Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing LinearSVC as coefficients instead of support vectors #399

Closed
HannahandRik opened this issue Nov 16, 2023 · 1 comment
Closed

Storing LinearSVC as coefficients instead of support vectors #399

HannahandRik opened this issue Nov 16, 2023 · 1 comment

Comments

@HannahandRik
Copy link

I am trying to export a linear SVM to PMML. The code works, but the model is stored using the support vectors. The outcome of the model when applied returns a 0 or 1. This latter is ofcourse expected behavior from an SVM. However for local explainability I would like to have the coefficients (or weights) of the model as well and also be able to compute the "score" instead of the predicted class.

I can't seem to figure out if it is at all possible to export the Linear SVM using the coefficients representation instead of the support vector representation. Some of my code is shown below to illustrate what I'm doing right now. I have also tried using the svm.LinearSVC but it results in the same PMML format.

linear_svm = svm.SVC(max_iter=20000, probability=True, kernel="linear")

clf_pp = sklearn.pipeline.Pipeline(
    steps=[
        ("preprocessor", preprocessor),
        ("classifier", linear_svm),
    ]
)

param_grid = {
    "classifier__C": [1, 10, 100, 1000],
}

cv = GridSearchCV(clf_pp, param_grid=param_grid)
cv.fit(X_train, y_train)

This then creates multiple estimators, and I can select the best one

best_estimator = cv.best_estimator_

And for example get the predicted scores if I want (not probabilities)

pred_scores = best_estimator.decision_function(X_valid)

I then turn it into a PMML pipeline:

pipeline = make_pmml_pipeline(
    best_estimator,
    target_fields=["PREDICTED_CLASS"] 
)

And export it to PMML:

sklearn2pmml(
    pipeline, "../data/linear_svm.pmml",
    debug=True
            )

Although the export works and the model can be loaded and used from it. It doesn't contain the coefficients that match the hyperplane.

A small snippet from the PMML to show what I mean:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_4" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.4">
	<Header>
		<Application name="SkLearn2PMML package" version="0.94.1"/>
		<Timestamp>2023-10-19T11:54:48Z</Timestamp>
	</Header>
	<DataDictionary>
		<DataField name="y" optype="categorical" dataType="integer">
			<Value value="0"/>
			<Value value="1"/>
		</DataField>
		<DataField name="FULL_NAME_LENGTH_DIFF" optype="continuous" dataType="double"/>
		<DataField name="FULL_NAME_LENGTH_MATCH" optype="categorical" dataType="boolean">
			<Value value="false"/>
			<Value value="true"/>
		</DataField>
	</DataDictionary>
	<SupportVectorMachineModel functionName="classification" algorithmName="sklearn.svm._classes.SVC" classificationMethod="OneAgainstOne">
		<MiningSchema>
			<MiningField name="y" usageType="target"/>
			<MiningField name="FULL_NAME_LENGTH_MATCH" missingValueReplacement="0.0" missingValueTreatment="asMode" invalidValueTreatment="asIs"/>
			<MiningField name="FULL_NAME_LENGTH_DIFF" missingValueReplacement="6.0" missingValueTreatment="asMedian"/>
		</MiningSchema>
		<LocalTransformations>
			<DerivedField name="standardScaler(FULL_NAME_LENGTH_DIFF)" optype="continuous" dataType="double">
				<Apply function="/">
					<Apply function="-">
						<FieldRef field="FULL_NAME_LENGTH_DIFF"/>
						<Constant dataType="double">7.049858585858586</Constant>
					</Apply>
					<Constant dataType="double">7.037735157109721</Constant>
				</Apply>
			</DerivedField>
		</LocalTransformations>
		<LinearKernelType/>
		<VectorDictionary>
			<VectorFields>
				<FieldRef field="standardScaler(FULL_NAME_LENGTH_DIFF)"/>
				<CategoricalPredictor name="FULL_NAME_LENGTH_MATCH" value="false" coefficient="1.0"/>
				<CategoricalPredictor name="FULL_NAME_LENGTH_MATCH" value="true" coefficient="1.0"/>
			</VectorFields>
			<VectorInstance id="22">
				<Array type="real">-0.8596314653510151 1.0 0.0</Array>
			</VectorInstance>

which goes on for a while with other support vectors.

</VectorDictionary>
		<SupportVectorMachine targetCategory="1" alternateTargetCategory="0">
			<SupportVectors>
				<SupportVector vectorId="22"/>

similar here.

</SupportVectors>
			<Coefficients absoluteValue="-1.7811373157282373E-5">
				<Coefficient value="1.0"/>

and then we get "coefficients", which seem to be the "scores" of the support vectors.

			</Coefficients>
		</SupportVectorMachine>
	</SupportVectorMachineModel>
</PMML>

As you can see, not the coefficients/weights that I am looking for.

@HannahandRik
Copy link
Author

Never mind, I tried again with the LinearSVC and now it does seem to work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant