Storing LinearSVC as coefficients instead of support vectors #399

HannahandRik · 2023-11-16T10:57:19Z

I am trying to export a linear SVM to PMML. The code works, but the model is stored using the support vectors. The outcome of the model when applied returns a 0 or 1. This latter is ofcourse expected behavior from an SVM. However for local explainability I would like to have the coefficients (or weights) of the model as well and also be able to compute the "score" instead of the predicted class.

I can't seem to figure out if it is at all possible to export the Linear SVM using the coefficients representation instead of the support vector representation. Some of my code is shown below to illustrate what I'm doing right now. I have also tried using the svm.LinearSVC but it results in the same PMML format.

linear_svm = svm.SVC(max_iter=20000, probability=True, kernel="linear")

clf_pp = sklearn.pipeline.Pipeline(
    steps=[
        ("preprocessor", preprocessor),
        ("classifier", linear_svm),
    ]
)

param_grid = {
    "classifier__C": [1, 10, 100, 1000],
}

cv = GridSearchCV(clf_pp, param_grid=param_grid)
cv.fit(X_train, y_train)

This then creates multiple estimators, and I can select the best one

best_estimator = cv.best_estimator_

And for example get the predicted scores if I want (not probabilities)

pred_scores = best_estimator.decision_function(X_valid)

I then turn it into a PMML pipeline:

pipeline = make_pmml_pipeline(
    best_estimator,
    target_fields=["PREDICTED_CLASS"] 
)

And export it to PMML:

sklearn2pmml(
    pipeline, "../data/linear_svm.pmml",
    debug=True
            )

Although the export works and the model can be loaded and used from it. It doesn't contain the coefficients that match the hyperplane.

A small snippet from the PMML to show what I mean:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_4" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.4">
	<Header>
		<Application name="SkLearn2PMML package" version="0.94.1"/>
		<Timestamp>2023-10-19T11:54:48Z</Timestamp>
	</Header>
	<DataDictionary>
		<DataField name="y" optype="categorical" dataType="integer">
			<Value value="0"/>
			<Value value="1"/>
		</DataField>
		<DataField name="FULL_NAME_LENGTH_DIFF" optype="continuous" dataType="double"/>
		<DataField name="FULL_NAME_LENGTH_MATCH" optype="categorical" dataType="boolean">
			<Value value="false"/>
			<Value value="true"/>
		</DataField>
	</DataDictionary>
	<SupportVectorMachineModel functionName="classification" algorithmName="sklearn.svm._classes.SVC" classificationMethod="OneAgainstOne">
		<MiningSchema>
			<MiningField name="y" usageType="target"/>
			<MiningField name="FULL_NAME_LENGTH_MATCH" missingValueReplacement="0.0" missingValueTreatment="asMode" invalidValueTreatment="asIs"/>
			<MiningField name="FULL_NAME_LENGTH_DIFF" missingValueReplacement="6.0" missingValueTreatment="asMedian"/>
		</MiningSchema>
		<LocalTransformations>
			<DerivedField name="standardScaler(FULL_NAME_LENGTH_DIFF)" optype="continuous" dataType="double">
				<Apply function="/">
					<Apply function="-">
						<FieldRef field="FULL_NAME_LENGTH_DIFF"/>
						<Constant dataType="double">7.049858585858586</Constant>
					</Apply>
					<Constant dataType="double">7.037735157109721</Constant>
				</Apply>
			</DerivedField>
		</LocalTransformations>
		<LinearKernelType/>
		<VectorDictionary>
			<VectorFields>
				<FieldRef field="standardScaler(FULL_NAME_LENGTH_DIFF)"/>
				<CategoricalPredictor name="FULL_NAME_LENGTH_MATCH" value="false" coefficient="1.0"/>
				<CategoricalPredictor name="FULL_NAME_LENGTH_MATCH" value="true" coefficient="1.0"/>
			</VectorFields>
			<VectorInstance id="22">
				<Array type="real">-0.8596314653510151 1.0 0.0</Array>
			</VectorInstance>

which goes on for a while with other support vectors.

</VectorDictionary>
		<SupportVectorMachine targetCategory="1" alternateTargetCategory="0">
			<SupportVectors>
				<SupportVector vectorId="22"/>

similar here.

</SupportVectors>
			<Coefficients absoluteValue="-1.7811373157282373E-5">
				<Coefficient value="1.0"/>

and then we get "coefficients", which seem to be the "scores" of the support vectors.

			</Coefficients>
		</SupportVectorMachine>
	</SupportVectorMachineModel>
</PMML>

As you can see, not the coefficients/weights that I am looking for.

The text was updated successfully, but these errors were encountered:

HannahandRik · 2023-11-16T11:02:08Z

Never mind, I tried again with the LinearSVC and now it does seem to work!

HannahandRik closed this as completed Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storing LinearSVC as coefficients instead of support vectors #399

Storing LinearSVC as coefficients instead of support vectors #399

HannahandRik commented Nov 16, 2023

HannahandRik commented Nov 16, 2023

Storing LinearSVC as coefficients instead of support vectors #399

Storing LinearSVC as coefficients instead of support vectors #399

Comments

HannahandRik commented Nov 16, 2023

HannahandRik commented Nov 16, 2023