Skip to content

Commit

Permalink
feat: support classification with linear models
Browse files Browse the repository at this point in the history
This corresponds to the LogisticRegression and RidgeClassifier classes from sklearn.
  • Loading branch information
iamDecode committed May 22, 2021
1 parent bb2fc93 commit 9650e25
Show file tree
Hide file tree
Showing 9 changed files with 475 additions and 103 deletions.
23 changes: 13 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,19 +20,22 @@ $ pip install sklearn-pmml-model
## Status
This library is in beta, and currently not all models are supported. The library currently does support the following models:

| Model | Classification | Regression | Categorical features |
|------------------------|----------------|------------|----------------------|
| [Decision Trees](sklearn_pmml_model/tree) | | | ✅<sup>1</sup> |
| [Random Forests](sklearn_pmml_model/ensemble) | | | ✅<sup>1</sup> |
| [Gradient Boosting](sklearn_pmml_model/ensemble) | | | ✅<sup>1</sup> |
| [Linear Regression](sklearn_pmml_model/linear_model) | | | |
| [Ridge](sklearn_pmml_model/linear_model) | || |
| [Lasso](sklearn_pmml_model/linear_model) | | | |
| [ElasticNet](sklearn_pmml_model/linear_model) | | | |
| [Gaussian Naive Bayes](sklearn_pmml_model/naive_bayes) | | | |
| Model | Classification | Regression | Categorical features |
|--------------------------------------------------------|----------------|------------|----------------------|
| [Decision Trees](sklearn_pmml_model/tree) || | ✅<sup>1</sup> |
| [Random Forests](sklearn_pmml_model/ensemble) || | ✅<sup>1</sup> |
| [Gradient Boosting](sklearn_pmml_model/ensemble) || | ✅<sup>1</sup> |
| [Linear Regression](sklearn_pmml_model/linear_model) | ✅<sup>2</sup> || ✅<sup>3</sup> |
| [Ridge](sklearn_pmml_model/linear_model) || | ✅<sup>3</sup> |
| [Lasso](sklearn_pmml_model/linear_model) | ✅<sup>2</sup> || ✅<sup>3</sup> |
| [ElasticNet](sklearn_pmml_model/linear_model) | ✅<sup>2</sup> || |
| [Gaussian Naive Bayes](sklearn_pmml_model/naive_bayes) || | |

<sup>1</sup> Categorical feature support using slightly modified internals, based on [scikit-learn#12866](https://github.com/scikit-learn/scikit-learn/pull/12866).

<sup>2</sup> These models differ only in training characteristics, the resulting model is of the same form. Classification is supported using `PMMLLogisticRegression`.

<sup>3</sup> By one-hot encoding categorical features automatically.

---

Expand Down
16 changes: 16 additions & 0 deletions models/generate_pmml.R
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,22 @@ Pima.tr2



library(glmnet)
test = read.csv('/Users/decode/Developer/sklearn-pmml-model/models/categorical-test.csv', header=TRUE, sep=",")

clf = glm(type ~., data=test, family=binomial)
pmml_clf = pmml(clf, predicted_field = "type")
saveXML(pmml_clf, "/Users/dennis/Downloads/naive_bayes.pmml")


predict(clf, model.matrix(type ~ ., test)[,-1], type="raw")

preds <- predict(clf, newdata = test)
conf_matrix <- table(preds, test$type)
conf_matrix




library(gbm)
library(r2pmml)
Expand Down
File renamed without changes.
92 changes: 92 additions & 0 deletions models/linear-model-lmc.pmml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
<?xml version="1.0"?>
<PMML version="4.4.1" xmlns="http://www.dmg.org/PMML-4_4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_4 http://www.dmg.org/pmml/v4-4/pmml-4-4.xsd">
<Header copyright="Copyright (c) 2021 decode" description="Generalized Linear Regression Model">
<Extension name="user" value="decode" extender="SoftwareAG PMML Generator"/>
<Application name="SoftwareAG PMML Generator" version="2.4.0"/>
<Timestamp>2021-05-21 21:56:51</Timestamp>
</Header>
<DataDictionary numberOfFields="8">
<DataField name="type" optype="categorical" dataType="string">
<Value value="No"/>
<Value value="Yes"/>
</DataField>
<DataField name="npreg" optype="continuous" dataType="double"/>
<DataField name="glu" optype="continuous" dataType="double"/>
<DataField name="bp" optype="continuous" dataType="double"/>
<DataField name="skin" optype="continuous" dataType="double"/>
<DataField name="bmi" optype="continuous" dataType="double"/>
<DataField name="ped" optype="continuous" dataType="double"/>
<DataField name="age" optype="categorical" dataType="string">
<Value value="(20,30]"/>
<Value value="(30,40]"/>
<Value value="(40,50]"/>
<Value value="(50,60]"/>
<Value value="(60,70]"/>
</DataField>
</DataDictionary>
<GeneralRegressionModel modelName="General_Regression_Model" modelType="generalizedLinear" functionName="classification" algorithmName="glm" distribution="binomial" linkFunction="logit">
<MiningSchema>
<MiningField name="type" usageType="predicted" invalidValueTreatment="returnInvalid"/>
<MiningField name="npreg" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="glu" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="bp" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="skin" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="bmi" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="ped" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="age" usageType="active" invalidValueTreatment="returnInvalid"/>
</MiningSchema>
<Output>
<OutputField name="Probability_Yes" targetField="type" feature="probability" value="Yes" optype="continuous" dataType="double"/>
<OutputField name="Predicted_type" feature="predictedValue" optype="categorical" dataType="string"/>
</Output>
<ParameterList>
<Parameter name="p0" label="(Intercept)"/>
<Parameter name="p1" label="npreg"/>
<Parameter name="p2" label="glu"/>
<Parameter name="p3" label="bp"/>
<Parameter name="p4" label="skin"/>
<Parameter name="p5" label="bmi"/>
<Parameter name="p6" label="ped"/>
<Parameter name="p7" label="age(30,40]"/>
<Parameter name="p8" label="age(40,50]"/>
<Parameter name="p9" label="age(50,60]"/>
<Parameter name="p10" label="age(60,70]"/>
</ParameterList>
<FactorList>
<Predictor name="age"/>
</FactorList>
<CovariateList>
<Predictor name="npreg"/>
<Predictor name="glu"/>
<Predictor name="bp"/>
<Predictor name="skin"/>
<Predictor name="bmi"/>
<Predictor name="ped"/>
</CovariateList>
<PPMatrix>
<PPCell value="1" predictorName="npreg" parameterName="p1"/>
<PPCell value="1" predictorName="glu" parameterName="p2"/>
<PPCell value="1" predictorName="bp" parameterName="p3"/>
<PPCell value="1" predictorName="skin" parameterName="p4"/>
<PPCell value="1" predictorName="bmi" parameterName="p5"/>
<PPCell value="1" predictorName="ped" parameterName="p6"/>
<PPCell value="(30,40]" predictorName="age" parameterName="p7"/>
<PPCell value="(40,50]" predictorName="age" parameterName="p8"/>
<PPCell value="(50,60]" predictorName="age" parameterName="p9"/>
<PPCell value="(60,70]" predictorName="age" parameterName="p10"/>
</PPMatrix>
<ParamMatrix>
<PCell targetCategory="Yes" parameterName="p0" df="1" beta="-57.1799981494652"/>
<PCell targetCategory="Yes" parameterName="p1" df="1" beta="0.722654058424025"/>
<PCell targetCategory="Yes" parameterName="p2" df="1" beta="0.170651218810002"/>
<PCell targetCategory="Yes" parameterName="p3" df="1" beta="0.455725762363011"/>
<PCell targetCategory="Yes" parameterName="p4" df="1" beta="-0.473218748281948"/>
<PCell targetCategory="Yes" parameterName="p5" df="1" beta="0.275493428386101"/>
<PCell targetCategory="Yes" parameterName="p6" df="1" beta="7.40623923752118"/>
<PCell targetCategory="Yes" parameterName="p7" df="1" beta="5.6829407356491"/>
<PCell targetCategory="Yes" parameterName="p8" df="1" beta="8.82062257424644"/>
<PCell targetCategory="Yes" parameterName="p9" df="1" beta="-4.44588099376691"/>
<PCell targetCategory="Yes" parameterName="p10" df="1" beta="-26.4273990722638"/>
</ParamMatrix>
</GeneralRegressionModel>
</PMML>
7 changes: 6 additions & 1 deletion sklearn_pmml_model/base.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from sklearn.base import BaseEstimator
from sklearn.preprocessing import LabelBinarizer
from xml.etree import cElementTree as eTree
from cached_property import cached_property
from sklearn_pmml_model.datatypes import Category
Expand Down Expand Up @@ -228,7 +229,11 @@ def __init__(self, pmml):
PMMLBaseEstimator.__init__(self, pmml)

target_type: Category = get_type(self.target_field)
self.classes_ = np.array(target_type.categories)
try:
self.classes_ = np.array(target_type.categories)
except AttributeError:
self._label_binarizer = LabelBinarizer(pos_label=1, neg_label=-1)
self._label_binarizer.classes_ = np.array(target_type.categories)
self.n_classes_ = len(self.classes_)
self.n_outputs_ = 1

Expand Down
12 changes: 10 additions & 2 deletions sklearn_pmml_model/linear_model/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,14 @@
The :mod:`sklearn_pmml_model.linear_model` module implements generalized linear models.
"""

from .implementations import PMMLLinearRegression, PMMLRidge, PMMLLasso, PMMLElasticNet
from .implementations import PMMLLinearRegression, PMMLLogisticRegression, PMMLRidge, \
PMMLRidgeClassifier, PMMLLasso, PMMLElasticNet

__all__ = ["PMMLLinearRegression", "PMMLRidge", "PMMLLasso", "PMMLElasticNet"]
__all__ = [
"PMMLLinearRegression",
"PMMLLogisticRegression",
"PMMLRidge",
"PMMLRidgeClassifier",
"PMMLLasso",
"PMMLElasticNet"
]

0 comments on commit 9650e25

Please sign in to comment.