Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GaussianMixture #169

Merged
merged 45 commits into from
Jul 15, 2019
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
9c3d365
Enables opset 9 for TextVectorizer
sdpython Apr 24, 2019
9152cb9
Merge branch 'master' of https://github.com/onnx/sklearn-onnx
sdpython May 2, 2019
ede9829
Merge branch 'master' of https://github.com/onnx/sklearn-onnx
sdpython May 3, 2019
9ff2aa5
Merge branch 'master' of https://github.com/onnx/sklearn-onnx
sdpython May 6, 2019
37ee770
Merge branch 'master' of https://github.com/onnx/sklearn-onnx
sdpython May 9, 2019
2a6f00f
Merge branch 'master' of https://github.com/onnx/sklearn-onnx
sdpython May 10, 2019
8df0812
check random forest
sdpython May 10, 2019
1c3b3e5
Merge branch 'master' of https://github.com/onnx/sklearn-onnx
sdpython May 15, 2019
7934c4a
Merge branch 'master' of https://github.com/onnx/sklearn-onnx
sdpython May 20, 2019
086f1fe
Merge branch 'master' of https://github.com/onnx/sklearn-onnx
sdpython May 21, 2019
fa0517e
Merge branch 'master' of https://github.com/onnx/sklearn-onnx
sdpython May 22, 2019
4e2053b
Merge branch 'master' of https://github.com/onnx/sklearn-onnx
sdpython May 22, 2019
eb5923d
Merge branch 'master' of https://github.com/onnx/sklearn-onnx
sdpython May 27, 2019
50311d5
Merge branch 'master' of https://github.com/onnx/sklearn-onnx
sdpython Jun 3, 2019
4d82c47
Merge branch 'master' of https://github.com/onnx/sklearn-onnx
sdpython Jun 6, 2019
4f4cb03
[WIP] add GaussianMixture
sdpython Jun 6, 2019
2baf6fd
Merge branch 'master' of https://github.com/onnx/sklearn-onnx into gau
sdpython Jun 7, 2019
8d64dd8
fix gaussian mixture + function to investigate failures
sdpython Jun 7, 2019
badb269
Complete GaussianMixture
sdpython Jun 10, 2019
a72f77e
Merge branch 'master' of https://github.com/onnx/sklearn-onnx into gau
sdpython Jun 17, 2019
1bb81e0
update comments
sdpython Jun 17, 2019
784d64b
Enables opset 9 for TextVectorizer
sdpython Apr 24, 2019
0047376
check random forest
sdpython May 10, 2019
d18b90c
fix spaces
sdpython Jun 10, 2019
249b19f
update converter for TfIdf after a change of spec in onnxruntime
sdpython Jun 11, 2019
1cae401
check random forest
sdpython May 10, 2019
98e45ef
Merge branch 'master' of https://github.com/onnx/sklearn-onnx
sdpython Jun 20, 2019
2e796a4
Delete test_SklearnGradientBoostingConverters.py
sdpython Jun 20, 2019
f7eecdd
Merge branch 'master' of https://github.com/onnx/sklearn-onnx
sdpython Jun 25, 2019
41e8145
rename ModuleFoundError into ImportError (not allowed in python3.5)
sdpython Jun 26, 2019
bbe958c
Update tests_helper.py
sdpython Jun 26, 2019
f4601b4
Update tests_helper.py
sdpython Jun 26, 2019
f79bfd2
pep8, CI
sdpython Jul 3, 2019
d9febca
CI, pep8
sdpython Jul 3, 2019
6aeca99
Update win32-conda-CI.yml
sdpython Jul 3, 2019
b638846
Update win32-conda-CI.yml
sdpython Jul 3, 2019
05fb4ce
Update win32-conda-CI.yml
sdpython Jul 3, 2019
10ee6c8
Update win32-conda-CI.yml
sdpython Jul 3, 2019
4b2f1d5
Update test_algebra_onnx_doc.py
sdpython Jul 3, 2019
7495ffb
shorten CI
sdpython Jul 3, 2019
f34c8d3
update CI
sdpython Jul 3, 2019
169c658
Merge branch 'importerror' of https://github.com/xadupre/sklearn-onnx…
sdpython Jul 3, 2019
a1559ac
Merge branch 'master' of https://github.com/xadupre/sklearn-onnx into…
sdpython Jul 3, 2019
8a647b9
support integer as input feature
sdpython Jul 8, 2019
391bd85
Merge branch 'master' of https://github.com/onnx/sklearn-onnx into gau
sdpython Jul 11, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 8 additions & 0 deletions skl2onnx/_parse.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from sklearn import pipeline
from sklearn.base import ClassifierMixin, ClusterMixin
from sklearn.neighbors import NearestNeighbors
from sklearn.mixture import GaussianMixture
from sklearn.svm import LinearSVC, NuSVC, SVC
from sklearn.preprocessing import FunctionTransformer
try:
Expand Down Expand Up @@ -110,6 +111,13 @@ def _parse_sklearn_simple_model(scope, model, inputs, custom_parsers=None):
FloatTensorType())
this_operator.outputs.append(index_variable)
this_operator.outputs.append(distance_variable)
elif type(model) == GaussianMixture:
label_variable = scope.declare_local_variable('label',
Int64TensorType())
prob_variable = scope.declare_local_variable('probabilities',
FloatTensorType())
this_operator.outputs.append(label_variable)
this_operator.outputs.append(prob_variable)
else:
# We assume that all scikit-learn operator produce a single output.
variable = scope.declare_local_variable('variable', FloatTensorType())
Expand Down
5 changes: 4 additions & 1 deletion skl2onnx/_supported_operators.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@
from sklearn.linear_model import SGDRegressor
from sklearn.svm import LinearSVR

# Mixture
from sklearn.mixture import GaussianMixture

# Multi-class
from sklearn.multiclass import OneVsRestClassifier

Expand Down Expand Up @@ -143,7 +146,7 @@ def build_sklearn_operator_name_map():
RobustScaler, OneHotEncoder, DictVectorizer,
GenericUnivariateSelect, RFE, RFECV, SelectFdr, SelectFpr,
SelectFromModel, SelectFwe, SelectKBest, SelectPercentile,
VarianceThreshold,
VarianceThreshold, GaussianMixture
] if k is not None}
res.update({
ElasticNet: 'SklearnElasticNetRegressor',
Expand Down
26 changes: 23 additions & 3 deletions skl2onnx/helpers/onnx_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,16 +45,36 @@ def save_onnx_model(model, filename=None):
return content


def enumerate_model_node_outputs(model):
def enumerate_model_node_outputs(model, add_node=False):
"""
Enumerates all the node of a model.
Enumerates all the nodes of a model.

:param model: ONNX graph
:param add_node: if False, the function enumerates
all output names from every node, otherwise, it
enumerates tuple (output name, node)
:return: enumerator
"""
if not hasattr(model, "graph"):
raise TypeError("Parameter model is not an ONNX model but "
"{}".format(type(model)))
for node in model.graph.node:
for out in node.output:
yield out
yield (out, node) if add_node else out


def enumerate_model_initializers(model, add_node=False):
"""
Enumerates all the initializers of a model.

:param model: ONNX graph
:param add_node: if False, the function enumerates
all output names from every node, otherwise, it
enumerates tuple (output name, node)
:return: enumerator
"""
for node in model.graph.initializer:
yield (node.name, node) if add_node else node.name


def select_model_inputs_outputs(model, outputs=None, inputs=None):
Expand Down
2 changes: 2 additions & 0 deletions skl2onnx/operator_converters/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from . import feature_selection
from . import flatten_op
from . import function_transformer
from . import gaussian_mixture
from . import gradient_boosting
from . import imputer_op
from . import k_bins_discretiser
Expand Down Expand Up @@ -53,6 +54,7 @@
feature_selection,
flatten_op,
function_transformer,
gaussian_mixture,
gradient_boosting,
imputer_op,
k_bins_discretiser,
Expand Down
147 changes: 147 additions & 0 deletions skl2onnx/operator_converters/gaussian_mixture.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for
# license information.
# --------------------------------------------------------------------------

import numpy as np
from sklearn.mixture.gaussian_mixture import _compute_log_det_cholesky
from ..common._registration import register_converter
from ..algebra.onnx_ops import (
OnnxAdd, OnnxSub, OnnxMul, OnnxGemm, OnnxReduceSumSquare,
OnnxReduceLogSumExp, OnnxExp, OnnxArgMax, OnnxConcat
)


def convert_sklearn_gaussian_mixture(scope, operator, container):
"""
Converter for *GaussianMixture*.
Parameters which change the prediction function:

* *covariance_type*
"""
X = operator.inputs[0]
out = operator.outputs
op = operator.raw_operator
n_features = X.type.shape[1]
n_components = op.means_.shape[0]

# All comments come from scikit-learn code and tells
# which functions is being onnxified.
# def _estimate_weighted_log_prob(self, X):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why commented code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remember where I found the implementation in scikit-learn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to have comments instead. That would make it clear to anyone reading the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I did (line 29). Does it need more?

# self._estimate_log_prob(X) + self._estimate_log_weights()
log_weights = np.log(op.weights_) # self._estimate_log_weights()

# self._estimate_log_prob(X)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented code again?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason.

log_det = _compute_log_det_cholesky(
op.precisions_cholesky_, op.covariance_type, n_features)

if op.covariance_type == 'full':
# shape(op.means_) = (n_components, n_features)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a lot of commented code in this file, could you clean it up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to let it, it is how it is implemented in scikit-learn, I can add a new comment to specify it comes from sklearn.

# shape(op.precisions_cholesky_) =
# (n_components, n_features, n_features)

# log_prob = np.empty((n_samples, n_components))
# for k, (mu, prec_chol) in enumerate(zip(means, precisions_chol)):
# y = np.dot(X, prec_chol) - np.dot(mu, prec_chol)
# log_prob[:, k] = np.sum(np.square(y), axis=1)

ys = []
for c in range(n_components):
prec_chol = op.precisions_cholesky_[c, :, :]
cst = - np.dot(op.means_[c, :], prec_chol)
y = OnnxGemm(X, prec_chol, cst, alpha=1., beta=1.)
y2s = OnnxReduceSumSquare(y, axes=[1])
ys.append(y2s)
log_prob = OnnxConcat(*ys, axis=1)

elif op.covariance_type == 'tied':
# shape(op.means_) = (n_components, n_features)
# shape(op.precisions_cholesky_) =
# (n_features, n_features)

# log_prob = np.empty((n_samples, n_components))
# for k, mu in enumerate(means):
# y = np.dot(X, precisions_chol) - np.dot(mu, precisions_chol)
# log_prob[:, k] = np.sum(np.square(y), axis=1)

precisions_chol = op.precisions_cholesky_
ys = []
for f in range(n_components):
cst = - np.dot(op.means_[f, :], precisions_chol)
y = OnnxGemm(X, precisions_chol, cst, alpha=1., beta=1.)
y2s = OnnxReduceSumSquare(y, axes=[1])
ys.append(y2s)
log_prob = OnnxConcat(*ys, axis=1)

elif op.covariance_type == 'diag':
# shape(op.means_) = (n_components, n_features)
# shape(op.precisions_cholesky_) =
# (n_components, n_features)

# precisions = precisions_chol ** 2
# log_prob = (np.sum((means ** 2 * precisions), 1) -
# 2. * np.dot(X, (means * precisions).T) +
# np.dot(X ** 2, precisions.T))

precisions = op.precisions_cholesky_ ** 2
mp = np.sum((op.means_ ** 2 * precisions), 1)
zeros = np.zeros((n_components, ))
xmp = OnnxGemm(X, (op.means_ * precisions).T, zeros,
alpha=-2., beta=0.)
term = OnnxGemm(OnnxMul(X, X), precisions.T, zeros, alpha=1., beta=0.)
log_prob = OnnxAdd(OnnxAdd(mp, xmp), term)

elif op.covariance_type == 'spherical':
# shape(op.means_) = (n_components, n_features)
# shape(op.precisions_cholesky_) = (n_components, )

# precisions = precisions_chol ** 2
# log_prob = (np.sum(means ** 2, 1) * precisions -
# 2 * np.dot(X, means.T * precisions) +
# np.outer(row_norms(X, squared=True), precisions))

zeros = np.zeros((n_components, ))
precisions = op.precisions_cholesky_ ** 2
normX = OnnxReduceSumSquare(X, axes=[1])
outer = OnnxGemm(normX, precisions[np.newaxis, :], zeros,
alpha=1., beta=1.)
xmp = OnnxGemm(X, (op.means_.T * precisions), zeros,
alpha=-2., beta=0.)
mp = np.sum(op.means_ ** 2, 1) * precisions
log_prob = OnnxAdd(mp, OnnxAdd(xmp, outer))
else:
raise RuntimeError("Unknown op.covariance_type='{}'. Upgrade "
"to a mroe recent version of skearn-onnx "
"or raise an issue.".format(op.covariance_type))

# -.5 * (cst + log_prob) + log_det
cst = np.array([n_features * np.log(2 * np.pi)])
add = OnnxAdd(cst, log_prob)
mul = OnnxMul(add, np.array([-0.5]))
if isinstance(log_det, float):
log_det = np.array([log_det])
weighted_log_prob = OnnxAdd(OnnxAdd(mul, log_det), log_weights)

# labels
labels = OnnxArgMax(weighted_log_prob, axis=1, output_names=out[:1])

# def _estimate_log_prob_resp():
# np.exp(log_resp)
# weighted_log_prob = self._estimate_weighted_log_prob(X)
# log_prob_norm = logsumexp(weighted_log_prob, axis=1)
# with np.errstate(under='ignore'):
# log_resp = weighted_log_prob - log_prob_norm[:, np.newaxis]

log_prob_norm = OnnxReduceLogSumExp(weighted_log_prob, axes=[1])
log_resp = OnnxSub(weighted_log_prob, log_prob_norm)

# probabilities
probs = OnnxExp(log_resp, output_names=out[1:])

# final
labels.add_to(scope, container)
probs.add_to(scope, container)


register_converter('SklearnGaussianMixture', convert_sklearn_gaussian_mixture)
2 changes: 2 additions & 0 deletions skl2onnx/shape_calculators/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from . import label_encoder
from . import linear_classifier
from . import linear_regressor
from . import mixture
from . import nearest_neighbours
from . import one_hot_encoder
from . import one_vs_rest_classifier
Expand All @@ -43,6 +44,7 @@
label_encoder,
linear_classifier,
linear_regressor,
mixture,
nearest_neighbours,
one_hot_encoder,
one_vs_rest_classifier,
Expand Down
30 changes: 30 additions & 0 deletions skl2onnx/shape_calculators/mixture.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for
# license information.
# --------------------------------------------------------------------------

from ..common._registration import register_shape_calculator
from ..common.data_types import FloatTensorType, Int64TensorType
from ..common.utils import (
check_input_and_output_numbers,
check_input_and_output_types
)


def calculate_gaussian_mixture_output_shapes(operator):
check_input_and_output_numbers(operator, input_count_range=1,
output_count_range=2)
check_input_and_output_types(operator, good_input_types=[FloatTensorType])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is int not allowed as an input type? Scikit allows int features.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hesitate. Statistically, it makes no sense to fix a gaussian mixture on integer data as it cannot be gaussian. I'll fix it.


if len(operator.inputs[0].type.shape) != 2:
raise RuntimeError('Input must be a [N, C]-tensor')

op = operator.raw_operator
N = operator.inputs[0].type.shape[0]
operator.outputs[0].type = Int64TensorType([N, 1])
operator.outputs[1].type = FloatTensorType([N, op.n_components])


register_shape_calculator('SklearnGaussianMixture',
calculate_gaussian_mixture_output_shapes)