Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #13173, implements faster polynomial features for dense matrices #13290

Merged
merged 28 commits into from Jul 29, 2019
Merged

Fixes #13173, implements faster polynomial features for dense matrices #13290

merged 28 commits into from Jul 29, 2019

Conversation

@sdpython
Copy link
Contributor

@sdpython sdpython commented Feb 26, 2019

Reference Issues/PRs

Fixes #13173.

What does this implement/fix? Explain your changes.

This PR implements a faster version of polynomial features for dense matrices. Instead of computing each features independently by going through all combinations, it broadcasts as much as possible the multiple by one vector. The speed up is significant (3 times faster) for matrices with less than 10.000 rows.

Any other comments?

More details about speed up can be found here: https://github.com/sdpython/mlinsights/blob/master/_doc/notebooks/sklearn/faster_polynomial_features.ipynb.

@agramfort agramfort added this to In progress in Sprint Paris 2019 Feb 26, 2019
@GaelVaroquaux
Copy link
Member

@GaelVaroquaux GaelVaroquaux commented Feb 27, 2019

Can you add the benchmark script that you used in the benchmarks folder of the repo and paste here the graph that it outputs.

@sdpython
Copy link
Contributor Author

@sdpython sdpython commented Feb 27, 2019

image

X-axis is number of observations
Y-axis is time in seconds

degree  interaction_only      n  nfeat order  time_0_20_2  time_current

1 2 True 1 10 C 0.000299 0.000100
25 2 True 10 10 C 0.000399 0.000100
49 2 True 100 10 C 0.000399 0.000100
73 2 True 1000 10 C 0.000898 0.000299
97 2 True 10000 10 C 0.007676 0.006383
degree interaction_only n nfeat order time_0_20_2 time_current
13 2 True 1 20 C 0.001596 0.000100
37 2 True 10 20 C 0.001396 0.000199
61 2 True 100 20 C 0.001496 0.000100
85 2 True 1000 20 C 0.004189 0.002495
109 2 True 10000 20 C 0.031014 0.031117

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Feb 27, 2019

The benchmark script needs a couple of quick fixes:

benchmarks/bench_plot_polynomial_features.py:10:1: F401 'sys' imported but unused
import sys
^
benchmarks/bench_plot_polynomial_features.py:11:1: F401 'warnings' imported but unused
import warnings
^
benchmarks/bench_plot_polynomial_features.py:12:1: F401 'numbers' imported but unused
import numbers
^
benchmarks/bench_plot_polynomial_features.py:24:1: F401 'sklearn.utils.testing.ignore_warnings' imported but unused
from sklearn.utils.testing import ignore_warnings
^
benchmarks/bench_plot_polynomial_features.py:87:80: E501 line too long (94 > 79 characters)
                                                              degree, interaction_only, order)
                                                                               ^
benchmarks/bench_plot_polynomial_features.py:89:80: E501 line too long (87 > 79 characters)
                                                       degree, interaction_only, order)
                                                                               ^
benchmarks/bench_plot_polynomial_features.py:105:38: E261 at least two spaces before inline comment
                                break # stops if longer than a second
@ogrisel
Copy link
Member

@ogrisel ogrisel commented Feb 27, 2019

Can you please do a benchmark for a larger number of features? The new code is significantly more complex than the old one so I would like to see case where the new code is significantly faster than the old code on a case that lasts more than 10s.

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Feb 27, 2019

Arguably it might still be interesting to be significantly faster on a small number of samples at prediction time to reduce the single sample prediction latency but I am not sure that a change from 0.3ms to 0.1ms is important enough to justify the increase in code complexity.

@sdpython
Copy link
Contributor Author

@sdpython sdpython commented Feb 28, 2019

image

image

Full results: https://github.com/sdpython/_benchmarks/tree/master/scikit-learn/results

You are probably right about the speed up. However, the new version is a better inspiration for anybody who needs to implement a runtime for fast predictions. That was also one of my objectives by proposing this change. I don't know if many people search for numerical recipes inside scikit-learn code. I do that sometimes.

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Feb 28, 2019

Thank you for the additional benchmark results. So indeed for a large number of features the difference starts to be significant. Maybe this code a good candidate for cythonization but I am not saying we should wait for cythonization to merge this PR.

I would like to have other people opinions in the complexity vs performance tradeoff of this PR.

@sdpython
Copy link
Contributor Author

@sdpython sdpython commented Feb 28, 2019

I just made a fix and remove one data copy (no copy of np.multiply result) and the improvment is significant even for 100.000 rows.

image

Basically, the implementation is almost twice faster than the current one for matrices with 100.000 rows.

# the matrix first to multiply contiguous memory segments.
transpose = X.size <= 1e7 and self.order == 'C'
n = X.shape[1]

This comment has been minimized.

@jeremiedbb

jeremiedbb Feb 28, 2019
Member

When the size is larger, is transposing slower or equivalent ?
And, when the matrix is small does the transposing bring a significant part of the speed up ?

My point is that the transpose switch makes the code twice as long, and I'm wondering if we could always do one or the other.

This comment has been minimized.

@sdpython

sdpython Feb 28, 2019
Author Contributor

image

I remove the transpose and it seemed better. So i'll keep it. I added it because it was better before I removed one unnecessary copy this morning. transpose is inefficient when order=='F' so the only option to simplify is to remove it. Updating the PR.

@GaelVaroquaux
Copy link
Member

@GaelVaroquaux GaelVaroquaux commented Feb 28, 2019

@sdpython
Copy link
Contributor Author

@sdpython sdpython commented Feb 28, 2019

Hi,

I created a benchmark which tests the four following functions:

def fcts_model(X, y):
    
    model1 = SGDClassifier()
    model2 = make_pipeline(PolynomialFeatures(), SGDClassifier())
    model3 = make_pipeline(ExtendedFeatures(kind='poly'), SGDClassifier())
    model4 = make_pipeline(ExtendedFeatures(kind='poly-slow'), SGDClassifier())

    model1.fit(PolynomialFeatures().fit_transform(X), y)
    model2.fit(X, y)
    model3.fit(X, y)
    model4.fit(X, y)
    
    def partial_fit_model1(X, y, model=model1):
        return model.partial_fit(X, y)
    
    def partial_fit_model2(X, y, model=model2):
        X2 = model.steps[0][1].transform(X)
        return model.steps[1][1].partial_fit(X2, y)
    
    def partial_fit_model3(X, y, model=model3):
        X2 = model.steps[0][1].transform(X)
        return model.steps[1][1].partial_fit(X2, y)
    
    def partial_fit_model4(X, y, model=model4):
        X2 = model.steps[0][1].transform(X)
        return model.steps[1][1].partial_fit(X2, y)

    return partial_fit_model1, partial_fit_model2, partial_fit_model3, partial_fit_model4

It tests the combination of PolynomialFeatures + SGDClassifier. On the following graph: SGD = SGDClassifier only, SGD-SKL = SGD + POLY 0.20.2 , SGD-FAST is the new implementation, SGD-SLOW is 0.20.2 but implemented in the bench to make sure I'm not sure my branch to make the test. Is that what you expected?

image

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Feb 28, 2019

It tests the combination of PolynomialFeatures + SGDClassifier. On the following graph: SGD = SGDClassifier only, SGD-SKL = SGD + POLY 0.20.2 , SGD-FAST is the new implementation, SGD-SLOW is 0.20.2 but implemented in the bench to make sure I'm not sure my branch to make the test.

This is really confusing. Can you please rephrase?

Copy link
Member

@ogrisel ogrisel left a comment

Some more comments:

sklearn/preprocessing/data.py Outdated Show resolved Hide resolved
sklearn/preprocessing/data.py Outdated Show resolved Hide resolved
sklearn/preprocessing/data.py Outdated Show resolved Hide resolved
@sdpython
Copy link
Contributor Author

@sdpython sdpython commented Mar 1, 2019

I added the second benchmark to the sources:

  • SGD=SGDClasifier only
  • SGD-SKL=PolynomialFeatures from scikit-learn (no matter what it is)
  • SGD-FAST=new implementation copy-pasted in the benchmark source file
  • SGD-SLOW=implementation of 0.20.2 copy-pasted in the benchmark source file
@GaelVaroquaux
Copy link
Member

@GaelVaroquaux GaelVaroquaux commented Mar 1, 2019

I hate to say, but I find the benchmark code very hard to read. It's very factored and generic. I would personally write much simpler code, even if it repeats a bit more.

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Apr 19, 2019

In line of the benchmark results (and the relative simplicity of the code) I am +1 for merging this change but I don't like the complexity of the benchmark scripts either.

So I would suggest to just remove the benchmark script for this PR.

@NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Apr 24, 2019

The last benchmark #13290 (comment) seems to show that the fast version is only really much faster when the current one is already pretty fast. The improvement isn't as clear on the right side of the x axis.

@sdpython , do you have a specific use-case for this? I agree with the others that we need to make sure we are addressing an actual problem here, to justify the complexity.

@ogrisel
Copy link
Member

@ogrisel ogrisel commented May 10, 2019

One use case is to decrease the latency of individual predictions in a production deployment (the red dots on the right hand side figure).

Copy link
Member

@jeremiedbb jeremiedbb left a comment

I just made a few comment, mostly cosmetics.
Besides that, LGTM

else:
new_index = []
end = index[-1]
for feature_idx in range(0, n_features):

This comment has been minimized.

@jeremiedbb

jeremiedbb May 17, 2019
Member

range(n_features) is enough.

for feature_idx in range(0, n_features):
a = index[feature_idx]
new_index.append(current_col)
start = a

This comment has been minimized.

@jeremiedbb

jeremiedbb May 17, 2019
Member

it seems that you don't need a. Simply write start = index[feature_idx].

np.multiply(XP[:, start:end],
X[:, feature_idx:feature_idx + 1],
out=XP[:, current_col:next_col],
where=True, casting='no')

This comment has been minimized.

@jeremiedbb

jeremiedbb May 17, 2019
Member

where=True is the default so is not necessary

next_col = current_col + end - start
if next_col <= current_col:
break
np.multiply(XP[:, start:end],

This comment has been minimized.

@jeremiedbb

jeremiedbb May 17, 2019
Member

I'm not a fan of breaking loops but ok.

@@ -1,10 +1,12 @@
# coding: utf-8

This comment has been minimized.

@jeremiedbb

jeremiedbb May 17, 2019
Member

unrelated to this PR. Just looked at other files in sklearn and it's sometimes there and sometimes not. It seems there's no strong convention about that.

# Authors: Alexandre Gramfort <alexandre.gramfort@inria.fr>
# Mathieu Blondel <mathieu@mblondel.org>
# Olivier Grisel <olivier.grisel@ensta.org>
# Andreas Mueller <amueller@ais.uni-bonn.de>
# Eric Martin <eric@ericmart.in>
# Giorgio Patrini <giorgio.patrini@anu.edu.au>
# Eric Chang <ericchang2017@u.northwestern.edu>
# Xavier Dupré <xadupre@microsoft.com>

This comment has been minimized.

@jeremiedbb

jeremiedbb May 17, 2019
Member

I think these lists are not maintained any more :)

@sdpython
Copy link
Contributor Author

@sdpython sdpython commented May 20, 2019

I modified the code based on the comment. The build is failing due to unrelated changes (as others PR).

@sdpython
Copy link
Contributor Author

@sdpython sdpython commented Jun 7, 2019

Do I need to do new changes to the PR?


current_col = 1 if self.include_bias else 0
for d in range(0, self.degree):
if d == 0:

This comment has been minimized.

@NicolasHug

NicolasHug Jun 7, 2019
Member

put this before the loop?

Copy link
Member

@jnothman jnothman left a comment

I think this deserves some small readability improvements. Otherwise LGTM ;)

index.append(current_col)

# d >= 1
for d in range(1, self.degree):

This comment has been minimized.

@jnothman

jnothman Jun 18, 2019
Member

Suggested change
for d in range(1, self.degree):
for _ in range(1, self.degree):
index = list(range(current_col,
current_col + n_features))
current_col += n_features
index.append(current_col)

This comment has been minimized.

@jnothman

jnothman Jun 18, 2019
Member

I think somewhere you should describe what the index variable is for.

for i, comb in enumerate(combinations):
XP[:, i] = X[:, comb].prod(1)

# What follows is a faster implementation of:

This comment has been minimized.

@jnothman

jnothman Jun 18, 2019
Member

At the moment I don't find the algorithm easy to read. I think if you explicitly say "dynamic programming" it might be clearer that your algorithm is building subsequent columns from precomputed partial solutions.

new_index = []
end = index[-1]
for feature_idx in range(n_features):
start = index[feature_idx]

This comment has been minimized.

@jnothman

jnothman Jun 18, 2019
Member

Add a comment like: # XP[start:end] are terms of degree d - 1 that exclude feature #feature_idx

@@ -1548,6 +1548,15 @@ def transform(self, X):
# What follows is a faster implementation of:
# for i, comb in enumerate(combinations):
# XP[:, i] = X[:, comb].prod(1)
# This new implementation uses two optimisations.

This comment has been minimized.

@jnothman

jnothman Jun 27, 2019
Member

I don't think "new" belongs in merged code :)

This comment has been minimized.

@jnothman

jnothman Jul 26, 2019
Member

Suggested change
# This new implementation uses two optimisations.
# This implementation uses two optimisations.
@rth
Copy link
Member

@rth rth commented Jul 25, 2019

Looks like there is a +2..+3 for merging on this PR, and most of the comments were addressed (except for the last minor wording improvement)?

Is someone among the reviewers willing to merge it then?

@jnothman jnothman merged commit ce869d8 into scikit-learn:master Jul 29, 2019
3 of 9 checks passed
3 of 9 checks passed
@azure-pipelines
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas failed
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas failed
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux pylatest_conda_mkl_pandas) Linux pylatest_conda_mkl_pandas failed
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit failed
Details
@azure-pipelines
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl failed
Details
ci/circleci: lint Your tests are queued behind your running builds
Details
@lgtm-com
LGTM analysis: C/C++ No code changes detected
Details
@lgtm-com
LGTM analysis: JavaScript No code changes detected
Details
@lgtm-com
LGTM analysis: Python No new or fixed alerts
Details
Sprint Paris 2019 automation moved this from In progress to Done Jul 29, 2019
@jnothman
Copy link
Member

@jnothman jnothman commented Jul 29, 2019

Thanks @sdpython!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Linked issues

Successfully merging this pull request may close these issues.

8 participants