Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #13173, implements faster polynomial features for dense matrices #13290

Open
wants to merge 15 commits into
base: master
from

Conversation

4 participants
@sdpython
Copy link

sdpython commented Feb 26, 2019

Reference Issues/PRs

Fixes #13173.

What does this implement/fix? Explain your changes.

This PR implements a faster version of polynomial features for dense matrices. Instead of computing each features independently by going through all combinations, it broadcasts as much as possible the multiple by one vector. The speed up is significant (3 times faster) for matrices with less than 10.000 rows.

Any other comments?

More details about speed up can be found here: https://github.com/sdpython/mlinsights/blob/master/_doc/notebooks/sklearn/faster_polynomial_features.ipynb.

@agramfort agramfort added this to In progress in Sprint Paris 2019 Feb 26, 2019

sdpython added some commits Feb 26, 2019

@GaelVaroquaux

This comment has been minimized.

Copy link
Member

GaelVaroquaux commented Feb 27, 2019

Can you add the benchmark script that you used in the benchmarks folder of the repo and paste here the graph that it outputs.

@sdpython

This comment has been minimized.

Copy link
Author

sdpython commented Feb 27, 2019

image

X-axis is number of observations
Y-axis is time in seconds

degree  interaction_only      n  nfeat order  time_0_20_2  time_current

1 2 True 1 10 C 0.000299 0.000100
25 2 True 10 10 C 0.000399 0.000100
49 2 True 100 10 C 0.000399 0.000100
73 2 True 1000 10 C 0.000898 0.000299
97 2 True 10000 10 C 0.007676 0.006383
degree interaction_only n nfeat order time_0_20_2 time_current
13 2 True 1 20 C 0.001596 0.000100
37 2 True 10 20 C 0.001396 0.000199
61 2 True 100 20 C 0.001496 0.000100
85 2 True 1000 20 C 0.004189 0.002495
109 2 True 10000 20 C 0.031014 0.031117

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Feb 27, 2019

The benchmark script needs a couple of quick fixes:

benchmarks/bench_plot_polynomial_features.py:10:1: F401 'sys' imported but unused
import sys
^
benchmarks/bench_plot_polynomial_features.py:11:1: F401 'warnings' imported but unused
import warnings
^
benchmarks/bench_plot_polynomial_features.py:12:1: F401 'numbers' imported but unused
import numbers
^
benchmarks/bench_plot_polynomial_features.py:24:1: F401 'sklearn.utils.testing.ignore_warnings' imported but unused
from sklearn.utils.testing import ignore_warnings
^
benchmarks/bench_plot_polynomial_features.py:87:80: E501 line too long (94 > 79 characters)
                                                              degree, interaction_only, order)
                                                                               ^
benchmarks/bench_plot_polynomial_features.py:89:80: E501 line too long (87 > 79 characters)
                                                       degree, interaction_only, order)
                                                                               ^
benchmarks/bench_plot_polynomial_features.py:105:38: E261 at least two spaces before inline comment
                                break # stops if longer than a second
@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Feb 27, 2019

Can you please do a benchmark for a larger number of features? The new code is significantly more complex than the old one so I would like to see case where the new code is significantly faster than the old code on a case that lasts more than 10s.

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Feb 27, 2019

Arguably it might still be interesting to be significantly faster on a small number of samples at prediction time to reduce the single sample prediction latency but I am not sure that a change from 0.3ms to 0.1ms is important enough to justify the increase in code complexity.

sdpython added some commits Feb 27, 2019

@sdpython

This comment has been minimized.

Copy link
Author

sdpython commented Feb 28, 2019

image

image

Full results: https://github.com/sdpython/_benchmarks/tree/master/scikit-learn/results

You are probably right about the speed up. However, the new version is a better inspiration for anybody who needs to implement a runtime for fast predictions. That was also one of my objectives by proposing this change. I don't know if many people search for numerical recipes inside scikit-learn code. I do that sometimes.

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Feb 28, 2019

Thank you for the additional benchmark results. So indeed for a large number of features the difference starts to be significant. Maybe this code a good candidate for cythonization but I am not saying we should wait for cythonization to merge this PR.

I would like to have other people opinions in the complexity vs performance tradeoff of this PR.

@sdpython

This comment has been minimized.

Copy link
Author

sdpython commented Feb 28, 2019

I just made a fix and remove one data copy (no copy of np.multiply result) and the improvment is significant even for 100.000 rows.

image

Basically, the implementation is almost twice faster than the current one for matrices with 100.000 rows.

# the matrix first to multiply contiguous memory segments.
transpose = X.size <= 1e7 and self.order == 'C'
n = X.shape[1]

This comment has been minimized.

@jeremiedbb

jeremiedbb Feb 28, 2019

Contributor

When the size is larger, is transposing slower or equivalent ?
And, when the matrix is small does the transposing bring a significant part of the speed up ?

My point is that the transpose switch makes the code twice as long, and I'm wondering if we could always do one or the other.

This comment has been minimized.

@sdpython

sdpython Feb 28, 2019

Author

image

I remove the transpose and it seemed better. So i'll keep it. I added it because it was better before I removed one unnecessary copy this morning. transpose is inefficient when order=='F' so the only option to simplify is to remove it. Updating the PR.

@GaelVaroquaux

This comment has been minimized.

Copy link
Member

GaelVaroquaux commented Feb 28, 2019

@sdpython

This comment has been minimized.

Copy link
Author

sdpython commented Feb 28, 2019

Hi,

I created a benchmark which tests the four following functions:

def fcts_model(X, y):
    
    model1 = SGDClassifier()
    model2 = make_pipeline(PolynomialFeatures(), SGDClassifier())
    model3 = make_pipeline(ExtendedFeatures(kind='poly'), SGDClassifier())
    model4 = make_pipeline(ExtendedFeatures(kind='poly-slow'), SGDClassifier())

    model1.fit(PolynomialFeatures().fit_transform(X), y)
    model2.fit(X, y)
    model3.fit(X, y)
    model4.fit(X, y)
    
    def partial_fit_model1(X, y, model=model1):
        return model.partial_fit(X, y)
    
    def partial_fit_model2(X, y, model=model2):
        X2 = model.steps[0][1].transform(X)
        return model.steps[1][1].partial_fit(X2, y)
    
    def partial_fit_model3(X, y, model=model3):
        X2 = model.steps[0][1].transform(X)
        return model.steps[1][1].partial_fit(X2, y)
    
    def partial_fit_model4(X, y, model=model4):
        X2 = model.steps[0][1].transform(X)
        return model.steps[1][1].partial_fit(X2, y)

    return partial_fit_model1, partial_fit_model2, partial_fit_model3, partial_fit_model4

It tests the combination of PolynomialFeatures + SGDClassifier. On the following graph: SGD = SGDClassifier only, SGD-SKL = SGD + POLY 0.20.2 , SGD-FAST is the new implementation, SGD-SLOW is 0.20.2 but implemented in the bench to make sure I'm not sure my branch to make the test. Is that what you expected?

image

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Feb 28, 2019

It tests the combination of PolynomialFeatures + SGDClassifier. On the following graph: SGD = SGDClassifier only, SGD-SKL = SGD + POLY 0.20.2 , SGD-FAST is the new implementation, SGD-SLOW is 0.20.2 but implemented in the bench to make sure I'm not sure my branch to make the test.

This is really confusing. Can you please rephrase?

@ogrisel
Copy link
Member

ogrisel left a comment

Some more comments:

Show resolved Hide resolved sklearn/preprocessing/data.py Outdated
Show resolved Hide resolved sklearn/preprocessing/data.py Outdated
if self.interaction_only:
start = a + index[i+1] - index[i]
else:
start = a

This comment has been minimized.

@ogrisel

ogrisel Feb 28, 2019

Member

This could be simplified as:

                            new_index.append(pos)
                            start = index[i]
                            if self.interaction_only:
                                start += index[i+1] - index[i]

This comment has been minimized.

@sdpython
@sdpython

This comment has been minimized.

Copy link
Author

sdpython commented Mar 1, 2019

I added the second benchmark to the sources:

  • SGD=SGDClasifier only
  • SGD-SKL=PolynomialFeatures from scikit-learn (no matter what it is)
  • SGD-FAST=new implementation copy-pasted in the benchmark source file
  • SGD-SLOW=implementation of 0.20.2 copy-pasted in the benchmark source file
@GaelVaroquaux

This comment has been minimized.

Copy link
Member

GaelVaroquaux commented Mar 1, 2019

I hate to say, but I find the benchmark code very hard to read. It's very factored and generic. I would personally write much simpler code, even if it repeats a bit more.

@sdpython

This comment has been minimized.

Copy link
Author

sdpython commented Mar 2, 2019

Both benchmarks I assume?

@GaelVaroquaux

This comment has been minimized.

Copy link
Member

GaelVaroquaux commented Mar 2, 2019

I had the online one in mind.

@sdpython

This comment has been minimized.

Copy link
Author

sdpython commented Mar 3, 2019

I updated the second benchmark and the code. Below what I measure written in a different way.

def measure_sgd_only(X_train, y_train, Xs, ys):
    model = SGDClassifier()
    model.fit(PolynomialFeatures().fit_transform(X_train), y_train)
    Xs = [PolynomialFeatures().fit_transform(x) for x in Xs]
    st = time()
    for X, y in zip(Xs, ys):
        model.partial_fit(X, y)
    end = time()
    return end - st


def measure_sgd_poly_skl(X_train, y_train, Xs, ys):
    model = make_pipeline(PolynomialFeatures(), SGDClassifier())
    model.fit(X_train, y_train)
    st = time()
    for X, y in zip(Xs, ys):
        X2 = model.steps[0][1].transform(X)
        model.steps[1][1].partial_fit(X2, y)
    end = time()
    return end - st


def measure_sgd_poly_fast(X_train, y_train, Xs, ys):
    model = make_pipeline(CustomPolynomialFeatures(kind='poly-fast'),
                          SGDClassifier())
    model.fit(X_train, y_train)
    st = time()
    for X, y in zip(Xs, ys):
        X2 = model.steps[0][1].transform(X)
        model.steps[1][1].partial_fit(X2, y)
    end = time()
    return end - st


def measure_sgd_poly_slow(X_train, y_train, Xs, ys):
    model = make_pipeline(CustomPolynomialFeatures(kind='poly-slow'),
                          SGDClassifier())
    model.fit(X_train, y_train)
    st = time()
    for X, y in zip(Xs, ys):
        X2 = model.steps[0][1].transform(X)
        model.steps[1][1].partial_fit(X2, y)
    end = time()
    return end - st
@sdpython

This comment has been minimized.

Copy link
Author

sdpython commented Mar 3, 2019

I was looking for a way to show how the improvment works. The following graph plots the ratio (new execution time / execution time of 0.20.2) accross all configurations. This shows how it moves against the number of features and observations.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.