[MRG] Avoid uncessary copies in sklearn.preprocessing #13987

rth · 2019-05-30T08:56:24Z

Partially addresses #13986

This removes the copy=True in the fit method of StandardScaler, MinMaxScaler, MaxAbsScaler, RobustScaler where it is typically not necessary to compute the scaling factors.

In practice, this makes StandardScaler().fit_transform 10%-20% faster on the few examples I have tried.

If that copy was necessary and this mistakenly removed it check_transformer_general(.., readonly_memmap=True) would fail in common tests.

NicolasHug

LGTM.

Looks like in these specific cases inplace would have been a more descriptive parameter name than copy, and might have prevented this.

thomasjpfan

LGTM

thomasjpfan · 2019-05-30T16:40:41Z

Does this need a whats_new entry as an enhancement or a bug fix?

rth · 2019-05-31T07:08:48Z

Thanks for the reviews! Added a what's new.

thomasjpfan · 2019-05-31T15:50:58Z

QuantileTransformer has a _check_inputs that copies during fit and transform. What you think about adding copy parameter to _check_inputs and setting it to false during fit and self.copy during transform?

thomasjpfan · 2019-06-01T04:07:19Z

Thank you @rth!

Avoid uncessary copies in preprocessors

fcecfc9

rth mentioned this pull request May 30, 2019

Change default copy value from True to None #13986

Open

NicolasHug approved these changes May 30, 2019

View reviewed changes

thomasjpfan approved these changes May 30, 2019

View reviewed changes

Add what's new

83a8dde

Add QuantileTransformer

86a3288

thomasjpfan merged commit 9661a64 into scikit-learn:master Jun 1, 2019

rth deleted the avoid-copy-preprocessing branch June 1, 2019 07:30

stsievert mentioned this pull request Jun 8, 2019

sklearn dev tests failing (deprecated argument to OneHotEncoder and QuantileTransformer) dask/dask-ml#517

Closed

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

ENH Avoid uncessary copies in sklearn.preprocessing (scikit-learn#13987)

a300951

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Avoid uncessary copies in sklearn.preprocessing #13987

[MRG] Avoid uncessary copies in sklearn.preprocessing #13987

rth commented May 30, 2019

NicolasHug left a comment

thomasjpfan left a comment

thomasjpfan commented May 30, 2019

rth commented May 31, 2019

thomasjpfan commented May 31, 2019

thomasjpfan commented Jun 1, 2019

[MRG] Avoid uncessary copies in sklearn.preprocessing #13987

[MRG] Avoid uncessary copies in sklearn.preprocessing #13987

Conversation

rth commented May 30, 2019

NicolasHug left a comment

Choose a reason for hiding this comment

thomasjpfan left a comment

Choose a reason for hiding this comment

thomasjpfan commented May 30, 2019

rth commented May 31, 2019

thomasjpfan commented May 31, 2019

thomasjpfan commented Jun 1, 2019