ENH Optimise `decomposition.FastICA.fit` memory footprint and runtime #22268

MohamedBsh · 2022-01-22T17:12:36Z

Reference Issues/PRs

Fixes #20558.

What does this implement/fix? Explain your changes.

Possible performance improvement of FastICA.

Many contributors have demonstrated with different benchmarks that there is a runtime/memory saving by using np.einsum instead of np.dot.

@jjerphan @chritter @norbusan

MohamedBsh · 2022-01-22T17:24:16Z

It seems that one of the tests failed because of the following warning :

sklearn/decomposition/_fastica.py:12:1: F401 'email.mime.base' imported but unused
from email.mime import base

How do I handle this ?

jjerphan

Nice to see you contributing, @MohamedBsh! 👋

Here are a few comments helping with this contribution.

When you addressed them, I think the only thing left will be benchmarks.

To do this, you can:

Start from @ogrisel's script #20558 (comment) and adapt it to %timeit and %memit the sklearn.decomposition.FastICA.fit method
Report the result you get for scikit-learn current implementation (i.e. on main)
Report the result you get on this PR branch (i.e perfImprovementFastICA)

Let me know if your need more information.

sklearn/decomposition/_fastica.py

MohamedBsh · 2022-01-22T19:41:51Z

perfImprovementFastICA

Hey @jjerphan ,I have retrieved the following examples within the proposed examples in the documentation and here are the results obtained between the main branch and this branch:

Example 1 : load_digits

from sklearn.datasets import load_digits
from sklearn.decomposition import FastICA
X, _ = load_digits(return_X_y=True)
transformer = FastICA(n_components=7,random_state=0,whiten='unit-variance')

Result - perfImprovementFastICA branch :

%timeit transformer.fit_transform(X)

> 156 ms ± 8.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%memit transformer.fit_transform(X)

> peak memory: 96.81 MiB, increment: -2.62 MiB

Result - main branch :

%timeit transformer.fit_transform(X)

> 148 ms ± 9.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%memit transformer.fit_transform(X)

> peak memory: 104.30 MiB, increment: 0.09 MiB

Example 2 : load_breast_cancer

from sklearn.datasets import load_breast_cancer
from sklearn.decomposition import FastICA
X, _ = load_breast_cancer(return_X_y=True)
transformer = FastICA(n_components=7,random_state=0,whiten='unit-variance')

Result - perfImprovementFastICA branch :

%timeit transformer.fit_transform(X)

> 7.86 ms ± 2.81 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

%memit transformer.fit_transform(X)

peak memory: 97.33 MiB, increment: 1.51 MiB

Result - main branch :

%timeit transformer.fit_transform(X)

> 7.78 ms ± 1.77 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

%memit transformer.fit_transform(X)

> peak memory: 97.76 MiB, increment: 0.18 MiB

MohamedBsh · 2022-01-22T20:16:17Z

Hmm, to be honest, I don't know how to fix the errors related to azure-pipelines :

Build log #L13
The HTTP request timed out after 00:00:30.

jjerphan · 2022-01-24T08:32:55Z

Thanks for the benchmark. Can you use bigger synthetic datasets?

Azure CI runs sometimes fail randomly. The only thing that we can do is to restart it.

To trigger the CI again, you can create an empty commit:

git commit --allow-empty -m "CI Rerun CI"

and push it.

MohamedBsh · 2022-01-26T19:48:45Z

Hey, sorry for the late reply.

I have generated two synthetic datasets and retrieved the following example within the proposed example in Real World dataset fetch_california_housing.

Here are the results obtained between the main branch and this branch:

Dataset fetch_california_housing
Samples total : 20640 rows

MAIN

from sklearn.datasets import fetch_california_housing
from sklearn.decomposition import FastICA
X, _ = fetch_california_housing(return_X_y=True)
transformer = FastICA(n_components=7,random_state=0,whiten='unit-variance')

%timeit transformer.fit_transform(X)

> 35.6 ms ± 3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%load_ext memory_profiler
%memit transformer.fit_transform(X)

> peak memory: 98.03 MiB, increment: 0.08 MiB

Current branch

%timeit transformer.fit_transform(X)

> 36.6 ms ± 3.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%memit transformer.fit_transform(X)

> peak memory: 93.72 MiB, increment: 0.14 MiB

1st Synthetic Dataset
n_samples = 50000

from sklearn.decomposition import FastICA
import sklearn.datasets as dt
rand_state = 11
noise = 0.2
X,Y = dt.make_regression(n_samples=50000,
                             n_features=2,
                             noise=noise,
                             random_state=rand_state)
transformer = FastICA(n_components=7,random_state=0,whiten='unit-variance')

MAIN

%timeit transformer.fit_transform(X)

> 35.5 ms ± 11.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%memit transformer.fit_transform(X)

> peak memory: 89.88 MiB, increment: -0.24 MiB

Current branch

%timeit transformer.fit_transform(X)

> 33.7 ms ± 7.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%memit transformer.fit_transform(X)

> peak memory: 103.87 MiB, increment: 0.18 MiB

2nd Synthetic Dataset
n_samples = 100000

from sklearn.decomposition import FastICA
import sklearn.datasets as dt
rand_state = 11
noise = 0.2
X,Y = dt.make_regression(n_samples=100000,
                             n_features=2,
                             noise=noise,
                             random_state=rand_state)
transformer = FastICA(n_components=7,random_state=0,whiten='unit-variance')

MAIN

%timeit transformer.fit_transform(X)

> 35.4 ms ± 5.67 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%memit transformer.fit_transform(X)

> peak memory: 113.36 MiB, increment: 0.15 MiB

Current branch

%timeit transformer.fit_transform(X)

> 46.5 ms ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%memit transformer.fit_transform(X)

> peak memory: 97.33 MiB, increment: -0.48 MiB

The results seem to indicate that there is a memory footprint gain but it is rather unclear on the runtime side. Feel free to react and reproduce these tests.

thomasjpfan

Note for the benchmarks you shown, n_components is small, which means np.einsum will spend more of it's time parsing the string "ij,ij->i" compared to the actual computation. (Small matrix == less computation and less memory, while the parsing the string is constant)

Given this benchmark with n_components=500

Benchmark script

from sklearn.decomposition import FastICA
from time import perf_counter
from sklearn.datasets import make_blobs
from tqdm import trange
from statistics import mean, stdev
from sklearn.exceptions import ConvergenceWarning
import warnings

warnings.filterwarnings("ignore", category=ConvergenceWarning)

X, Y = make_blobs(n_samples=10000, n_features=500, random_state=0)

transformer = FastICA(n_components=500, random_state=0, whiten="unit-variance", max_iter=10)

n_repeat = 10
durations = []

for i in trange(n_repeat):
    start = perf_counter()
    transformer.fit(X)
    duration = perf_counter() - start
    durations.append(duration)

print(f"{mean(durations):.2f} +/- {stdev(durations):.2f}")

This PR: 3.67 +/- 0.08, and on main: 4.75 +/- 0.99

In theory, the memory savings for the above case is: 4 * (500 * 500 - 500) / 10^6 ~ 1 Mb per iteration. which reduces memory pressure. For memory profiling I used:

Memory profiling script

from sklearn.decomposition import FastICA
from time import perf_counter
from sklearn.datasets import make_blobs
from statistics import mean, stdev

from sklearn.exceptions import ConvergenceWarning
import warnings

warnings.filterwarnings("ignore", category=ConvergenceWarning)

X, Y = make_blobs(n_samples=10000, n_features=500, random_state=0)

transformer = FastICA(
    n_components=500, random_state=0, whiten="unit-variance", max_iter=10
)

transformer.fit(X)

Using scalene memory line profiling, I see that on main, the original line used 4Mb. While with this PR, the changed lined uses 0.0Mb. (I'm guessing it collapsed ~ 0.002 Mb).

Summary

There is a small gain in memory, and there is a runtime benefit for high values of n_components. Overall I am +1 on this.

sklearn/decomposition/_fastica.py

jjerphan · 2022-03-04T14:34:23Z

Hi @MohamedBsh, do you still have time to work on this?

MohamedBsh · 2022-03-26T21:06:07Z

Hello @jjerphan, the benchmark of @thomasjpfan seems relevant. However I can't reproduce the results with my example. When I run this code in Main / Branch I get the following warnings:

from sklearn.datasets import fetch_california_housing
from sklearn.decomposition import FastICA
X, _ = fetch_california_housing(return_X_y=True)
transform = FastICA(n_components=500,random_state=0,whiten='unit-variance')

../scikit-learn/sklearn/decomposition/_fastica.py:540: UserWarning: n_components is too large: it will be set to 8

=> I can't set n_components=500 and do comparisons like @thomasjpfan. n_components is automatically set to 8 (this is equivalent to doing the initial benchmark with n_components=7).

transform = FastICA(n_components=500,random_state=0,whiten='unit-variance',max_iter=10)

../scikit-learn/sklearn/decomposition/_fastica.py:116: ConvergenceWarning: FastICA did not converge. Consider increasing tolerance or the maximum number of iterations.

-> When I put max_iter=10, FastICA does not converge.

Is it possible to have more information about this?

@jjerphan @thomasjpfan thank you for your time!

sklearn/decomposition/_fastica.py

ogrisel · 2022-04-05T13:26:01Z

=> I can't set n_components=500 and do comparisons like @thomasjpfan. n_components is automatically set to 8 (this is equivalent to doing the initial benchmark with n_components=7).

This is expected since the California housing dataset just has 8 dimensions. The performance improvement can only be measured with wider datasets.

ogrisel

LGTM to me as well once #22268 (comment) is accepted.

@MohamedBsh please also document the performance improvement in a dedicated changelog entry in doc/whats_new/v1.1.rst.

ogrisel · 2022-04-05T13:29:40Z

I merged the main branch to this PR to check if it fixes the circle CI problem.

…log entry in doc/whats_new/v1.1.rst.

MohamedBsh · 2022-04-05T21:32:51Z

Done.
Thank you for your time and your explanations @ogrisel @lorentzenchr @jjerphan @thomasjpfan.
I am +1 on this.

lorentzenchr

LGTM

lorentzenchr · 2022-04-08T06:32:17Z

@MohamedBsh Could once more merge the branch main into this branch to resolve merge conflicts (in the whats new file). Then we're ready to merge this PR.

jjerphan

LGTM

doc/whats_new/v1.2.rst

…scikit-learn#22268) * memory improvements and fast execution with np.einsum than np.dot

memory improvements and fast execution with np.einsum than np.dot

514cbdf

github-actions bot added the module:decomposition label Jan 22, 2022

jjerphan reviewed Jan 22, 2022

View reviewed changes

sklearn/decomposition/_fastica.py Outdated Show resolved Hide resolved

sklearn/decomposition/_fastica.py Outdated Show resolved Hide resolved

sklearn/decomposition/_fastica.py Outdated Show resolved Hide resolved

jjerphan changed the title ~~memory improvements and fast execution with np.einsum than np.dot~~ FIX Optimise decomposition.FastICA.fit memory footprint and runtime Jan 22, 2022

jjerphan changed the title ~~FIX Optimise decomposition.FastICA.fit memory footprint and runtime~~ ENH Optimise decomposition.FastICA.fit memory footprint and runtime Jan 22, 2022

fix typos and comments

9b3a4be

ogrisel added the Performance label Jan 24, 2022

CI Rerun CI

aa2150e

thomasjpfan reviewed Feb 14, 2022

View reviewed changes

sklearn/decomposition/_fastica.py Show resolved Hide resolved

sklearn/decomposition/_fastica.py Outdated Show resolved Hide resolved

lorentzenchr reviewed Apr 4, 2022

View reviewed changes

sklearn/decomposition/_fastica.py Outdated Show resolved Hide resolved

sklearn/decomposition/_fastica.py Outdated Show resolved Hide resolved

ogrisel reviewed Apr 5, 2022

View reviewed changes

sklearn/decomposition/_fastica.py Outdated Show resolved Hide resolved

Update sklearn/decomposition/_fastica.py

c9d0bf9

ogrisel reviewed Apr 5, 2022

View reviewed changes

Merge branch 'main' into perfImprovementFastICA

08f9df3

MohamedBsh added 2 commits April 5, 2022 22:36

fix: change explanation for perf improvement ICA

171d95e

changelog: document the performance improvement in a dedicated change…

e087598

…log entry in doc/whats_new/v1.1.rst.

lorentzenchr approved these changes Apr 8, 2022

View reviewed changes

MohamedBsh and others added 2 commits April 9, 2022 13:43

fix: add my comrades

832b548

Merge branch 'main' into perfImprovementFastICA

d2efdf3

cmarmo added the Waiting for Reviewer label May 10, 2022

jjerphan added 2 commits June 14, 2022 10:57

Merge branch 'main' into perfImprovementFastICA

9cab683

DOC Update changelog entry

e3471c2

jjerphan removed the Waiting for Reviewer label Jun 14, 2022

jjerphan added 2 commits June 14, 2022 11:01

fixup! DOC Update changelog entry

35bce6b

fixup! fixup! DOC Update changelog entry

a55953f

jjerphan approved these changes Jun 14, 2022

View reviewed changes

jjerphan reviewed Jun 14, 2022

View reviewed changes

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved

DOC FIx whats_new section

f3c2845

lorentzenchr merged commit 8308646 into scikit-learn:main Jun 14, 2022

ogrisel pushed a commit to ogrisel/scikit-learn that referenced this pull request Jul 11, 2022

ENH Optimise decomposition.FastICA.fit memory footprint and runtime (…

2040197

…scikit-learn#22268) * memory improvements and fast execution with np.einsum than np.dot

Uh oh!

ENH Optimise decomposition.FastICA.fit memory footprint and runtime #22268

ENH Optimise decomposition.FastICA.fit memory footprint and runtime #22268

Uh oh!

Conversation

MohamedBsh commented Jan 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

MohamedBsh commented Jan 22, 2022

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MohamedBsh commented Jan 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Result - perfImprovementFastICA branch :

Result - main branch :

Result - perfImprovementFastICA branch :

Result - main branch :

Uh oh!

MohamedBsh commented Jan 22, 2022

Uh oh!

jjerphan commented Jan 24, 2022

Uh oh!

MohamedBsh commented Jan 26, 2022 • edited by jjerphan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasjpfan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Summary

Uh oh!

Uh oh!

Uh oh!

jjerphan commented Mar 4, 2022

Uh oh!

MohamedBsh commented Mar 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Apr 5, 2022

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Apr 5, 2022

Uh oh!

MohamedBsh commented Apr 5, 2022

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

lorentzenchr commented Apr 8, 2022

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ENH Optimise `decomposition.FastICA.fit` memory footprint and runtime #22268

ENH Optimise `decomposition.FastICA.fit` memory footprint and runtime #22268

MohamedBsh commented Jan 22, 2022 •

edited

Loading

MohamedBsh commented Jan 22, 2022 •

edited

Loading

MohamedBsh commented Jan 26, 2022 •

edited by jjerphan

Loading

thomasjpfan left a comment •

edited

Loading

MohamedBsh commented Mar 26, 2022 •

edited

Loading