-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
ENH Optimise decomposition.FastICA.fit
memory footprint and runtime
#22268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Optimise decomposition.FastICA.fit
memory footprint and runtime
#22268
Conversation
It seems that one of the tests failed because of the following warning :
How do I handle this ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see you contributing, @MohamedBsh! 👋
Here are a few comments helping with this contribution.
When you addressed them, I think the only thing left will be benchmarks.
To do this, you can:
- Start from @ogrisel's script #20558 (comment) and adapt it to
%timeit
and%memit
thesklearn.decomposition.FastICA.fit
method - Report the result you get for scikit-learn current implementation (i.e. on
main
) - Report the result you get on this PR branch (i.e
perfImprovementFastICA
)
Let me know if your need more information.
decomposition.FastICA.fit
memory footprint and runtime
decomposition.FastICA.fit
memory footprint and runtimedecomposition.FastICA.fit
memory footprint and runtime
Hey @jjerphan ,I have retrieved the following examples within the proposed examples in the documentation and here are the results obtained between the main branch and this branch: Example 1 : load_digits
Result - perfImprovementFastICA branch :
Result - main branch :
Example 2 : load_breast_cancer
Result - perfImprovementFastICA branch :
Result - main branch :
|
Hmm, to be honest, I don't know how to fix the errors related to azure-pipelines :
|
Thanks for the benchmark. Can you use bigger synthetic datasets? Azure CI runs sometimes fail randomly. The only thing that we can do is to restart it. To trigger the CI again, you can create an empty commit: git commit --allow-empty -m "CI Rerun CI" and push it. |
Hey, sorry for the late reply. I have generated two synthetic datasets and retrieved the following example within the proposed example in Real World dataset fetch_california_housing. Here are the results obtained between the main branch and this branch: Dataset fetch_california_housing MAIN from sklearn.datasets import fetch_california_housing
from sklearn.decomposition import FastICA
X, _ = fetch_california_housing(return_X_y=True)
transformer = FastICA(n_components=7,random_state=0,whiten='unit-variance')
%timeit transformer.fit_transform(X)
> 35.6 ms ± 3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%load_ext memory_profiler
%memit transformer.fit_transform(X)
> peak memory: 98.03 MiB, increment: 0.08 MiB Current branch
1st Synthetic Dataset from sklearn.decomposition import FastICA
import sklearn.datasets as dt
rand_state = 11
noise = 0.2
X,Y = dt.make_regression(n_samples=50000,
n_features=2,
noise=noise,
random_state=rand_state)
transformer = FastICA(n_components=7,random_state=0,whiten='unit-variance') MAIN
Current branch
2nd Synthetic Dataset from sklearn.decomposition import FastICA
import sklearn.datasets as dt
rand_state = 11
noise = 0.2
X,Y = dt.make_regression(n_samples=100000,
n_features=2,
noise=noise,
random_state=rand_state)
transformer = FastICA(n_components=7,random_state=0,whiten='unit-variance') MAIN
Current branch
The results seem to indicate that there is a memory footprint gain but it is rather unclear on the runtime side. Feel free to react and reproduce these tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for the benchmarks you shown, n_components
is small, which means np.einsum
will spend more of it's time parsing the string "ij,ij->i" compared to the actual computation. (Small matrix == less computation and less memory, while the parsing the string is constant)
Given this benchmark with n_components=500
Benchmark script
from sklearn.decomposition import FastICA
from time import perf_counter
from sklearn.datasets import make_blobs
from tqdm import trange
from statistics import mean, stdev
from sklearn.exceptions import ConvergenceWarning
import warnings
warnings.filterwarnings("ignore", category=ConvergenceWarning)
X, Y = make_blobs(n_samples=10000, n_features=500, random_state=0)
transformer = FastICA(n_components=500, random_state=0, whiten="unit-variance", max_iter=10)
n_repeat = 10
durations = []
for i in trange(n_repeat):
start = perf_counter()
transformer.fit(X)
duration = perf_counter() - start
durations.append(duration)
print(f"{mean(durations):.2f} +/- {stdev(durations):.2f}")
This PR: 3.67 +/- 0.08, and on main: 4.75 +/- 0.99
In theory, the memory savings for the above case is: 4 * (500 * 500 - 500) / 10^6 ~ 1 Mb per iteration. which reduces memory pressure. For memory profiling I used:
Memory profiling script
from sklearn.decomposition import FastICA
from time import perf_counter
from sklearn.datasets import make_blobs
from statistics import mean, stdev
from sklearn.exceptions import ConvergenceWarning
import warnings
warnings.filterwarnings("ignore", category=ConvergenceWarning)
X, Y = make_blobs(n_samples=10000, n_features=500, random_state=0)
transformer = FastICA(
n_components=500, random_state=0, whiten="unit-variance", max_iter=10
)
transformer.fit(X)
Using scalene
memory line profiling, I see that on main
, the original line used 4Mb. While with this PR, the changed lined uses 0.0Mb. (I'm guessing it collapsed ~ 0.002 Mb).
Summary
There is a small gain in memory, and there is a runtime benefit for high values of n_components
. Overall I am +1 on this.
Hi @MohamedBsh, do you still have time to work on this? |
Hello @jjerphan, the benchmark of @thomasjpfan seems relevant. However I can't reproduce the results with my example. When I run this code in Main / Branch I get the following warnings: from sklearn.datasets import fetch_california_housing
from sklearn.decomposition import FastICA
X, _ = fetch_california_housing(return_X_y=True)
transform = FastICA(n_components=500,random_state=0,whiten='unit-variance') ../scikit-learn/sklearn/decomposition/_fastica.py:540: UserWarning: n_components is too large: it will be set to 8 => I can't set n_components=500 and do comparisons like @thomasjpfan. n_components is automatically set to 8 (this is equivalent to doing the initial benchmark with n_components=7). transform = FastICA(n_components=500,random_state=0,whiten='unit-variance',max_iter=10) ../scikit-learn/sklearn/decomposition/_fastica.py:116: ConvergenceWarning: FastICA did not converge. Consider increasing tolerance or the maximum number of iterations. -> When I put max_iter=10, FastICA does not converge. Is it possible to have more information about this? @jjerphan @thomasjpfan thank you for your time! |
This is expected since the California housing dataset just has 8 dimensions. The performance improvement can only be measured with wider datasets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to me as well once #22268 (comment) is accepted.
@MohamedBsh please also document the performance improvement in a dedicated changelog entry in doc/whats_new/v1.1.rst
.
I merged the main branch to this PR to check if it fixes the circle CI problem. |
…log entry in doc/whats_new/v1.1.rst.
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@MohamedBsh Could once more merge the branch main into this branch to resolve merge conflicts (in the whats new file). Then we're ready to merge this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…scikit-learn#22268) * memory improvements and fast execution with np.einsum than np.dot
Reference Issues/PRs
Fixes #20558.
What does this implement/fix? Explain your changes.
Possible performance improvement of FastICA.
Many contributors have demonstrated with different benchmarks that there is a runtime/memory saving by using np.einsum instead of np.dot.
@jjerphan @chritter @norbusan