Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of LogisticRegression with saga. #13316

Closed
pierreglaser opened this Issue Feb 27, 2019 · 2 comments

Comments

Projects
None yet
1 participant
@pierreglaser
Copy link
Contributor

commented Feb 27, 2019

Using LogisticRegresion with solver="saga" uses a thread-based backend if possible (see these lines)
However, I observed performance issues (potentially due to over-subscription?)

import itertools
import time

import numpy as np
from sklearn.externals.joblib import parallel_backend
from sklearn.linear_model.logistic import LogisticRegression
from sklearn.datasets import make_classification


X, y = make_classification(n_samples=100000, n_features=10, n_informative=5,
                           n_classes=10)

for backend, n_jobs in itertools.product(['threading', 'loky'], [1, 2, 4]):
    with parallel_backend(backend):
        clf = LogisticRegression(solver='saga', n_jobs=n_jobs)
        t0 = time.time()
        clf.fit(X, y)
        total_time = time.time() - t0
        print("backend: {:>9} n_jobs: {} total time: ({:.3f}, "
              " n_iter: {})".format(backend, n_jobs, total_time,
                                   np.mean(clf.n_iter_)))

yields

backend: threading n_jobs: 1 total time: (7.078,  n_iter: 14.6)
backend: threading n_jobs: 2 total time: (8.821,  n_iter: 14.8)
backend: threading n_jobs: 4 total time: (30.201,  n_iter: 14.8)
backend:      loky n_jobs: 1 total time: (7.394,  n_iter: 14.3)
backend:      loky n_jobs: 2 total time: (5.375,  n_iter: 15.0)
backend:      loky n_jobs: 4 total time: (3.994,  n_iter: 15.2)

I traced the number of iterations to make sure it is not simply a matter of convergence.
Monitoring CPU usage showed a big (~50%) proportion of system calls.

ping @ogrisel @jeremiedbb

@pierreglaser

This comment has been minimized.

Copy link
Contributor Author

commented Mar 1, 2019

Debugging this requires a significant amount of time, given that I have merely no experience in Cython. Should we do anything in the meantime?

@pierreglaser

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2019

Some updates: the performance issue disappears if I comment the blocks handling numerical errors:

if not skl_isfinite{{name}}(intercept[class_ind]):
with gil:
raise_infinite_error(n_iter)

Note that these blocks are technically never entered in my benchmarks as no error was raised during their execution. So some unexecuted code is influencing the performance of sag.

On the other hand, it turns out that Cython enables branch prediction by default in its compiler options. So I suspect its prediction behavior makes it fight for the GIL early, before the check has finished. As this check is done at a high frequency in a for loop, and happens for each thread, it generates a lot of system calls, and eventually affects the performance.

#13389 gets rid of this by propagating return codes instead of raising errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.