Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sag handle numerical error outside of cython #13389

Conversation

@pierreglaser
Copy link
Contributor

@pierreglaser pierreglaser commented Mar 4, 2019

Closes #13316

Cython's gcc branch prediction directive make sag unnecessarily fight for the GIL at a high frequency in case an error needs to be raised. This severely impacts the performance of parallel sag calls using a joblib "threading" backend.

I propose to handle numerical error "C-style", by propagating return codes and breaking out of for-loops. An error is raised once and for all after sag, in sag_solver.

Note that I need to manually break out of for-loops and not simply return -1 after the check, as returning requires the GIL, so the same problem would happen.

@jeremiedbb followed the issue during last sprint.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

looks good.

just a couple of minor comments.

Also can you run the benchmark #13316 with this change ?

doc/whats_new/v0.21.rst Show resolved Hide resolved
sklearn/linear_model/sag_fast.pyx.tp Outdated Show resolved Hide resolved
@pierreglaser
Copy link
Contributor Author

@pierreglaser pierreglaser commented Mar 4, 2019

Running the same #13316 benchmark, I now get:

backend: threading n_jobs: 1 total time: (6.028,  n_iter: 15.0)
backend: threading n_jobs: 2 total time: (3.815,  n_iter: 15.6)
backend: threading n_jobs: 4 total time: (3.018,  n_iter: 15.8)
backend:      loky n_jobs: 1 total time: (6.448,  n_iter: 15.3)
backend:      loky n_jobs: 2 total time: (4.532,  n_iter: 15.4)
backend:      loky n_jobs: 4 total time: (3.255,  n_iter: 15.3)
@TomDLT
TomDLT approved these changes Mar 4, 2019
Copy link
Member

@TomDLT TomDLT left a comment

LGTM

@massich
massich approved these changes Mar 5, 2019
Copy link
Contributor

@massich massich left a comment

LGTM

sklearn/linear_model/sag.py Outdated Show resolved Hide resolved
Copy link
Member

@ogrisel ogrisel left a comment

This is a really weird performance bug. Have you tried to re-run your benchmark with this PR to confirm that it fixes the performance issue observed with the threading backend?

Can you try to compile this extension without branch prediction to confirm that the GIL acquisition is caused by this as suggested in #13316 (comment)? That sounds really weird to me.

sklearn/linear_model/tests/test_sag.py Outdated Show resolved Hide resolved
@jeremiedbb
Copy link
Member

@jeremiedbb jeremiedbb commented Mar 5, 2019

This is a really weird performance bug. Have you tried to re-run your benchmark with this PR to confirm that it fixes the performance issue observed with the threading backend?

just look a bit above :)

@jeremiedbb
Copy link
Member

@jeremiedbb jeremiedbb commented Mar 5, 2019

Can you try to compile this extension without branch prediction to confirm that the GIL acquisition is caused by this as suggested in #13316 (comment)? That sounds really weird to me.

This is strange indeed but if you comment the

if something_not_finite:
    with gil:
        raise error

bloks, the issue disappears, although the condition is never met (because no error is raised)

@pierreglaser
Copy link
Contributor Author

@pierreglaser pierreglaser commented Mar 5, 2019

@ogrisel I tried disabling Cython.Compiler.Options.gcc_branch_hints using current master but that did not improve performance.
However, not trying to aquire the gil in such setting is not unusual: in sgd_fast.pyx, the same "trick" is done:

# floating-point under-/overflow check.
if (not skl_isfinite(intercept)
or any_nonfinite(<double *>weights.data, n_features)):
infinity = True
break

then further along the way:

if infinity:
raise ValueError(("Floating-point under-/overflow occurred at epoch"
" #%d. Scaling input data with StandardScaler or"
" MinMaxScaler might help.") % (epoch + 1))

@jeremiedbb
Copy link
Member

@jeremiedbb jeremiedbb commented Mar 5, 2019

I tried disabling Cython.Compiler.Options.gcc_branch_hints

these are just hints but I think it does not disable branch prediction. Branch prediction is done by the cpu. I can't see how to disable it.

@ogrisel
ogrisel approved these changes Mar 6, 2019
Copy link
Member

@ogrisel ogrisel left a comment

nitpick but otherwise looks good to me!

I suppose we might want to set prefer="threads" for LogisticRegressionCV when the saga solver is used but this should probably be done elsewhere.

sklearn/linear_model/sag_fast.pyx.tp Outdated Show resolved Hide resolved
sklearn/linear_model/sag_fast.pyx.tp Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_sag.py Show resolved Hide resolved
@ogrisel ogrisel merged commit 0f0799a into scikit-learn:master Mar 6, 2019
16 checks passed
16 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
@codecov
codecov/patch 100% of diff hit (target 92.36%)
Details
@codecov
codecov/project 92.36% (+<.01%) compared to a3ccbe9
Details
@azure-pipelines
scikit-learn.scikit-learn Build #20190306.9 succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux py35_np_atlas) Linux py35_np_atlas succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux pylatest_conda) Linux pylatest_conda succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Windows py35_32) Windows py35_32 succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Windows py37_64) Windows py37_64 succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (macOS pylatest_conda) macOS pylatest_conda succeeded
Details
@wamuir
Copy link

@wamuir wamuir commented Mar 10, 2019

So much better. Thank you @pierreglaser

xhlulu added a commit to xhlulu/scikit-learn that referenced this pull request Apr 28, 2019
xhlulu added a commit to xhlulu/scikit-learn that referenced this pull request Apr 28, 2019
xhlulu added a commit to xhlulu/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde added a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants