MNT Removed `_safe_accumulator_op` for first-pass algorithm in `_assert_all_finite` #23446

Micky774 · 2022-05-23T21:09:31Z

Reference Issues/PRs

Follow-up to #23347
Related to #23197
Specifically addresses #23197 (comment)

What does this implement/fix? Explain your changes.

Removes _safe_accumulator_op from _assert_all_finite since it is not needed in the average case, and can be a significant bottleneck. Even when a false-positive is detected in the rare (and yet-untested) case, the second-pass algorithm will determine it explicitly.

Any other comments?

For profiling info refer to: #23197 (comment)

ogrisel

LGTM:

on main

In [1]: import numpy as np
   ...: from sklearn.utils.validation import _assert_all_finite
   ...: a = np.random.RandomState(0).randn(int(1e8)).astype(np.float32)
   ...: %timeit _assert_all_finite(a)
54.9 ms ± 383 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

on this branch:

In [1]: import numpy as np
   ...: from sklearn.utils.validation import _assert_all_finite
   ...: a = np.random.RandomState(0).randn(int(1e8)).astype(np.float32)
   ...: %timeit _assert_all_finite(a)
28.4 ms ± 39.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

sklearn/utils/validation.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

ogrisel · 2022-05-24T17:08:04Z

Not sure if we should document this in the changelog for 1.2. Maybe we should have something like "Reduce the overhead of finiteness checks for float32 input data by leveraging numpy's SIMD optimized primitives." or something similar.

Micky774 · 2022-05-24T18:44:09Z

Not sure if we should document this in the changelog for 1.2. Maybe we should have something like "Reduce the overhead of finiteness checks for float32 input data by leveraging numpy's SIMD optimized primitives." or something similar.

I figured this was a small enough change pretty separated from what users really interact with that it would be fine to omit a changelog entry. If you/other reviewers think mentioning the performance gain would be worthwhile I will of course add an entry :)

thomasjpfan · 2022-05-24T19:34:08Z

If you/other reviewers think mentioning the performance gain would be worthwhile I will of course add an entry :)

I think it's worth adding a changelog entry for performance improvements.

Micky774 · 2022-05-24T23:34:52Z

I think it's worth adding a changelog entry for performance improvements.

Added!

doc/whats_new/v1.2.rst

thomasjpfan

LGTM

…rt_all_finite` (scikit-learn#23446) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Removed safe_accumulator_op for first-pass algorithm

c646046

github-actions bot added the module:utils label May 23, 2022

Micky774 changed the title ~~Removed safe_accumulator_op for first-pass algorithm~~ MAINT Removed _safe_accumulator_op for first-pass algorithm in _assert_all_finite May 23, 2022

Merge branch 'main' into remove_safe_accumulator

cd25ed7

ogrisel approved these changes May 24, 2022

View reviewed changes

sklearn/utils/validation.py Outdated Show resolved Hide resolved

Micky774 and others added 2 commits May 24, 2022 13:04

Update sklearn/utils/validation.py

2688cdf

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Merge branch 'main' into remove_safe_accumulator

77bb18a

Merge branch 'main' into remove_safe_accumulator

89f6fed

Added changelog entry

40846f7

ogrisel reviewed May 25, 2022

View reviewed changes

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved

Micky774 added 2 commits May 25, 2022 08:16

Update changelog

658bb8f

Merge branch 'main' into remove_safe_accumulator

b8d54a8

Micky774 changed the title ~~MAINT Removed _safe_accumulator_op for first-pass algorithm in _assert_all_finite~~ MNT Removed _safe_accumulator_op for first-pass algorithm in _assert_all_finite May 25, 2022

thomasjpfan approved these changes May 25, 2022

View reviewed changes

thomasjpfan merged commit 38d23c4 into scikit-learn:main May 25, 2022

Micky774 deleted the remove_safe_accumulator branch May 25, 2022 20:40

mathijs02 pushed a commit to mathijs02/scikit-learn that referenced this pull request Dec 27, 2022

MNT Removed _safe_accumulator_op for first-pass algorithm in `_asse…

9ed9ea0

…rt_all_finite` (scikit-learn#23446) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNT Removed `_safe_accumulator_op` for first-pass algorithm in `_assert_all_finite` #23446

MNT Removed `_safe_accumulator_op` for first-pass algorithm in `_assert_all_finite` #23446

Micky774 commented May 23, 2022 •

edited

Loading

ogrisel left a comment

ogrisel commented May 24, 2022

Micky774 commented May 24, 2022

thomasjpfan commented May 24, 2022

Micky774 commented May 24, 2022

thomasjpfan left a comment

MNT Removed _safe_accumulator_op for first-pass algorithm in _assert_all_finite #23446

MNT Removed _safe_accumulator_op for first-pass algorithm in _assert_all_finite #23446

Conversation

Micky774 commented May 23, 2022 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel commented May 24, 2022

Micky774 commented May 24, 2022

thomasjpfan commented May 24, 2022

Micky774 commented May 24, 2022

thomasjpfan left a comment

Choose a reason for hiding this comment

MNT Removed `_safe_accumulator_op` for first-pass algorithm in `_assert_all_finite` #23446

MNT Removed `_safe_accumulator_op` for first-pass algorithm in `_assert_all_finite` #23446

Micky774 commented May 23, 2022 •

edited

Loading