ENH Cythonize `_assert_all_finite` using stop-on-first strategy #23197

Micky774 · 2022-04-22T23:44:14Z

Reference Issues/PRs
Fixes #11681

What does this implement/fix? Explain your changes.
Implements the code developed by jakirkham and extends it to meet requirements for current _assert_all_finite function.

Any other comments?
Currently struggling to adapt the function to work with np.float16 arrays.

To Do

Compare performance as "second pass" algorithm- Compare op speed by replacing w/ equality check
Benchmark w/ non-finite arrays of varying density
not np.isfinite--> np.isinf() and np.isnan()

thomasjpfan

Thank you for the PR!

sklearn/utils/isfinite.pyx

thomasjpfan · 2022-04-23T01:41:38Z

sklearn/utils/isfinite.pyx

+        raise TypeError("Unsupported array type: %s" % repr(numpy.PyArray_TypeObjectFromType(a_type)))
+
+
+cdef inline bint c_isfinite(const_fprecision* a_ptr, size_t step, size_t size, bint_enum disallow_nan) nogil:


There is a lot of indirection for getting disallow_nan passed through. Although, I do not see the motivation for the bint_enum.

I suspect the original implementation is trying to template out the bint_type so the overhead of checking for bint is compiled away in the loop. I am interested to see what the overhead looks like without this optimization.

I'll run some benchmarks once the rest of the pattern is stable -- I'm also interested to see the realized performance difference.

For context the bint_type was intended to move a deeply nested branch from a repeated runtime check to a one time mostly compiled check. It borrows from C++ templating tricks. That said, there are other ways to achieve the same behavior.

sklearn/utils/isfinite.pyx

sklearn/utils/validation.py

sklearn/utils/isfinite.pyx

Micky774 · 2022-04-23T18:40:06Z

Current benchmarks indicate that the optimization is dependent on dtype and that the current implementation is preferable once the number of elements ~>5000. I don't think we need to really offer a choice between the cython/python implementations since the preferable algorithm is fairly clear cut for by far most cases.

Code for benchmark:

from sklearn.utils.validation import _assert_all_finite
import numpy as np

dtype = np.float32
X = np.random.rand(1000,100).astype(dtype)

%timeit _assert_all_finite(X)

ogrisel · 2022-04-29T14:02:03Z

Could you try to use prange / OpenMP parallelism to try to see if parallelism can further increase the processing speed on a multicore machine?

thomasjpfan · 2022-04-29T14:33:20Z

Could you try to use prange / OpenMP parallelism to try to see if parallelism can further increase the processing speed on a multicore machine?

Even if it is faster with prange, I prefer not to have a drop in single thread performance compared to main. OMP_NUM_THREADS is commonly set to 1 in libraries such as dask or ray to prevent oversubscription.

We can work around this by using np.isfinite + np.sum for single threaded and use this Cython version if there are enough threads. Although, I prefer not to go down this route.

ogrisel · 2022-04-29T15:38:53Z

Even if it is faster with prange, I prefer not to have a drop in single thread performance compared to main. OMP_NUM_THREADS is commonly set to 1 in libraries such as dask or ray to prevent oversubscription.

Are you sure we will get a drop in single-threaded performance with prangewhen OMP_NUM_THREADS=1? I think it's worth measuring it.

…nto cython_assert_isfinite

thomasjpfan · 2022-04-29T17:42:47Z

With a fairly simple implementation in a Jupyter notebook: (Using %load_ext Cython):

Code for only using Cython + prange

%%cython --compile-args=-fopenmp

from libc.math cimport isfinite as c_isfinite
cimport cython
from cython.parallel cimport prange
from cython cimport floating

cdef int cy_isfinite(floating[:, ::1] a, int n_threads):
    cdef:
        int size = a.size
        int i
        floating* a_ptr = &a[0, 0]
        int output = 1
        
    for i in prange(size, num_threads=n_threads, nogil=True):
        if c_isfinite(a_ptr[i]) == 0:
            output = 0
            break
    return output

def my_isfinite(floating[:, ::1] a, int n_threads=1):
    return bool(cy_isfinite(a, n_threads=n_threads))

and these Python version:

from sklearn.utils.extmath import _safe_accumulator_op
import numpy as np

def sk_isfinite(X):
    return np.isfinite(_safe_accumulator_op(np.sum, X))

def np_isfinite_all(X):
    return np.isfinite(X).all()

Running on a fairly sized X:

rng = np.random.RandomState(42)
X = rng.rand(100_000, 100).astype(dtype)

%timeit sk_isfinite(X)
# 1.87 ms ± 8.21 µs per loop

%timeit np_isfinite_all(X)
# 4.71 ms ± 60.5 µs per loop

%timeit my_isfinite(X, n_threads=1)
# 15.8 ms ± 62.8 µs per loop

%timeit my_isfinite(X, n_threads=8)
# 2.47 ms ± 311 µs per loop

# Only using range
%timeit my_isfinite_single_with_range(X)
# 6.25 ms ± 17.6 µs per loop

From the results above scikit-learn's current np.sum + np.isfinite performs better than all Cython implementations. I think it's strange how using range is better compared to using prange with n_threads=1.

Code for only using Cython + range

%%cython

from libc.math cimport isfinite as c_isfinite
cimport cython
from cython cimport floating

cdef int cy_isfinite_with_range(floating[:, ::1] a):
    cdef:
        int size = a.size
        int i
        floating* a_ptr = &a[0, 0]
        int output = 1
        
    with nogil:
        for i in range(size):
            if c_isfinite(a_ptr[i]) == 0:
                output = 0
                break
    return output

def my_isfinite_single_with_range(floating[:, ::1] a):
    return bool(cy_isfinite_with_range(a))

ogrisel · 2022-05-02T07:54:17Z

Thanks @thomasjpfan. It's possible that the np.sum function was recently optimized with SIMD instructions that make more efficient than alternatives.

ogrisel · 2022-05-02T08:00:10Z

Indeed, using py-spy top --native to monitor a process that runs your sk_isfinite on a float32 data array in a while True loop:

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)                                                                                                                                                      
 60.00%  60.00%   38.92s    38.92s   _aligned_contig_cast_float_to_double (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
 40.00%  40.00%   25.05s    25.05s   DOUBLE_pairwise_sum (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
  0.00%  60.00%   0.310s    39.23s   npyiter_copy_to_buffers (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
  0.00%   0.00%   0.210s    0.210s   npyiter_goto_iterindex (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
  0.00%   0.00%   0.150s    0.150s   npyiter_copy_from_buffers (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
  0.00%  60.00%   0.090s    39.67s   npyiter_buffered_reduce_iternext_iters2 (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
  0.00%  40.00%   0.040s    25.09s   DOUBLE_add_AVX512F (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
  0.00% 100.00%   0.030s    64.84s   reduce_loop (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
  0.00%  40.00%   0.030s    25.12s   generic_wrapped_legacy_loop (numpy/core/_multiarray_umath.cpython-310-x86_64-

the DOUBLE_add_AVX512F is SIMD optimized. However I do not understand why numpy converts the float32 data to float64... This seems like a huge waste.

ogrisel · 2022-05-02T16:38:43Z

However I do not understand why numpy converts the float32 data to float64... This seems like a huge waste.

Actually this is done on purpose in _safe_accumulator_op. Maybe we could not do that when the goal is to detect nan or inf values. We could call np.sum() directly and only return an exact check if the sum finds a nan or inf.

sklearn/utils/_isfinite.pyx

sklearn/utils/validation.py

thomasjpfan

I think the Cython implementation is already a net improvement and further improvements can be made in follow up PRs.

sklearn/utils/validation.py

thomasjpfan

LGTM

sklearn/utils/validation.py

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

sklearn/utils/setup.py

Co-authored-by: jakirkham <jakirkham@gmail.com>

…scikit-learn into cython_assert_isfinite

ogrisel

LGTM.

sklearn/utils/_isfinite.pyx

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

ogrisel · 2022-07-06T16:04:52Z

Merged, thanks @Micky774 and others!

…it-learn#23197) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: jakirkham <jakirkham@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

jakirkham · 2022-07-06T23:52:20Z

Did some very minor tidying of the code in PR ( #23849 ). Hope that is ok. Please let me know what you think 🙂

…it-learn#23197) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: jakirkham <jakirkham@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Micky774 added 2 commits April 22, 2022 19:43

Initial implementation

0dd79ce

Added support for np.float16

ddf2382

Micky774 changed the title ~~ENH Cythonize _assert_all_finite using reduction scheme~~ ENH Cythonize _assert_all_finite using reduction scheme Apr 22, 2022

github-actions bot added cython module:utils labels Apr 22, 2022

Micky774 changed the title ~~ENH Cythonize _assert_all_finite using reduction scheme~~ WIP ENH Cythonize _assert_all_finite using reduction scheme Apr 22, 2022

thomasjpfan reviewed Apr 23, 2022

View reviewed changes

sklearn/utils/isfinite.pyx Outdated Show resolved Hide resolved

Micky774 added 4 commits April 22, 2022 23:18

Corrected type-checking and removed ComplexFloat support

0db6976

Refactored function name for clarity

b1312fb

Finished refactor (forgot to add)

a3a2752

Removed cython support for FP16 and attempted prange implementation

5434849

Micky774 commented Apr 23, 2022

View reviewed changes

sklearn/utils/isfinite.pyx Outdated Show resolved Hide resolved

thomasjpfan reviewed Apr 23, 2022

View reviewed changes

sklearn/utils/validation.py Outdated Show resolved Hide resolved

sklearn/utils/isfinite.pyx Outdated Show resolved Hide resolved

sklearn/utils/isfinite.pyx Outdated Show resolved Hide resolved

Micky774 added 4 commits April 23, 2022 14:04

Simplified cython implementation

5f900dd

Merge branch 'main' into cython_assert_isfinite

0d2a36b

Added heuristic and documented explanation

369df2c

Fixed typo and changed wording

2d3ff8b

Micky774 changed the title ~~WIP ENH Cythonize _assert_all_finite using reduction scheme~~ WIP ENH Cythonize _assert_all_finite using stop-on-first strategy Apr 24, 2022

Reintroduced overflow false-positive check

f2ab3de

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

0c3bee1

…nto cython_assert_isfinite

Merge branch 'main' into cython_assert_isfinite

079ff55

Micky774 added 2 commits June 20, 2022 09:42

Merge branch 'main' into cython_assert_isfinite

1de9652

Trimmed unnecessary imports

47b7c6c

ogrisel reviewed Jun 20, 2022

View reviewed changes

sklearn/utils/_isfinite.pyx Outdated Show resolved Hide resolved

Incorporated review feedback

052b567

jakirkham reviewed Jun 22, 2022

View reviewed changes

sklearn/utils/validation.py Outdated Show resolved Hide resolved

Micky774 and others added 3 commits June 22, 2022 13:38

Added short-circuit to avoid unnecessary has_inf calculation

86d8633

Merge branch 'main' into cython_assert_isfinite

a013726

Merge branch 'main' into cython_assert_isfinite

078c00a

thomasjpfan reviewed Jun 29, 2022

View reviewed changes

sklearn/utils/validation.py Outdated Show resolved Hide resolved

Micky774 and others added 3 commits June 30, 2022 15:25

Merge branch 'main' into cython_assert_isfinite

133ee28

Changed var name for clarity

6edca20

Merge branch 'main' into cython_assert_isfinite

5cb3017

thomasjpfan approved these changes Jul 5, 2022

View reviewed changes

sklearn/utils/validation.py Outdated Show resolved Hide resolved

Update sklearn/utils/validation.py

bcf0a21

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

jakirkham reviewed Jul 6, 2022

View reviewed changes

sklearn/utils/setup.py Outdated Show resolved Hide resolved

Micky774 and others added 3 commits July 6, 2022 10:24

Merge branch 'main' into cython_assert_isfinite

2ff7f9b

Update sklearn/utils/setup.py

14585a7

Co-authored-by: jakirkham <jakirkham@gmail.com>

Merge branch 'cython_assert_isfinite' of https://github.com/Micky774/…

9a1f976

…scikit-learn into cython_assert_isfinite

ogrisel approved these changes Jul 6, 2022

View reviewed changes

ogrisel reviewed Jul 6, 2022

View reviewed changes

sklearn/utils/_isfinite.pyx Outdated Show resolved Hide resolved

Update sklearn/utils/_isfinite.pyx

a051a0d

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

ogrisel merged commit dfaef0c into scikit-learn:main Jul 6, 2022

Micky774 deleted the cython_assert_isfinite branch July 6, 2022 16:05

jakirkham mentioned this pull request Jul 6, 2022

MAINT cleanup isfinite cython implementation #23849

Merged

lorentzenchr mentioned this pull request Jul 7, 2022

Optimizing assert_all_finite check #11681

Closed

jakirkham mentioned this pull request Jul 8, 2022

MAINT Fix conversion of FiniteStatus from C to Python #23858

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Cythonize `_assert_all_finite` using stop-on-first strategy #23197

ENH Cythonize `_assert_all_finite` using stop-on-first strategy #23197

Micky774 commented Apr 22, 2022 •

edited

Loading

thomasjpfan left a comment

thomasjpfan Apr 23, 2022

Micky774 Apr 23, 2022

jakirkham Jun 1, 2022

Micky774 commented Apr 23, 2022 •

edited

Loading

ogrisel commented Apr 29, 2022

thomasjpfan commented Apr 29, 2022

ogrisel commented Apr 29, 2022

thomasjpfan commented Apr 29, 2022 •

edited

Loading

ogrisel commented May 2, 2022

ogrisel commented May 2, 2022

ogrisel commented May 2, 2022

thomasjpfan left a comment

thomasjpfan left a comment

ogrisel left a comment

ogrisel commented Jul 6, 2022

jakirkham commented Jul 6, 2022

		raise TypeError("Unsupported array type: %s" % repr(numpy.PyArray_TypeObjectFromType(a_type)))


		cdef inline bint c_isfinite(const_fprecision* a_ptr, size_t step, size_t size, bint_enum disallow_nan) nogil:

ENH Cythonize _assert_all_finite using stop-on-first strategy #23197

ENH Cythonize _assert_all_finite using stop-on-first strategy #23197

Conversation

Micky774 commented Apr 22, 2022 • edited Loading

To Do

thomasjpfan left a comment

Choose a reason for hiding this comment

thomasjpfan Apr 23, 2022

Choose a reason for hiding this comment

Micky774 Apr 23, 2022

Choose a reason for hiding this comment

jakirkham Jun 1, 2022

Choose a reason for hiding this comment

Micky774 commented Apr 23, 2022 • edited Loading

ogrisel commented Apr 29, 2022

thomasjpfan commented Apr 29, 2022

ogrisel commented Apr 29, 2022

thomasjpfan commented Apr 29, 2022 • edited Loading

ogrisel commented May 2, 2022

ogrisel commented May 2, 2022

ogrisel commented May 2, 2022

thomasjpfan left a comment

Choose a reason for hiding this comment

thomasjpfan left a comment

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel commented Jul 6, 2022

jakirkham commented Jul 6, 2022

ENH Cythonize `_assert_all_finite` using stop-on-first strategy #23197

ENH Cythonize `_assert_all_finite` using stop-on-first strategy #23197

Micky774 commented Apr 22, 2022 •

edited

Loading

Micky774 commented Apr 23, 2022 •

edited

Loading

thomasjpfan commented Apr 29, 2022 •

edited

Loading