Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] MiniRocket segfault #6252

Open
mmaaz-git opened this issue Apr 2, 2024 · 1 comment
Open

[BUG] MiniRocket segfault #6252

mmaaz-git opened this issue Apr 2, 2024 · 1 comment
Labels
bug Something isn't working module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing

Comments

@mmaaz-git
Copy link

Describe the bug

MiniRocket, when calling transform(), segfaults on some systems.

To Reproduce

import numpy as np
from sktime.transformations.panel.rocket import MiniRocket
data = np.array([[1] * 100]).reshape(-1,1)
trf = MiniRocket(num_kernels=128)
trf.fit(data)
trf.transform(data)

Expected behavior

Expected behaviour is a 1x84 pandas dataframe containing the extracted features.

Additional context

This is a strange bug that I had not found addressed online. On the server I am currently working on, when I tried to use MiniRocket as the feature extractor, I instead got Segmentation fault (core dumped). I was able to consistently reproduce this behavior with the minimal example above. This server is hosted at a hospital, and around 20 or so people use it for machine learning tasks as part of a research collaboration. My hunch is that the server places some limits on multithreading/multiprocessing in order to conserve computational resources. The minimal example runs fine on my personal computer.

I spent a lot of time investigating the cause of this segfault, with gdb and other tools. It ended up being related to numba parallelization. Here is the (long) output from gdb.

``` #0 0x00007ffff7c2ad78 in __memset_power8 () from /lib64/glibc-hwcaps/power9/libc-2.28.so #1 0x00007ff648213c94 in _3cdynamic_3e::__numba_parfor_gufunc_0x7fee8c382190[abi:v81][abi:c8tJTC_2fWQAlzW1yBDkop6GEOEUMEOYSPGuIQMViAQ3iQ8IbKQIMbwoOGNoQDDWwQR1NHAS3lQ9XgSucwaURU2IJtBVoBjQxb9NioVgdJqdcC7QLGLMRaAA_3d_3d](Array, Array, Array, long long, long long, Array, Array, long long, long long, Array, Array, long long) () #2 0x00007ff648212c74 in __gufunc__._ZN13_3cdynamic_3e36__numba_parfor_gufunc_0x7fee8c382190B3v81B126c8tJTC_2fWQAlzW1yBDkop6GEOEUMEOYSPGuIQMViAQ3iQ8IbKQIMbwoOGNoQDDWwQR1NHAS3lQ9XgSucwaURU2IJtBVoBjQxb9NioVgdJqdcC7QLGLMRaAA_3d_3dE5ArrayIyLi1E1C7mutable7alignedE5ArrayIfLi1E1A7mutable7alignedE5ArrayIiLi1E1A7mutable7alignedExx5ArrayIiLi1E1A7mutable7alignedE5ArrayIfLi2E1A7mutable7alignedExx5ArrayIfLi2E1C7mutable7alignedE5ArrayIiLi2E1C7mutable7alignedEx () #3 0x00007ff64956a574 in parallel_for(void*, char**, unsigned long*, unsigned long*, void*, unsigned long, unsigned long, int)::{lambda()#1}::operator()() const::{lambda(tbb::detail::d1::blocked_range const&)#1}::operator()(tbb::detail::d1::blocked_range const) const () from /datafs_a/mmaaz/anaconda3/lib/python3.11/site-packages/numba/np/ufunc/tbbpool.cpython-311-powerpc64le-linux-gnu.so #4 0x00007ff64956c6b8 in tbb::detail::d1::start_for, parallel_for(void*, char**, unsigned long*, unsigned long*, void*, unsigned long, unsigned long, int)::{lambda()#1}::operator()() const::{lambda(tbb::detail::d1::blocked_range const&)#1}, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) () from /datafs_a/mmaaz/anaconda3/lib/python3.11/site-packages/numba/np/ufunc/tbbpool.cpython-311-powerpc64le-linux-gnu.so #5 0x00007ff6495cea54 in tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) [clone .constprop.0] [clone .isra.0] () from /datafs_a/mmaaz/anaconda3/lib/python3.11/lib-dynload/../../libtbb.so.12 #6 0x00007ff6495cf034 in tbb::detail::r1::execute_and_wait(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) () from /datafs_a/mmaaz/anaconda3/lib/python3.11/lib-dynload/../../libtbb.so.12 #7 0x00007ff64956b2e8 in tbb::detail::d1::task_arena_function::operator()() const () from /datafs_a/mmaaz/anaconda3/lib/python3.11/site-packages/numba/np/ufunc/tbbpool.cpython-311-powerpc64le-linux-gnu.so #8 0x00007ff6495aaa10 in tbb::detail::r1::task_arena_impl::execute(tbb::detail::d1::task_arena_base&, tbb::detail::d1::delegate_base&) () from /datafs_a/mmaaz/anaconda3/lib/python3.11/lib-dynload/../../libtbb.so.12 #9 0x00007ff64956c2a0 in parallel_for(void*, char**, unsigned long*, unsigned long*, void*, unsigned long, unsigned long, int) () from /datafs_a/mmaaz/anaconda3/lib/python3.11/site-packages/numba/np/ufunc/tbbpool.cpython-311-powerpc64le-linux-gnu.so #10 0x00007ff64821096c in sktime::transformations::panel::rocket::_minirocket_numba::_transform[abi:v65][abi:c8tJTC_2fWQAlzW1yBDkop6GEOEUMEOYSPGuIQMViAQ3iQ8IbKQIMbwoOGNoQDDWwQR1NHAS3lQ9XgSucw86Ahb4se9NXqICn1WqDBwGiE2AEA](Array, Tuple, Array, Array >) () #11 0x00007ff648210e1c in cpython::sktime::transformations::panel::rocket::_minirocket_numba::_transform[abi:v65][abi:c8tJTC_2fWQAlzW1yBDkop6GEOEUMEOYSPGuIQMViAQ3iQ8IbKQIMbwoOGNoQDDWwQR1NHAS3lQ9XgSucw86Ahb4se9NXqICn1WqDBwGiE2AEA](Array, Tuple, Array, Array >) () #12 0x00007ff64a73430c in call_cfunc(Dispatcher*, _object*, _object*, _object*, _object*) () from /datafs_a/mmaaz/anaconda3/lib/python3.11/site-packages/numba/_dispatcher.cpython-311-powerpc64le-linux-gnu.so #13 0x00007ff64a735834 in Dispatcher_call(Dispatcher*, _object*, _object*) () from /datafs_a/mmaaz/anaconda3/lib/python3.11/site-packages/numba/_dispatcher.cpython-311-powerpc64le-linux-gnu.so ```

I attempted to limit numba's behaviour, by, e.g., setting NUMBA_NUM_THREADS=1. However this did not fix the problem.

Solution

I then went into the sktime code files, and traced the root of the problem to the file sktime/transformations/panel/rocket/_minirocket_numba.py.

Specifically, the function _transform(), on line 411. It has a decorator which says

@njit(
    "float32[:,:](float32[:,:],Tuple((int32[:],int32[:],float32[:])))",
    fastmath=True,
    parallel=True,
    cache=True,
)

If you change this to parallel=False, then everything works.

This is the only thing I have observed that consistently fixes the segfault.

I'm putting this issue here in case other people also experience a segfault issue, to spare them the frustration of debugging it themselves. To be clear, I don't think this is a bug with sktime per se. Instead, the issue I believe stems from some multiprocessing limit on the external server I am running code on for this project. But, others may potentially run into the same problem if working on some external server.

I'm not sure if there is a more elegant way of fixing this. For now, I have just edited the sktime code in my own installation. It would be nice if there was a better way to disable parallelization for this function.

Versions

System:
    python: 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:53:58) [GCC 12.3.0]
executable: /datafs_a/mmaaz/anaconda3/bin/python
   machine: Linux-4.18.0-372.32.1.el8_6.ppc64le-ppc64le-with-glibc2.28

Python dependencies:
          pip: 24.0
       sktime: 0.28.0
      sklearn: 1.2.2
       skbase: 0.7.5
        numpy: 1.25.2
        scipy: 1.11.3
       pandas: 2.1.1
   matplotlib: 3.8.0
       joblib: 1.2.0
        numba: 0.59.1
  statsmodels: 0.14.0
     pmdarima: None
statsforecast: None
      tsfresh: 0.20.2
      tslearn: None
        torch: None
   tensorflow: None
tensorflow_probability: None
@mmaaz-git mmaaz-git added the bug Something isn't working label Apr 2, 2024
@fkiraly fkiraly added the module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing label Apr 2, 2024
@fkiraly
Copy link
Collaborator

fkiraly commented Apr 2, 2024

Thanks for reporting!

How about making parallel a parameter of the estimator MiniRocket? That is, the user can set this?

It might require some thinking on how to pass this on and interactions with numba, as naive approaches may not work due to how numba compiles - the best way I can think off the top of my head is have two versions of the function, one decorated with and one without, and the one which is called depends on the estimator parameter.

@fkiraly fkiraly changed the title MiniRocket segfault [BUG] [BUG] MiniRocket segfault Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing
Projects
Status: Needs triage & validation
Development

No branches or pull requests

2 participants