Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.22.1 test_stacking ends with Bus error on ARMv7 arch (Python 3.5) #16443

Closed
h6197627 opened this issue Feb 14, 2020 · 11 comments
Closed

0.22.1 test_stacking ends with Bus error on ARMv7 arch (Python 3.5) #16443

h6197627 opened this issue Feb 14, 2020 · 11 comments
Labels
arch:arm ARM related issues Bug

Comments

@h6197627
Copy link

h6197627 commented Feb 14, 2020

Describe the bug

Running standard test suite for Scikit-learn 0.22.1 on ARMv7 architecture for Ubuntu 16.04 (Python 3.5) results in fatal bus error.

# scikit_learn-0.22.1-cp35-cp35m-linux_armv7l.whl
> pytest -v /usr/local/lib/python3.5/dist-packages/sklearn
========================= test session starts =========================
platform linux -- Python 3.5.2, pytest-5.3.5, py-1.8.1, pluggy-0.13.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /usr/local/lib/python3.5/dist-packages/sklearn
collected 14072 items / 3 skipped / 14069 selected

...

../../../../usr/local/lib/python3.5/dist-packages/sklearn/ensemble/tests/test_stacking.py::test_check_estimators_stacking_estimator[StackingClassifier] PASSED                                              [ 11%]
../../../../usr/local/lib/python3.5/dist-packages/sklearn/ensemble/tests/test_stacking.py::test_check_estimators_stacking_estimator[StackingRegressor] Fatal Python error: Bus error

Thread 0xa7fff470 (most recent call first):
  File "/usr/lib/python3.5/threading.py", line 293 in wait
  File "/usr/local/lib/python3.5/dist-packages/joblib/externals/loky/backend/queues.py", line 138 in _feed
  File "/usr/lib/python3.5/threading.py", line 862 in run
  File "/usr/lib/python3.5/threading.py", line 914 in _bootstrap_inner
  File "/usr/lib/python3.5/threading.py", line 882 in _bootstrap

Thread 0xa892f470 (most recent call first):
  File "/usr/lib/python3.5/selectors.py", line 376 in select
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 911 in wait
  File "/usr/local/lib/python3.5/dist-packages/joblib/externals/loky/process_executor.py", line 615 in _queue_management_worker
  File "/usr/lib/python3.5/threading.py", line 862 in run
  File "/usr/lib/python3.5/threading.py", line 914 in _bootstrap_inner
  File "/usr/lib/python3.5/threading.py", line 882 in _bootstrap

Current thread 0xb6ff6300 (most recent call first):
  File "/usr/local/lib/python3.5/dist-packages/sklearn/tree/_classes.py", line 367 in fit
  File "/usr/local/lib/python3.5/dist-packages/sklearn/tree/_classes.py", line 1225 in fit
  File "/usr/local/lib/python3.5/dist-packages/sklearn/ensemble/_base.py", line 36 in _parallel_fit_estimator
  File "/usr/local/lib/python3.5/dist-packages/joblib/parallel.py", line 256 in <listcomp>
  File "/usr/local/lib/python3.5/dist-packages/joblib/parallel.py", line 256 in __call__
  File "/usr/local/lib/python3.5/dist-packages/joblib/_parallel_backends.py", line 590 in __init__
  File "/usr/local/lib/python3.5/dist-packages/joblib/_parallel_backends.py", line 209 in apply_async
  File "/usr/local/lib/python3.5/dist-packages/joblib/parallel.py", line 754 in _dispatch
  File "/usr/local/lib/python3.5/dist-packages/joblib/parallel.py", line 835 in dispatch_one_batch
  File "/usr/local/lib/python3.5/dist-packages/joblib/parallel.py", line 1007 in __call__
  File "/usr/local/lib/python3.5/dist-packages/sklearn/ensemble/_stacking.py", line 141 in fit
  File "/usr/local/lib/python3.5/dist-packages/sklearn/ensemble/_stacking.py", line 643 in fit
  File "/usr/local/lib/python3.5/dist-packages/sklearn/utils/estimator_checks.py", line 2201 in check_regressors_train
  File "/usr/local/lib/python3.5/dist-packages/sklearn/utils/_testing.py", line 327 in wrapper
  File "/usr/local/lib/python3.5/dist-packages/sklearn/utils/estimator_checks.py", line 427 in check_estimator
  File "/usr/local/lib/python3.5/dist-packages/sklearn/ensemble/tests/test_stacking.py", line 382 in test_check_estimators_stacking_estimator
  File "/usr/local/lib/python3.5/dist-packages/_pytest/python.py", line 167 in pytest_pyfunc_call
  File "/usr/local/lib/python3.5/dist-packages/pluggy/callers.py", line 187 in _multicall
  File "/usr/local/lib/python3.5/dist-packages/pluggy/manager.py", line 87 in <lambda>
  File "/usr/local/lib/python3.5/dist-packages/pluggy/manager.py", line 93 in _hookexec
  File "/usr/local/lib/python3.5/dist-packages/pluggy/hooks.py", line 286 in __call__
  File "/usr/local/lib/python3.5/dist-packages/_pytest/python.py", line 1445 in runtest
  File "/usr/local/lib/python3.5/dist-packages/_pytest/runner.py", line 134 in pytest_runtest_call
  File "/usr/local/lib/python3.5/dist-packages/pluggy/callers.py", line 187 in _multicall
  File "/usr/local/lib/python3.5/dist-packages/pluggy/manager.py", line 87 in <lambda>
  File "/usr/local/lib/python3.5/dist-packages/pluggy/manager.py", line 93 in _hookexec
  File "/usr/local/lib/python3.5/dist-packages/pluggy/hooks.py", line 286 in __call__
  File "/usr/local/lib/python3.5/dist-packages/_pytest/runner.py", line 210 in <lambda>
  File "/usr/local/lib/python3.5/dist-packages/_pytest/runner.py", line 237 in from_call
  File "/usr/local/lib/python3.5/dist-packages/_pytest/runner.py", line 210 in call_runtest_hook
  File "/usr/local/lib/python3.5/dist-packages/_pytest/runner.py", line 185 in call_and_report
  File "/usr/local/lib/python3.5/dist-packages/_pytest/runner.py", line 99 in runtestprotocol
  File "/usr/local/lib/python3.5/dist-packages/_pytest/runner.py", line 84 in pytest_runtest_protocol
  File "/usr/local/lib/python3.5/dist-packages/pluggy/callers.py", line 187 in _multicall
  File "/usr/local/lib/python3.5/dist-packages/pluggy/manager.py", line 87 in <lambda>
  File "/usr/local/lib/python3.5/dist-packages/pluggy/manager.py", line 93 in _hookexec
  File "/usr/local/lib/python3.5/dist-packages/pluggy/hooks.py", line 286 in __call__
  File "/usr/local/lib/python3.5/dist-packages/_pytest/main.py", line 271 in pytest_runtestloop
  File "/usr/local/lib/python3.5/dist-packages/pluggy/callers.py", line 187 in _multicall
  File "/usr/local/lib/python3.5/dist-packages/pluggy/manager.py", line 87 in <lambda>
  File "/usr/local/lib/python3.5/dist-packages/pluggy/manager.py", line 93 in _hookexec
  File "/usr/local/lib/python3.5/dist-packages/pluggy/hooks.py", line 286 in __call__
  File "/usr/local/lib/python3.5/dist-packages/_pytest/main.py", line 247 in _main
  File "/usr/local/lib/python3.5/dist-packages/_pytest/main.py", line 197 in wrap_session
  File "/usr/local/lib/python3.5/dist-packages/_pytest/main.py", line 240 in pytest_cmdline_main
  File "/usr/local/lib/python3.5/dist-packages/pluggy/callers.py", line 187 in _multicall
  File "/usr/local/lib/python3.5/dist-packages/pluggy/manager.py", line 87 in <lambda>
  File "/usr/local/lib/python3.5/dist-packages/pluggy/manager.py", line 93 in _hookexec
  File "/usr/local/lib/python3.5/dist-packages/pluggy/hooks.py", line 286 in __call__
  File "/usr/local/lib/python3.5/dist-packages/_pytest/config/__init__.py", line 93 in main
  File "/usr/local/bin/pytest", line 8 in <module>
Bus error

Versions

System:
python: 3.5.2 (default, Oct 8 2019, 13:06:37) [GCC 5.4.0 20160609]
machine: Linux-3.8.13.30-armv7l-with-Ubuntu-16.04-xenial
executable: /usr/bin/python3

Python dependencies:
matplotlib: None
scipy: 1.4.1 (test suite run without failures)
numpy: 1.18.1 (test suite run without failures)
pip: 20.0.2
sklearn: 0.22.1
setuptools: 45.2.0
pandas: None
Cython: None (was installed latest from pip and deleted after building scikit-learn)
joblib: 0.14.1

Built with OpenMP: True

@h6197627 h6197627 added the Bug label Feb 14, 2020
@h6197627
Copy link
Author

I tried to test joblib package alone (as in stack trace there are calls to it): standard test finishes with single failure (non-fatal) for test_weak_array_key_map test joblib/joblib#1010. Parallel tests pass successfully.

@h6197627
Copy link
Author

h6197627 commented Mar 8, 2020

The problem still there in 0.22.2.post1

@jnothman jnothman added the High Priority High priority issues and pull requests label Mar 8, 2020
@h6197627
Copy link
Author

I tried to search for exact place where bus errors occur:
Failed test test_check_estimators_stacking_estimator[StackingRegressor] from test_stacking.py file. It calls check_estimator from sklearn.utils.estimator_checks with such estimator

StackingRegressor(cv=None,
                  estimators=[('lr',
                               LinearRegression(copy_X=True, fit_intercept=True,
                                                n_jobs=None, normalize=False)),
                              ('tree',
                               DecisionTreeRegressor(ccp_alpha=0.0,
                                                     criterion='mse',
                                                     max_depth=None,
                                                     max_features=None,
                                                     max_leaf_nodes=None,
                                                     min_impurity_decrease=0.0,
                                                     min_impurity_split=None,
                                                     min_samples_leaf=1,
                                                     min_samples_split=2,
                                                     min_weight_fraction_leaf=0.0,
                                                     presort='deprecated',
                                                     random_state=0,
                                                     splitter='best'))],
                  final_estimator=None, n_jobs=None, passthrough=False,
                  verbose=0)

and it fails in functools.partial(<function check_regressors_train at 0xae988a98>, 'StackingRegressor', readonly_memmap=True) check (the same test with readonly_memmap=False successfully passes). It additionally calls create_memmap_backed_data routine. Memmapped outputs can be successfully printed.
Bus error occurs later on line 2201 of estimator_check.py in 0.22.2.post1
regressor.fit(X, y_)
while fitting _BaseStacking with with some kind of multithreading. However I am not sure how to debug further.
Also it looks suspicious that reported earlier joblib error joblib/joblib#1010 also from test_memmapping.py file.

@h6197627
Copy link
Author

Maybe some additional information is needed?

@ckastner
Copy link
Contributor

ckastner commented Aug 8, 2020

We're also getting SIGBUS with Debian package builds for 32-bit ARM.

Here's a minimal use case that reliably triggers the bus error for me (extracted from on the test_check_estimators_stacking_estimator test) using 0.23.1:

from sklearn.tree import DecisionTreeClassifier
from sklearn.utils.estimator_checks import check_estimator

estimator = DecisionTreeClassifier(random_state=0)
check_estimator(estimator)

Backtrace:

#0  0xb381a028 in __pyx_f_7sklearn_4tree_9_splitter_12BestSplitter_node_split (__pyx_v_self=0xb0e59028, __pyx_v_impurity=0.5, 
    __pyx_v_split=0xbe9e17a8, __pyx_v_n_constant_features=0xbe9e178c) at sklearn/tree/_splitter.c:6584
#1  0xb37d6d60 in __pyx_f_7sklearn_4tree_5_tree_21DepthFirstTreeBuilder_build (__pyx_v_self=<optimized out>, 
    __pyx_v_tree=0xb0ed0090, __pyx_v_X=0xb0eb7a70, __pyx_v_y=0xb0eb7b10, __pyx_skip_dispatch=1, 
    __pyx_optional_args=0xbe9e193c) at sklearn/tree/_tree.c:6321
#2  0xb37cbdd0 in __pyx_pf_7sklearn_4tree_5_tree_21DepthFirstTreeBuilder_2build (__pyx_v_X_idx_sorted=<optimized out>, 
    __pyx_v_sample_weight=<optimized out>, __pyx_v_y=<optimized out>, __pyx_v_X=<optimized out>, 
    __pyx_v_tree=<optimized out>, __pyx_v_self=0xb0f20fa8) at sklearn/tree/_tree.c:6839
#3  __pyx_pw_7sklearn_4tree_5_tree_21DepthFirstTreeBuilder_3build (__pyx_v_self=0xb0f20fa8, __pyx_args=<optimized out>, 
    __pyx_kwds=<optimized out>) at sklearn/tree/_tree.c:6807
#4  0x00092e1e in ?? ()

Furthermore, we're also getting SIGBUS with test test_apply_path_readonly_all_trees. I only briefly looked at this, but what @h6197627 wrote in the previous comment (create_memmap_backed_data and so forth) seems very relevant to this test, too. It's quite possible that they are related.

@rth
Copy link
Member

rth commented Aug 8, 2020

Thank for the confirmation. For check_estimator could you please run,

pytest sklearn/tests/test_common.py -k DecisionTreeClassifier

of if that doesn't work try,

from sklearn.utils.estimator_checks import parametrize_with_checks

@parametrize_with_checks([DecisionTreeClassifier(random_state=0)])
def test_common_checks(estimator, check):
    check(estimator)

to see which common test fails specifically (check_estimator runs all of them)?

@ckastner
Copy link
Contributor

ckastner commented Aug 8, 2020

Here you go:

pytest sklearn/tests/test_common.py -k DecisionTreeClassifier

produces

============================= test session starts ==============================
platform linux -- Python 3.8.5, pytest-4.6.11, py-1.8.1, pluggy-0.13.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /home/ckk/scikit-learn-0.23.1, inifile: setup.cfg
collecting ... collected 6609 items / 6563 deselected / 46 selected

sklearn/tests/test_common.py::test_parameters_default_constructible[DecisionTreeClassifier-DecisionTreeClassifier] SKIPPED [  2%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_no_attributes_set_in_init] PASSED [  4%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_estimators_dtypes] PASSED [  6%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_fit_score_takes_y] PASSED [  8%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_sample_weights_pandas_series] PASSED [ 10%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_sample_weights_not_an_array] PASSED [ 13%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_sample_weights_list] PASSED [ 15%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_sample_weights_shape] PASSED [ 17%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_sample_weights_invariance] PASSED [ 19%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_estimators_fit_returns_self] PASSED [ 21%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_estimators_fit_returns_self(readonly_memmap=True)] PASSED [ 23%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_complex_data] PASSED [ 26%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_dtype_object] PASSED [ 28%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_estimators_empty_data_messages] PASSED [ 30%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_pipeline_consistency] PASSED [ 32%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_estimators_nan_inf] PASSED [ 34%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_estimators_overwrite_params] PASSED [ 36%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_estimator_sparse_data] PASSED [ 39%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_estimators_pickle] PASSED [ 41%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_classifier_data_not_an_array] PASSED [ 43%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_classifiers_one_label] PASSED [ 45%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_classifiers_classes] PASSED [ 47%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_estimators_partial_fit_n_features] PASSED [ 50%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_classifier_multioutput] PASSED [ 52%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_classifiers_train] PASSED [ 54%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_classifiers_train(readonly_memmap=True)] PASSED [ 56%]
sklearn/tests/test_common.py::test_estimators[DecisionTreeClassifier()-check_classifiers_train(readonly_memmap=True,X_dtype=float32)] Bus error

(edit: I accidentally dropped the 'Bus error' from output, sorry)

@ckastner
Copy link
Contributor

ckastner commented Aug 9, 2020

I took another look at the core dump:

(gdb) bt
#0  0xb2b2e088 in __pyx_f_7sklearn_4tree_9_splitter_12BestSplitter_node_split (__pyx_v_self=0xaf93aa68, __pyx_v_impurity=0.5, 
    __pyx_v_split=0xbeb55ce0, __pyx_v_n_constant_features=0xbeb55cc4) at sklearn/tree/_splitter.c:6584
#1  0xb2aeae04 in __pyx_f_7sklearn_4tree_5_tree_21DepthFirstTreeBuilder_build (__pyx_v_self=<optimized out>, 
    __pyx_v_tree=0xaf936b80, __pyx_v_X=0xaf960200, __pyx_v_y=0xaf960d40, __pyx_skip_dispatch=1, 
    __pyx_optional_args=0xbeb55e74) at sklearn/tree/_tree.c:6321
#2  0xb2adfe2a in __pyx_pf_7sklearn_4tree_5_tree_21DepthFirstTreeBuilder_2build (__pyx_v_X_idx_sorted=<optimized out>, 
    __pyx_v_sample_weight=<optimized out>, __pyx_v_y=<optimized out>, __pyx_v_X=<optimized out>, 
    __pyx_v_tree=<optimized out>, __pyx_v_self=0xaf95a8a8) at sklearn/tree/_tree.c:6839
#3  __pyx_pw_7sklearn_4tree_5_tree_21DepthFirstTreeBuilder_3build (__pyx_v_self=0xaf95a8a8, __pyx_args=<optimized out>, 
    __pyx_kwds=<optimized out>) at sklearn/tree/_tree.c:6807
#4  0x00092e1e in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) p * __pyx_v_self
$9 = {__pyx_base = {__pyx_base = {ob_base = {ob_refcnt = 3, 
        ob_type = 0xb2b54bfc <__pyx_type_7sklearn_4tree_9_splitter_BestSplitter>}, 
      __pyx_vtab = 0xb2b55ba0 <__pyx_vtable_7sklearn_4tree_9_splitter_BestSplitter>, criterion = 0xaf9d1770, 
      max_features = 2, min_samples_leaf = 1, min_weight_leaf = 0, random_state = 0xaf9d15d8, rand_r_state = 97729234, 
      samples = 0x1e0d1b8, n_samples = 200, weighted_n_samples = 200, features = 0x1ddc6e8, constant_features = 0x20f2b20, 
      n_features = 2, feature_values = 0x2207260, start = 0, end = 200, y = {memview = 0xaf95bdf8, data = 0x220cfb0 "", 
        shape = {200, 1, 0, 0, 0, 0, 0, 0}, strides = {8, 8, 0, 0, 0, 0, 0, 0}, suboffsets = {-1, -1, 0, 0, 0, 0, 0, 0}}, 
      sample_weight = 0x0}, X = {memview = 0xaf8e3028, data = 0xb6f55f47 <error: Cannot access memory at address 0xb6f55f47>, 
      shape = {200, 2, 0, 0, 0, 0, 0, 0}, strides = {8, 4, 0, 0, 0, 0, 0, 0}, suboffsets = {-1, -1, 0, 0, 0, 0, 0, 0}}, 
    X_idx_sorted = 0x38c8bc <_Py_NoneStruct>, X_idx_sorted_ptr = 0x0, X_idx_sorted_stride = 0, n_total_samples = 0, 
    sample_mask = 0x0}}

I've attached _splitter.c as generated on the affected host: _splitter.c.txt

The backtrace indicates the cause to emanate from sklearn/tree/_splitter.c:6584, which corresponds to this while loop:

while p < partition_end:
if self.X[samples[p], best.feature] <= best.threshold:
p += 1
else:
partition_end -= 1
samples[p], samples[partition_end] = samples[partition_end], samples[p]

To my untrained eye, that doesn't look all too suspicious, though. At least, not suspicious enough to fail on only this particular architecture.

@cmarmo cmarmo added arch:arm ARM related issues and removed High Priority High priority issues and pull requests labels Nov 22, 2021
@noloader
Copy link

noloader commented Feb 16, 2022

@ckastner,

Sorry to jump in. This bug was referenced on Debian PowerPC mailing list. The issue is present on PowerPC, too.

To my untrained eye, that doesn't look all too suspicious, though. At least, not suspicious enough to fail on only this particular architecture.

I have some experience flushing these unaligned accesses/SIGBUS faults... You almost certainly have unaligned data causing this (based on my experience with chasing these).

Here is how I would approach it. First, switch back to x86_64. It is an easier platform, and the tools work best. Second, build a debug build with -g3 -O1. Third, include undefined behavior sanitizer options. The flags of interest you should be using:

  • CPPFLAGS: none
  • CFLAGS: -g3 -O1 -fsanitize=undefined -fno-sanitize=integer-divide-by-zero -fno-sanitize=float-divide-by-zero
  • CXXFLAGS: -g3 -O1 -fsanitize=undefined -fno-sanitize=integer-divide-by-zero -fno-sanitize=float-divide-by-zero
  • LDFLAGS: -fsanitize=undefined -fno-sanitize=integer-divide-by-zero -fno-sanitize=float-divide-by-zero

Fourth, run your test program. The problems areas of the code produce a finding like below. The keywords are runtime error:.

xed25519.cpp:856:97: runtime error: reference binding to misaligned address 0xbe8cfaa4 for type 'const struct ed25519PrivateKey', which requires 8 byte alignment
0xbe8cfaa4: note: pointer points here
  38 9f 14 02 b8 9f 14 02  1c 2e 14 02 e4 2e 14 02  00 00 00 00 01 ff ff 3f  00 01 00 00 48 0a 6b 03
              ^

Finally, fix the findings. Retest on ARM32 and PowerPC.

If you get unlucky, then the problem will only surface on ARM32 and PowerPC. In this case, try the same experiment on ARM or PPC instead of x86_64.

@ckastner
Copy link
Contributor

@noloader, the last build on PowerPC doesn't seem related to SIGBUS though? There's just one test failure not related to the above. SIGBUS seems to be an issue only on 32-bit ARM.

In any case, I agree that it's probably an alignment issue, and @glaubitz already discovered a build process issue that needs to be fixed first. If that doesn't resolve the issue, then it's time to follow the steps you outlined above (thanks!)

@h6197627
Copy link
Author

Closing due to no activity for long time and no interest in fixing issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch:arm ARM related issues Bug
Projects
None yet
Development

No branches or pull requests

6 participants