Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] Documentation on setting up development environment #6350

Open
nicklamiller opened this issue Mar 3, 2024 · 5 comments
Open
Labels

Comments

@nicklamiller
Copy link
Contributor

Summary

Add documentation describing how to setup an environment for developing in the python-package.

Motivation

Currently there's no specification on how to setup an environment for developing in the python-package #6310 (comment), adding this would make the contribution process smoother.

Description

References

@jameslamb jameslamb added the doc label Mar 3, 2024
@jameslamb
Copy link
Collaborator

Thanks for writing this up!

We could give better guidance here. Until a doc like that's added, please post comments on this issue with specific questions and one of us will help.

@nicklamiller
Copy link
Contributor Author

nicklamiller commented Mar 13, 2024

Thanks for outlining the dev environment setup steps! That helped a lot and seems like a great foundation for the developer setup documentation.

Following these initial steps, running pytest tests/python_package_test resulted in segmentation fault errors. This appeared to be an openMP issue and the original gcc and g++ compilers on my OS didn't have OpenMP support, so I brew install libomp and aliased my original gcc to gcc-13 and g++ to g++-13. These compilers did have openMP support, as I could compile dummy c and cpp files that used openMP. However, running pytest tests/python_package_test resulted in the same segfault errors as before.

After this, I followed the steps outlined in #4229 (comment) and am now getting aTypeError: Wrong type(ChunkedArray) error for several tests, all in test_arrow.py. I'd appreciate any feedback if you know a good way to handle this error/have seen it before and if it's indicative of a faulty setup 🙏.

Original segmentation fault message

tests/python_package_test/test_arrow.py Fatal Python error: Segmentation fault

Thread 0x00000001fd04e080 (most recent call first):
  File "/Users/nick/miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py", line 2377 in __init_from_csr
  File "/Users/nick/miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py"[1]    2968 segmentation fault  pytest tests/python_package_test

Segmentation fault message in LLDB

============================= test session starts ==============================
platform darwin -- Python 3.11.8, pytest-8.0.1, pluggy-1.4.0
rootdir: /Users/nick/development/LightGBM
plugins: cov-4.1.0
collected 710 items / 1 skipped                                      		 

Process 3635 stopped
* thread #20, stop reason = EXC_BAD_ACCESS (code=1, address=0x540)
    frame #0: 0x0000000175b1f7d4 libomp.dylib`__kmp_suspend_initialize_thread + 32
libomp.dylib`:
->  0x175b1f7d4 <+32>: ldr    w8, [x0, #0x540]
    0x175b1f7d8 <+36>: nop    
    0x175b1f7dc <+40>: ldr    w9, 0x175b51308  		 ; _MergedGlobals + 8
    0x175b1f7e0 <+44>: add    w20, w9, #0x1
  thread #21, stop reason = EXC_BAD_ACCESS (code=1, address=0x540)
    frame #0: 0x0000000175b1f7d4 libomp.dylib`__kmp_suspend_initialize_thread + 32
libomp.dylib`:
->  0x175b1f7d4 <+32>: ldr    w8, [x0, #0x540]
    0x175b1f7d8 <+36>: nop    
    0x175b1f7dc <+40>: ldr    w9, 0x175b51308  		 ; _MergedGlobals + 8
    0x175b1f7e0 <+44>: add    w20, w9, #0x1
Target 0: (python) stopped.

TypeError: Wrong type(ChunkedArray) errors


========================================================================= short test summary info ==========================================================================
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[<lambda>-dataset_params0] - AssertionError: assert False
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[<lambda>-dataset_params1] - AssertionError: assert False
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[<lambda>-dataset_params2] - AssertionError: assert False
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[<lambda>-dataset_params3] - AssertionError: assert False
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[<lambda>-dataset_params4] - AssertionError: assert False
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[<lambda>-dataset_params5] - AssertionError: assert False
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fields_fuzzy - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type0-array-label_data0] - TypeError: Wrong type(Int8Array) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type0-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type0-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type0-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type1-array-label_data0] - TypeError: Wrong type(Int16Array) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type1-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type1-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type1-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type2-array-label_data0] - TypeError: Wrong type(Int32Array) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type2-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type2-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type2-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type3-array-label_data0] - TypeError: Wrong type(Int64Array) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type3-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type3-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type3-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type4-array-label_data0] - TypeError: Wrong type(UInt8Array) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type4-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type4-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type4-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type5-array-label_data0] - TypeError: Wrong type(UInt16Array) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type5-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type5-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type5-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type6-array-label_data0] - TypeError: Wrong type(UInt32Array) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type6-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type6-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type6-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type7-array-label_data0] - TypeError: Wrong type(UInt64Array) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type7-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type7-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type7-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type8-array-label_data0] - TypeError: Wrong type(FloatArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type8-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type8-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type8-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type9-array-label_data0] - TypeError: Wrong type(DoubleArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type9-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type9-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type9-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights_none - TypeError: Wrong type(Int64Array) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type0-array-weight_data0] - TypeError: Wrong type(FloatArray) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type0-chunked_array-weight_data1] - TypeError: Wrong type(ChunkedArray) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type0-chunked_array-weight_data2] - TypeError: Wrong type(ChunkedArray) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type0-chunked_array-weight_data3] - TypeError: Wrong type(ChunkedArray) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type1-array-weight_data0] - TypeError: Wrong type(DoubleArray) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type1-chunked_array-weight_data1] - TypeError: Wrong type(ChunkedArray) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type1-chunked_array-weight_data2] - TypeError: Wrong type(ChunkedArray) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type1-chunked_array-weight_data3] - TypeError: Wrong type(ChunkedArray) for weight.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type0-array-group_data0] - TypeError: Wrong type(Int8Array) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type0-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type0-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type0-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type1-array-group_data0] - TypeError: Wrong type(Int16Array) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type1-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type1-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type1-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type2-array-group_data0] - TypeError: Wrong type(Int32Array) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type2-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type2-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type2-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type3-array-group_data0] - TypeError: Wrong type(Int64Array) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type3-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type3-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type3-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type4-array-group_data0] - TypeError: Wrong type(UInt8Array) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type4-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type4-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type4-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type5-array-group_data0] - TypeError: Wrong type(UInt16Array) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type5-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type5-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type5-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type6-array-group_data0] - TypeError: Wrong type(UInt32Array) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type6-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type6-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type6-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type7-array-group_data0] - TypeError: Wrong type(UInt64Array) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type7-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type7-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type7-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type0-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type0-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type0-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type0-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type1-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type1-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type1-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type1-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type2-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type2-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type2-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type2-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type3-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type3-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type3-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type3-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type4-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type4-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type4-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type4-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type5-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type5-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type5-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type5-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type6-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type6-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type6-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type6-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type7-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type7-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type7-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type7-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type8-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type8-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type8-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type8-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type9-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type9-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type9-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type9-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_table - TypeError: init_score must be list, numpy 1-D array or pandas Series.
FAILED tests/python_package_test/test_arrow.py::test_predict_regression - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_predict_binary_classification - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_predict_multiclass_classification - TypeError: Wrong type(ChunkedArray) for label.
FAILED tests/python_package_test/test_arrow.py::test_predict_ranking - TypeError: Wrong type(ChunkedArray) for label.

OS info

  • OS: macOS 13.5 (Ventura)
  • CPU: M2 chip
  • compiler: AppleClang 15.0.0
  • Python: 3.11.7
  • OpenMP (libomp): 18.1.1

@jameslamb
Copy link
Collaborator

Thanks for the very detailed write-up and for working through this! Sorry it isn't easier.

I'm on mobile right now so apologies for being brief, but wanted to help unblock you. Try replacing this

cmake ..

with this

cmake -DUSE_OPENMP=OFF

@nicklamiller
Copy link
Contributor Author

nicklamiller commented Mar 15, 2024

Thanks for the prompt response, and no worries, uncovering these speed bumps is giving a lot of fodder for the developer env documentation.

After making that change, I'm still getting the TypeError: Wrong type(ChunkedArray) errors and they're still in test_arrow.py. I noticed without the -DUSE_OPENMP=OFF (i.e. following the instructions in #4229 (comment) exactly)

otool -L lib_lightgbm.so

returns

lib_lightgbm.so:
        @rpath/lib_lightgbm.so (compatibility version 0.0.0, current version 0.0.0)
        /opt/homebrew/opt/libomp/lib/libomp.dylib (compatibility version 5.0.0, current version 5.0.0)
        /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1600.151.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.0.0)

Whereas building with -DUSE_OPENMP=OFF

otool -L lib_lightgbm.so

returns

lib_lightgbm.so:
        @rpath/lib_lightgbm.so (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1600.151.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.0.0)

So it looks like lib_lightgbm.so no longer links to OpenMP and addition of -DUSE_OPENMP=OFF was successful. This makes me wonder if this problem has to do with something outside of OpenMP. I see ChunkedArray is defined in a cpp file, and taking test_predict_ranking as an example, it looks like a python method _list_to_1d_numpy fails as it doesn't expect a ChunkedArray. Maybe there's an incompatibility between the cpp code and the python code, though I'm not sure how this could happen as I'm building the package with the most up-to-date code pulled.

test_predict_ranking specific error

/Users/nick/development/LightGBM/tests/python_package_test/test_arrow.py::test_predict_ranking failed: def test_predict_ranking():
        data = generate_random_arrow_table(10, 10000, 42)
        dataset = lgb.Dataset(
            data,
            label=generate_random_arrow_array(10000, 43, generate_nulls=False, values=np.arange(4)),
            group=np.array([1000, 2000, 3000, 4000]),
            params=dummy_dataset_params(),
        )
>       booster = lgb.train(
            {"objective": "lambdarank", "num_leaves": 7},
            dataset,
            num_boost_round=5,
        )

tests/python_package_test/test_arrow.py:372: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/engine.py:260: in train
    booster = Booster(params=params, train_set=train_set)
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:3624: in __init__
    train_set.construct()
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:2563: in construct
    self._lazy_init(
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:2177: in _lazy_init
    self.set_label(label)
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:3050: in set_label
    label_array = _list_to_1d_numpy(label, dtype=np.float32, name="label")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

data = <pyarrow.lib.ChunkedArray object at 0x288e0bab0>
[
  [
    2,
    2,
    1,
    0,
    2,
    ...
    3,
    2,
    0,
    0,
    0
  ]
]
dtype = <class 'numpy.float32'>, name = 'label'

    def _list_to_1d_numpy(
        data: Any,
        dtype: "np.typing.DTypeLike",
        name: str,
    ) -> np.ndarray:
        """Convert data to numpy 1-D array."""
        if _is_numpy_1d_array(data):
            return _cast_numpy_array_to_dtype(data, dtype)
        elif _is_numpy_column_array(data):
            _log_warning("Converting column-vector to 1d array")
            array = data.ravel()
            return _cast_numpy_array_to_dtype(array, dtype)
        elif _is_1d_list(data):
            return np.array(data, dtype=dtype, copy=False)
        elif isinstance(data, pd_Series):
            _check_for_bad_pandas_dtypes(data.to_frame().dtypes)
            return np.array(data, dtype=dtype, copy=False)  # SparseArray should be supported as well
        else:
>           raise TypeError(
                f"Wrong type({type(data).__name__}) for {name}.\n" "It should be list, numpy 1-D array or pandas Series"
            )
E           TypeError: Wrong type(ChunkedArray) for label.
E           It should be list, numpy 1-D array or pandas Series

../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:362: TypeError

@jameslamb
Copy link
Collaborator

if this problem has to do with something outside of OpenMP.

Turning off OpenMP linking was to help with the segfaults you reported. As you found with #4229 and similar, LightGBM's Python package has some outstanding issues with OpenMP support on macOS.

TypeError: Wrong type(ChunkedArray) errors and they're still in test_arrow.py

Please install pyarrow in your development environment and try again.

conda install -c conda-forge --yes pyarrow

Alternatively, ignore the Arrow tests if you're just working on the scikit-learn interface (as I suspect you are, given our discussion in #6310).

pytest tests/python_package_tests/test_sklearn.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants