Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] [python-package] Python tests leave files behind #6361

Open
8 tasks
jameslamb opened this issue Mar 15, 2024 · 12 comments
Open
8 tasks

[ci] [python-package] Python tests leave files behind #6361

jameslamb opened this issue Mar 15, 2024 · 12 comments

Comments

@jameslamb
Copy link
Collaborator

jameslamb commented Mar 15, 2024

Description

The Python unit tests in this project leave some files behind when they are done running.

They should be modified to use Python-managed temporary files that are automatically removed, so that:

  • successive test runs don't accidentally rely on outputs from previous runs
  • files aren't left behind on developers' local systems

Reproducible example

Build the Python package and run the Python tests.

cmake -B build -S .
cmake --build build --target _lightgbm
sh build-python.sh install --precompile
pytest tests/python_package_test

(for more details on this, see #6350).

Look at the files created.

git status --ignored

As of latest master (b27d81e), you'll see all of these created by tests:

categorical.model
lgb.model
lgb.pkl
lgb_train_data.bin
model.txt
Tree4.gv.pdf
Tree4.gv

Approach

Find the tests that created those files, and ensure that they stop creating them.

For example, it looks like lgb.model probably comes from here:

gbm.save_model("lgb.model")

And that that could be avoided using pytests's tmp_path fixture, like this:

def test_ranking_with_position_information_with_file(tmp_path):

str(tmp_path / "rank.train"),

For more on how that works, see "How to use temporary directories and files in tests" (pytest docs).

Additional Comments

You do not need to put up a pull request fixing all of these! Contributions that fix any of these would be welcomed.

This list will be updated as these are fixed:

  • categorical.model
  • data_dask.csv
  • lgb.model
  • lgb.pkl
  • lgb_train_data.bin
  • model.txt
  • Tree4.gv.pdf
  • Tree4.gv

If you are interested in working on this, comment here to indicate that and to ask for help if you need it.

@Hitro147
Copy link

Hi @jameslamb ,

I'm new to open source and would like to take up this issue.

Thanks!

@jameslamb
Copy link
Collaborator Author

Sure, thanks! @ me here if you have any questions.

@Hitro147
Copy link

Hitro147 commented Mar 19, 2024

Hey @jameslamb,

I have encountered a few issues while building the Python package. However, I have managed to build it successfully now. But, I am facing some errors while running the tests. I am not able to find the requirements.txt file. Can you suggest any way to install all the necessary modules?

Best,
Shrikanth

Errors after running pytest tests/python_package_test

==================================================== test session starts =====================================================
platform darwin -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: /Users/hitro/Desktop/Microsoft/LightGBM
collected 3 items / 9 errors                                                                                                 

=========================================================== ERRORS ===========================================================
__________________________________ ERROR collecting tests/python_package_test/test_arrow.py __________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_arrow.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_arrow.py:6: in <module>
    import pyarrow as pa
E   ModuleNotFoundError: No module named 'pyarrow'
__________________________________ ERROR collecting tests/python_package_test/test_basic.py __________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_basic.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_basic.py:12: in <module>
    from sklearn.datasets import dump_svmlight_file, load_svmlight_file
E   ModuleNotFoundError: No module named 'sklearn'
________________________________ ERROR collecting tests/python_package_test/test_callback.py _________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_callback.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_callback.py:6: in <module>
    from .utils import SERIALIZERS, pickle_and_unpickle_object
tests/python_package_test/utils.py:6: in <module>
    import cloudpickle
E   ModuleNotFoundError: No module named 'cloudpickle'
_______________________________ ERROR collecting tests/python_package_test/test_consistency.py _______________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_consistency.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_consistency.py:5: in <module>
    from sklearn.datasets import load_svmlight_file
E   ModuleNotFoundError: No module named 'sklearn'
__________________________________ ERROR collecting tests/python_package_test/test_dask.py ___________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_dask.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_dask.py:14: in <module>
    from sklearn.metrics import accuracy_score, r2_score
E   ModuleNotFoundError: No module named 'sklearn'
__________________________________ ERROR collecting tests/python_package_test/test_dual.py ___________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_dual.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_dual.py:8: in <module>
    from sklearn.metrics import log_loss
E   ModuleNotFoundError: No module named 'sklearn'
_________________________________ ERROR collecting tests/python_package_test/test_engine.py __________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_engine.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_engine.py:15: in <module>
    import psutil
E   ModuleNotFoundError: No module named 'psutil'
________________________________ ERROR collecting tests/python_package_test/test_plotting.py _________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_plotting.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_plotting.py:3: in <module>
    import pandas as pd
E   ModuleNotFoundError: No module named 'pandas'
_________________________________ ERROR collecting tests/python_package_test/test_sklearn.py _________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_sklearn.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_sklearn.py:9: in <module>
    import joblib
E   ModuleNotFoundError: No module named 'joblib'
================================================== short test summary info ===================================================
ERROR tests/python_package_test/test_arrow.py
ERROR tests/python_package_test/test_basic.py
ERROR tests/python_package_test/test_callback.py
ERROR tests/python_package_test/test_consistency.py
ERROR tests/python_package_test/test_dask.py
ERROR tests/python_package_test/test_dual.py
ERROR tests/python_package_test/test_engine.py
ERROR tests/python_package_test/test_plotting.py
ERROR tests/python_package_test/test_sklearn.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 9 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
===================================================== 9 errors in 3.31s ======================================================

@jameslamb
Copy link
Collaborator Author

Thanks for trying it out!

Please post error messages and logs as plaintext, not images, so they can be found from search engines. See these resources:

any way to install all the necessary modules

Follow these steps (but add pyarrow): #6310 (comment)

@Hitro147
Copy link

Sorry about that! I've updated my comment.

Thanks for the information. I'll start working on it 😄

@jameslamb
Copy link
Collaborator Author

I found another one generated by the Dask tests, added it above.

np.savetxt("data_dask.csv", np.hstack([np.array([y]).T, X]), fmt="%f,%f,%f,%f,%f")

@jameslamb
Copy link
Collaborator Author

@Hitro147 Are you still interested in pursuing this?

@Hitro147
Copy link

Hello @jameslamb,

I'm facing some issues with my current environment, but I'll need time to resolve them. However, I need to put it on hold for a while, if it's open after a while I'd like to return to it when I have more time. Feel free to assign this to someone if they are interested in this.

Thanks for giving me this opportunity! 😄

@jameslamb
Copy link
Collaborator Author

Ok sure, no problem. Comment here or on #6350 any time if you need help.

Anyone else reading this... you are welcome to contribute! A PR even just eliminating one of these left-behind files would be greatly appreciated 😊

@Arup-Chauhan
Copy link

@jameslamb, I would like to contribute to this issue, or any related good first issue (as there are multiple mentioned), here in the repository

@jameslamb
Copy link
Collaborator Author

Sure! This is a great issue to start with @Arup-Chauhan .

I recommend focusing on a single file like categorical.model in your first contribution, to get used to the process. You can find where it's used like this:

git grep -E 'categorical\.model'

Thanks for spending some time on LightGBM, we really appreciate it!

@Arup-Chauhan
Copy link

Arup-Chauhan commented Jun 13, 2024

Hi @jameslamb , thanks for this, I will get started, will reach out to you if I need assistance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants