Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] regression tests: model files #4407

Closed
4 tasks
jameslamb opened this issue Jun 25, 2021 · 2 comments
Closed
4 tasks

[ci] regression tests: model files #4407

jameslamb opened this issue Jun 25, 2021 · 2 comments

Comments

@jameslamb
Copy link
Collaborator

Summary

This project's continuous integration (CI) should include a job which tests that LightGBM model files produced by previous versions can be successfully loaded and used in newer versions.

Specifically, it should test the following claim:

Model files produced in LightGBM version (N).x.x should be readable and usable in all versions in the same major version series.

It should also include tests of expected compatibility between other versions. For example, if 4.0.0 does not include breaking changes to saving / loading of model files, then a test should be added that such a file created in LightGBM 3.2.1 can be loaded in LightGBM 4.0.0.

"model files" refers to the following:

  • (Python, R, C++) models saved to string using LGBM_BoosterSaveModelToString()
  • (Python, R, C++) models saved to text file using LGBM_BoosterSaveModel()
  • (Python) pickled lightgbm.Booster objects (saved with cloudpickle, joblib, or pickle)
  • [R] .rds files created with saveRDS.lgb.Booster() or saveRDS()

Motivation

LightGBM uses semantic versioning for releases. As a result, users expect that there will not be breaking changes within a major release series. For example, they expect that a Booster saved to a text file using LightGBM 3.1.0 will be readable in any other LightGBM 3.x.x release.

Adding explicit tests on that fact might provide greater confidence that releases are not introducing such changes, and might help to catch issues like #3778 (PR #4056) before they are merged.

References

Created based on #4228 (comment).

See https://lightgbm.readthedocs.io/en/latest/Parallel-Learning-Guide.html#saving-dask-models for some documentation that explains the different ways that one type of LightGBM model object (Dask estimators in Python) can be saved.

saveRDS() for R objects will only work once #4208 is addressed.

@jameslamb
Copy link
Collaborator Author

This issue has been added to #2302 with other feature requests. I'd like to leave it open for a few days in case others want to add comments, since I just locked discussion on #4228.

After a few days, this issue will be closed until someone leaves a comment saying they'd like to work on it.

@jameslamb
Copy link
Collaborator Author

Ok now that this has been open for a few days, I am going to close it. If you're reading this and would like to work on this, please comment below and it can be re-opened!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant