Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize GBM feature names to remove JSON special characters #3326

Merged
merged 8 commits into from
May 12, 2023

Conversation

jeffkinnison
Copy link
Contributor

LGBM Dataseta are incmpatible with features with JSON special characters in their names (e.g. :, [, ], {, }). Currently, we raise an exception when Dataset creation fails in this way. This update adds feature name sanitizers local to LGBMTrainer and LGBMRayTrainer that cleans the feature names for Dataset creation without altering the names globally.

@github-actions
Copy link

github-actions bot commented Apr 6, 2023

Unit Test Results

  6 files  ±       0    6 suites  ±0   1h 17m 11s ⏱️ - 7m 39s
33 tests  - 2 726  29 ✔️  - 2 718    4 💤  - 8  0 ±0 
99 runs   - 2 714  87 ✔️  - 2 709  12 💤  - 5  0 ±0 

Results for commit 389bf80. ± Comparison against base commit d4f8ccc.

♻️ This comment has been updated with latest results.

@jeffkinnison jeffkinnison changed the title [Draft] Sanitize GBM feature names to remove JSON special characters Sanitize GBM feature names to remove JSON special characters Apr 11, 2023
@jeffkinnison jeffkinnison marked this pull request as ready for review April 11, 2023 16:54
ludwig/trainers/trainer_lightgbm.py Outdated Show resolved Hide resolved
ludwig/trainers/trainer_lightgbm.py Outdated Show resolved Hide resolved
@jeffkinnison
Copy link
Contributor Author

We're finally green! Just want to get a quick re-review before merging.

Copy link
Collaborator

@justinxzhao justinxzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@jeffkinnison jeffkinnison merged commit f76b637 into master May 12, 2023
15 checks passed
@jeffkinnison jeffkinnison deleted the gbm-json-special-characters branch May 12, 2023 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants