Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix postprocessing on binary feature columns with number dtype #2189

Merged
merged 16 commits into from
Jun 24, 2022

Conversation

geoffreyangus
Copy link
Collaborator

@geoffreyangus geoffreyangus commented Jun 23, 2022

This PR ensures that the postprocessing behavior before #2058 is preserved when dealing with binary feature columns of dtype float/int. This is done primarily by reverting the changes introduced by #2058 in ludwig/features/binary_feature.py and ludwig/visualize.py. You can see the full changes in binary_feature.py and visualize.py relative to 0.5.2 here and here, respectively.

That said, in order to achieve this, we add several checks upstream in cast_column to ensure that the old behavior when determining dtypes is preserved. The one caveat is that this requires calling compute(), which could be expensive in large datasets.

@geoffreyangus geoffreyangus marked this pull request as draft June 23, 2022 18:28
@github-actions
Copy link

github-actions bot commented Jun 23, 2022

Unit Test Results

       6 files  ±  0         6 suites  ±0   2h 22m 4s ⏱️ + 5m 30s
2 846 tests +  6  2 812 ✔️ +  7    34 💤 ±0  0  - 1 
8 538 runs  +18  8 432 ✔️ +19  106 💤 ±0  0  - 1 

Results for commit eb78129. ± Comparison against base commit 36787ba.

♻️ This comment has been updated with latest results.

Copy link
Collaborator

@justinxzhao justinxzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM. Not totally sure why this change would causes tests to be flaky.

@geoffreyangus geoffreyangus marked this pull request as ready for review June 24, 2022 19:30
@geoffreyangus geoffreyangus merged commit 48731ae into master Jun 24, 2022
@geoffreyangus geoffreyangus deleted the fix-binary-postprocessing branch June 24, 2022 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants