Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update AutoML to check for imbalanced binary or category output features #2052

Merged
merged 6 commits into from
May 25, 2022

Conversation

amholler
Copy link
Collaborator

@amholler amholler commented May 23, 2022

Update AutoML to check for imbalanced binary or category output features.

Currently detection of such features produces an informational log message.
In the future, AutoML could use the detection to trigger specific functionality.

Tested on the AutoML datasets in the experiments repo.

@github-actions
Copy link

github-actions bot commented May 23, 2022

Unit Test Results

       6 files  ±0         6 suites  ±0   2h 17m 55s ⏱️ + 6m 58s
2 798 tests ±0  2 763 ✔️ ±0    35 💤 ±0  0 ±0 
8 394 runs  ±0  8 285 ✔️ ±0  109 💤 ±0  0 ±0 

Results for commit 3075b97. ± Comparison against base commit dd6ba79.

♻️ This comment has been updated with latest results.

Comment on lines +210 to +212
imbalanced_output = True
break
return imbalanced_output
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more simple to return True directly and then return False at the end of the function?

Copy link
Collaborator Author

@amholler amholler May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is written to handle if there is more than one output feature
(the break is only out of the inner loop) and to log info for each imbalanced feature.
The function returns True if any of the output_features is imbalanced.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. If there's value in logging each imbalanced output feature, then I agree that we shouldn't short-circuit.

@@ -178,3 +193,20 @@ def set_output_feature_metric(base_config):
base_config[HYPEROPT]["metric"] = output_metric
base_config[HYPEROPT]["goal"] = output_goal
return base_config


def check_imbalanced_output(base_config, features_metadata):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add type hint that this returns bool.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -178,3 +193,20 @@ def set_output_feature_metric(base_config):
base_config[HYPEROPT]["metric"] = output_metric
base_config[HYPEROPT]["goal"] = output_goal
return base_config


def check_imbalanced_output(base_config, features_metadata):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Rename to has_imbalanced_output()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +210 to +212
imbalanced_output = True
break
return imbalanced_output
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. If there's value in logging each imbalanced output feature, then I agree that we shouldn't short-circuit.

@justinxzhao justinxzhao merged commit d60b722 into ludwig-ai:master May 25, 2022
tgaddair added a commit that referenced this pull request May 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants