-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update AutoML to check for imbalanced binary or category output features #2052
Conversation
imbalanced_output = True | ||
break | ||
return imbalanced_output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more simple to return True
directly and then return False
at the end of the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is written to handle if there is more than one output feature
(the break is only out of the inner loop) and to log info for each imbalanced feature.
The function returns True if any of the output_features is imbalanced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. If there's value in logging each imbalanced output feature, then I agree that we shouldn't short-circuit.
ludwig/automl/utils.py
Outdated
@@ -178,3 +193,20 @@ def set_output_feature_metric(base_config): | |||
base_config[HYPEROPT]["metric"] = output_metric | |||
base_config[HYPEROPT]["goal"] = output_goal | |||
return base_config | |||
|
|||
|
|||
def check_imbalanced_output(base_config, features_metadata): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add type hint that this returns bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ludwig/automl/utils.py
Outdated
@@ -178,3 +193,20 @@ def set_output_feature_metric(base_config): | |||
base_config[HYPEROPT]["metric"] = output_metric | |||
base_config[HYPEROPT]["goal"] = output_goal | |||
return base_config | |||
|
|||
|
|||
def check_imbalanced_output(base_config, features_metadata): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Rename to has_imbalanced_output()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
for more information, see https://pre-commit.ci
imbalanced_output = True | ||
break | ||
return imbalanced_output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. If there's value in logging each imbalanced output feature, then I agree that we shouldn't short-circuit.
Update AutoML to check for imbalanced binary or category output features.
Currently detection of such features produces an informational log message.
In the future, AutoML could use the detection to trigger specific functionality.
Tested on the AutoML datasets in the experiments repo.