Update AutoML to check for imbalanced binary or category output features #2052

amholler · 2022-05-23T21:00:17Z

Update AutoML to check for imbalanced binary or category output features.

Currently detection of such features produces an informational log message.
In the future, AutoML could use the detection to trigger specific functionality.

Tested on the AutoML datasets in the experiments repo.

for more information, see https://pre-commit.ci

github-actions · 2022-05-23T21:42:12Z

Unit Test Results

      6 files ±0       6 suites ±0 2h 17m 55s ⏱️ + 6m 58s
2 798 tests ±0 2 763 ✔️ ±0   35 💤 ±0 0 ❌ ±0
8 394 runs ±0 8 285 ✔️ ±0 109 💤 ±0 0 ❌ ±0

Results for commit 3075b97. ± Comparison against base commit dd6ba79.

♻️ This comment has been updated with latest results.

justinxzhao · 2022-05-23T23:48:02Z

ludwig/automl/utils.py

+                        imbalanced_output = True
+                    break
+    return imbalanced_output


Would it be more simple to return True directly and then return False at the end of the function?

The function is written to handle if there is more than one output feature
(the break is only out of the inner loop) and to log info for each imbalanced feature.
The function returns True if any of the output_features is imbalanced.

I see. If there's value in logging each imbalanced output feature, then I agree that we shouldn't short-circuit.

justinxzhao · 2022-05-23T23:48:25Z

ludwig/automl/utils.py

@@ -178,3 +193,20 @@ def set_output_feature_metric(base_config):
        base_config[HYPEROPT]["metric"] = output_metric
        base_config[HYPEROPT]["goal"] = output_goal
    return base_config
+
+
+def check_imbalanced_output(base_config, features_metadata):


nit: Add type hint that this returns bool.

justinxzhao · 2022-05-23T23:48:50Z

ludwig/automl/utils.py

@@ -178,3 +193,20 @@ def set_output_feature_metric(base_config):
        base_config[HYPEROPT]["metric"] = output_metric
        base_config[HYPEROPT]["goal"] = output_goal
    return base_config
+
+
+def check_imbalanced_output(base_config, features_metadata):


nit: Rename to has_imbalanced_output()

for more information, see https://pre-commit.ci

justinxzhao · 2022-05-25T17:10:37Z

ludwig/automl/utils.py

+                        imbalanced_output = True
+                    break
+    return imbalanced_output


I see. If there's value in logging each imbalanced output feature, then I agree that we shouldn't short-circuit.

…ut features (#2052)" This reverts commit d60b722.

anneholler and others added 3 commits May 23, 2022 13:56

Update AutoML to check for imbalanced output features

2c6463e

[pre-commit.ci] auto fixes from pre-commit.com hooks

51112a9

for more information, see https://pre-commit.ci

Address flake8 issue with line length

690cc89

Merge from master

465db4f

justinxzhao reviewed May 23, 2022

View reviewed changes

anneholler and others added 2 commits May 23, 2022 17:32

Respond to review comments

0f9f46f

[pre-commit.ci] auto fixes from pre-commit.com hooks

3075b97

for more information, see https://pre-commit.ci

justinxzhao approved these changes May 25, 2022

View reviewed changes

justinxzhao merged commit d60b722 into ludwig-ai:master May 25, 2022

tgaddair added a commit that referenced this pull request May 29, 2022

Revert "Update AutoML to check for imbalanced binary or category outp…

18202ad

…ut features (#2052)" This reverts commit d60b722.

tgaddair mentioned this pull request May 29, 2022

Revert "Update AutoML to check for imbalanced binary or category outp… #2074

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update AutoML to check for imbalanced binary or category output features #2052

Update AutoML to check for imbalanced binary or category output features #2052

amholler commented May 23, 2022 •

edited

Loading

github-actions bot commented May 23, 2022 •

edited

Loading

justinxzhao May 23, 2022

amholler May 24, 2022 •

edited

Loading

justinxzhao May 25, 2022

justinxzhao May 23, 2022

amholler May 24, 2022

justinxzhao May 23, 2022

amholler May 24, 2022

justinxzhao May 25, 2022

Update AutoML to check for imbalanced binary or category output features #2052

Update AutoML to check for imbalanced binary or category output features #2052

Conversation

amholler commented May 23, 2022 • edited Loading

github-actions bot commented May 23, 2022 • edited Loading

Unit Test Results

justinxzhao May 23, 2022

Choose a reason for hiding this comment

amholler May 24, 2022 • edited Loading

Choose a reason for hiding this comment

justinxzhao May 25, 2022

Choose a reason for hiding this comment

justinxzhao May 23, 2022

Choose a reason for hiding this comment

amholler May 24, 2022

Choose a reason for hiding this comment

justinxzhao May 23, 2022

Choose a reason for hiding this comment

amholler May 24, 2022

Choose a reason for hiding this comment

justinxzhao May 25, 2022

Choose a reason for hiding this comment

amholler commented May 23, 2022 •

edited

Loading

github-actions bot commented May 23, 2022 •

edited

Loading

amholler May 24, 2022 •

edited

Loading