Update get_repeatable_train_val_test_split to handle non-stratified split w/ no existing split #2237

amholler · 2022-07-06T16:25:47Z

As titled. Tested using new unit test added and on experiments repo regression datasets with no existing split column.

Bonus change: Avoid AutoML crash on dataset containing column of all NaNs (kdd).

…plit w/no existing split

for more information, see https://pre-commit.ci

github-actions · 2022-07-06T17:17:58Z

Unit Test Results

      6 files ±0       6 suites ±0 2h 20m 40s ⏱️ - 6m 15s
2 914 tests ±0 2 868 ✔️ ±0   46 💤 ±0 0 ❌ ±0
8 742 runs ±0 8 600 ✔️ ±0 142 💤 ±0 0 ❌ ±0

Results for commit 5ec5011. ± Comparison against base commit 27e0b9b.

♻️ This comment has been updated with latest results.

ludwig/utils/dataset_utils.py

justinxzhao · 2022-07-06T20:13:07Z

Thanks!

anneholler and others added 2 commits July 6, 2022 09:23

Update get_repeatable_train_val_test_split to handle non-stratified s…

8aa7bac

…plit w/no existing split

[pre-commit.ci] auto fixes from pre-commit.com hooks

408524b

for more information, see https://pre-commit.ci

anneholler added 2 commits July 6, 2022 10:39

Avoid AutoML crash on dataset having column with only NaNs (kdd)

12b181d

update

29313d0

justinxzhao approved these changes Jul 6, 2022

View reviewed changes

ludwig/utils/dataset_utils.py Outdated Show resolved Hide resolved

anneholler added 2 commits July 6, 2022 11:13

Address review comment

2efadf0

Correct return Tuple on get_distinct_values

5ec5011

justinxzhao merged commit d121eeb into ludwig-ai:master Jul 6, 2022