Create a script to run the TableVectorizer on all openml datasets #665

LeoGrin · 2023-07-20T15:03:05Z

No description provided.

LilianBoulard · 2023-07-24T14:49:20Z

Also, it might be useful to implement a hot-load functionality (which is already part the benchmark framework), in case, for example, OpenML shuts off during the run. Adding a parameter --retry-errors would be useful in that sense.

Edit: nevermind, the hot load functionality is not yet merged, as it's part of #593

LilianBoulard · 2023-07-27T08:59:16Z

Ah, the diff broke for some reason. I could fix it on one of my PRs by doing this:

Copy the branch to another name (e.g. run_on_openml_save)
Delete the original branch (i.e. run_on_openml)
Checkout main, pull, checkout new branch with the same name (i.e. run_on_openml)
Cherry-pick commits from the save
Force-push the branch to your fork

benchmarks/run_on_openml_datasets.py

…c and messages

Co-authored-by: Lilian <lilian@boulard.fr>

LeoGrin · 2023-07-29T14:47:38Z

138 tasks raised errors. Some are not linked to skrub (Nans in y, mixed types in y...). The only error linked to skrub is #679 (127 times).

GaelVaroquaux

LGTM. Merging. Thank you!

…rub-data#665) * create script * cache * Use loguru for logging, various code improvements, slightly better doc and messages * Fix condition * fix import bug * fix bug for empty evals * fix 0 featues * improvements * Update benchmarks/run_on_openml_datasets.py Co-authored-by: Lilian <lilian@boulard.fr> * import Counter * test commit * remove test commit * fix bug --------- Co-authored-by: Lilian <lilian@boulard.fr>

LeoGrin marked this pull request as draft July 20, 2023 15:03

LilianBoulard added the benchmarks Something related to the benchmarks label Jul 21, 2023

LilianBoulard assigned LeoGrin Jul 21, 2023

LilianBoulard reviewed Jul 27, 2023

View reviewed changes

benchmarks/run_on_openml_datasets.py Outdated Show resolved Hide resolved

LeoGrin and others added 8 commits July 27, 2023 21:10

create script

3f70747

cache

ae7ded3

Use loguru for logging, various code improvements, slightly better do…

e0401e8

…c and messages

Fix condition

2d68a08

fix import bug

b565477

fix bug for empty evals

dad5627

fix 0 featues

b5f318e

improvements

9ab0a9e

LeoGrin force-pushed the run_on_openml branch from 1aba3e4 to 9ab0a9e Compare July 27, 2023 19:12

LeoGrin and others added 4 commits July 27, 2023 21:13

Update benchmarks/run_on_openml_datasets.py

033914d

Co-authored-by: Lilian <lilian@boulard.fr>

import Counter

9737613

test commit

a071795

remove test commit

bb392d1

LilianBoulard approved these changes Jul 28, 2023

View reviewed changes

fix bug

c944f09

LeoGrin marked this pull request as ready for review July 29, 2023 14:46

LeoGrin mentioned this pull request Jul 31, 2023

TableVectorizer fails on sparse Dataframes #679

Closed

GaelVaroquaux approved these changes Aug 3, 2023

View reviewed changes

GaelVaroquaux merged commit 99b67a7 into skrub-data:main Aug 3, 2023
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a script to run the TableVectorizer on all openml datasets #665

Create a script to run the TableVectorizer on all openml datasets #665

LeoGrin commented Jul 20, 2023

LilianBoulard commented Jul 24, 2023 •

edited

Loading

LilianBoulard commented Jul 27, 2023

LeoGrin commented Jul 29, 2023 •

edited

Loading

GaelVaroquaux left a comment

Create a script to run the TableVectorizer on all openml datasets #665

Create a script to run the TableVectorizer on all openml datasets #665

Conversation

LeoGrin commented Jul 20, 2023

LilianBoulard commented Jul 24, 2023 • edited Loading

LilianBoulard commented Jul 27, 2023

LeoGrin commented Jul 29, 2023 • edited Loading

GaelVaroquaux left a comment

Choose a reason for hiding this comment

LilianBoulard commented Jul 24, 2023 •

edited

Loading

LeoGrin commented Jul 29, 2023 •

edited

Loading