Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a script to run the TableVectorizer on all openml datasets #665

Merged
merged 13 commits into from
Aug 3, 2023

Conversation

LeoGrin
Copy link
Contributor

@LeoGrin LeoGrin commented Jul 20, 2023

No description provided.

@LeoGrin LeoGrin marked this pull request as draft July 20, 2023 15:03
@LilianBoulard LilianBoulard added the benchmarks Something related to the benchmarks label Jul 21, 2023
@LilianBoulard
Copy link
Member

LilianBoulard commented Jul 24, 2023

Also, it might be useful to implement a hot-load functionality (which is already part the benchmark framework), in case, for example, OpenML shuts off during the run. Adding a parameter --retry-errors would be useful in that sense.

Edit: nevermind, the hot load functionality is not yet merged, as it's part of #593

@LilianBoulard
Copy link
Member

Ah, the diff broke for some reason. I could fix it on one of my PRs by doing this:

  1. Copy the branch to another name (e.g. run_on_openml_save)
  2. Delete the original branch (i.e. run_on_openml)
  3. Checkout main, pull, checkout new branch with the same name (i.e. run_on_openml)
  4. Cherry-pick commits from the save
  5. Force-push the branch to your fork

@LeoGrin LeoGrin marked this pull request as ready for review July 29, 2023 14:46
@LeoGrin
Copy link
Contributor Author

LeoGrin commented Jul 29, 2023

138 tasks raised errors. Some are not linked to skrub (Nans in y, mixed types in y...). The only error linked to skrub is #679 (127 times).

Copy link
Member

@GaelVaroquaux GaelVaroquaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Merging. Thank you!

@GaelVaroquaux GaelVaroquaux merged commit 99b67a7 into skrub-data:main Aug 3, 2023
21 checks passed
LeoGrin added a commit to LeoGrin/skrub that referenced this pull request Aug 24, 2023
…rub-data#665)

* create script

* cache

* Use loguru for logging, various code improvements, slightly better doc and messages

* Fix condition

* fix import bug

* fix bug for empty evals

* fix 0 featues

* improvements

* Update benchmarks/run_on_openml_datasets.py

Co-authored-by: Lilian <lilian@boulard.fr>

* import Counter

* test commit

* remove test commit

* fix bug

---------

Co-authored-by: Lilian <lilian@boulard.fr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarks Something related to the benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants