forked from automl/auto-sklearn
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Development #1
Merged
Merged
Development #1
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* update meta-data * more configurations * New meta-features, update computation * unify meta-feature computation to use same code in smbo.py and offline metadata computation * new meta-features based on that * new meta-data based on running BO for 2 days * update unit tests
On our cluster the meta-data creation fails currently due to the job freezing when initiating shutil.copytree. This PR changes the behavior to make meta-data creation also work in case of this failing as the necessary files are now stored directly in the real working directory.
When encoding a pandas array in autosklearn.data.validator, the columns are re-ordered by the ColumnTransformer. This PR re-orders the feature types so that when passing the data to the actual ML pipeline, columns and feature types are sorted the same way.
* store the selector in the home directory of the user following https://specifications.freedesktop.org/basedir-spec/ This means that by default the selector is put into ~/.cache/auto-sklearn/ * make the AutoSklearn2Classifier picklable by replacing closures with callable classes * the initial issue using Lock objects does no longer exist as they were removed when we introduced dask for parallelism
Co-authored-by: Rohit Agarwal <rohit.agarwal4@aexp.com>
* Ensemble with Dask * flake fix * More debug info on failure * Feedback from comments * Fix test log file * move to proc * Flake8 * Move to dask fixture * Move test to use the fixture * Increase run time to remove random crash * Address commit feedback * Fix outputs * test pytest fixtures * continue moving to pytest tests * close dask client and cluster * start local cluster differently * more debug output * run all tests again * replace close by shutdown * more debug output * shutdown and close clients * proactively delete dask client objects * fix fixture directory, reduce debug output * Incorporate feedback from comments * intermediate commit refactoring unit tests * Added pytest test to ensemble * Fixing dummy classifier merge conflict * additional msg to dict * Only one active ensemble * Feedback from comments * Added missing pickle test * Moving to thread based ensemble * sleep when needed * Minor changes to ensemble scheduling * use future.result() to wait for a future instead of active wait with sleep * store hash of ensemble training data in status pickle file * build ensemble via SMAC callback * fix ensemble time limit * bump SMAC requirement * PEP8 * fix tests? * further stabilize tests * robustify examples * improve unit tests Co-authored-by: chico <francisco.rivera.valverde@gmail.com>
* Improve managing disk space * fixes a bug in the ensemble builder that would cause the ensemble building to break when giving a limit on the disk space to use * allow more fine-grained control over what files to save on disk * make the output directory optional and only create it if it is actively passed in by the user * fix bug in logging function * Improve cv models directories (#993) * restructure directories for models and predictions * pep8 and mypy * fix tests, include offline feedback * update unittests after rebase * add forgotten files * add forgotten file * fix tests * fix merge issues * fix unit tests * minor improvements * remove print statement
* Improve ensemble selection memory usage * separate storage of data inside the ensemble builder into two dictionaries to separately store them on disk (one for scores and one for predictions). If we run over the allocated memory, we can still make use of the stored scores * do not stop the ensemble builder when the number of models to consider can no longer be reduced so that the ensemble builder can still delete models from the hard drive if necessary. * avoid unnecessary memory copies during ensemble construction * improve structure of temporary directories during unit testing * delete files before building an ensemble * flake8 * reorder function calls in ensemble builder
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.