Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development #1

Merged
merged 15 commits into from
Nov 10, 2020
Merged

Development #1

merged 15 commits into from
Nov 10, 2020

Conversation

rabsr
Copy link
Owner

@rabsr rabsr commented Nov 10, 2020

No description provided.

mfeurer and others added 15 commits October 6, 2020 20:30
* update meta-data

* more configurations

* New meta-features, update computation

* unify meta-feature computation to use same code in smbo.py
  and offline metadata computation
* new meta-features based on that
* new meta-data based on running BO for 2 days

* update unit tests
On our cluster the meta-data creation fails currently due to the
job freezing when initiating shutil.copytree. This PR changes
the behavior to make meta-data creation also work in case of this
failing as the necessary files are now stored directly in the real
working directory.
When encoding a pandas array in autosklearn.data.validator,
the columns are re-ordered by the ColumnTransformer. This PR
re-orders the feature types so that when passing the data to
the actual ML pipeline, columns and feature types are sorted
the same way.
* better debugging for #772

* flake8

* fix tests

* better assert?

* re-order print and queue population
* store the selector in the home directory of the user following
  https://specifications.freedesktop.org/basedir-spec/ This means
  that by default the selector is put into ~/.cache/auto-sklearn/
* make the AutoSklearn2Classifier picklable by replacing closures
  with callable classes
* the initial issue using Lock objects does no longer exist as
  they were removed when we introduced dask for parallelism
Co-authored-by: Rohit Agarwal <rohit.agarwal4@aexp.com>
* Ensemble with Dask

* flake fix

* More debug info on failure

* Feedback from comments

* Fix test log file

* move to proc

* Flake8

* Move to dask fixture

* Move test to use the fixture

* Increase run time to remove random crash

* Address commit feedback

* Fix outputs

* test pytest fixtures

* continue moving to pytest tests

* close dask client and cluster

* start local cluster differently

* more debug output

* run all tests again

* replace close by shutdown

* more debug output

* shutdown and close clients

* proactively delete dask client objects

* fix fixture directory, reduce debug output

* Incorporate feedback from comments

* intermediate commit refactoring unit tests

* Added pytest test to ensemble

* Fixing dummy classifier merge conflict

* additional msg to dict

* Only one active ensemble

* Feedback from comments

* Added missing pickle test

* Moving to thread based ensemble

* sleep when needed

* Minor changes to ensemble scheduling

* use future.result() to wait for a future instead of active wait with
  sleep
* store hash of ensemble training data in status pickle file

* build ensemble via SMAC callback

* fix ensemble time limit

* bump SMAC requirement

* PEP8

* fix tests?

* further stabilize tests

* robustify examples

* improve unit tests

Co-authored-by: chico <francisco.rivera.valverde@gmail.com>
* Improve managing disk space

* fixes a bug in the ensemble builder that would cause the ensemble building to break when giving a limit on the disk space to use
* allow more fine-grained control over what files to save on disk
* make the output directory optional and only create it if it is actively passed in by the user

* fix bug in logging function

* Improve cv models directories (#993)

* restructure directories for models and predictions

* pep8 and mypy

* fix tests, include offline feedback

* update unittests after rebase

* add forgotten files

* add forgotten file

* fix tests

* fix merge issues

* fix unit tests

* minor improvements

* remove print statement
* Improve ensemble selection memory usage

* separate storage of data inside the ensemble builder into two dictionaries to separately store them on disk (one for scores and one for predictions). If we run over the allocated memory, we can still make use of the stored scores
* do not stop the ensemble builder when the number of models to consider can no longer be reduced so that the ensemble builder can still delete models from the hard drive if necessary.
* avoid unnecessary memory copies during ensemble construction
* improve structure of temporary directories during unit testing

* delete files before building an ensemble

* flake8

* reorder function calls in ensemble builder
@rabsr rabsr merged commit 9d89b66 into rabsr:development Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants