Development #1

rabsr · 2020-11-10T10:33:33Z

No description provided.

* update meta-data * more configurations * New meta-features, update computation * unify meta-feature computation to use same code in smbo.py and offline metadata computation * new meta-features based on that * new meta-data based on running BO for 2 days * update unit tests

On our cluster the meta-data creation fails currently due to the job freezing when initiating shutil.copytree. This PR changes the behavior to make meta-data creation also work in case of this failing as the necessary files are now stored directly in the real working directory.

When encoding a pandas array in autosklearn.data.validator, the columns are re-ordered by the ColumnTransformer. This PR re-orders the feature types so that when passing the data to the actual ML pipeline, columns and feature types are sorted the same way.

* better debugging for #772 * flake8 * fix tests * better assert? * re-order print and queue population

* store the selector in the home directory of the user following https://specifications.freedesktop.org/basedir-spec/ This means that by default the selector is put into ~/.cache/auto-sklearn/ * make the AutoSklearn2Classifier picklable by replacing closures with callable classes * the initial issue using Lock objects does no longer exist as they were removed when we introduced dask for parallelism

Co-authored-by: Rohit Agarwal <rohit.agarwal4@aexp.com>

* Ensemble with Dask * flake fix * More debug info on failure * Feedback from comments * Fix test log file * move to proc * Flake8 * Move to dask fixture * Move test to use the fixture * Increase run time to remove random crash * Address commit feedback * Fix outputs * test pytest fixtures * continue moving to pytest tests * close dask client and cluster * start local cluster differently * more debug output * run all tests again * replace close by shutdown * more debug output * shutdown and close clients * proactively delete dask client objects * fix fixture directory, reduce debug output * Incorporate feedback from comments * intermediate commit refactoring unit tests * Added pytest test to ensemble * Fixing dummy classifier merge conflict * additional msg to dict * Only one active ensemble * Feedback from comments * Added missing pickle test * Moving to thread based ensemble * sleep when needed * Minor changes to ensemble scheduling * use future.result() to wait for a future instead of active wait with sleep * store hash of ensemble training data in status pickle file * build ensemble via SMAC callback * fix ensemble time limit * bump SMAC requirement * PEP8 * fix tests? * further stabilize tests * robustify examples * improve unit tests Co-authored-by: chico <francisco.rivera.valverde@gmail.com>

* Improve managing disk space * fixes a bug in the ensemble builder that would cause the ensemble building to break when giving a limit on the disk space to use * allow more fine-grained control over what files to save on disk * make the output directory optional and only create it if it is actively passed in by the user * fix bug in logging function * Improve cv models directories (#993) * restructure directories for models and predictions * pep8 and mypy * fix tests, include offline feedback * update unittests after rebase * add forgotten files * add forgotten file * fix tests * fix merge issues * fix unit tests * minor improvements * remove print statement

* Improve ensemble selection memory usage * separate storage of data inside the ensemble builder into two dictionaries to separately store them on disk (one for scores and one for predictions). If we run over the allocated memory, we can still make use of the stored scores * do not stop the ensemble builder when the number of models to consider can no longer be reduced so that the ensemble builder can still delete models from the hard drive if necessary. * avoid unnecessary memory copies during ensemble construction * improve structure of temporary directories during unit testing * delete files before building an ensemble * flake8 * reorder function calls in ensemble builder

mfeurer and others added 15 commits October 6, 2020 20:30

Better debugging for #772 (#968)

3f5bd20

* better debugging for #772 * flake8 * fix tests * better assert? * re-order print and queue population

hotfix for parallel Auto-sklearn2

4893089

handling logging error (#980)

8bdebc3

Co-authored-by: Rohit Agarwal <rohit.agarwal4@aexp.com>

Lets see if this passes tests (#948)

c363cd6

Release 0.11

9e04bd8

Fix encoding of exit status of the pynisher (#1001)

facef6b

FIX #989: pass y to data preprocessors

5f5e0db

MAINT #1000: minimal dask.distributed version

1bb3d88

rabsr merged commit 9d89b66 into rabsr:development Nov 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development #1

Development #1

rabsr commented Nov 10, 2020

Development #1

Development #1

Conversation

rabsr commented Nov 10, 2020