Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lagom / executor / driver refactor #89

Merged
merged 6 commits into from
Mar 1, 2021
Merged

lagom / executor / driver refactor #89

merged 6 commits into from
Mar 1, 2021

Conversation

amacati
Copy link
Contributor

@amacati amacati commented Feb 22, 2021

Refactor the experiment function, executors and drivers.

This PR seeks to disentangle hyperparameter optimization, ablation studies and distributed training code. With the addition of more features to Maggy the code becomes confusing if it is not split into distinct units. As a first step, the lagom function and associated functionality was rewritten. Since the implications of changing the lagom function concern the whole module, the changes are substantial. In order to make changes retraceable, I give a short summary of each file with the respective changes. It will be easiest to check the files of the PR in the order they are given in this summary.

Disentangling the lagom function:

  • maggy/experiment.py: Experiment now acts as a simple wrapper around the different lagom functionalities. It sets up basic info such as the IDs and still contains the error handling. Now takes a config object instead of the previous parameters since some keywords make no sense in the distributed setting, but are necessary for optimization.
  • maggy/experiment_config.py: The config classes for lagom.
  • maggy/core/lagom/: Contains lagom_optimization.py, lagom_ablation.py and lagom_distributed.py. These are the target functions for the dispatcher in experiment.py. Splitting the code introduced some redundancy, but I'd rather have slightly similar scripts than a large, entangled single script. Might be able to reduce redundancies further in future versions.
  • maggy/distributed/: Since experiment now dispatches for all modes, the special treatment of distributed is no longer necessary and the folder was deleted.
  • maggy/core/executors/Executor.py: The Executor class previously acted as a dispatcher for the kind of monkey-patching that was to be performed on the training function. Since dispatching has been delegated to experiment.py, the Executor class is no longer necessary.
  • maggy/core/executors/dist_executor.py: Adaption to the use of EnvSing() introduced by Create Environment classes with common interface #76 and minor changes.
  • maggy/core/executors/trial_executor.py: Minor changes to keep consistency with dist_executor.py.

Refactoring the drivers

  • maggy/core/experiment_driver/Driver.py: The base class driver previously contained all the message digestion logic for hyperparameter training as well as several hp tuning specific functions. The new driver instead lets child classes register callbacks for the message types they want to support and starts this general message digestion thread. Also switched to config based initialization instead of keyword arguments.
  • maggy/core/experiment_driver/OptimizationDriver.py: Moved most of the message digestion logic into optimization driver. Now defines and registers a callback for each message type for the digestion thread. Also moved several functions specific to hp tuning to the OptimizationDriver. Switched to config initialization.
  • maggy/core/experiment_driver/AblationDriver.py: Switched to config initialization. Simplified initialization logic and switched to config.
  • maggy/core/experiment_driver/DistributedDriver.py: Removed all code that was necessary to maintain compatibility with the previous version of the base driver. Adapted callback schema and config based initialization.

Miscellaneous changes

  • maggy/core/rpc.py: Linting changes to conform with global variable name style.
  • maggy/util.py: Added two functions to keep hops out of maggy dependencies. Fixed a bug from bug fix #88 .
  • examples/notebooks: Adapted notebooks to new lagom syntax.

Known issues
Ablation tests are still ongoing, so far there seems to be an issue with the heartbeat.

maggy/core/lagom/lagom_ablation.py Outdated Show resolved Hide resolved
maggy/core/lagom/lagom_ablation.py Outdated Show resolved Hide resolved
maggy/core/lagom/lagom_ablation.py Outdated Show resolved Hide resolved
maggy/core/lagom/lagom_ablation.py Outdated Show resolved Hide resolved
maggy/core/lagom/lagom_distributed.py Show resolved Hide resolved
maggy/core/lagom/lagom_optimization.py Show resolved Hide resolved
@RiccardoGrigoletto
Copy link
Contributor

It looks good to me, I tried it with tensorflow in my vm and works.

@amacati
Copy link
Contributor Author

amacati commented Feb 24, 2021

Why were there still calls to hopsutils and experiment_utils in the code? I converted those calls to EnvSing, pretty sure..

@RiccardoGrigoletto
Copy link
Contributor

It was in the first commit then you change them in the next commit. I didn't see it at the beginning but then I saw it and marked the comments as 'resolved'

Copy link
Contributor

@moritzmeister moritzmeister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good work! The callbacks are cool, much clearer now.

Two things:

  1. Can we change the Driver file names to lower case only? e.g. ablation_driver.py etc.
  2. What happened to grid search?

maggy/core/lagom/lagom_distributed.py Show resolved Hide resolved
maggy/core/experiment_driver/OptimizationDriver.py Outdated Show resolved Hide resolved
maggy/core/experiment_driver/OptimizationDriver.py Outdated Show resolved Hide resolved
@amacati
Copy link
Contributor Author

amacati commented Mar 1, 2021

Gridsearch is tested and should work.

Copy link
Contributor

@moritzmeister moritzmeister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

@moritzmeister moritzmeister merged commit 5b86d39 into logicalclocks:master Mar 1, 2021
@amacati amacati deleted the driver-refactor branch March 1, 2021 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants