Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Story] Config and Experiment Management - MVP #2

Closed
roma-glushko opened this issue May 11, 2021 · 1 comment
Closed

[Story] Config and Experiment Management - MVP #2

roma-glushko opened this issue May 11, 2021 · 1 comment

Comments

@roma-glushko
Copy link
Owner

roma-glushko commented May 11, 2021

Description

Config Management

For small to middle (side) projects and experiments, it's useful to have a simple, straightforward and flexible configuration system. It's not convenient to copy-past even simple plain dictionary/class based configuration loading code.

Also, when using configs, we need to be able to:

  • access nested values in the easy and readable way
  • change different moving parts of the experiment (e.g. loss functions, optimisers, LR schedulers, feature extractors, architectures or parts of them)
  • load secrets from a place that is not under version control, so all env variables could be safety stored separately and loaded along with other configs.

Experiment Management

In the same time, it may be helpful to have a simple and straightforward experiment management system. We can't focus too deep on this part as we probably never replace Neptune.ai or W&B functionality.

However, in some cases, installing them may be an overhead and all that scientist may need is a just plain and quick way to understand what is the best scored experiment so far and how it was produced.

In other cases, they may not be compatible with ML/DL framework version you need (like TF 2.5 RC Ampere-compatible version may not be compatible with Neptune dependency list)

Solution

Add a straightforward and simple way to manage configs/hyperparams and track experiment outcomes.

Configurations

We may based our configs on python dictionaries specified as separate files/modules. This should be a flexible approach that allows to make configurable such a pieces of code as augmentation pipelines. This would be annoying to add another layer of abstraction to just be able to experiment with augmentation.

Another useful thing would be to have reusable factories which allows to register different "moving" components (e.g. losses). So we could combine configured class/type value and create an instance of it via the factories.

Experiment Tracking

It's a bit harder to imagine a simple experiment management system to keep straightforward and still useful. The system would be local and file-based. All information could be logged in separated directories where experiment details could be logged.

We would like to see the following information logged:

  • entire script output
  • backup of hyperparams
  • GIT commit hash of the project
  • GIT diff patch (many changes are done on the fly, so this may be helpful)
  • ability to backup specific files (in case, it's not feasible to make them configurable)
  • ability to dump and save any artifacts (like training history or embeddings)

We are not going to focus on more advanced way of reusing logged information like plotting learning curves or embeddings.
However, DVC provides a simple way to plot some kind of information and we could use the system compatible with their plotting functionality.

Let's call it an MVP.

References

The main source of inspiration came from Kaggle master's pipelines and Hydra config system:

@roma-glushko roma-glushko added the enhancement New feature or request label May 11, 2021
@roma-glushko roma-glushko self-assigned this May 11, 2021
@roma-glushko roma-glushko linked a pull request May 11, 2021 that will close this issue
@roma-glushko roma-glushko changed the title Config and Experiment Management Config and Experiment Management - MVP May 11, 2021
@roma-glushko roma-glushko removed a link to a pull request Jul 11, 2021
@roma-glushko roma-glushko added this to the Experiment Tracker MVP v1 milestone Oct 3, 2021
@roma-glushko roma-glushko changed the title Config and Experiment Management - MVP [Story] Config and Experiment Management - MVP Oct 3, 2021
@roma-glushko
Copy link
Owner Author

It's time to review what I have done so far from this story and move the rest to separate tickets:

  • based our configs on python dictionaries specified as separate files/modules ✅
  • access nested values in the easy and readable way ✅
  • change different moving parts of the experiment (e.g. loss functions, optimisers, LR schedulers, feature extractors, architectures or parts of them) ✅
  • load secrets from a place that is not under version control, so all env variables could be safety stored separately and loaded along with other configs - moved to Pull values that is stored in ENV files #18
  • track entire script output ✅
  • backup hyperparams ✅
  • GIT commit hash of the project ✅
  • GIT diff patch ✅
  • ability to backup specific files - moved to Add an ability to backup needed files #19
  • ability to dump and save any artifacts ✅
  • DVC provides a simple way to plot some kind of information and we could use the system compatible with their plotting functionality - needs investigation [Story] Investigate possibilities to plot charts #20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant