Scalable hyperparameters selection #1060

vlad-tokarev · 2019-10-28T14:49:41Z

Summary: we need to provide a tool for the user so that he can run multiple training workers with different values of hyperparameters.
Below subtasks:

Develop the syntax for describing a strategy for hyperparameters variation (examples below are just concepts – you should design syntax by yourself and get approve from the team). We suggest to implement parent legion API resource that will include regular ModelTrainingTemplate resource with other parameters for scalable running.
I know about three strategies for hyperparameters searching:
1.1. Manual – user hardcode fixed values for hyperparameters, review results, and modify them for the next run (Legion already allows do this).
Example

  hyperParameters:
    strategy: "default/fixed"
    alpha: "1.0"
    beta: "2.0"

1.2. Random search in space. Imagine that we have X hyperparameters, therefore if we want to run training with bruteforce all possible combinations for these parameters with some values, we will have N^X required runs. To decrease the complexity of searching in such a big space usually, a random selection of values in ranges is used
Example:

  hyperParameters:
    strategy: "random"
    countOfSelections: 10
    alpha: "[0.1, 10]", step: "0.1"
    beta: "[2, 50]", step: "2"
    gamma: "[0.1, 0.5]", step: "0.1"

1.3. Varying one hyperparameter in range, while others are fixed One of the hyperparameters is described with range syntax, others are hardcoded. System guarantee that all values for variated hyperparameter will be runned
Example:

  hyperParameters:
    strategy: "range"
    alpha: "[0.1, 10]", step: "0.1"
    beta: "2"
    gamma: "3"

The syntax for describing resources for scalable model training according to hyperparameters
Example:

  resources:
    workers: 5
    limits:
      cpu: 4024m
      memory: 4024Mi
    requests:
      cpu: 2024m
      memory: 2024Mi

The implementation of running scalable training processes for implementing the strategy that the user describe with distributed training on resources

Theory: https://en.wikipedia.org/wiki/Hyperparameter_optimization

The text was updated successfully, but these errors were encountered:

vlad-tokarev · 2019-11-11T15:41:37Z

Look at https://github.com/kubeflow/katib

vlad-tokarev · 2019-11-18T05:28:18Z

https://static.googleusercontent.com/media/research.google.com/ja//pubs/archive/bcb15507f4b52991a0783013df4222240e942381.pdf

mcdoker18 assigned vlad-tokarev Oct 28, 2019

mcdoker18 added the improvement [Changed] for changes in existing functionality label Oct 28, 2019

mcdoker18 added this to To do in Legion platform board via automation Oct 28, 2019

mcdoker18 added this to the 1.1 milestone Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scalable hyperparameters selection #1060

Scalable hyperparameters selection #1060

vlad-tokarev commented Oct 28, 2019 •

edited

Loading

vlad-tokarev commented Nov 11, 2019

vlad-tokarev commented Nov 18, 2019

Scalable hyperparameters selection #1060

Scalable hyperparameters selection #1060

Comments

vlad-tokarev commented Oct 28, 2019 • edited Loading

vlad-tokarev commented Nov 11, 2019

vlad-tokarev commented Nov 18, 2019

vlad-tokarev commented Oct 28, 2019 •

edited

Loading