# Hyper-Parameter Tuning with Kubeflow

Hyperparameter optimization or tuning chooses a set of optimal hyperparameters, parameters that control the learning process, for a learning algorithm. The set of hyperparameters yield an optimal model that minimizes a pre-defined loss function on given test data. 

There are many approaches for HPO: 
- grid search
- random search
- bayesian optimization
- gradient-based optimization
- evolutionary optimization
- population based training


# Katib

The [Katib](https://github.com/kubeflow/katib) project is inspired by the [Google Vizier Paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf). 

Katib is a scalable and flexible hyperparameter tuning framework and is tightly integrated with Kubernetes. It does not depend on any specific deep learning framework (such as TensorFlow, MXNet, or PyTorch).

Here are some notes on Katib:
* Optimizes a given objective metric such as validation accuracy 
* Supports Int, Double, Discrete, and Categorical parameter ranges
* Option for early stopping

# Hyper-Parameter Tuning Examples 

With Kubeflow Katib, we will run popular hyper-parameter tuning algorithms including `random search`, `grid search`, `bayesian optimization` and `hyperband`.

# Random Search

In [1]:
!kubectl create -f random-search-example.yaml

experiment.kubeflow.org/random-example created


If you check manifest, you will see


```yaml
parameters:
- name: --lr
  parameterType: double
  feasibleSpace:
    min: "0.01"
    max: "0.03"
- name: --num-layers
  parameterType: int
  feasibleSpace:
    min: "2"
    max: "5"
- name: --optimizer
  parameterType: categorical
  feasibleSpace:
    list:
    - sgd
    - adam
    - ftrl
```


This job generates 3 hyperparameters, parameter type and range are also listed.

* --lr (Learning Rate) - type: double
* --num-layers (Number of NN Layer) - type: int
* --optimizer (optimizer) - type: categorical

The demo should start an experiment and run three jobs with different parameters. You can run following command to check job status.

When the `spec.Status.Condition` changes to Completed, the experiment is finished.


In [2]:
!kubectl describe experiment random-example

Name:         random-example
Namespace:    anonymous
Labels:       controller-tools.k8s.io=1.0
Annotations:  <none>
API Version:  kubeflow.org/v1alpha3
Kind:         Experiment
Metadata:
  Creation Timestamp:  2020-08-23T00:26:28Z
  Finalizers:
    update-prometheus-metrics
  Generation:        2
  Resource Version:  84379
  Self Link:         /apis/kubeflow.org/v1alpha3/namespaces/anonymous/experiments/random-example
  UID:               746bd5f4-7460-42e4-96b9-c6ce04b99b09
Spec:
  Algorithm:
    Algorithm Name:        random
    Algorithm Settings:    <nil>
  Max Failed Trial Count:  3
  Max Trial Count:         12
  Metrics Collector Spec:
    Collector:
      Kind:  StdOut
  Objective:
    Additional Metric Names:
      accuracy
    Goal:                   0.99
    Objective Metric Name:  Validation-accuracy
    Type:                   maximize
  Parallel Trial Count:     3
  Parameters:
    Feasible Space:
      Max:           0.03
      Min:      

# Navigate to Katib to Monitor the Hyper-Parameter Tuning Jobs

You can monitor your results in the Katib UI. If you installed Kubeflow using the deployment guide, you can access the Katib UI at `https://<your kubeflow endpoint>/katib/`

For `random-experiment`, please go to `HP (HypterParameter)` -> `Monitor` -> `random-experiment`.

![katib-experiment-selection.png](./images/katib-experiment-selection.png)

### Pick up best parameters in from results

Once you click job and go the detail page, you will see different combination of parameters and accuracy.


| trialName  | Validation-accuracy 	| accuracy 	| --lr 	| --num-layers 	| --optimizer|
|----------------------------|----------|----------|----------------------|---|------|
| random-experiment-rfwwbnsd | 0.974920 | 0.984844 | 0.013831565266960293 | 4 | sgd  |
| random-experiment-vxgwlgqq | 0.113854 | 0.116646 | 0.024225789898529138 | 4 | ftrl |
| random-experiment-wclrwlcq | 0.979697 | 0.998437 | 0.021916171239020756 | 4 | sgd  |
| random-experiment-7lsc4pwb | 0.113854 | 0.115312 | 0.024163810384272653 | 5 | ftrl |
| random-experiment-86vv9vgv | 0.963475 | 0.971562 | 0.02943228249244735  | 3 | adam |
| random-experiment-jh884cxz | 0.981091 | 0.999219 | 0.022372025623908262 | 2 | sgd  |
| random-experiment-sgtwhrgz | 0.980693 | 0.997969 | 0.016641686851083654 | 4 | sgd  |
| random-experiment-c6vvz6dv | 0.980792 | 0.998906 | 0.0264125850165842   | 3 | sgd  |
| random-experiment-vqs2xmfj | 0.113854 | 0.105313 | 0.026629394628228185 | 4 | ftrl |
| random-experiment-bv8lsh2m | 0.980195 | 0.999375 | 0.021769570793012488 | 2 | sgd  |
| random-experiment-7vbnqc7z | 0.113854 | 0.102188 | 0.025079750575740783 | 4 | ftrl |
| random-experiment-kwj9drmg | 0.979498 | 0.995469 | 0.014985919312945063 | 4 | sgd  |


![katib-experiment-result.png](./images/katib-experiment-result.png)

You can also click trail name to check Trial data.

> Note: All rest examples are different optimization algorithms.  
> The way to submit the job and check job lifecycle is same as random-search-example we did.

# Grid Search

In [3]:
!kubectl create -f grid-example.yaml

experiment.kubeflow.org/grid-example created


In [4]:
!kubectl describe experiment grid-example

Name:         grid-example
Namespace:    anonymous
Labels:       controller-tools.k8s.io=1.0
Annotations:  <none>
API Version:  kubeflow.org/v1alpha3
Kind:         Experiment
Metadata:
  Creation Timestamp:  2020-08-23T00:26:30Z
  Finalizers:
    update-prometheus-metrics
  Generation:        2
  Resource Version:  84410
  Self Link:         /apis/kubeflow.org/v1alpha3/namespaces/anonymous/experiments/grid-example
  UID:               cdd94200-4d62-4be1-9d67-6cc5aeba0ee7
Spec:
  Algorithm:
    Algorithm Name:        grid
    Algorithm Settings:    <nil>
  Max Failed Trial Count:  3
  Max Trial Count:         12
  Metrics Collector Spec:
    Collector:
      Kind:  StdOut
  Objective:
    Additional Metric Names:
      accuracy
    Goal:                   0.99
    Objective Metric Name:  Validation-accuracy
    Type:                   maximize
  Parallel Trial Count:     3
  Parameters:
    Feasible Space:
      Max:           0.01
      Min:           0

# Bayesian

BayesOpt: A toolbox for bayesian optimization, experimental design and stochastic bandits.

In [5]:
!kubectl create -f bayesopt-example.yaml

experiment.kubeflow.org/bayesopt-example created


In [6]:
!kubectl describe experiment bayesopt-example

Name:         bayesopt-example
Namespace:    anonymous
Labels:       controller-tools.k8s.io=1.0
Annotations:  <none>
API Version:  kubeflow.org/v1alpha3
Kind:         Experiment
Metadata:
  Creation Timestamp:  2020-08-23T00:26:31Z
  Finalizers:
    update-prometheus-metrics
  Generation:        1
  Resource Version:  84444
  Self Link:         /apis/kubeflow.org/v1alpha3/namespaces/anonymous/experiments/bayesopt-example
  UID:               5e93e227-78e3-4f24-abf4-2e585f57a7c1
Spec:
  Algorithm:
    Algorithm Name:  bayesianoptimization
    Algorithm Settings:
      Name:                random_state
      Value:               10
  Max Failed Trial Count:  3
  Max Trial Count:         12
  Metrics Collector Spec:
    Collector:
      Kind:  StdOut
  Objective:
    Additional Metric Names:
      accuracy
    Goal:                   0.99
    Objective Metric Name:  Validation-accuracy
    Type:                   maximize
  Parallel Trial Count:     3
  Pa

# Hyperband

In [7]:
!kubectl create -f hyperband-example.yaml

experiment.kubeflow.org/hyperband-example created


In [8]:
!kubectl describe experiment hyperband-example

Name:         hyperband-example
Namespace:    anonymous
Labels:       <none>
Annotations:  <none>
API Version:  kubeflow.org/v1alpha3
Kind:         Experiment
Metadata:
  Creation Timestamp:  2020-08-23T00:26:33Z
  Finalizers:
    update-prometheus-metrics
  Generation:        1
  Resource Version:  84473
  Self Link:         /apis/kubeflow.org/v1alpha3/namespaces/anonymous/experiments/hyperband-example
  UID:               53ead790-170f-4d69-b4a9-78a43135caf4
Spec:
  Algorithm:
    Algorithm Name:  hyperband
    Algorithm Settings:
      Name:                resource_name
      Value:               --num-epochs
      Name:                eta
      Value:               3
      Name:                r_l
      Value:               9
  Max Failed Trial Count:  9
  Max Trial Count:         9
  Metrics Collector Spec:
    Collector:
      Kind:  StdOut
  Objective:
    Additional Metric Names:
      accuracy
    Goal:                   0.99
    Objective Metr

# Navigate to Katib to Monitor All Hyper-Parameter Tuning Jobs
![katib-experiment-selection.png](./images/katib-experiment-selection.png)