# HyperParameter tunning using  CMA-ES

In this example you will deploy 3 Katib experiments with Covariance Matrix Adaptation Evolution Strategy (CMA-ES) using Jupyter Notebook and Katib SDK. These experiments have various resume policies.

The notebook shows how to create, get, check status and delete experiment.

# Install required package

In [2]:
pip install kubeflow-katib

Defaulting to user installation because normal site-packages is not writeable
Collecting kubeflow-katib
  Downloading kubeflow_katib-0.0.5-py3-none-any.whl (112 kB)
[K     |████████████████████████████████| 112 kB 34.4 MB/s eta 0:00:01
[?25hCollecting table-logger>=0.3.5
  Downloading table_logger-0.3.6-py3-none-any.whl (14 kB)
Installing collected packages: table-logger, kubeflow-katib
Successfully installed kubeflow-katib-0.0.5 table-logger-0.3.6
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


## Restart the Notebook kernel to use SDK package

In [None]:
from IPython.display import display_html
display_html("<script>Jupyter.notebook.kernel.restart()</script>",raw=True)

## Import required packages

In [1]:
import copy

from kubeflow.katib.api.katib_client import KatibClient
from kubernetes.client import V1ObjectMeta
from kubeflow.katib import V1beta1Experiment
from kubeflow.katib import V1beta1AlgorithmSpec
from kubeflow.katib import V1beta1ObjectiveSpec
from kubeflow.katib import V1beta1FeasibleSpace
from kubeflow.katib import V1beta1ExperimentSpec
from kubeflow.katib import V1beta1ObjectiveSpec
from kubeflow.katib import V1beta1ParameterSpec
from kubeflow.katib import V1beta1TrialTemplate
from kubeflow.katib import V1beta1TrialParameterSpec

## Define experiment

You have to create experiment object before deploying it. This experiment is similar to [this](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/cmaes-example.yaml) example.

In [2]:
# Experiment metadata
namespace = "anonymous"
experiment_name = "cmaes-example"

metadata = V1ObjectMeta(
    name=experiment_name,
    namespace=namespace
)

# Algorithm specification
algorithm_spec=V1beta1AlgorithmSpec(
    algorithm_name="cmaes"
)

# Objective specification
objective_spec=V1beta1ObjectiveSpec(
    type="maximize",
    goal= 0.99,
    objective_metric_name="Validation-accuracy",
    additional_metric_names=["Train-accuracy"]
)

# Experiment search space. In this example we tune learning rate, number of layer and optimizer.
parameters=[
    V1beta1ParameterSpec(
        name="lr",
        parameter_type="double",
        feasible_space=V1beta1FeasibleSpace(
            min="0.01",
            max="0.06"
        ),
    ),
    V1beta1ParameterSpec(
        name="num-layers",
        parameter_type="int",
        feasible_space=V1beta1FeasibleSpace(
            min="2",
            max="5"
        ),
    ),
    V1beta1ParameterSpec(
        name="optimizer",
        parameter_type="categorical",
        feasible_space=V1beta1FeasibleSpace(
            list=["sgd", "adam", "ftrl"]
        ),
    ),
]



# JSON trial template specification
trial_spec={
    "apiVersion": "batch/v1",
    "kind": "Job",
    "spec": {
        "template": {
            "spec": {
                "containers": [
                    {
                        "name": "training-container",
                        "image": "docker.io/kubeflowkatib/mxnet-mnist",
                        "command": [
                            "python3",
                            "/opt/mxnet-mnist/mnist.py",
                            "--batch-size=64",
                            "--lr=${trialParameters.learningRate}",
                            "--num-layers=${trialParameters.numberLayers}",
                            "--optimizer=${trialParameters.optimizer}"
                        ]
                    }
                ],
                "restartPolicy": "Never"
            }
        }
    }
}

# Template with trial parameters and trial spec
trial_template=V1beta1TrialTemplate(
    trial_parameters=[
        V1beta1TrialParameterSpec(
            name="learningRate",
            description="Learning rate for the training model",
            reference="lr"
        ),
        V1beta1TrialParameterSpec(
            name="numberLayers",
            description="Number of training model layers",
            reference="num-layers"
        ),
        V1beta1TrialParameterSpec(
            name="optimizer",
            description="Training model optimizer (sdg, adam or ftrl)",
            reference="optimizer"
        ),
    ],
    trial_spec=trial_spec
)


# Experiment object
experiment = V1beta1Experiment(
    api_version="kubeflow.org/v1beta1",
    kind="Experiment",
    metadata=metadata,
    spec=V1beta1ExperimentSpec(
        max_trial_count=7,
        parallel_trial_count=3,
        max_failed_trial_count=3,
        algorithm=algorithm_spec,
        objective=objective_spec,
        parameters=parameters,
        trial_template=trial_template,
    )
)

# Define experiments with resume policy

We will define another 2 experiments with ResumePolicy = Never and ResumePolicy = FromVolume.

Experiment with _Never_ resume policy can't be resumed, suggestion resources will be deleted.

Experiment with _FromVolume_ resume policy can be resumed, volume is attached to suggestion. PVC and PV should be created for suggestion.

In [10]:
experiment_never_resume_name = "never-resume-cmaes"
experiment_from_volume_resume_name = "from-volume-resume-cmaes"

# Create new experiments from previous experiment info
# Define experiment with never resume
experiment_never_resume = copy.deepcopy(experiment)
experiment_never_resume.metadata.name = experiment_never_resume_name
experiment_never_resume.spec.resume_policy = "Never"
experiment_never_resume.spec.max_trial_count = 4

# Define experiment with from volume resume
experiment_from_volume_resume = copy.deepcopy(experiment)
experiment_from_volume_resume.metadata.name = experiment_from_volume_resume_name
experiment_from_volume_resume.spec.resume_policy = "FromVolume"
experiment_from_volume_resume.spec.max_trial_count = 4

You can print experiment's info to verify it before submission

In [11]:
print(experiment.metadata.name)
print(experiment.spec.algorithm.algorithm_name)
print("-----------------")
print(experiment_never_resume.metadata.name)
print(experiment_never_resume.spec.resume_policy)
print("-----------------")
print(experiment_from_volume_resume.metadata.name)
print(experiment_from_volume_resume.spec.resume_policy)


cmaes-example
cmaes
-----------------
never-resume-cmaes
Never
-----------------
from-volume-resume-cmaes
FromVolume


# Create experiment

You have to create Katib client to use SDK

In [12]:
# Create client
kclient = KatibClient()

# Create experiment
kclient.create_experiment(experiment,namespace=namespace)

{'apiVersion': 'kubeflow.org/v1beta1',
 'kind': 'Experiment',
 'metadata': {'creationTimestamp': '2020-09-14T23:15:47Z',
  'generation': 1,
  'name': 'cmaes-example',
  'namespace': 'anonymous',
  'resourceVersion': '127102635',
  'selfLink': '/apis/kubeflow.org/v1beta1/namespaces/anonymous/experiments/cmaes-example',
  'uid': '68c43a20-6926-4586-9440-6a7930d7712d'},
 'spec': {'algorithm': {'algorithmName': 'cmaes'},
  'maxFailedTrialCount': 3,
  'maxTrialCount': 7,
  'metricsCollectorSpec': {'collector': {'kind': 'StdOut'}},
  'objective': {'additionalMetricNames': ['Train-accuracy'],
   'goal': 0.99,
   'metricStrategies': [{'name': 'Validation-accuracy', 'value': 'max'},
    {'name': 'Train-accuracy', 'value': 'max'}],
   'objectiveMetricName': 'Validation-accuracy',
   'type': 'maximize'},
  'parallelTrialCount': 3,
  'parameters': [{'feasibleSpace': {'max': '0.06', 'min': '0.01'},
    'name': 'lr',
    'parameterType': 'double'},
   {'feasibleSpace': {'max': '5', 'min': '2'},
    

Create other experiments

In [13]:
# Create experiment with never resume
kclient.create_experiment(experiment_never_resume,namespace=namespace)
# Create experiment with from volume resume
kclient.create_experiment(experiment_from_volume_resume,namespace=namespace)

{'apiVersion': 'kubeflow.org/v1beta1',
 'kind': 'Experiment',
 'metadata': {'creationTimestamp': '2020-09-14T23:16:04Z',
  'generation': 1,
  'name': 'from-volume-resume-cmaes',
  'namespace': 'anonymous',
  'resourceVersion': '127102800',
  'selfLink': '/apis/kubeflow.org/v1beta1/namespaces/anonymous/experiments/from-volume-resume-cmaes',
  'uid': '65173495-e76b-4136-88fa-a688790150cd'},
 'spec': {'algorithm': {'algorithmName': 'cmaes'},
  'maxFailedTrialCount': 3,
  'maxTrialCount': 4,
  'metricsCollectorSpec': {'collector': {'kind': 'StdOut'}},
  'objective': {'additionalMetricNames': ['Train-accuracy'],
   'goal': 0.99,
   'metricStrategies': [{'name': 'Validation-accuracy', 'value': 'max'},
    {'name': 'Train-accuracy', 'value': 'max'}],
   'objectiveMetricName': 'Validation-accuracy',
   'type': 'maximize'},
  'parallelTrialCount': 3,
  'parameters': [{'feasibleSpace': {'max': '0.06', 'min': '0.01'},
    'name': 'lr',
    'parameterType': 'double'},
   {'feasibleSpace': {'max': 

# Get experiment

You can get experiment by name and receive required data

In [16]:
exp = kclient.get_experiment(name=experiment_name, namespace=namespace)
print(exp)
print("-----------------\n")

# Get max trial count and last status
print(exp["spec"]["maxTrialCount"])
print(exp["status"]["conditions"][-1])

{'apiVersion': 'kubeflow.org/v1beta1', 'kind': 'Experiment', 'metadata': {'creationTimestamp': '2020-09-14T23:15:47Z', 'finalizers': ['update-prometheus-metrics'], 'generation': 1, 'name': 'cmaes-example', 'namespace': 'anonymous', 'resourceVersion': '127103016', 'selfLink': '/apis/kubeflow.org/v1beta1/namespaces/anonymous/experiments/cmaes-example', 'uid': '68c43a20-6926-4586-9440-6a7930d7712d'}, 'spec': {'algorithm': {'algorithmName': 'cmaes'}, 'maxFailedTrialCount': 3, 'maxTrialCount': 7, 'metricsCollectorSpec': {'collector': {'kind': 'StdOut'}}, 'objective': {'additionalMetricNames': ['Train-accuracy'], 'goal': 0.99, 'metricStrategies': [{'name': 'Validation-accuracy', 'value': 'max'}, {'name': 'Train-accuracy', 'value': 'max'}], 'objectiveMetricName': 'Validation-accuracy', 'type': 'maximize'}, 'parallelTrialCount': 3, 'parameters': [{'feasibleSpace': {'max': '0.06', 'min': '0.01'}, 'name': 'lr', 'parameterType': 'double'}, {'feasibleSpace': {'max': '5', 'min': '2'}, 'name': 'num-

# Get all experiments

You can get list of current experiments

In [17]:
# Get names from running experiments
exp_list = kclient.get_experiment(namespace=namespace)

for exp in exp_list["items"]:
    print(exp["metadata"]["name"])

cmaes-example
from-volume-resume-cmaes
never-resume-cmaes


# Get current experiment status

You can check current experiment status

In [22]:
kclient.get_experiment_status(name=experiment_name, namespace=namespace)

'Succeeded'

You can check if experiment is succeeded

In [23]:
kclient.is_experiment_succeeded(name=experiment_name, namespace=namespace)

True

# List of current trials

You can get list of current trials with latest status

In [24]:
# List trials
kclient.list_trials(name=experiment_name, namespace=namespace)

[{'name': 'cmaes-example-7jm8qj6m', 'status': 'Succeeded'},
 {'name': 'cmaes-example-b6pbtrm8', 'status': 'Succeeded'},
 {'name': 'cmaes-example-c8f55mvb', 'status': 'Succeeded'},
 {'name': 'cmaes-example-d2l7mwnb', 'status': 'Succeeded'},
 {'name': 'cmaes-example-j9rlhrfc', 'status': 'Succeeded'},
 {'name': 'cmaes-example-mmlvn8sg', 'status': 'Succeeded'},
 {'name': 'cmaes-example-vhjxdbfx', 'status': 'Succeeded'}]

# Get optimal HyperParameters

You can get current optimal trial from experiment. For each metric you can see max, min and latest value.

In [25]:
# Optimal HPs
kclient.get_optimal_hyperparameters(name=experiment_name, namespace=namespace)

{'currentOptimalTrial': {'bestTrialName': 'cmaes-example-vhjxdbfx',
  'observation': {'metrics': [{'latest': '0.980295',
     'max': '0.980295',
     'min': '0.963774',
     'name': 'Validation-accuracy'},
    {'latest': '0.990988',
     'max': '0.991654',
     'min': '0.925773',
     'name': 'Train-accuracy'}]},
  'parameterAssignments': [{'name': 'lr', 'value': '0.04511033252270099'},
   {'name': 'num-layers', 'value': '3'},
   {'name': 'optimizer', 'value': 'sgd'}]}}

# Status for suggestion objects

You can check suggestion object status for more information about resume status.

For experiment with FromVolume you should be able to check created PVC and PV.

In [28]:
# Get never resume experiment's suggestion status
suggestion = kclient.get_suggestion(name=experiment_never_resume_name, namespace=namespace)

print(suggestion["status"]["conditions"][-1]["message"])
print("-----------------")

# Get from volume resume experiment's suggestion status
suggestion = kclient.get_suggestion(name=experiment_from_volume_resume_name, namespace=namespace)

print(suggestion["status"]["conditions"][-1]["message"])


Suggestion is succeeded, can't be restarted
-----------------
Suggestion is succeeded, suggestion volume is not deleted, can be restarted


# Delete experiments

You can delete experiments

In [29]:
kclient.delete_experiment(name=experiment_name, namespace=namespace)
kclient.delete_experiment(name=experiment_never_resume_name, namespace=namespace)
kclient.delete_experiment(name=experiment_from_volume_resume_name, namespace=namespace)

{'apiVersion': 'kubeflow.org/v1beta1',
 'kind': 'Experiment',
 'metadata': {'creationTimestamp': '2020-09-14T23:16:04Z',
  'deletionGracePeriodSeconds': 0,
  'deletionTimestamp': '2020-09-14T23:33:24Z',
  'finalizers': ['update-prometheus-metrics'],
  'generation': 2,
  'name': 'from-volume-resume-cmaes',
  'namespace': 'anonymous',
  'resourceVersion': '127110528',
  'selfLink': '/apis/kubeflow.org/v1beta1/namespaces/anonymous/experiments/from-volume-resume-cmaes',
  'uid': '65173495-e76b-4136-88fa-a688790150cd'},
 'spec': {'algorithm': {'algorithmName': 'cmaes'},
  'maxFailedTrialCount': 3,
  'maxTrialCount': 4,
  'metricsCollectorSpec': {'collector': {'kind': 'StdOut'}},
  'objective': {'additionalMetricNames': ['Train-accuracy'],
   'goal': 0.99,
   'metricStrategies': [{'name': 'Validation-accuracy', 'value': 'max'},
    {'name': 'Train-accuracy', 'value': 'max'}],
   'objectiveMetricName': 'Validation-accuracy',
   'type': 'maximize'},
  'parallelTrialCount': 3,
  'parameters': [