# HyperParameter tunning using  CMA-ES

In this example you will deploy 3 Katib Experiments with Covariance Matrix Adaptation Evolution Strategy (CMA-ES) using Jupyter Notebook and Katib SDK. These Experiments have various resume policies.

Reference documentation:
- https://www.kubeflow.org/docs/components/katib/experiment/#cmaes
- https://www.kubeflow.org/docs/components/katib/resume-experiment/

The notebook shows how to create, get, check status and delete an Experiment.

## Install Katib SDK

You need to install Katib SDK to run this Notebook.

In [None]:
# TODO (andreyvelich): Change to release version when SDK with the new APIs is published.
!pip install git+https://github.com/kubeflow/katib.git#subdirectory=sdk/python/v1beta1

## Import required packages

In [2]:
import copy

from kubeflow.katib import KatibClient
from kubernetes.client import V1ObjectMeta
from kubeflow.katib import V1beta1Experiment
from kubeflow.katib import V1beta1AlgorithmSpec
from kubeflow.katib import V1beta1ObjectiveSpec
from kubeflow.katib import V1beta1FeasibleSpace
from kubeflow.katib import V1beta1ExperimentSpec
from kubeflow.katib import V1beta1ObjectiveSpec
from kubeflow.katib import V1beta1ParameterSpec
from kubeflow.katib import V1beta1TrialTemplate
from kubeflow.katib import V1beta1TrialParameterSpec

## Define your Experiment

You have to create your Experiment object before deploying it. This Experiment is similar to [this](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/hp-tuning/cma-es.yaml) example.

In [46]:
# Experiment name and namespace.
namespace = "kubeflow-user-example-com"
experiment_name = "cmaes-example"

metadata = V1ObjectMeta(
    name=experiment_name,
    namespace=namespace
)

# Algorithm specification.
algorithm_spec=V1beta1AlgorithmSpec(
    algorithm_name="cmaes"
)

# Objective specification.
objective_spec=V1beta1ObjectiveSpec(
    type="minimize",
    goal= 0.001,
    objective_metric_name="loss",
)

# Experiment search space. In this example we tune learning rate, number of layer and optimizer.
parameters=[
    V1beta1ParameterSpec(
        name="lr",
        parameter_type="double",
        feasible_space=V1beta1FeasibleSpace(
            min="0.01",
            max="0.06"
        ),
    ),
    V1beta1ParameterSpec(
        name="momentum",
        parameter_type="double",
        feasible_space=V1beta1FeasibleSpace(
            min="0.5",
            max="0.9"
        ),
    ),
]

# JSON template specification for the Trial's Worker Kubernetes Job.
trial_spec={
    "apiVersion": "batch/v1",
    "kind": "Job",
    "spec": {
        "template": {
            "metadata": {
                "annotations": {
                    "sidecar.istio.io/inject": "false"
                }
            },
            "spec": {
                "containers": [
                    {
                        "name": "training-container",
                        "image": "docker.io/kubeflowkatib/pytorch-mnist-cpu:v0.14.0",
                        "command": [
                            "python3",
                            "/opt/pytorch-mnist/mnist.py",
                            "--epochs=1",
                            "--batch-size=64",
                            "--lr=${trialParameters.learningRate}",
                            "--momentum=${trialParameters.momentum}",
                        ]
                    }
                ],
                "restartPolicy": "Never"
            }
        }
    }
}

# Configure parameters for the Trial template.
trial_template=V1beta1TrialTemplate(
    primary_container_name="training-container",
    trial_parameters=[
        V1beta1TrialParameterSpec(
            name="learningRate",
            description="Learning rate for the training model",
            reference="lr"
        ),
        V1beta1TrialParameterSpec(
            name="momentum",
            description="Momentum for the training model",
            reference="momentum"
        ),
    ],
    trial_spec=trial_spec
)


# Experiment object.
experiment = V1beta1Experiment(
    api_version="kubeflow.org/v1beta1",
    kind="Experiment",
    metadata=metadata,
    spec=V1beta1ExperimentSpec(
        max_trial_count=3,
        parallel_trial_count=2,
        max_failed_trial_count=1,
        algorithm=algorithm_spec,
        objective=objective_spec,
        parameters=parameters,
        trial_template=trial_template,
    )
)

## Define Experiments with resume policy

We will define another 2 Experiments with ResumePolicy = Never and ResumePolicy = FromVolume.

Experiment with _Never_ resume policy can't be resumed, the Suggestion resources will be deleted.

Experiment with _FromVolume_ resume policy can be resumed, volume is attached to the Suggestion. Suggestion's PVC be created for the Suggestion.

In [47]:
experiment_never_resume_name = "never-resume-cmaes"
experiment_from_volume_resume_name = "from-volume-resume-cmaes"

# Create new Experiments from the previous Experiment info.
# Define Experiment with Never resume.
experiment_never_resume = copy.deepcopy(experiment)
experiment_never_resume.metadata.name = experiment_never_resume_name
experiment_never_resume.spec.resume_policy = "Never"
experiment_never_resume.spec.max_trial_count = 4

# Define Experiment with FromVolume resume.
experiment_from_volume_resume = copy.deepcopy(experiment)
experiment_from_volume_resume.metadata.name = experiment_from_volume_resume_name
experiment_from_volume_resume.spec.resume_policy = "FromVolume"
experiment_from_volume_resume.spec.max_trial_count = 4

You can print the Experiment's info to verify it before submission.

In [48]:
print(experiment.metadata.name)
print(experiment.spec.algorithm.algorithm_name)
print("-----------------")
print(experiment_never_resume.metadata.name)
print(experiment_never_resume.spec.resume_policy)
print("-----------------")
print(experiment_from_volume_resume.metadata.name)
print(experiment_from_volume_resume.spec.resume_policy)


cmaes-example
cmaes
-----------------
never-resume-cmaes
Never
-----------------
from-volume-resume-cmaes
FromVolume


## Create your Experiment

You have to create Katib client to use the SDK.

In [49]:
# Create Katib client.
kclient = KatibClient()

# Create your Experiment.
kclient.create_experiment(experiment,namespace=namespace)

Experiment kubeflow-user-example-com/cmaes-example has been created


### Create other Experiments

In [50]:
# Create Experiment with never resume.
kclient.create_experiment(experiment_never_resume,namespace=namespace)
# Create Experiment with from volume resume.
kclient.create_experiment(experiment_from_volume_resume,namespace=namespace)

Experiment kubeflow-user-example-com/never-resume-cmaes has been created


Experiment kubeflow-user-example-com/from-volume-resume-cmaes has been created


## Get your Experiment

You can get your Experiment by name and receive required data.

In [51]:
exp = kclient.get_experiment(name=experiment_name, namespace=namespace)
print(exp)
print("-----------------\n")

# Get the max trial count and latest status.
print(exp.spec.max_trial_count)
print(exp.status.conditions[-1])

{'api_version': 'kubeflow.org/v1beta1',
 'kind': 'Experiment',
 'metadata': {'annotations': None,
              'creation_timestamp': datetime.datetime(2023, 1, 6, 14, 28, 28, tzinfo=tzlocal()),
              'deletion_grace_period_seconds': None,
              'deletion_timestamp': None,
              'finalizers': ['update-prometheus-metrics'],
              'generate_name': None,
              'generation': 1,
              'labels': None,
              'managed_fields': [{'api_version': 'kubeflow.org/v1beta1',
                                  'fields_type': 'FieldsV1',
                                  'fields_v1': {'f:spec': {'.': {},
                                                           'f:algorithm': {'.': {},
                                                                           'f:algorithmName': {}},
                                                           'f:maxFailedTrialCount': {},
                                                           'f:maxTrialCount': {}

## Get all Experiments

You can get list of the current Experiments.

In [52]:
# Get names from the running Experiments.
exp_list = kclient.list_experiments(namespace=namespace)

for exp in exp_list:
    print(exp.metadata.name)

cmaes-example
from-volume-resume-cmaes
never-resume-cmaes


## Get the current Experiment conditions

You can check the current Experiment conditions and check if Experiment is Succeeded.

In [53]:
kclient.get_experiment_conditions(name=experiment_name, namespace=namespace)


[{'last_transition_time': datetime.datetime(2023, 1, 6, 14, 28, 28, tzinfo=tzlocal()),
  'last_update_time': datetime.datetime(2023, 1, 6, 14, 28, 28, tzinfo=tzlocal()),
  'message': 'Experiment is created',
  'reason': 'ExperimentCreated',
  'status': 'True',
  'type': 'Created'},
 {'last_transition_time': datetime.datetime(2023, 1, 6, 14, 28, 52, tzinfo=tzlocal()),
  'last_update_time': datetime.datetime(2023, 1, 6, 14, 28, 52, tzinfo=tzlocal()),
  'message': 'Experiment is running',
  'reason': 'ExperimentRunning',
  'status': 'True',
  'type': 'Running'}]

In [54]:
kclient.is_experiment_succeeded(name=experiment_name, namespace=namespace)

False

## List of the current Trials

You can get list of the current Trials with the latest status.

In [55]:
# Trial list.
trial_list = kclient.list_trials(experiment_name=experiment_name, namespace=namespace)
for trial in trial_list:
    print(f"Trial Name: {trial.metadata.name}")
    print(f"Trial Status: {trial.status.conditions[-1]}\n")

Trial Name: cmaes-example-dd4x6tsh
Trial Status: {'last_transition_time': datetime.datetime(2023, 1, 6, 14, 30, 43, tzinfo=tzlocal()),
 'last_update_time': datetime.datetime(2023, 1, 6, 14, 30, 43, tzinfo=tzlocal()),
 'message': 'Trial is running',
 'reason': 'TrialRunning',
 'status': 'True',
 'type': 'Running'}

Trial Name: cmaes-example-f64n8vb5
Trial Status: {'last_transition_time': datetime.datetime(2023, 1, 6, 14, 30, 43, tzinfo=tzlocal()),
 'last_update_time': datetime.datetime(2023, 1, 6, 14, 30, 43, tzinfo=tzlocal()),
 'message': 'Trial has succeeded',
 'reason': 'TrialSucceeded',
 'status': 'True',
 'type': 'Succeeded'}

Trial Name: cmaes-example-l6zkx5jx
Trial Status: {'last_transition_time': datetime.datetime(2023, 1, 6, 14, 30, 45, tzinfo=tzlocal()),
 'last_update_time': datetime.datetime(2023, 1, 6, 14, 30, 45, tzinfo=tzlocal()),
 'message': 'Trial has succeeded',
 'reason': 'TrialSucceeded',
 'status': 'True',
 'type': 'Succeeded'}


## Get the optimal HyperParameters

You can get the current optimal Trial from your Experiment. For the each metric you can see the max, min and latest value.

In [56]:
# Optimal HPs.
kclient.get_optimal_hyperparameters(name=experiment_name, namespace=namespace)

{'best_trial_name': 'cmaes-example-l6zkx5jx',
 'observation': {'metrics': [{'latest': '0.955613',
                              'max': '0.955613',
                              'min': '0.955613',
                              'name': 'Validation-accuracy'},
                             {'latest': '0.922775',
                              'max': '0.922775',
                              'min': '0.922775',
                              'name': 'Train-accuracy'}]},
 'parameter_assignments': [{'name': 'lr', 'value': '0.04511033252270099'},
                           {'name': 'num-layers', 'value': '3'},
                           {'name': 'optimizer', 'value': 'sgd'}]}

## Status for the Suggestion objects

Once Experiment is Succeeded, you can check the Suggestion object status for more information about resume status.

For Experiment with FromVolume you should be able to check created PVC.

In [59]:
# Get the current Suggestion status for the never resume Experiment.
suggestion = kclient.get_suggestion(name=experiment_never_resume_name, namespace=namespace)

print(suggestion.status.conditions[-1].message)
print("-----------------")

# Get the current Suggestion status for the from volume Experiment.
suggestion = kclient.get_suggestion(name=experiment_from_volume_resume_name, namespace=namespace)

print(suggestion.status.conditions[-1].message)

Suggestion is succeeded, can't be restarted
-----------------
Suggestion is succeeded, suggestion volume is not deleted, can be restarted


## Delete your Experiments

You can delete your Experiments.

In [61]:
kclient.delete_experiment(name=experiment_name, namespace=namespace)
kclient.delete_experiment(name=experiment_never_resume_name, namespace=namespace)
kclient.delete_experiment(name=experiment_from_volume_resume_name, namespace=namespace)

Experiment kubeflow-user-example-com/cmaes-example has been deleted
Experiment kubeflow-user-example-com/never-resume-cmaes has been deleted
Experiment kubeflow-user-example-com/from-volume-resume-cmaes has been deleted
