# YAML-Python Katib Example Submission

The Katib SDK exposes the power of Python programming to Katib users; however, the code structure is significantly more complex than in the case of direct YAML submission using kubectl.

In contrast, direct Katib Experiment submission via YAML offers an easy-to-read (and to maintain) structure, but it is very basic, and it lacks advantages that the SDK route provides.

You can combine the strenghts of both approaches. This is accomplished in the following steps:
1. Use a YAML file for simplicity and clarity in defining the basic experiment (like a config or template).
2. Convert the YAML to a Python dictionary that is compatible with the SDK.
3. Make any runtime changes.
4. Submit the experiment.

The following steps run through a simple example.

### 0. Required Packages

Before you begin, it is assumed you have access to Katib on a Kubernetes cluster. You also need to have the `kubeflow-katib` library installed, which you can do by uncommenting and running the command below.

In [None]:
# install kubeflow-katib if needed
# !pip install kubeflow-katib==0.14.0

In [1]:
import yaml
import requests
import time
import datetime as dt
import kubeflow.katib as katib

### 1. Katib Experiment in YAML

You need an experiment to do a test run with. Use the `random.yaml` experiment from the examples in the Katib GitHub repository. [here](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/hp-tuning/random.yaml).

In [2]:
# download the random.yaml Katib Experiment file
url = "https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1beta1/hp-tuning/random.yaml"
random_yaml = requests.get(url)

In [3]:
# inspect first lines
print(random_yaml.text[:200])

---
apiVersion: kubeflow.org/v1beta1
kind: Experiment
metadata:
  namespace: kubeflow
  name: random
spec:
  objective:
    type: maximize
    goal: 0.99
    objectiveMetricName: Validation-accuracy
 


### 2. Load the Experiment into Python

Now, turn the YAML text into a Python dictionary.

In [4]:
experiment = yaml.safe_load(random_yaml.text)

In [5]:
experiment

{'apiVersion': 'kubeflow.org/v1beta1',
 'kind': 'Experiment',
 'metadata': {'namespace': 'kubeflow', 'name': 'random'},
 'spec': {'objective': {'type': 'maximize',
   'goal': 0.99,
   'objectiveMetricName': 'Validation-accuracy',
   'additionalMetricNames': ['Train-accuracy']},
  'algorithm': {'algorithmName': 'random'},
  'parallelTrialCount': 3,
  'maxTrialCount': 12,
  'maxFailedTrialCount': 3,
  'parameters': [{'name': 'lr',
    'parameterType': 'double',
    'feasibleSpace': {'min': '0.01', 'max': '0.03'}},
   {'name': 'num-layers',
    'parameterType': 'int',
    'feasibleSpace': {'min': '2', 'max': '5'}},
   {'name': 'optimizer',
    'parameterType': 'categorical',
    'feasibleSpace': {'list': ['sgd', 'adam', 'ftrl']}}],
  'trialTemplate': {'primaryContainerName': 'training-container',
   'trialParameters': [{'name': 'learningRate',
     'description': 'Learning rate for the training model',
     'reference': 'lr'},
    {'name': 'numberLayers',
     'description': 'Number of tr

### 3. Make any needed Edits

It's common for users (or apps) to need to change parameters at runtime. The following code appends the current timestamp to the Experiment name, and then it updates the `maxTrialCount` and `parallelTrialCount` values.

The code also sets the namespace. If the namespace `kubeflow-user-example-com` doesn't exist on your cluster, you can create it prior to continuing, or you can update the line below with your own namespace. Make sure the namespace you use has the label `katib.kubeflow.org/metrics-collector-injection: enabled`.

In [6]:
# uncomment and run commands if needed to create/label namespace
# !kubectl create namespace kubeflow-user-example-com
# !kubectl label namespace kubeflow-user-example-com katib.kubeflow.org/metrics-collector-injection=enabled

In [7]:
ns = "kubeflow-user-example-com"  # change namespace if needed

dtime = dt.datetime.now().strftime("%Y-%m-%d-%M%H%S")
exp_name = f"{experiment['metadata']['name']}-{dtime}"
experiment["metadata"]["name"] = exp_name
experiment["metadata"]["namespace"] = ns
experiment["metadata"]["labels"] = {"katib.kubeflow.org/metrics-collector-injection": "enabled"}
experiment["spec"]["maxTrialCount"] = 10
experiment["spec"]["parallelTrialCount"] = 2

### 4. Submit Katib Experiment

Submit the updated experiment to Katib.

In [8]:
client = katib.KatibClient()
ns = "kubeflow-user-example-com"  # change namespace if needed
result = client.create_experiment(experiment, namespace=ns)

The following code monitors the progress of the experiment until it either succeeds or fails, or the timeout is reached. You may also wish to visit the Katib UI to check on the experiment there.

In [9]:
timeout = 30*60  # 30 minutes in seconds
status = None
prev_time = dt.datetime.now()
last_msg = ""
while status not in ["Succeeded", "Failed"] and timeout > 0:
    try:
        status = client.get_experiment_status(exp_name, namespace=ns)
    except IndexError:
        status = None
    exp = client.get_experiment(exp_name, namespace=ns)
    trials_success = exp.get("status", {}).get("trialsSucceeded")
    msg = f"Experiment status: {status}  Trials Succeeded: {trials_success}"
    tstamp = dt.datetime.now()
    if msg != last_msg:
        print(f"{tstamp.strftime('%H:%M:%S')} {msg}")
        last_msg = msg
    if status in ["Succeeded", "Failed"]: break
    time.sleep(10)
    cur_time = dt.datetime.now()
    timeout -= (tstamp - prev_time).total_seconds()
    prev_time = tstamp

12:59:32 Experiment status: Created  Trials Succeeded: None
12:59:52 Experiment status: Running  Trials Succeeded: None
13:02:52 Experiment status: Running  Trials Succeeded: 1
13:03:42 Experiment status: Running  Trials Succeeded: 2
13:04:32 Experiment status: Running  Trials Succeeded: 4
13:05:13 Experiment status: Running  Trials Succeeded: 5
13:05:23 Experiment status: Running  Trials Succeeded: 6
13:06:03 Experiment status: Running  Trials Succeeded: 8
13:07:53 Experiment status: Running  Trials Succeeded: 9
13:08:03 Experiment status: Succeeded  Trials Succeeded: 10


Finally, we can check that the experiment succeeded, and we can get the optimal hyperparameter results.

In [10]:
client.is_experiment_succeeded(exp_name, ns)

True

In [12]:
client.get_optimal_hyperparameters(exp_name, namespace=ns)

{'currentOptimalTrial': {'bestTrialName': 'random-2022-11-10-591205-bvk9hdk8',
  'observation': {'metrics': [{'latest': '0.979896',
     'max': '0.979896',
     'min': '0.958499',
     'name': 'Validation-accuracy'},
    {'latest': '0.993737',
     'max': '0.993737',
     'min': '0.918094',
     'name': 'Train-accuracy'}]},
  'parameterAssignments': [{'name': 'lr', 'value': '0.024085346582757364'},
   {'name': 'num-layers', 'value': '3'},
   {'name': 'optimizer', 'value': 'sgd'}]}}