# How to run a katib experiment in a pipeline

Katib is a framework native to Kubernetes that also works with RedHat Openshift. Its purpose is to tune hyperparameters. \
In this example, we train a NLP model that uses BERT and performs sentiment analysis on a dataset of plain-text IMDB movie reviews.

Katib runs from a YAML file that contains:
- The parameters of the model
- The number of trials in total
- The number of parallel trials
- The number of failed trials allowed
- An objective function
- A search algorithm
- A metric collector specs that indicates how metrics are collected. The default option is StdOut but in this example, we collect metrics in a file. The default format is "metric_name=value". 
- A trial template that contains necessary information for running the experiment, including the container specs and the image that contains the model to tune, a command to launch and a set of parameters to tune. In this example, GPU usage has also been set in the "resources" section of the trial spec JSON. 

The final yaml looks like the **example.yaml** file. However, if you want to run Katib using Kubeflow Pipelines from a Jupyter notebook, you must create the previous yaml file using Python and the Kubeflow-Katib SDK. The SDK is compatible with Kubeflow v1beta1 and can be installed using <code>pip install kubeflow-katib</code>. \
The SDK contains all the classes for creating the YAML file requirements listed above. The documentation can be found in the following Github repo: https://github.com/kubeflow/katib/tree/master/sdk/python/v1beta1/docs

Finally, the experiment component is created from Kubeflow launcher component. This one has been rebuilt for IBM Power Systems and has to be loaded from the file **katib-pipelines-launcher.yaml** in this folder.

More information in the official documentation: https://www.kubeflow.org/docs/components/katib/experiment/

## Running the experiment

In [1]:
import kfp
import kfp.dsl as dsl
from kfp import components

from kubeflow.katib import ApiClient
from kubeflow.katib import V1beta1ExperimentSpec
from kubeflow.katib import V1beta1AlgorithmSpec
from kubeflow.katib import V1beta1ObjectiveSpec
from kubeflow.katib import V1beta1ParameterSpec
from kubeflow.katib import V1beta1FeasibleSpace
from kubeflow.katib import V1beta1TrialTemplate
from kubeflow.katib import V1beta1TrialParameterSpec
from kubeflow.katib import V1beta1MetricsCollectorSpec
from kubeflow.katib import V1beta1CollectorSpec
from kubeflow.katib import V1beta1SourceSpec
from kubeflow.katib import V1beta1FileSystemPath

experiment_name = "katib-e2e"

In [217]:
def launch_katib_experiment(experiment_name, experiment_namespace):
    # Trial count specification.
    max_trial_count = 12
    max_failed_trial_count = 3
    parallel_trial_count = 3

    objective = V1beta1ObjectiveSpec(
        type="minimize",
        goal=2,
        objective_metric_name="loss"
    )

    # Algorithm specification.
    algorithm = V1beta1AlgorithmSpec(
        algorithm_name="random",
    )

    parameters = [
        V1beta1ParameterSpec(
            name="epochs",
            parameter_type="int",
            feasible_space=V1beta1FeasibleSpace(
                min="1",
                max="2"
            ),
        ),
        V1beta1ParameterSpec(
            name="initlr",
            parameter_type="double",
            feasible_space=V1beta1FeasibleSpace(
                min=0.05,
                max=0.5
            ),
        )
    ]
    
    metrics_collector_spec = V1beta1MetricsCollectorSpec(
        collector=V1beta1CollectorSpec(kind="File"),
        source=V1beta1SourceSpec(
            file_system_path=V1beta1FileSystemPath(
                kind="File",
                path="/tmp/output.txt"
            )
        )
    )

    trial_spec = {
        "apiVersion": "batch/v1",
        "kind": "Job",
        "spec": {
            "template": {
                "metadata": {
                    "annotations": {
                        "sidecar.istio.io/inject": "false"
                    }
                },
                "spec": {
                    "containers": [
                        {
                            "name": "training-container",
                            "image": "quay.io/jeremie_ch/bert_model:latest",
                            "resources": {
                                    "limits": {
                                        "nvidia.com/gpu": 1
                                    }
                            },
                            "command": [
                                "python",
                                "/opt/bert_model.py",
                                "--epochs=${trialParameters.Epochs}",
                                "--initlr=${trialParameters.LearningRate}"
                            ]
                        }
                    ],
                    "restartPolicy": "Never"
                }
            }
        }
    }

    trial_template = V1beta1TrialTemplate(
        primary_container_name="training-container",
        trial_parameters=[
            V1beta1TrialParameterSpec(
                name="Epochs",
                description="Number of epochs",
                reference="epochs"
            ),
            V1beta1TrialParameterSpec(
                name="LearningRate",
                description="Initial learning rate",
                reference="initlr"
            ),
        ],
        trial_spec=trial_spec
    )

    experiment_spec = V1beta1ExperimentSpec(
        max_trial_count=max_trial_count,
        max_failed_trial_count=max_failed_trial_count,
        parallel_trial_count=parallel_trial_count,
        objective=objective,
        algorithm=algorithm,
        parameters=parameters,
        trial_template=trial_template,
        metrics_collector_spec=metrics_collector_spec
    )

    katib_experiment_launcher_op = components.load_component_from_file("katib-pipelines-launcher.yaml")
    op = katib_experiment_launcher_op(
        experiment_name=experiment_name,
        experiment_namespace=experiment_namespace,
        experiment_spec=ApiClient().sanitize_for_serialization(experiment_spec),
        experiment_timeout_minutes=60,
        delete_finished_experiment=False)

    return op

This component returns a JSON object as output.

## Print optimal results

In [218]:
def print_katib_results(katib_results):
    import json
    import pprint
    katib_results_json = json.loads(katib_results)
    print("Katib results:")
    pprint.pprint(katib_results_json)

results_op = components.func_to_container_op(print_katib_results, base_image="quay.io/jeremie_ch/katib_results:latest")

## Running the pipeline

In [219]:
name="katib-e2e"
namespace="jeremie-chheang-ibm-com"

@dsl.pipeline(
    name="End to End Pipeline",
    description="An end to end mnist example including hyperparameter tuning, train and inference"
)
def mnist_pipeline(name=name, namespace=namespace):
    # Run the hyperparameter tuning with Katib.
    katib_op = launch_katib_experiment(name, namespace)
    results_task = results_op(katib_op.output)
    
    
kfp_client=kfp.Client()
run_id = kfp_client.create_run_from_pipeline_func(mnist_pipeline, namespace=namespace, arguments={}).run_id
print("Run ID: ", run_id)

Run ID:  8486c74e-a2ef-4847-a095-db6d5411d0be
