# Kubeflow Pipelines with Katib component

In this notebook you will:
- Create Katib Experiment using random algorithm.
- Use median stopping rule as an early stopping algorithm.
- Use Kubernetes Job with mxnet mnist training container as a Trial template.
- Create Pipeline to get the optimal hyperparameters.

Reference documentation:
- https://kubeflow.org/docs/components/katib/experiment/#random-search
- https://kubeflow.org/docs/components/katib/early-stopping/
- https://kubeflow.org/docs/pipelines/overview/concepts/component/

# Install required package

Kubeflow Pipelines SDK and Kubeflow Katib SDK.

In [None]:
# Update the PIP version.
!python -m pip install --upgrade pip
!pip install kfp==1.4.0
!pip install kubeflow-katib==0.11.0
!pip install kfp-tekton==0.7.0
!pip install kubernetes==12.0.0

## Restart the Notebook kernel to use the SDK packages

In [None]:
from IPython.display import display_html
display_html("<script>Jupyter.notebook.kernel.restart()</script>",raw=True)

## Import required packages

In [None]:
import kfp
import kfp.dsl as dsl
from kfp import components

from kubeflow.katib import ApiClient
from kubeflow.katib import V1beta1ExperimentSpec
from kubeflow.katib import V1beta1AlgorithmSpec
from kubeflow.katib import V1beta1EarlyStoppingSpec
from kubeflow.katib import V1beta1EarlyStoppingSetting
from kubeflow.katib import V1beta1ObjectiveSpec
from kubeflow.katib import V1beta1ParameterSpec
from kubeflow.katib import V1beta1FeasibleSpace
from kubeflow.katib import V1beta1TrialTemplate
from kubeflow.katib import V1beta1TrialParameterSpec

## Define an Experiment

You have to create an Experiment object before deploying it. This Experiment is similar to [this](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/early-stopping/median-stop.yaml) YAML.

In [None]:
# Experiment name and namespace.
experiment_name = "median-stop"
# for multi user deployment, please specify your own namespace instead of "anonymous"
experiment_namespace = "anonymous"

# Trial count specification.
max_trial_count = 18
max_failed_trial_count = 3
parallel_trial_count = 2

# Objective specification.
objective=V1beta1ObjectiveSpec(
    type="maximize",
    goal= 0.99,
    objective_metric_name="Validation-accuracy",
    additional_metric_names=[
        "Train-accuracy"
    ]
)

# Algorithm specification.
algorithm=V1beta1AlgorithmSpec(
    algorithm_name="random",
)

# Early Stopping specification.
early_stopping=V1beta1EarlyStoppingSpec(
    algorithm_name="medianstop",
    algorithm_settings=[
        V1beta1EarlyStoppingSetting(
            name="min_trials_required",
            value="2"
        )
    ]
)


# Experiment search space.
# In this example we tune learning rate, number of layer and optimizer.
# Learning rate has bad feasible space to show more early stopped Trials.
parameters=[
    V1beta1ParameterSpec(
        name="lr",
        parameter_type="double",
        feasible_space=V1beta1FeasibleSpace(
            min="0.01",
            max="0.3"
        ),
    ),
    V1beta1ParameterSpec(
        name="num-layers",
        parameter_type="int",
        feasible_space=V1beta1FeasibleSpace(
            min="2",
            max="5"
        ),
    ),
    V1beta1ParameterSpec(
        name="optimizer",
        parameter_type="categorical",
        feasible_space=V1beta1FeasibleSpace(
            list=[
                "sgd", 
                "adam",
                "ftrl"
            ]
        ),
    ),
]


## Define a Trial template

In this example, the Trial's Worker is the Kubernetes Job.

In [None]:
# JSON template specification for the Trial's Worker Kubernetes Job.
trial_spec={
    "apiVersion": "batch/v1",
    "kind": "Job",
    "spec": {
        "template": {
            "metadata": {
                "annotations": {
                     "sidecar.istio.io/inject": "false"
                }
            },
            "spec": {
                "containers": [
                    {
                        "name": "training-container",
                        "image": "docker.io/kubeflowkatib/mxnet-mnist:v1beta1-45c5727",
                        "command": [
                            "python3",
                            "/opt/mxnet-mnist/mnist.py",
                            "--batch-size=64",
                            "--lr=${trialParameters.learningRate}",
                            "--num-layers=${trialParameters.numberLayers}",
                            "--optimizer=${trialParameters.optimizer}"
                        ]
                    }
                ],
                "restartPolicy": "Never"
            }
        }
    }
}

# Configure parameters for the Trial template.
# We set the retain parameter to "True" to not clean-up the Trial Job's Kubernetes Pods.
trial_template=V1beta1TrialTemplate(
    retain=True,
    primary_container_name="training-container",
    trial_parameters=[
        V1beta1TrialParameterSpec(
            name="learningRate",
            description="Learning rate for the training model",
            reference="lr"
        ),
        V1beta1TrialParameterSpec(
            name="numberLayers",
            description="Number of training model layers",
            reference="num-layers"
        ),
        V1beta1TrialParameterSpec(
            name="optimizer",
            description="Training model optimizer (sdg, adam or ftrl)",
            reference="optimizer"
        ),
    ],
    trial_spec=trial_spec
)
    

## Define an Experiment specification

Create an Experiment specification from the above parameters.

In [None]:
experiment_spec=V1beta1ExperimentSpec(
    max_trial_count=max_trial_count,
    max_failed_trial_count=max_failed_trial_count,
    parallel_trial_count=parallel_trial_count,
    objective=objective,
    algorithm=algorithm,
    early_stopping=early_stopping,
    parameters=parameters,
    trial_template=trial_template
)

# Create a Pipeline using Katib component

The best hyperparameters are printed after Experiment is finished.
The Experiment is not deleted after the Pipeline is finished.

In [None]:
# Get the Katib launcher.
katib_experiment_launcher_op = components.load_component_from_url(
    "https://raw.githubusercontent.com/kubeflow/pipelines/master/components/kubeflow/katib-launcher/component.yaml")

@dsl.pipeline(
    name="Launch Katib early stopping Experiment",
    description="An example to launch Katib Experiment with early stopping"
)
def median_stop():
    
    # Katib launcher component.
    # Experiment Spec should be serialized to a valid Kubernetes object.
    op = katib_experiment_launcher_op(
        experiment_name=experiment_name,
        experiment_namespace=experiment_namespace,
        experiment_spec=ApiClient().sanitize_for_serialization(experiment_spec),
        experiment_timeout_minutes=60,
        delete_finished_experiment=False)
    
    # Output container to print the results.
    op_out = dsl.ContainerOp(
        name="best-hp",
        image="library/bash:4.4.23",
        command=["sh", "-c"],
        arguments=["echo Best HyperParameters: %s" % op.output],
    )

# Run the Pipeline

You can check the Katib Experiment info in the Katib UI.

If you run this in a multi-user deployment, you need to follow the instructions
here: https://github.com/kubeflow/kfp-tekton/tree/master/guides/kfp-user-guide#2-upload-pipelines-using-the-kfp_tektontektonclient-in-python
Check the `multi tenant` section and  create TektonClient with `host` and `cookies` arguments.
For example:
```
  TektonClient(
     host='http://<Kubeflow_public_endpoint_URL>/pipeline',
     cookies='authservice_session=xxxxxxx'
  )
```
You also need to specify `namespace` argument when calling `create_run_from_pipeline_func` function

In [None]:
from kfp_tekton._client import TektonClient 
# Example code for multi user deployment:
#   TektonClient(
#      host='http://<Kubeflow_public_endpoint_URL>/pipeline',
#      cookies='authservice_session=xxxxxxx'
#   ).create_run_from_pipeline_func(median_stop, arguments={}, namespace='user namespace')
TektonClient().create_run_from_pipeline_func(median_stop, arguments={})