# Getting started with Determined, the open-source deep learning training platform - Lab 2
## Learning the base principles of Determined

For this part of the lab you will consider the well known Iris classification problem of predicting Iris species based on the length and width measurements of their sepals and petals. You'll leverage Determined to train and tune a TensorFlow Keras based Neural Network model on TensorFlow's publicly available iris [training](http://download.tensorflow.org/data/iris_training.csv) and [validation](http://download.tensorflow.org/data/iris_test.csv) datasets. 

The objective of the model is to predict the **likelihood** that a flower is the given Iris species.

The small dataset consists of:
* 150 samples (120 samples for the training dataset, 30 samples for the validation dataset)
* 4 features (the characteristics of the Iris): Sepal length, Sepal width, Petal length, Petal width (in cm)
* 1 label (the specie of Iris to predict): for this dataset it is an integer value of 0, 1, 2 that corresponds to the species of Iris: 0-Iris setosa, 1-Iris versicolor, 2-Iris virginica.

Determined provides a Web User Interface (UI), a Command Line Interface (CLI) and APIs to interact with Determined system. In this part of the lab you will learn how you can communicate with Determined system, run tasks such as training model tasks, monitor and visualize training progress and results.

* You will install Determined CLI in the local Jupyter Notebook server to interact with the Determined system running on Kubernetes cluster.
* You will be introduced to Determined system components.
* You will get familiar with major commands of the Determined CLI.
* You will use the Determined CLI to create training tasks in order to train your Neural Network model with a single GPU, with multiple GPUs and using Determined AI hyperparameter tuning.
* You will also interact with the Determined AI WebUI and TensorBoard to visualize experiments metrics and results.
* You will finally use the Python API to interact with Determined, load the trained model and make some predictions (inferences) using the trained model.

> <font color="green"> **Note:** The model has already been ported to run on Determined. Porting a deep learning model code to Determined is beyond the scope of this workshop. The easiest way to learn how to port your existing deep learning model code to Determined, is to start with the [Pytorch Porting tutorial](https://docs.determined.ai/latest/tutorials/pytorch-porting-tutorial.html).</font>

### 1- Install Determined CLI

The Determined CLI is a command line tool that allows you to interact with the Determined system. For example, the CLI allows you to launch new experiment to train your deep learning (DL) model. The Determined CLI is distributed as a Python package. You need a machine with Python 3.6 or later installed with Internet access. You local Jupyter Notebook Server fulfills the requirements. You will just have to use the command _pip install determined_ to install the Determined package.

#### In the code cell below, you will use the line magic function `%pip` to install **Determined CLI** as a new Python packages in the current Jupyter kernel. The Determined CLI will be installed in the folder: ***~/.local/bin***

>**Note:** When running the code cell below, please ignore the Warning messages: _the script distro is installed in /home/studentId/.local/bin which is not on PATH_. 

>**Important Note:** After installation of the package is completed, you will need to restart the current kernel to use the newly installed package from the local Jupyter Notebook server.

In [None]:
# use the line magic function below to install new Python packages in the current Jupyter kernel.
# restart the current kernel to use the newly installed package
%pip install kfp-pipeline-spec --quiet
%pip install determined --quiet

### 2- Restart your Kernel to use the newly installed Python packages.
From the menu bar, select **"Kernel"** then **"Restart Kernel..."**

### 3- Fetch the KubeConfig file for your tenant user ID

Your local Jupyter Notebook has been tailored to interact with the Kubernetes resources using the Kubernetes API through the command line interface "***kubectl***". 

#### Use the custom magic command ***%kubeRefresh*** below to fetch the kubeconfig file that will allow you to interact directly with the Kubernetes resources from within your local Jupyter Notebook.

When prompted to enter the password, make sure to enter password for your StudentID credentials you received in the Workshop-on-demand registration e-mail.

>**Note:** You can ignore _InsecureRequestWarning_ messages

If the password is correct, you will see the message ***kubeconfig set for user Student\<YourID\>***.

In [None]:
%kubeRefresh

### 4- Determined system components

For this hands-on workshop, the Determined system has been installed on the Kubernetes cluster managed by HPE Ezmeral Runtime Enterprise on a Kubernetes namespace ***determinedai***.  

When installing Determined on Kubernetes, an instance of the **Determined Master** and a **PostgreSQL database** are deployed in the Kubernetes cluster. These components run as a container within a Kubernetes POD. 

#### Run the code cell below and check out the output. 

You should see one container POD for the Determined Master service and another POD for the Database service as well as service endpoints for the Master and the database. The Master service endpoint is a NodePort service which exposes the Master service endpoint outside the Kubernetes cluster.

In [None]:
!kubectl get pod,services -n determinedai | grep determined

The **Determined Master** is the central component of the Determined System. The Master is responsible for:

* **Scheduling** Determined training tasks as a collection of Kubernetes PODs. The Master brings up PODs to run workloads such as model training tasks, TensorBoard instances and JupyterLab instances.
* **Tracking and storing** all model training tasks metadata (description, labels, hyperparameters, search algorithm used, training metrics, validation metrics, start/end time, logs) in the PostgreSQL database.
* **Saving** model _artifacts_ (model files, code, model definition files) and training _checkpoints_ of Determined training tasks in a _checkpoint storage_ to keep records of the training tasks progress and ensure resiliency. Determined will automatically retry failed training tasks from latest checkpoint.
* **Serving** the Web User Interface (WebUI) for users to visualize training and validation metrics across their model training tasks.

>**Note:** _For this hands-on lab, the Kubernetes cluster worker nodes that run the Determined system, have been configured to connect to a distributed file system provided by HPE Ezmeral Runtime Enterprise (from the pre-integrated HPE Ezmeral Data Fabric). The distributed file system provides shared storage that works with Determined system to:_ 
>* _allow training tasks launched as container PODs on any Kubernetes worker nodes to access the shared model training/validation datasets,_ 
>* _store model training artifacts (model files, model codes) and training task checkpoints on a shared checkpoint storage. Checkpoints are saved versions of trained models that users can access later to test the model and deploy the model in production. Checkpoints are also used by Determined to ensure training work is not lost in case of system failure during a training, so determine can retry failed training tasks._

**Let's see Determined system in action!!!**

### 5- Get the endpoint URL of the Determined Master. 

To use Determined and interact with Determined system with the CLI, you need to tell the CLI where the Determined Master service is running. 

#### Run the code cell below to get the Determined Master endpoint URL fetched from the Kubernetes service of Determined Master. The _"kubectl describe service"_ command is used to get the Master URL.  

In [None]:
#
# Getting the DeterminedAI Master service endpoint URL:
#
masterUrl=!kubectl describe service determined-master-service-stagingdetai -n determinedai | grep gateway/8080 | awk '{print $3}'
det_master = str(masterUrl)[2:-2] # we remove any potential brackets
determined_master = "http://" + det_master
print (f"The Determined Master Service endpoint URL is: {determined_master}")
#print (f"{determined_master}")

### 6- Set the Determined Master URL and authenticate to Determined system

#### Run the code cell below and follow steps 1 to 4 below to set environmental variable _on the terminal_ to reference the Determined Master service endpoint and authenticate to Determined system as student\<yourId\>: 

In [None]:
userID = "student900"
#
print ("")
print ("export DET_MASTER=" + determined_master) 
print ("~/.local/bin/det user login " + userID)

1. Start a Terminal in the Launcher (navigate to Launcher tab --> Click Terminal tile; or go to Menu --> File --> New Launcher --> Terminal)
2. In the Terminal, copy/paste the two commands above to authenticate to Determined as user Student<yourID>.
3. Press _Return_ key when prompted to enter a password. Users have `blank` password per default in Determined.
4. Then continue from **Step 7** onwards. 

### 7- Check connectivity to the Determined Master service endpoint

#### Run your first Determined CLI command below to verify the connectivity to the Determined Master.

Any Det CLI command is in the form: ***det -m \<det_master_URL_or_IP:port\> \<command_argument\> \<action_verb\> [-h]***

The Master service endpoint is referenced using the ***-m*** flag. 

You can use the help flag [-h] to learn more about valid options.

The command below display the Determined CLI client version and Master version. 

In [None]:
# The Determined CLI is installed in '$HOME/.local/bin'
!~/.local/bin/det -m {determined_master} version

### 8- Launch your first Determined training workloads to train your model

Let's first introduce some fundamental Determined concepts that are leveraged in this workshop: _Experiment, Trial and Hyperparamater_.

**Experiment:** In Determined terms, an ***experiment*** is a collection of one or more DL training tasks (trials). A Determined experiment can either train a single model with a single training task using one or multiple GPUs, or it can define a search over a user-defined hyperparameter space with several training tasks.

**Trial:** Each training task in an experiment is called a ***trial***. A trial is a training task that consists of the dataset (training and validation/test dataset), a deep learning model (for example the Python scripts that load the dataset, build and compile the model) `adjusted to run on Determined system`, and **an experiment configuration file** that defines how to run the training model process on Determined system (for example, the hyperparameters, the number of GPUs for each trial, the amount of data on which to train a model, or how often to report the training metrics and the validation metrics). All the elements of a training task are put together in a `model definition directory`.

**Hyperparameters:** These are user-defined variables that define how a model is trained. They affect the accuracy of the trained model. By choosing the best combination of hyperparameters you can obtain better performance of your model. 

#### Run the code cell below to display the content of the `model definition directory`:

In [None]:
!ls ~/source_control/Code -l

The Determined model definition directory contains:
- `model_def.py`: The model definition exposed to Determined. This is the core code for the model. This includes data loading code, building the model and compiling the model.
- `*.yaml` a set of experiment configuration YAML files that each defines settings for how an experiment runs to train the model
     - _const.yaml_: Trains the model with single GPU and with constant hyperparameter values, and data located in a shared file system storage.
     - _distributed.yaml_: Same as const.yaml, but trains the model with multiple GPUs (distributed training).
     - _adaptive.yaml_: Performs a hyperparameter search using Determined's state-of-the-art adaptive hyperparameter tuning algorithm (aka a `Searcher` method).
- `startup-hook.sh`: (optional) Additional dependencies that Determined will automatically install into each POD container for this experiment. In the Iris classification example used here, Pandas Python library will be installed.

#### Let’s start simple by training the Iris deep Learning model on a single GPU by defining the hyperparameters as fixed values in the experiment configuration file. 

#### First, run the code cell below to take a closer look at the `experiment configuration file` const.yaml.
The experiment configuration file defines the hyperparameters, the Searcher method to use and the settings for that Searcher, the number of GPUs for each trial, the amount of data (batches or epochs) on which to train the model, how often to report training metrics and when the validation occurs.

>**Note:** The Experiment configuration file has some required field and some optional ones. To learn more about Experiment configuration settings, check out the online documentation [here](https://docs.determined.ai/latest/training-apis/experiment-config.html).

In [None]:
!cat ~/source_control/Code/const.yaml

As you can see above, the hyperparameters (for example the _learning_rate_ and the _batch_size_) are defined as fixed values.

The ***Searcher*** section defines how Determined should explore the hyperparameter space and the amount of data on which to train the model. Here, the Searcher method is defined as _Single_ because we use fixed values for the hyperparameters. In this case, Determined does not perform any hyperparameter search and optimization at all. Here, the mount of data is set to 5000 batches.
   * ***batch***: group of records passed to the neural network model during training. The hyperparameter ***batch_size*** defines the number of records within a batch.

The validation metric ***val_categorical_accuracy*** is used to evaluate the performance of the training and validation over a certain amount of data expressed as _batches_ or _epochs_. For our Iris model use case the higher the metric the better the training and validation. 

The resource setting ***slot_per_trial*** specifies the number of GPUs on which to run the experiment. Here, a single GPU is used to train the model. 

The ***entrypoint*** tells the training task where to start running the model code.

The parameter ***scheduling_unit*** lets the Master receive and plot training metrics every _N_ batches of data. The default value is 100.

The parameter ***min_validation_period*** instructs Determined how often to calculate the validation metrics and how often to 
checkpoint the validated model. The validation metric is plotted whenever it is calculated. By default the validation is at the trial end. In this example, the validation metric is calculated every 1000 batches and the valided model is checkpointed every 1000 batches if it is the best model. The validated model is also calculated and checkpointed at the trial end.

Notice the ***bind_mounts*** attributes: to run an experiment that uses data stored in a shared file system, _bind_mounts_ attributes are specified in the experiment configuration file to mount the directory that stores the data to the training task container POD. Here, the bind_mounts point to the shared file system path mounted on the Kubernetes cluster worker nodes by HPE Ezmeral Runtime Enterprise. 

#### Next, run the command below to create your first experiment! 

The Det command specifies the model configuration file to use (_const.yaml_) and the model definition directory. Determined then schedules the training task as a Kubernetes POD. The POD container has all the librairies and dependencies required for training typical deep learning model.

* _det experiment create \<experiment_config_file\> \<model_definition_directory\>_

The command will return the Experiment Id.

In [None]:
# launch experiment to train a single model on a single GPU
!~/.local/bin/det -m {determined_master} experiment create ~/source_control/Code/const.yaml ~/source_control/Code

Using the command below, you will see that Determined Master has launched **one** training task (trial) for your experiment as a container POD with name in the form:

 _exp-\<experimentID\>-trial-\<TriaID\>-\<unique-name\>_
 
 >**Note:** Since this experiment trains a single model with a fixed set of hyperparameters, there is only one training task (trial) launched.
 
> <font color="blue"> **Note:** As you are sharing the same Kubernetes resources with other participants, and depending on the number of concurrent experiments running, your training task POD might be in **Pending** state waiting for GPU resources to become available. You might need to wait a few minutes until other experiments complete for your training task POD to become **Running**.</font>

In [None]:
!kubectl get pods -n determinedai

The Det commands below are used to list your experiment and its status in the Determined system:

* _det experiment list_
* _det experiment describe \<experiment_Id\> --json | jq .[0].state_

#### Run the code cell below to track the execution progress of your experiment.

In [None]:
!~/.local/bin/det -m {determined_master} experiment list | tail -1
# Get the experiment Id, remove spaces
myexpId=!~/.local/bin/det -m {determined_master} experiment list | tail -1 | cut -d'|' -f 1 |  tr -d ' '
# remove the trailer characters
myexpId=str(myexpId)[2:-2]
!~/.local/bin/det -m {determined_master} experiment describe {myexpId} --json | jq .[0].state

### 9- Monitor and visualize your experiment in Determined AI Web User Interface

To access information on both training and validation performance, simply go to the Determined **WebUI** by entering the service endpoint URL of the Determined Master in your web browser connected to the Internet.

* #### Run the code cell below to get the Determined Master WebUI URL. 
* #### Then, click on the displayed link to connect. This will open a new tab in your browser with the Determined UI login banner.
* #### You will be prompted to enter your credentials. Type your StudentID as credentials and press return. The password is `blank` by default.
* #### Upon login you should see the WebUI **dashboard** as shown in the picture below. The Dashboard page shows an overview of tasks on the Determined system as well as an overview of the GPU resources utilization. 

In [None]:
port = !kubectl describe service determined-master-service-stagingdetai -n determinedai | grep gateway/8080 | awk '{print $3}' | cut -d':' -f 2 |  tr -d ' '
portUI = str(port)[2:-2]
print (f"The Determined Master WebUI URL is: http://notebooks.hpedev.io:{portUI}")
print (f"Click the link above to connect. Login using your student Identifier: {userID}, do not enter password. Click on Sign In button")

<img src="DetWebUI-Login.png" height="298" width="300">

From the WebUI, make sure **you select your StudentID** from the ***Users*** drop-down list as shown in the picture below. By default, the Experiments are displayed. You can select other icons to display auxiliary tasks such as TensorBoard tasks and JupyterLab tasks. We will explore these auxiliary tasks in the next sections. 

<img src="DetWebUI-Users-v1.png" height="171" width="900">


##### From the dashboard, select the most recent experiment you want to visualize.

You should see the experiment as an **active** state and its completion percentage.

> <font color="blue"> **Important Note:** If there are multiple concurrent participants to the workshop, your experiment might not run yet because there are more experiments running than the Kubernetes cluster has GPUs. You might need to wait a few minutes until other experiments complete for your experiment to start running. </font>

After the experiment completes, you can see on the experiment detail page that training the model with the hyperparameter settings in `const.yaml` yields a validation accuracy between 93% and 97%. 

From the **Metrics** menu, under **Training Metrics**, select _categorical_accuracy_ (see picture below for an example). This metric indicates the model accuracy on training data while the _val_categorical_accuracy_ indicates the model accuracy on validation data.

Scroll down to see a list of training validation workloads and their metrics for the metric types you previously selected. 
You might see one or two validation workloads with checkpoints. By default, Determined will checkpoint the most recent and the best model per training task (trial). If the most recent checkpoint is also the best checkpoint for a given trial, only one checkpoint will be saved for that trial.

<img src="WebUI-Exp-const-graph.png" height="520" width="900">

### 10 - TensorBoard visualization

[TensorBoard](https://www.tensorflow.org/tensorboard) is a widely used tool for visualizing and inspecting deep learning models. Determined is integrated with TensorBoard for deeper analysis of your experiment and to help you examine your neural network model by viewing the training and validation loss curves for your experiment in TensorBoard. 

Determined lets you launch a Tensorboard server and access TensorBoard in one-click from the WebUI, or you can run the following command in Determined’s command line:

* _det tensorboard start \<experiment_Id\>_

#### Run the code cell below to launch the TensorBoard server instance.

This may take a minute or so as Determined has to launch the Tensorboard server as a Kubernetes POD. 

In [None]:
print (f"Start a Tensordboard server instance for your Experiment {myexpId} with TensorBoard instance ID:")
# start the tensorBoard server instance for the experiment
!~/.local/bin/det -m {determined_master} tensorboard start -d {myexpId}

#### Run the code cell below to get the Tensorboard URL for your experiment. Then, click on the link to connect.

>**Note:** The associated TensorBoard server is launched as a container POD in the Kubernetes cluster. Determined proxies HTTP requests to and from the TensorBoard container through the Determined Master node.

In [None]:
mytensorboard=!~/.local/bin/det -m {determined_master} tensorboard list | grep RUNNING | cut -d'|' -f 1 |  tr -d ' '
mytensorboard=str(mytensorboard)[2:-2]
#print (f"{mytensorboard}")
print (f"Your tensorboard is running at http://notebooks.hpedev.io:{portUI}/proxy/{mytensorboard}/")
print (f"Click on the link to connect.")

<img src="TensorBoard-const-graph.png" height="413" width="900">

Determined created TensorBoard plots to show the training loss, validation loss, training accuracy and validation accuracy for the training task (trial).

#### When you have finished with Tensorboard, run the code cell below to `kill` the Tensorboard process

In [None]:
!~/.local/bin/det -m {determined_master} tensorboard kill {mytensorboard}

### 11 - List the best model created by the training process
By default, Determined will save the most recent and the best checkpoint per training task (trial) according to the validation metrics specified in the Searcher section of the configuration file for the experiment.

* _det experiment list-checkpoints [--best] [N best checkpoints to return] \<experiment_Id\>_

>**Note**: Upon completion of the training task, if the most recent checkpoint is also the best checkpoint for a given trial, only one checkpoint will be saved for that trial by Determined. Otherwise, two checkpoints will be saved. Other checkpoints will be automatically deleted to reclaim space.

#### Run the code cell below to display the best checkpoint for your experiment

In [None]:
#list the best Trial checkpoint(s) (training task):
!~/.local/bin/det -m {determined_master} experiment list-checkpoints --best 1 {myexpId}

### 12 - Launch a JupyterLab instance on the Determined system

Users can also launch a JupyterLab server instance on the Determined system, in which they run Jupyter Notebooks. This is useful to load and test a model that was trained during the experiment because the Determined CLI is installed into the JupyterLab server instance by default, and the JupyterLab server container has access to the shared file system where the checkpoints are stored. 

In the next section of this part of the lab, you will use a JupyterLab server instance on Determined system to test your trained model and make predictions. 

Determined lets you launch an instance of a JupyterLab server and access the JupyterLab server in one-click from the WebUI, or you can run the following command in Determined’s command line:

* _det notebook start [--config-file]_

The configuration file is used to control aspects of the JupyterLab environment such as a description, the checkpoint volume where trial checkpoints are stored in the shared file system, and the resources (CPU or GPU) used to launch the JupyterLab server. Run the next code cell to look at the content of the configuration file.

#### Run the code cell below to examine the settings for the JupyterLab instance.
The configuration file used here allows you to launch a JupyterLab server instance that does not use any GPUs (***resources.slot=0***) and that gets access to the shared checkpoint storage area where the model artifacts are stored.  

In [None]:
!cat ~/source_control/Code/notebook-config.yaml

#### Run the code cell below to launch an instance of the JupyterLab
This may take a minute or so for the JupyterLab instance to become active as Determined has to launch the JupyterLab server instance as a Kubernetes POD in the Kubernetes cluster. 

In [None]:
print (f"Start a JupyterLab server instance within Determined system with instance ID:")
# start the Jyputer Notebook server instance for the experiment
!~/.local/bin/det -m {determined_master} notebook start -d --config-file ~/source_control/Code/notebook-config.yaml

#### Check the status of the JupyterLab instance using the command below:
* _det notebook list_

In [None]:
!~/.local/bin/det -m {determined_master} notebook list | grep -e RUNNING -e STARTING

### 13- Inferences with Determined
When you train a model with Determined, all of the artifacts (model files) associated with that training tasks are tracked and stored in _checkpoint storage_. Determined lets you access the artifacts programmatically using the Python API from within the JupyterLab server launched on Determined system. This makes it really easy for you to export your best-performing trained model out of Determined and load it for **inferences** (the process of using a trained model and new unlabeled data to make a prediction).

* More information about the Determined Python API can be found [here](https://docs.determined.ai/latest/interact/api-experimental-client.html).
* More information for downloading a trained model can be found [here](https://docs.determined.ai/latest/post-training/use-trained-models.html).

#### Run the code cell to adjust some environment variables in the notebook **Inferences.ipynb**

In [None]:
!sed -i "s/USERNAME/$userID/" Inferences.ipynb
!sed -i "s/EXPID/$myexpId/" Inferences.ipynb
!sed -i "s/MASTERURL/$det_master/" Inferences.ipynb

#### Next, download the file **Inferences.ipynb** to your local PC/laptop. 

You will use this notebook to test your trained model by making some inferences from JupyterLab instance you have just launched on Determined system.

Right-click on the file **Inferences.ipynb** and choose **Download**.

#### Now, connect to the JupyterLab server instance you have just deployed: 

* Run the code cell below to get the JupyterLab URL. Then, click on the link to connect to the JupyterLab instance you have just launched on Determined System.

* On the JupyterLab instance, click the ***up arrow*** icon to **upload** the file _Inferences.ipynb_ from your local PC/laptop. Once the file is uploaded, double-click the file to open the notebook. 

In [None]:
myNotebook=!~/.local/bin/det -m {determined_master} notebook list | grep RUNNING | cut -d'|' -f 1 |  tr -d ' '
myNotebook=str(myNotebook)[2:-2]
print (f"{myNotebook}")
print (f"Your JupyterLab instance is running at http://notebooks.hpedev.io:{portUI}/proxy/{myNotebook}/")
print (f"Click on the link to connect to the JupyterLab instance you just launched.")
print (f"On JupyterLab instance, click the up arrow to upload the file Inferences.ipynb.")

> <font color="red"> **IMPORTANT: When you have finished with the Inferences in JulyterLab on Determined system, please get back to your local Jupyter Notebook to run the code cells below and perform some cleanup** </font>

### 14- Delete the checkpoints for your experiment to reclaim some storage space in the storage file system and stop the JupyterLab instance.

The default **checkpoint garbage collection policy** dictates Determined to save the most recent and the best checkpoint per training task (trial). The ***save_experiment_best***, ***save_trial_best*** and ***save_trial_latest*** parameters specify which checkpoints to save. The default policy is set as follows:

  * save_experiment_best:0 
  * save_trial_best:1
  * save_trial_latest:1
 
#### Run the code cell below to reclaim some storage disk space by changing the default checkpoint garbage collection policy as shown below:

In [None]:
# Delete the checkpoints data for the single model training using a single GPU
!~/.local/bin/det -m {determined_master} experiment set gc-policy --yes --save-experiment-best 0 --save-trial-best 0 --save-trial-latest 0 {myexpId}

#### Next, delete the instance of the JupyterLab server.

In [None]:
!~/.local/bin/det -m {determined_master} notebook kill {myNotebook}

#### Now that you have the base principles about Determined in mind, let's explore some more complex experiments with Distributing training.

Click on Lab 3 below to open a notebook to explore Distributed Training with Determined. 
* [Lab 3](3-WKSHP-DET-AI-101-Getting-started-Dist-Training.ipynb)

### 16- Train multiple models as part of a hyperparameter search, using Determined AI hyperparameter tuning functionality (HPO)

Next, let's run an experiment with the same model definition (same code), but this time leveraging Determined's hyperparameter tuning (aka Hyperparameter Optimization or **HPO**). ML engineers typically use HPO to efficiently determine the hyperparameter values that yield the best-performing model. Here the hyperparameters in the experiment configuration file are specified as ranges instead of fixed values, and the `adaptive_asha` searcher is used to explore the hyperparameter space.

With HPO, an experiment consists of multiple training tasks (trials), each with different hyperparameters. Determined AI hyperparameter tuning functionality helps you find the best combination of hyperparameters for your particular model. 

The number of trials to run,  the set of user-defined hyperparameters range and the search algorithm (aka the searcher method) are defined in the configuration file _adaptive.yaml_.

>Note: The **searcher** is a method that is used to find effective hyperparameter settings within a predifined range of hyperparameter values.

More about Hyperparameter optimization and Searcher methods supported by Determined AI can be found [here](https://docs.determined.ai/latest/training-hyperparameter/index.html#hyperparameter-tuning)

In [None]:
!cat ~/source_control/Code/adaptive.yaml

In [None]:
# Launch experiment to train the model with hyperparameter tuning
!~/.local/bin/det -m {determined_master} experiment create ~/source_control/Code/adaptive.yaml ~/source_control/Code

In [None]:
!kubectl get pods -n determinedai

In [None]:
# Delete the checkpoints data for the HPO training
myexpId=!~/.local/bin/det -m {determined_master} experiment list | tail -1 | cut -d'|' -f 1
myexpId=str(myexpId)[5:-3]
print (f"{myexpId}")
!~/.local/bin/det -m {determined_master} experiment set gc-policy --yes --save-experiment-best 0 --save-trial-best 0 --save-trial-latest 0 {myexpId}

Third experiment: On the experiment detail page, we see the best categorical accuracy that Determined's adaptive search achieves over time.  When the experiment finishes, we find that we reach 100% accuracy on the 30 test set examples, an improvement over the results of the fixed hyperparameter experiment.  We can drill in to the best-performing trial and view the associated hyperparameter values.