<span style="display: block;  text-align: center; color:#8735fb; font-size:22pt"> **HPO Benchmarking with RAPIDS and Dask** </span>

Hyper-Parameter Optimization (HPO) helps to find the best version of a model by exploring the space of possible configurations. While generally desirable, this search is computationally expensive and time-consuming.

In the notebook demo below, we compare benchmarking results to show how GPU can accelerate HPO tuning jobs relative to CPU.

For instance, we find a x speedup in wall clock time (6 hours vs 3+ days) and a x reduction in cost when comparing between GPU and CPU EC2 instances on 100 XGBoost HPO trials using 10 parallel workers on 10 years of the Airline Dataset.

For more check out our AWS blog(link).

<span style="display: block;  color:#8735fb; font-size:22pt"> **Preamble** </span>

You can set up local environment but it is recommended to launch a Virtual Machine service (Azure, AWS, GCP, etc).

For the purposes of this notebook, we will be utilizing the [Amazon Machine Image (AMI)](https://aws.amazon.com/releasenotes/aws-deep-learning-ami-gpu-tensorflow-2-12-amazon-linux-2/) as the starting point.


````{docref} /cloud/aws/
Please follow instructions in [AWS Elastic Cloud Compute)](../../cloud/aws/ec2) to launch an EC2 instance with GPUs, the NVIDIA Driver and the NVIDIA Container Runtime.

```{note}
When configuring your instance ensure you select the [Deep Learning AMI GPU TensorFlow or PyTorch](https://docs.aws.amazon.com/dlami/latest/devguide/appendix-ami-release-notes.html) in the AMI selection box under **"Amazon Machine Image (AMI)"**

![](../../_static/images/examples/xgboost-rf-gpu-cpu-benchmark/amazon-deeplearning-ami.png)
```

Once your instance is running and you have access to Jupyter save this notebook and run through the cells.

````


<span style="display: block; color:#8735fb; font-size:20pt"> 2.1 - Dataset </span>

The data source for this workflow is 3 years of the [Airline On-Time Statistics](https://www.transtats.bts.gov/ONTIME/) dataset from the US Bureau of Transportation.

The public dataset contains logs/features about flights in the United States (17 airlines) including:

* Locations and distance ( Origin, Dest, Distance )
* Airline / carrier ( Reporting_Airline )
* Scheduled departure and arrival times ( CRSDepTime and CRSArrTime )
* Actual departure and arrival times ( DpTime and ArrTime )
* Difference between scheduled & actual times ( ArrDelay and DepDelay )
* Binary encoded version of late, aka our target variable ( ArrDelay15 )



In [None]:
# !aws configure

In [None]:
## DOWNLOAD THE DATASET
!aws s3 cp --recursive s3://sagemaker-rapids-hpo-us-west-2/3_year/ ./data/

<span style="display: block; color:#8735fb; font-size:20pt"> 2.2 - Local Cluster </span>

To maximize on efficiency, we launch a `LocalCUDACluster` that utilizes GPUs for distributed computing. Then connect a Dask Client to submit and manage computations on the cluster. Refer to this (link) for more information on how to achieve this.

Submit dataset to the Dask client, instructing Dask to store the dataset in memory  at all times. This can improve performance by avoiding unnecessary data transfers during the hpo process. 


<span style="display: block; color:#8735fb; font-size:22pt"> **ML Workflow** </span>

In order to work with RAPIDS container, the entrypoint logic should parse arguments, load and split data, build and train a model, score/evaluate the trained model, and emit an output representing the final score for the given hyperparameter setting.

`Optuna` is a hyperparameter optimization library in Python. We create an Optuna `study object` that provides a framework to define the search space, objective function, and optimization algorith for the hpo  process.  

In [None]:
%cd code

In [None]:
ls

<span style="display: block; color:#8735fb; font-size:22pt"> **Build RAPIDS Container** </span>

In [None]:
!nvidia-smi

In [None]:
cat Dockerfile

In [None]:
!docker images

In [None]:
!docker build -t rapids-tco-benchmark:v23.06 .

In [None]:
!docker images

In [None]:
# !tmux

In [None]:
!docker run -it --gpus all -p 8888:8888 -p 8787:8787 -p 8786:8786 -v \
                    /home/ec2-user/tco_hpo_gpu_cpu_perf_benchmark:/rapids/notebooks/host \
                            rapids-tco-benchmark:v23.06 


<span style="display: block; color:#8735fb; font-size:22pt"> **Run HPO** </span>

Navigate to the host directory inside the container and run the python script with the following command : 

    python ./hpo.py --model-type "XGBoost" --mode "gpu"  > xgboost_gpu.txt 2>&1
