### Lab #1 - Creating your first pipeline with Amazon SageMaker Components for Kubeflow

We can start by upgrading/verifying the installation of the Kubeflow Pipelines SDK in our Jupyter environment.

We will also verify that we have available the Domain Specific Language (DSL) compiler for Kubeflow.

In [None]:
!pip install kfp --upgrade
!which dsl-compile

Next, we will import the Amazon SageMaker and Boto3 SDKs for being able to get the proper roles, and interact with other services like Amazon S3.

In [None]:
import sagemaker
import boto3

sess = boto3.Session()
sm   = sess.client('sagemaker')
role = sagemaker.get_execution_role()
sagemaker_session = sagemaker.Session(boto_session=sess)

bucket_name = sagemaker_session.default_bucket()

Now, we will upload to Amazon S3 the dataset that we will use for this example, together with our pre-processing script. Both files are available in our local Jupyter notebook environment.

In [None]:
!aws s3 cp kmeans_preprocessing.py s3://$bucket_name/mnist_kmeans_example/processing_code/kmeans_preprocessing.py
!aws s3 cp mnist.pkl.gz s3://$bucket_name/mnist_kmeans_example/mnist.pkl.gz

Finally, we will compile our pipeline defined in the Python script in our local Jupyter notebook environment.

In [None]:
!dsl-compile --py mnist-classification-pipeline.py --output mnist-classification-pipeline.tar.gz

Now go back to the instructions and continue with the tasks in the Kubeflow Pipeline Dashboard.

<img src="./images/f1.png" alt="pipeline" width="600"/>

### Lab #2 - Exploring Amazon SageMaker Components with Elastic Inference and Endpoints with multiple model variants

We will now work with a new pipeline for exploring [Amazon Elastic Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/ei.html), and [SageMaker Multi-Model Endpoints](https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html).

For this example we will train two models in parallel with different hyper-parameters for illustrating the concept.

This will allow us having two different variants of the model in the endpoint, which enables use cases like e.g. A/B testing. You can see more details about this case in the [documentation here](https://docs.aws.amazon.com/sagemaker/latest/dg/model-ab-testing.html).

We will start by downloading the dataset and preparing the data:

In [None]:
import os 
import urllib.request

prefix_name = 'caltech_example'

def download(url):
    filename = url.split("/")[-1]
    if not os.path.exists(filename):
        urllib.request.urlretrieve(url, filename)

        
def upload_to_s3(channel, file):
    s3 = boto3.resource('s3')
    data = open(file, "rb")
    key = channel + '/' + file
    s3.Bucket(bucket).put_object(Key=key, Body=data)

# caltech-256
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-train.rec')
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-val.rec')

In [None]:
# Two channels: train, validation
s3train = 's3://{}/{}/train/'.format(bucket_name, prefix_name)
s3validation = 's3://{}/{}/validation/'.format(bucket_name, prefix_name)

In [None]:
# upload the rec files to train and validation channels
!aws s3 cp caltech-256-60-train.rec $s3train
!aws s3 cp caltech-256-60-val.rec $s3validation

Now, we will compile our pipeline defined in the Python script in our local Jupyter notebook environment.

In [None]:
!dsl-compile --py caltech-ei-mmv-pipeline.py --output caltech-ei-mmv-pipeline.tar.gz

Now go back to the instructions and continue with the tasks in the Kubeflow Pipeline Dashboard.

<img src="./images/f3.png" alt="pipeline" width="600"/>

### Lab #3 - Exploring Amazon SageMaker Debugger and Model Monitor

We will now work with a new pipeline for exploring [SageMaker Debugger](https://docs.aws.amazon.com/sagemaker/latest/dg/train-debugger.html), and [SageMaker Model Monitor](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor.html).

For this example we will train a very simple pipeline with the XGBoost algorithm and the same data from the Lab \#1 above. We will use poor hyper-parameters choices on purpose, which will allow us showing how Debugger highlights the issues in the training.

Now, we will compile our pipeline defined in the Python script in our local Jupyter notebook environment.

In [None]:
!dsl-compile --py debugger-monitor-pipeline.py --output debugger-monitor-pipeline.tar.gz

Now go back to the instructions and continue with the tasks in the Kubeflow Pipeline Dashboard.

<img src="./images/f5.png" alt="pipeline" width="600"/>

Finally, we will enable Model Monitor on the endpoint, so that we can analyze the inputs and outputs to the model inferences.

For doing that go back to your Jupyter environment and open the "SageMaker Examples" tab, look for the section "SageMaker Model Monitor" and for the notebook called "SageMaker-Enable-Model-Monitor.ipynb" click on "Use". This will create a copy of this notebook on a folder in your Jupyter environment.

Then repeat the steps for the notebook called "SageMaker-Model-Monitor-Visualize.ipynb".

<img src="./images/f6.png" alt="pipeline" width="600"/>

You can now proceed running those notebooks in order (first the enable then the visualize) for completing this workshop.