### Install packages
We will first install the prerequisite packages:
- [**aiobotocore**](https://aiobotocore.readthedocs.io/en/latest/): adds async support for AWS services with [botocore](https://github.com/boto/botocore).
- [**sagemaker-studio-image-build**](https://pypi.org/project/sagemaker-studio-image-build/): CLI for building Docker images in SageMaker Studio using [AWS CodeBuild](https://aws.amazon.com/codebuild/)

In [None]:
!pip install --upgrade pip
!pip install -q s3fs==2022.5.0
!pip install -q boto3==1.21.21
!pip install -q botocore==1.24.21
!pip install -q awscli==1.22.76
!pip install -Uq aiobotocore==2.3.0
!pip install -q pandas==1.3.5
!pip install -q sagemaker-studio-image-build

In [None]:
!wget https://raw.githubusercontent.com/manifoldailearning/mlops-with-aws-datascientists/main/Section-15-Custom-models/scikit-byoc.zip

In [None]:
!unzip -q scikit-byoc.zip
!rm scikit-byoc.zip
!cd scikit-byoc

### The Dockerfile


In [None]:
!pygmentize scikit-byoc/container/Dockerfile

### Building and registering the container

In [None]:
%%sh
cd scikit-byoc/container

chmod +x decision_trees/train
chmod +x decision_trees/serve

sm-docker build .  --repository sagemaker-decision-trees:latest

### Setup & Upload Data

### Setup the Environment 

In [None]:
S3_prefix = 'mlops-scikit-byo-iris'

# Define IAM role
import boto3
import re

import os
import numpy as np
import pandas as pd
from sagemaker import get_execution_role

role = get_execution_role()

In [None]:
import sagemaker as sage
from time import gmtime, strftime

sess = sage.Session()

### Upload data to S3 Bucket

In [None]:
import sagemaker
WORK_DIRECTORY = 'scikit-byoc/data'
bucket=sagemaker.Session().default_bucket()
data_location = sess.upload_data(WORK_DIRECTORY, 
                                 bucket=bucket,
                                 key_prefix=S3_prefix)

## Model Training

In order to use SageMaker to fit our algorithm, we create an `estimator` that defines how to use the container to train:

- `image_uri (str)` - The Amazon Elastic Container Registry path where the docker image is registered. 
- `role (str)` - SageMaker IAM role 
- `instance_count (int)` - number of machines to use for training.
- `instance_type (str)` - the type of machine to use for training.
- `output_path (str)` - where the model artifact will be written.
- `sagemaker_session (sagemaker.session.Session)` - the SageMaker session object 


Then we use `estimator.fit()` method to train against the data that we uploaded.
The API calls the Amazon SageMaker `CreateTrainingJob` API to start model training. The API uses configuration you provided to create the `estimator` and the specified input training data to send the `CreatingTrainingJob` request to Amazon SageMaker.

In [None]:
account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image_uri = '{}.dkr.ecr.{}.amazonaws.com/sagemaker-decision-trees:latest'.format(account, region)

tree = sage.estimator.Estimator(image_uri,
                                role, 
                                instance_count=1, 
                                instance_type='ml.m4.xlarge',
                                output_path="s3://{}/output".format(sess.default_bucket()),
                                sagemaker_session=sess)

file_location = data_location + '/iris.csv'
tree.fit(file_location)

## Model Deployment
You can use a trained model to get real time predictions using HTTP endpoint. Follow these steps to walk you through the process.

After the model training successfully completes, you can call the [`estimator.deploy()` method](https://sagemaker.readthedocs.io/en/stable/estimators.html#sagemaker.estimator.Estimator.deploy). The `deploy()` method creates a deployable model, configures the SageMaker hosting services endpoint, and launches the endpoint to host the model. 

The method uses the following configurations:
- `initial_instance_count (int)` – The number of instances to deploy the model.
- `instance_type (str)` – The type of instances that you want to operate your deployed model.
- `serializer (int)` – Serialize input data of various formats (a NumPy array, list, file, or buffer) to a CSV-formatted string in this example. 


In [None]:
from sagemaker.serializers import CSVSerializer
predictor = tree.deploy(initial_instance_count=1, 
                        instance_type='ml.m4.xlarge', 
                        serializer=CSVSerializer())

## Run Inferences

In [None]:
!aws s3 ls $data_location

In [None]:
shape=pd.read_csv(f"scikit-byoc/data/iris.csv", header=None)
shape.sample(3)

In [None]:
shape.drop(shape.columns[[0]],axis=1,inplace=True)
shape.sample(3)

In [None]:
#chossing random data from each class
import itertools

a = [50*i for i in range(3)]
b = [40+i for i in range(10)]
indices = [i+j for i,j in itertools.product(a,b)]
test_data=shape.iloc[indices[:-1]]

### Predictions


In [None]:
print(predictor.predict(test_data.values).decode('utf-8'))

## Cleanup
Delete the endpoint through AWS Console


In [None]:
sess.delete_endpoint(predictor.endpoint_name)