# Customizing the Build/Train/Deploy MLOps Project Template

We recently announced [Amazon SageMaker Pipelines](https://aws.amazon.com/sagemaker/pipelines/), the first 
purpose-built, easy-to-use Continuous Integration and Continuous Delivery (CI/CD) service for machine learning. 
SageMaker Pipelines has three main components which improves the operational resilience and reproducibility of your 
workflows: Pipelines, Model Registry, and Projects. 

SageMaker Projects introduce MLOps templates that automatically provision the underlying resources needed to enable 
CI/CD capabilities for your Machine Learning Development Lifecycle (MLDC). Customers can use a number of built-in 
templates or create your own custom templates.

This example will focus on using one of the MLOps templates to bootstrap your ML project and establish a CI/CD 
pattern from seed code. We’ll show how to use the built-in Build/Train/Deploy Project template as a base for a 
customer churn classification example. This base template will enable CI/CD for training machine learning models, 
registering model artifacts to the Model Registry, and automating model deployment with manual approval and automated 
testing.

## MLOps Template for Build, Train, and Deploy

We’ll start by taking a detailed look at what AWS services are launched when this build, train, deploy MLOps template 
is launched. Later, we’ll discuss how the skeleton can be modified for a custom use case. 

To get started with SageMaker Projects, [they must be first enabled in the SageMaker Studio console](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-studio-updates.html). 
This can be done for existing users or while creating new ones:

<img src="img/enable_projects.png">

Within Amazon SageMaker Studio, you can now select “Projects” from a drop-down menu on the “Components and registries” 
tab as shown below:

<img src="img/select_projects.png">

From the projects page you’ll have the option to launch a pre-configured SageMaker MLOps template. We'll select the build, train and deploy template:

<img src="img/create_project.png">

NOTE: Launching this template will kick off a model building pipeline by default and will train a regression model. This will incur a small cost.

Once the project is created from the MLOps template, the following architecture will be deployed:

<img src="img/deep_dive.png">


## Modifying the Seed Code for Custom Use Case

After your project has been created the architecture shown above will be deployed and the visualization of the 
Pipeline will be available in the “Pipelines” drop down menu within SageMaker Studio.

In order to modify the seed code from this launched template, we’ll first need to clone the AWS CodeCommit 
repositories to our local SageMaker Studio instance. From the list of projects, select the one that was just 
created. Under the “Repositories” tab you can select the hyperlinks to locally clone the AWS CodeCommit repos:

<img src="img/clone_repos.png">


### ModelBuild Repo

The SageMaker project template will create this repositories.

In the `...-modelbuild` repository there's the code for preprocessing, training, and evaluating the model. 
The seed code trains and evaluates a model on the [UCI Abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone):

<img src="img/repo_directory.png">


**In our case we want to create a pipeline for predicting Churn (part 1 of the lab).** We can modify these files in order to solve our own customer churn use-case.


We’ll need a dataset accessible to the project (_Churn dataset_). 

The easiest way to do this is run the following in our notebook:

```
!wget http://dataminingconsultant.com/DKD2e_data_sets.zip
!unzip -o DKD2e_data_sets.zip
!mv "Data sets" Datasets
```

```
import os
import boto3
import sagemaker
prefix = 'sagemaker/DEMO-xgboost-churn'
region = boto3.Session().region_name
default_bucket = sagemaker.session.Session().default_bucket()
role = sagemaker.get_execution_role()

RawData = boto3.Session().resource('s3')\
.Bucket(default_bucket).Object(os.path.join(prefix, 'data/RawData.csv'))\
.upload_file('./Datasets/churn.txt')

print(os.path.join("s3://",default_bucket, prefix, 'data/RawData.csv'))
```

**Ok, now we have donwloaded the Churn dataset and uploaded it to our S3 Bucket that is accessible to the SageMaker Project role.**

---

### Modifying the code for the Churn problem

This is the sample structure of the Project (Abalone):

<img src="img/repo_directory.png">


#### We'll need to:
1. rename the `abalone` directory to `customer_churn`
2. replace `codebuild-buildspec.yml` in your current Studio project (Abalone) with the one found in [modelbuild/codebuild-buildspec.yml](modelbuild/codebuild-buildspec.yml) (Churn)
3. replace the `preprocess.py`, `evaluate.py` (of the sample Abalone) with the ones found in `modelbuild/pipelines/customer_churn`
4. replace `pipeline.py`(Abalone) with the one found in `modelbuild/pipelines/customer_churn/pipeline.py`

    
5. **In the `pipeline.py` file you'll need to replace the `default_value` of `InputDataURL` with the URL you obtained when uploading the data above.**
    
```python
#in pipeline.py
...
input_data = ParameterString(
    name="InputDataUrl",
    default_value=f"s3://EXAMPLE-BUCKET/PATH/TO/RawData.csv",  # Change this to point to the s3 location of your raw input data.
)
...
```


## Trigger a new training Pipeline Execution through git commit

By committing these changes to the AWS CodeCommit repository (easily done in SageMaker Studio source control tab), a 
new Pipeline execution will be triggered since there is an EventBridge monitoring for commits.  After a few moments, 
we can monitor the execution by selecting your Pipeline inside of the SageMaker Project.

<img src="img/git_push.png">

This triggers the pipelines for training. Go to our `“Pipelines”` tab inside of the SageMaker Project. Click on our only pipeline. And you'll see:

<img src="img/execute_pipeline.png">

Select the most recent execution:

<img src="img/dag.png">


## Trigger the ModelDeploy Pipeline

Once the train pipeline is completed, we can go to our `“Model groups”` tab inside of the SageMaker Project and inspect the metadata attached to the model artifacts. If everything looks good, we can manually approve the model:

<img src="img/model_metrics.png">

<img src="img/approve_model.png">

This approval will trigger the ModelDeploy pipeline (in CodePipeline):

<img src="img/execute_pipeline_deploy.png">

After we deploy to a staging environment and run some tests, we will have to **approve the deployment to production** by approving in the `ApproveDeployment` stage:

<img src="img/approve_deploy_prod.png">



Finally, if we go back to Studio, we will see the Production endpoint for real time inference.

<img src="img/endpoints.png">