## Lab #3 - Build a pipeline following MLOps best practices

Now we will assume that we are working as DevOps Engineers (or Infra or Operations alternatively).

Our data scientists have provided us the workflow Cloud Formation template in YAML format for running a new model in production, and we need to build the pipeline for doing code control, versioning, audit, testing, and monitoring of the models in production.

This time, we will create the pipeline using AWS CodePipeline directly with the console.

***Note you could be using other tools for this like Jenkins or similar (and the same for the previous lab workflow). You could also use AWS CodePipeline but building the pipelines through infra-as-code, CLI commands, or the CDK instead of the console as we are going to do here.***

**Note this lab requires you to have a GitHub account and fork the repository we are using. If you do not have an account or do not want to fork the repository in your account then just follow the demonstration done by the workshop staff.**

**0. Fork the GitHub repository in your account**

You will need to fork the lab repository in order to be able to source the code for our ML pipeline, and monitor for changes.

- Log into your GitHub account https://github.com/
- Go to the lab repository https://github.com/rodzanto/courier-default/ and choose "Fork" at the top right of the screen.
- Follow the steps in the GitHub documentation to create a new (OAuth 2) token with the following scopes (permissions): admin:repo_hook and repo. If you already have a token with these permissions, you can use that. You can find a list of all your personal access tokens in https://github.com/settings/tokens.
- Copy the access token to your clipboard. For security reasons, after you navigate off the page, **you will not be able to see the token again**. If you have lost your token, you can regenerate it.

**1. Create the testing AWS Lambda function**

We will need a function to test our pipeline once it is deployed in Pre-production or in Production. This function will perform a couple of testing inferences towards our Amazon SageMaker Endpoint for getting predictions in real-time, and evaluating if our model is responding as expected.

- Go to the [AWS Lambda console](https://eu-west-1.console.aws.amazon.com/lambda/)
- Choose "Create function", select "Author from scratch" and name it "glovo-test-prod". Choose Python 3.8 as the Runtime, and in "Choose or create an execution role" choose "Use an existing role". Select the AWS IAM service role that you created at the beggining of the workshop. Choose "Create function".
- Once it opens, go to the "Function code" section and replace the default code that comes with the function, instead, copy and paste the code in this script: [test-prod.py](https://github.com/rodzanto/courier-default/blob/master/test-prod.py) provided in our lab repository.
- Choose "Save".

**2. Adjust the workflow template**

- The repository contains two YAML files with the templates for deploying infrastructure as code. You must update the workflow.yaml and replace its content by copy-pasting from the last cell of the workflow notebook in lab #2. Then save it as workflow.yaml in your repo.
- Go to your [AWS Step Functions console](https://eu-west-1.console.aws.amazon.com/states/) and delete the state machine that we created on the previous lab #2, as our pipeline is going to create it automatically now. Just choose "Delete" and confirm.

**3. Create the ML pipeline**

- Go to the [AWS CodePipeline console](https://eu-west-1.console.aws.amazon.com/codesuite/codepipeline/pipelines)**
- Choose "Create pipeline", name it "glovo-courier-pipeline", choose "Existing service role" and in the "Role ARN" paste the ARN for the AWS IAM role we have been using since the beggining. Choose "Next".
- In the "Source" screen, for "Source provider" select "GitHub" and choose "Connect to GitHub". Login with your GitHub credentials for establishing the connection. Choose the "Repository" and "master" branch, then choose "Next".
- In the "Build" screen, just choose "Skip build stage" and then "Skip".
- In the Deploy screen, select provider "AWS CloudFormation", region "Europe (Ireland)", action mode "Create or update stack". For "Stack name" write "glovo-workflow", for "Artifact name" choose "SourceArtifact" and write the filename as "workflow.yaml". Finally in "Role name" paste our AWS IAM role ARN, and choose "Next". Then choose "Create pipeline".

The pipeline visualization will be shown on the screen, and it will start running. It will also deploy our ML workflow (same of the lab #2 with AWS StepFunctions) using AWS CloudFormation for infra-as-code.


Now, we will add another stage to our pipeline for: adding a manual approval before deploying to production, deploying an Amazon SageMaker Endpoint for real-time inference in Pre-production or Production, and will add some validation function to automate the testing of this Endpoint with AWS Lambda.

- Once the execution finishes, choose "Edit" and "Add stage" at the end of our pipeline. Give it a name e.g. "Production".

For the Manual Approval:

- Choose "Add action group" and give it a name like e.g. "Production-approval", then choose action provider as "Manual approval", choose "Done".

For the Amazon SageMaker Endpoint:

- Choose "Add action group" and give it a name like e.g. "Prod-endpoint", then choose the action provider as "AWS Cloud Formation", region "Europe (Ireland)", input artifact "SourceArtifact", action mode "Create or update stack", stack name "glovo-endpoint", artifact name "SourceArtifact" and file name "prod.yaml". Role name as per our AWS IAM ARN.
- Expand the "Advanced" tag in the arrow, and for parameter overrides paste the following:
````
{
   "ModelName": "YOUR AMAZON SAGEMAKER TRAINING JOB NAME",
   "InstanceType":"ml.m4.xlarge",
   "InstanceCount":"1"
}
````
You can verify your SageMaker training job name by checking in the console https://eu-west-1.console.aws.amazon.com/sagemaker/home?region=eu-west-1#/jobs. The name should be similar to "xgboost-YYYY-MM-DD-HH-MM-SS-sss".
- Finally for variable namespace write "Pipeline", and choose "Done".

For the testing function:

- Finally, choose "add action group" and give it a name like e.g. "Run-tests", then choose action provider as "AWS Lambda", input artifacts "SourceArtifact", function name "glovo-test-prod", and user parameters "#{Pipeline.EndpointName}", and choose "Done".

- Then choose "Done", and "Save". Confirm with "Save".

You can now wait for your pipeline to complete the execution, and you should be able to check the details for each step like e.g. the results of the tests on your Pre/production endpoint for a defaulter and not defaulter courier.


Congratulations, you have now completed the labs.