# Module 7 AWS SageMaker 

Topics:
 * AWS SageMaker
 * SageMaker Notebooks
 * SageMaker Studio
 * SageMaker Debugger
 * SageMaker Data Wrangler
 
#### Module Kickoff Video
* [AWS SageMaker and Final Project Preview (13 min)](https://youtu.be/J1EFahOz-Bs)
 * [Slides](./resources/DSA8430_Parallel_SageMaker.pdf)
 
 

### What is SageMaker?

From the documentation:
> Amazon SageMaker is a fully managed machine learning service. With SageMaker, data scientists and developers can quickly and easily build and train machine learning models, and then directly deploy them into a production-ready hosted environment. It provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so you don't have to manage servers. It also provides common machine learning algorithms that are optimized to run efficiently against extremely large data in a distributed environment.

https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html



## Videos

* [Introduction to AWS SageMaker (5-min)](https://youtu.be/Qv_Tr_BCFCQ)
* [Build, Train and Deploy Machine Learning Models on AWS with Amazon SageMaker - AWS Online Tech Talks (35-min)](https://youtu.be/R0vC31OXt-g)



## Readings
* [How Amazon SageMaker Works](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html#how-it-works)
  * [Machine Learning with Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-mlconcepts.html)
  * [Explore, Analyze, and Process Data](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-notebooks-instances.html)
  * [Train a Model with Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html)
  * [Deploy a Model in Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-deployment.html)
  * [Use Machine Learning Frameworks, Python, and R with Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/frameworks.html)
  

##### Do not dive down the "Get Started with Amazon SageMaker" rabbit hole.
We will sub-select tutorials and next steps in the Lab/Practice phase of the module.


### Reference Material 
#### [Full Documentation PDF](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-dg.pdf)


---


## Important Notes

SageMaker requires a wealth of permissions to access all the appropriate AWS resources.
Specifically, the configurations we are using requires __YOU__ to include "sagemaker" in the name of all your S3 resources (buckets, objects, etc).
To ensure you are in your own driving lane on buckets and resources, you will also need to ensure your buckets start with your SSO prefix.
So, S3 bucket name prefix for this module is __*SSO_sagemaker_*__.

## Labs

Various labs and practice activities will leverage the expansive [Amazon SageMaker Examples repository](https://github.com/aws/amazon-sagemaker-examples).

### L1. SageMaker Basics

Source Material Reference:

Create a SageMaker domain, _SSO-sagemaker_.  Use the Quick Setup
![AWS_SageMaker_Domain.png MISSING](./images/AWS_SageMaker_Domain.png)
 * **Choose the VPC as shown below:**
 ![AWS_SageMaker_Domain_VPC.png MISSING](./images/AWS_SageMaker_Domain_VPC.png)
 

---

### L2.  Follow this Tutorial: https://docs.aws.amazon.com/sagemaker/latest/dg/gs-console.html
 * Read down to the **Tutorial Overview** then get started with steps 1-7.
 * For instance name, use __SSO-sagemaker-lab2__.
 * For the notebook name, use *Lab_SSO*.
 
 
 * **Step 3**: As you do the tutorial, ensure each code block is its own cell and you are getting the shown expected output.
 * **Step 3**: Look for areas to customize the code to use SSO prefix, such as
```
import sagemaker, boto3, os
bucket = sagemaker.Session().default_bucket()
#prefix = "demo-sagemaker-xgboost-adult-income-prediction"
# above line specialized to: 
prefix = "scottgs-sagemaker-xgboost-adult-income-prediction"
```


 * **Step 4**: Keep this going in the same notebook as the data prep (step 3).
 * **Step 5**: Keep this going in the same notebook as the data prep (step 3).
 * **Step 6**: Keep this going in the same notebook as the data prep (step 3).
  * At the end of this step, Save your Notebook. Ensure your name has the Pawprint.
  * Download the notebook from the SageMaker studio to your local computer.
 
### Complete Step 7 to clean up the model, endpoints, and notebook instances.
 * Leave the buckets and logs fornow to ensure we are not messing up other students work.

#### Lab 1 Artifact: Upload your downloaded SageMaker notebook to the module7/labs/ folder.

Ensure this link below opens your notebook for grading.

[Lab Notebook](./labs/Lab_lcmhng.ipynb)
 * Note, you will need to double click this cell and updated the link filename to change "SSO" to your actual SSO.



---

### L3.  SageMaker Debugger 

You have seen the debugger used in the lab above.
You will also see it in the practices. For this lab, you will just familiarize your self with the debugger and the possibilities. 
This will potentially be VERY useful for your final course project!

Source Material Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-tutorial.html

1. [Review the Debugger Architecture](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-how-it-works.html)
2. [Review two or three videos](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-videos.html)
 * (14-min) Analyze, Detect, and Get Alerted on Problems with Training Runs Using Amazon SageMaker Debugger
 * (14-min) Debug Models with Amazon SageMaker Debugger in Studio
 * (Optional 44-min Video) Deep Dive on Amazon SageMaker Debugger and SageMaker Model Monitor


--- 

## Practices

###  P1. SageMaker Studio Basics

#### [Reference this tutorial](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html)

1. To launch the SageMaker Studio, starting console page: https://us-east-1.console.aws.amazon.com/sagemaker/home?region=us-east-1#/
2. Click on the SageMaker Domain link on the left.
3. Select the domain you created with your name in the Labs.

![AWS_SageMaker_Studio_Select.png MISSING](./images/AWS_SageMaker_Studio_Select.png)

4. Use the Launch, to launch the Studio

![AWS_SageMaker_Studio_Launch.png MISSING](./images/AWS_SageMaker_Studio_Launch.png)



Read through the tutorial sections noted below:
 * [Amazon SageMaker Studio UI Overview](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-ui.html)
 * [Use the Amazon SageMaker Studio Launcher](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launcher.html)
 * [Use Amazon SageMaker Studio Notebooks](https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks.html)
  * A sub page in of this portion of the tutorial includes: https://docs.aws.amazon.com/sagemaker/latest/dg/gs-studio-end-to-end.html
  * <span style='background:yellow; font-weight:700'>As you are reading through the links, one of the activities will have you spawning up</span>
![AWS_SageMaker_Studio_XGBoost_Churn.png MISSING](./images/AWS_SageMaker_Studio_XGBoost_Churn.png)
  * **For this activity, review the Practice 1 Artifact.**
 
 * [Customize Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-customize.html)
 * [Perform Common Tasks in Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-tasks.html)
 



#### Practice 1 Artifact: Upload your downloaded SageMaker notebook to the module7/practices/ folder.

In the XGBoost Notebook  
`amazon-sagemaker-examples/aws_sagemaker_studio/getting_started/xgboost_customer_churn_studio.ipynb`, you will first need to clear all the Output Cells. Use the `Edit > Clear All Outputs`

Work through the notebook, and then download the completed notebook, and then upload for submission and link below.
Name your uploaded notebook: `Studio_XGBoost_SSO.ipynb`, replacing SSO with your actual SSO.

[Practice Notebook](./practices/Studio_XGBoost_lcmhng.ipynb)
 * Note, you will need to double click this cell and updated the link filename to change "SSO" to your actual SSO.


---

### P2. Data Wrangler


Read an overview of Data Wrangler: https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler.html

For this practice reference, 
https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-getting-started.html

Specifically, we are going to work through the **[Data Wrangler Titanic Dataset Walkthrough](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-getting-started.html#data-wrangler-getting-started-demo)** tutorial.
Name your flow: `Titanic_Flow_SSO`, changing "SSO" to your actual SSO.
 * Due to updates outside of AWS, download [`titanic3.csv`](https://web.dsa.missouri.edu/static/data/titanic3.csv) to use instead of the link to OpenML.org data.


When you complete the Data Wrangler, export your data flow into a notebook `Titanic_Flow_SSO.ipynb`.
 * This is section: **Export to Data Wrangler Job Notebook**
 * Note, the instructions are a little off. Look for this option `Export To > Amazon S3 (via Jupyter Notebook)`
![AWS_SageMaker_Data_Wrangler.png MISSING](./images/AWS_SageMaker_Data_Wrangler.png)


Once you have exported it to a notebook, review the notebook as you execute the cells.
Once the notebook is completed, save it, download it, and upload it as an artifact.
 * **Note**: In some cases, you may see an error on a cell
```
AttributeError: 'NoneType' object has no attribute 'items'
```
 * Just keep moving down the notebook.

#### Practice 2 Artifact: Upload your downloaded SageMaker Data Wrangler notebook to the module7/practices/ folder.

[Practice Notebook](./practices/Titanic_Flow_lcmhng.ipynb)
 * Note, you will need to double click this cell and updated the link filename to change "SSO" to your actual SSO.
 


---

### P3. Bias/Fairness Measurements

For this practice tutorial, reference: https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-detect-data-bias.html


#### Task: [Follow along this Tutorial](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_processing/fairness_and_explainability/fairness_and_explainability.html#Overview)

You will use the code provided, creating a notebook in SageMaker Studio.
Name the notebook `Studio_Bias_Evaluation_SSO.ipynb`, replacing SSO with your actual SSO.


#### Practice 3 Artifact: Upload your downloaded SageMaker notebook to the module7/practices/ folder.

Build the tutorial into a notebook in SageMaker Studio and then download, then upload the completed notebook as `Studio_Bias_Evaluation_SSO.ipynb`, replacing SSO with your actual SSO.
 * Recall Lab 2, where the code was built into a new notebook. This is a similar activity.

[Practice Notebook](./practices/Studio_Bias_Evaluation_lcmhng.ipynb)
 * Note, you will need to double click this cell and updated the link filename to change "SSO" to your actual SSO.


## Excercises

You have done A LOT, Great Job!

Their is no exercise for this module.
Instead, proceed to [Module 8](../module8/Schedule.ipynb), where you will use AWS SageMaker for the course final project!

**But, first submit this work as noted below!**

 

## Submitting your work

### <span style='background:yellow'>Please be sure the artifacts from all practices and exercises are added into your repository for the commit and push!</span>

#### Steps:
  1. Open Terminal in JupyterHub
  1. Change into the course folder
  1. Stage (Git Add) the module's learning activities   
  `git  add   module7`
  1. Create your work snapshot (Git Commit)  
  `git   commit   -m   "Module 7 submission"`
  1. Upload the snapshot to the server (Git Push)  
  `git   push`