# AWS SageMaker

## 1-Introduction

Amazon SageMaker is a cloud machine-learning platform that was launched in November 2017. SageMaker enables developers to create, train, and deploy machine-learning models in the cloud.

#### How it Works?

Label-Labeling jobs for highly accurate training datasets within Amazon SageMaker, using active learning and human labeling.     
Process of tagging and detecting data samples. Process of attaching meaning to the data.     
Raw data --> Labelling -->Lablers(choose from different workforce)-->Assistive labelling-->Accurate Training Model     
Improves data label accuracy, Easy to use, Reduces cost by 70% and Choose your Workforce.   

Build-Connect to other AWS services and transform data in Amazon SageMaker notebooks.    
Train-Use Amazon SageMaker's algorithms and frameworks, or bring your own, for distributed training.    
Tune-Amazon SageMaker automatically tunes your model by adjusting multiple combinations of algorithm parameters.   
Deploy-Once training is completed, models can be deployed to Amazon SageMaker endpoints, for real-time predictions.   

#### Benefits and Features

Labeling raw data with active learning- Continuosly learn and improve between machine and human.     
Highly accurate training datasets-Active learning models from Amazon SageMaker Ground Truth provide a very high level of consistency and accuracy for training datasets.     
Fully managed notebook instances-For training data exploration and preprocessing, Amazon SageMaker provides fully managed instances running Jupyter notebooks that include example code for common model training and hosting exercises.  
Highly optimized machine learning algorithms-Amazon SageMaker installs high-performance, scalable machine learning algorithms optimized for speed, scale, and accuracy, to run on extremely large training datasets.  
One-click training-When you're ready to train in Amazon SageMaker, simply indicate the type and quantity of instances you need and initiate training with a single click.      
Deployment without engineering effort-After training, SageMaker provides the model artifacts and scoring images to you for deployment to Amazon EC2 or anywhere else.    

## 2-Creating free-tier AWS account

## 3-Notebook Instance/SageMaker Studio

It is same as Jupyter Notebook/Jupyter lab hosted in AWS Cloud you can choose the environment also.

#### AWS Elastic Inference

Amazon Elastic Inference allows you to attach low-cost GPU-powered acceleration to Amazon EC2 and Sagemaker instances or Amazon ECS tasks, to reduce the cost of running deep learning inference by up to 75%. Amazon Elastic Inference supports TensorFlow, Apache MXNet, PyTorch and ONNX models.

Inference is the process of making predictions using a trained model. In deep learning applications, inference accounts for up to 90% of total operational costs for two reasons. Firstly, standalone GPU instances are typically designed for model training - not for inference. While training jobs batch process hundreds of data samples in parallel, inference jobs usually process a single input in real time, and thus consume a small amount of GPU compute. This makes standalone GPU inference cost-inefficient. On the other hand, standalone CPU instances are not specialized for matrix operations, and thus are often too slow for deep learning inference. Secondly, different models have different CPU, GPU, and memory requirements. Optimizing for one resource can lead to underutilization of other resources and higher costs.

Amazon Elastic Inference solves these problems by allowing you to attach just the right amount of GPU-powered inference acceleration to any EC2 or SageMaker instance type or ECS task, with no code changes. With Amazon Elastic Inference, you can choose any CPU instance in AWS that is best suited to the overall compute and memory needs of your application, and then separately configure the right amount of GPU-powered inference acceleration, allowing you to efficiently utilize resources and reduce costs.

#### IAM Role

You can use IAM to securely control individual and group access to your AWS resources. You can create and manage user identities ("IAM users") and grant permissions for those IAM users to access your resources. You can also grant permissions for users outside of AWS ( federated users).
Suppose if we are working on a bank appliation and we want to access a specific S3 bucket or all the buckets we can crreate and give permission to an individual or group.

## 4-Importing Libraries and Creating S3 Bucket

In [None]:
import sagemaker
import boto3 #AWS SDK for Python (Boto3) to create, configure, and manage AWS services, such as Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).
from sagemaker.amazon.amazon_estimator import get_image_uri #estimators to train your model.
from sagemaker.session import s3_input, Session #Manage interactions with the Amazon SageMaker APIs and any other AWS services needed.

In [None]:
#Printing the S3 bucket
bucket_name = 'bankapplication' 
my_region = boto3.session.Session().region_name # set the region of the instance
print(my_region)

In [None]:
#Creating S3 bucket using Notebook instance in the region eu-west-1
s3 = boto3.resource('s3')
try:
    if  my_region == 'eu-west-1':
        s3.create_bucket(Bucket=bucket_name,CreateBucketConfiguration={'LocationConstraint': 'eu-west-1'})
    print('S3 bucket created successfully')
except Exception as e:
    print('S3 error: ',e)

In [None]:
#Set an output path where the trained model will be saved
prefix = 'xgboost-as-a-built-in-algo'
output_path ='s3://{}/{}/output'.format(bucket_name, prefix)
print(output_path)

## 5-Downloading The Dataset And Storing in S3

In [None]:
#Downloading the dataset
import pandas as pd
import urllib
try:
    urllib.request.urlretrieve ("https://d1.awsstatic.com/tmt/build-train-deploy-machine-learning-model-sagemaker/bank_clean.27f01fbbdf43271788427f3682996ae29ceca05d.csv", "bank_clean.csv")
    print('Success: downloaded bank_clean.csv.')
except Exception as e:
    print('Data load error: ',e)

try:
    model_data = pd.read_csv('./bank_clean.csv',index_col=0)
    print('Success: Data loaded into dataframe.')
except Exception as e:
    print('Data load error: ',e)

In [None]:
#Train Test split
import numpy as np
train_data, test_data = np.split(model_data.sample(frac=1, random_state=1729), [int(0.7 * len(model_data))])
print(train_data.shape, test_data.shape)

In [None]:
#Saving Train And Test Into Buckets
import os
pd.concat([train_data['y_yes'], train_data.drop(['y_no', 'y_yes'], axis=1)],axis=1).to_csv('train.csv', index=False, header=False)
boto3.Session().resource('s3').Bucket(bucket_name).Object(os.path.join(prefix, 'train/train.csv')).upload_file('train.csv')
s3_input_train = sagemaker.s3_input(s3_data='s3://{}/{}/train'.format(bucket_name, prefix), content_type='csv')

In [None]:
# Test Data Into Buckets
pd.concat([test_data['y_yes'], test_data.drop(['y_no', 'y_yes'], axis=1)], axis=1).to_csv('test.csv', index=False, header=False)
boto3.Session().resource('s3').Bucket(bucket_name).Object(os.path.join(prefix, 'test/test.csv')).upload_file('test.csv')
s3_input_test = sagemaker.s3_input(s3_data='s3://{}/{}/test'.format(bucket_name, prefix), content_type='csv')

## 6-Building the Model(Xgboost-Inbuild Algorithm)

In [None]:
#This line automatically looks for the XGBoost image URI and builds an XGBoost container.
#Specify the repo_version depending on your preference.
container = get_image_uri(boto3.Session().region_name,
                          'xgboost', 
                          repo_version='1.0-1')

In [None]:
#Initialize hyperparameters
hyperparameters = {
        "max_depth":"5",
        "eta":"0.2",
        "gamma":"4",
        "min_child_weight":"6",
        "subsample":"0.7",
        "objective":"binary:logistic",
        "num_round":50
        }

In [None]:
#Construct a SageMaker estimator that calls the xgboost-container
estimator = sagemaker.estimator.Estimator(image_name=container, 
                                          hyperparameters=hyperparameters,
                                          role=sagemaker.get_execution_role(),
                                          train_instance_count=1, 
                                          train_instance_type='ml.m5.2xlarge', 
                                          train_volume_size=5, # 5 GB 
                                          output_path=output_path,
                                          train_use_spot_instances=True,
                                          train_max_run=300,
                                          train_max_wait=600)

In [None]:
estimator.fit({'train': s3_input_train,'validation': s3_input_test})

## 7-Prediction of Test Data

In [None]:
from sagemaker.predictor import csv_serializer
test_data_array = test_data.drop(['y_no', 'y_yes'], axis=1).values #load the data into an array
xgb_predictor.content_type = 'text/csv' # set the data type for an inference
xgb_predictor.serializer = csv_serializer # set the serializer type
predictions = xgb_predictor.predict(test_data_array).decode('utf-8') # predict!
predictions_array = np.fromstring(predictions[1:], sep=',') # and turn the prediction into an array
print(predictions_array.shape)

In [None]:
predictions_array

In [None]:
cm = pd.crosstab(index=test_data['y_yes'], columns=np.round(predictions_array), rownames=['Observed'], colnames=['Predicted'])
tn = cm.iloc[0,0]; fn = cm.iloc[1,0]; tp = cm.iloc[1,1]; fp = cm.iloc[0,1]; p = (tp+tn)/(tp+tn+fp+fn)*100
print("\n{0:<20}{1:<4.1f}%\n".format("Overall Classification Rate: ", p))
print("{0:<15}{1:<15}{2:>8}".format("Predicted", "No Purchase", "Purchase"))
print("Observed")
print("{0:<15}{1:<2.0f}% ({2:<}){3:>6.0f}% ({4:<})".format("No Purchase", tn/(tn+fn)*100,tn, fp/(tp+fp)*100, fp))
print("{0:<16}{1:<1.0f}% ({2:<}){3:>7.0f}% ({4:<}) \n".format("Purchase", fn/(tn+fn)*100,fn, tp/(tp+fp)*100, tp))

## 8-Deleting the endpoints

In [None]:
sagemaker.Session().delete_endpoint(xgb_predictor.endpoint)
bucket_to_delete = boto3.resource('s3').Bucket(bucket_name)
bucket_to_delete.objects.all().delete()