# SageMaker - Logistic Regression

### Configure AWS CLI

Refs: 
- <a href='https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html#cli-configure-quickstart-creds'>To create access key for an IAM user</a>
- <a href='https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html'>CLI Configure QuickStart</a>

For execute this notebook you need configure aws cli with your credentials

User Policies:
- AdministratorAccess
- AmazonSageMakerFullAccess

Functions for SageMaker:
- AmazonS3FullAccess
- AmazonSageMakerFullAccess

After configure your AWS credentials, execute the script build_and_push.sh to build docker image and registry on AWS ECR

> build_and_push.sh iris-logistic-regression

# SageMaker

### Imports

In [51]:
import boto3, re, os
import pandas as pd
from sagemaker import get_execution_role
import sagemaker as sage
from time import gmtime, strftime
from sklearn.model_selection import train_test_split
from sagemaker.debugger import Rule, ProfilerRule, rule_configs
from sagemaker.predictor import csv_serializer

### Split data between train and validation

After split dataset (train and validation), upload the files in your s3 bucket

In [52]:
# Data location
local_path = 'local_test/test_dir/input/data/{}'

# Load dataset
data = pd.read_csv(local_path.format('training/iris.csv'))

# Split train and validation
X_train, X_test = train_test_split(data, test_size=0.2, random_state=1)

# Write datasets
X_train.to_csv(local_path.format('training.csv'), index=False)
X_test.to_csv(local_path.format('validation.csv'), index=False)

### Create a Debugger built-in rule list object

The following code cell shows how to configure a rule object for debugging and profiling. For more information about the Debugger built-in rules, see <a href="https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-built-in-rules.html">List of Debugger Built-in Rules</a>.

In [53]:
built_in_rules = [
    Rule.sagemaker(rule_configs.overfit()),
    ProfilerRule.sagemaker(rule_configs.ProfilerReport()),
]

### Create Session, an estimator and fit the model

In order to use SageMaker to fit our algorithm, we'll create an Estimator that defines how to use the container to train. This includes the configuration we need to invoke SageMaker training:

- The **container name.** This is constructed as in the shell commands above.
- The **role.** As defined above.
- The **instance count** which is the number of machines to use for training.
- The **instance type** which is the type of machine to use for training.
- The **output path** determines where the model artifact will be written.
- The **session** is the SageMaker session object that we defined above.

Then we use fit() on the estimator to train against the data that we uploaded above.

In [54]:
# Create the session
sess = sage.Session()

# Get IAM Functions for SageMaker
role = 'arn:aws:iam::465270637007:role/AmazonSageMaker-ExecutionRole' #get_execution_role()

# Create an estimator
prefix = "logistic-regression" # S3 prefix
data_location = "s3://logistic-regression/iris.csv"
account = sess.boto_session.client("sts").get_caller_identity()["Account"]
region = sess.boto_session.region_name
image = "{}.dkr.ecr.{}.amazonaws.com/iris-logistic-regression:latest".format(account, region)
instance_count = 1
instance_type = "ml.m5.xlarge"
output_path = "s3://{}/output".format(prefix)

In [55]:
# Create estimator
model = sage.estimator.Estimator(
    image_uri=image,
    role=role,
    instance_count=instance_count,
    instance_type=instance_type,
    output_path=output_path,
    sagemaker_session=sess,
    rules=built_in_rules
)
# Fit model
model.fit(data_location)

2021-07-27 23:07:59 Starting - Starting the training job...
2021-07-27 23:08:31 Starting - Launching requested ML instancesOverfit: InProgress
ProfilerReport: InProgress
...
2021-07-27 23:09:02 Starting - Preparing the instances for training......
2021-07-27 23:10:14 Downloading - Downloading input data
2021-07-27 23:10:14 Training - Downloading the training image...
2021-07-27 23:10:43 Uploading - Uploading generated training model
2021-07-27 23:10:43 Completed - Training job completed
[34mStarting the training.[0m
[34mTraining complete.[0m
Training seconds: 40
Billable seconds: 40


### Deploy model

In [56]:
predictor = model.deploy(1, instance_type, serializer=csv_serializer)

-----------!

### Predict

In [57]:
predictor.predict([7,3.2,4.7,1.4])

The csv_serializer has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


b'versicolor\n'

### Delete Endpoint

In [58]:
sess.delete_endpoint(predictor.endpoint)

The endpoint attribute has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


![title](img/sagemaker-painel.png)

### Important refs:
- https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html
- https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
- https://aws.amazon.com/pt/blogs/machine-learning/build-end-to-end-machine-learning-workflows-with-amazon-sagemaker-and-apache-airflow/