## Uploading to Bucket

In [136]:
%pip install -U sagemaker

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [137]:
import boto3
import pandas as pd
import numpy as np

# Create an S3 client
s3_client = boto3.client('s3')

# Specify a unique bucket name
bucket_name = 'crisis-detection-bucket'

In [138]:
# Create the S3 bucket
try:
    # Create the S3 bucket
    s3_client.create_bucket(Bucket=bucket_name)
    print("Bucket created successfully!")
except ClientError as e:
    # Check if the error is due to bucket already existing
    error_code = e.response['Error']['Code']
    if error_code == 'BucketAlreadyOwnedByYou':
        print("Bucket already exists. Continuing with the existing bucket.")
    else:
        print("An error occurred while creating the bucket:", error_code)
        raise

Bucket created successfully!


In [139]:
# Specify the local file path and desired S3 object key
local_file_path = 'data.csv'

# Create a sagemaker session to upload data to S3
import sagemaker

sagemaker_session = sagemaker.Session()
training_input_path = sagemaker_session.upload_data("data.csv", bucket_name, os.path.join('data'))
print(training_input_path)

s3://crisis-detection-bucket/data/data.csv


In [140]:
# Read the local file as a DataFrame using pandas
df = pd.read_csv(local_file_path)
df.head()

Unnamed: 0,id,keyword,location,text,target
0,1,,,Our Deeds are the Reason of this #earthquake M...,1
1,4,,,Forest fire near La Ronge Sask. Canada,1
2,5,,,All residents asked to 'shelter in place' are ...,1
3,6,,,"13,000 people receive #wildfires evacuation or...",1
4,7,,,Just got sent this photo from Ruby #Alaska as ...,1


## Train model
The model is trained using the SageMaker SDK's Estimator class. Firstly, get the execution role for training. This role allows us to access the S3 bucket in the last step, where the train and test data set is located.

In [141]:
# Use the current execution role for training. It needs access to S3
role = sagemaker.get_execution_role()
print(role)

arn:aws:iam::823616654574:role/LabRole


Then, it is time to define the SageMaker SDK Estimator class. We use an Estimator class specifically desgined to train scikit-learn models called `SKLearn`. In this estimator, we define the following parameters:
1. The script that we want to use to train the model (i.e. `entry_point`). This is the heart of the Script Mode method. Additionally, set the `script_mode` parameter to `True`.
1. The role which allows us access to the S3 bucket containing the train and test data set (i.e. `role`)
1. How many instances we want to use in training (i.e. `instance_count`) and what type of instance we want to use in training (i.e. `instance_type`)
1. Which version of scikit-learn to use (i.e. `framework_version`)
1. Training hyperparameters (i.e. `hyperparameters`)

After setting these parameters, the `fit` function is invoked to train the model.

In [142]:
# Docs: https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/sagemaker.sklearn.html

from sagemaker.sklearn import SKLearn

# Define the training script and dependencies
train_script = 'train.py'  # Replace with your actual training script name
dependencies = ['utils.py']  # Replace with your required dependencies

# Set up the SKLearn estimator with dependencies
sk_estimator = SKLearn(
    entry_point=train_script,
    dependencies=dependencies,
    role=role,
    instance_count=1,
    instance_type="ml.c5.xlarge",
    framework_version="1.2-1",
    script_mode=True,
    py_version='py3',
    sagemaker_session=sagemaker_session
)

# Train the estimator
print(training_input_path)
sk_estimator.fit({"train": training_input_path})
print(sk_estimator)

s3://crisis-detection-bucket/data/data.csv
Using provided s3_resource


INFO:sagemaker:Creating training-job with name: sagemaker-scikit-learn-2023-05-20-10-13-40-781


2023-05-20 10:13:41 Starting - Starting the training job...
2023-05-20 10:13:56 Starting - Preparing the instances for training......
2023-05-20 10:15:11 Downloading - Downloading input data
2023-05-20 10:15:11 Training - Downloading the training image...
2023-05-20 10:15:37 Uploading - Uploading generated training model[34m2023-05-20 10:15:31,703 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training[0m
[34m2023-05-20 10:15:31,706 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2023-05-20 10:15:31,713 sagemaker_sklearn_container.training INFO     Invoking user training script.[0m
[34m2023-05-20 10:15:31,916 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2023-05-20 10:15:31,926 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2023-05-20 10:15:31,937 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus 

In [143]:
model_data = sk_estimator.model_data
image_uri = sk_estimator.image_uri
model_role = sk_estimator.role

print(f"Model Data: {model_data}\nImage URI: {image_uri}\nModel Role: {model_role}")

Model Data: s3://sagemaker-us-east-1-823616654574/sagemaker-scikit-learn-2023-05-20-10-13-40-781/output/model.tar.gz
Image URI: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3
Model Role: arn:aws:iam::823616654574:role/LabRole


## Deploy and test endpoint
After training the model, it is time to deploy it as an endpoint. To do so, we invoke the `deploy` function within the scikit-learn estimator. As shown in the code below, one can define the number of instances (i.e. `initial_instance_count`) and instance type (i.e. `instance_type`) used to deploy the model.

In [144]:
import time

sk_endpoint_name = "sklearn-rf-model" + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
sk_predictor = sk_estimator.deploy(
    initial_instance_count=1, instance_type="ml.m5.large", endpoint_name=sk_endpoint_name
)

INFO:sagemaker:Creating model with name: sagemaker-scikit-learn-2023-05-20-10-16-25-448
INFO:sagemaker:Creating endpoint-config with name sklearn-rf-model2023-05-20-10-16-25
INFO:sagemaker:Creating endpoint with name sklearn-rf-model2023-05-20-10-16-25


----!

After the endpoint has been completely deployed, it can be invoked using the [SageMaker Runtime Client](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime.html) (which is the method used in the code cell below) or [Scikit Learn Predictor](https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/sagemaker.sklearn.html#scikit-learn-predictor). If you plan to use the latter method, make sure to use a [Serializer](https://sagemaker.readthedocs.io/en/stable/api/inference/serializers.html) to serialize your data properly.

In [149]:
import json

client = sagemaker_session.sagemaker_runtime_client

# Define the input data in the desired format
input_data = {"Input": ["This is a disaster","Hello world"]}

# Convert the input data to JSON payload
payload = json.dumps(input_data)

# Invoke the endpoint to get the prediction
response = client.invoke_endpoint(
    EndpointName=sk_endpoint_name,
    ContentType='application/json',
    Body=payload
)

# Parse the prediction response
response_body = response['Body'].read().decode('utf-8')
prediction_result = json.loads(response_body)['Output']
print(response_body)


{"Output":[1,0],"Probabilities":[[0.03373283183249365,0.9662671681675064],[0.8086712977030178,0.19132870229698218]]}



## Cleanup
If the model and endpoint are no longer in use, they should be deleted to save costs and free up resources.

In [146]:
# sk_predictor.delete_model()
# sk_predictor.delete_endpoint()

In [147]:
# import boto3

# # Specify the endpoint name to delete
# endpoint_name = 'sklearn-rf-model2023-05-20-09-06-08'

# # Create a SageMaker client
# sagemaker_client = boto3.client('sagemaker')

# # Delete the endpoint
# sagemaker_client.delete_endpoint(EndpointName=endpoint_name)
