In [None]:
!pip install pycaret
!pip install python-dotenv

<h3>1. Introduction </h3>
This template notebook will give users (data scientists and data engineers) the opportunities to train and deploy regression machine learning models with ease, without having to write all of the code from scratch. <br> 
There are a few requirements for the user anyway, such as;
<li>the data location</li> 
<li>the target (dependent) variable in your dataset and</li>
<li>what kind of machine learning algorithm you will be performing. </li>

All of these values will be entered in terraform and will be automatically applied in the notebook. 

<h3>2. Imports</h3>

The libraries that are required for this model notebook are imported below 

In [None]:
import boto3, os, tarfile
from sagemaker import get_execution_role
from dotenv import load_dotenv
from load_data import load_data
from split_data import split_data
import importlib
from save_model_to_s3 import save_model_to_s3
from deploy_model_endpoint import deploy_model

<h3>3. Loading Data</h3>
Here the user is required to specify the location of the data that they will like to use for prediction. An helper function is used to load the data from S3. 

<em>Note: Your data needs to be an s3 bucket.</em>

In [None]:
# Variables Setup Stage
load_dotenv(".env")
role = get_execution_role()

# Env variables
data_location_s3 = os.getenv("data_location_s3")
algorithm_choice = os.getenv("algorithm_choice")
target = os.getenv("target")
endpoint_name = os.getenv("endpoint_name")
model_name = os.getenv("model_name")
data_location = 's3://{}'.format(data_location_s3)
# pycaret_ecr_name = os.getenv("pycaret_ecr_name")
# instance_type = os.getenv("instance_type")

# print(data_location_s3, algorithm_choice, target, endpoint_name, model_name, data_location, pycaret_ecr_name, instance_type)

<h3>4. Read and display a sample of data</h3>

In [None]:
# Load data from S3
df = load_data(data_location)
df.head() 

## Importing Pycaret

<h3>5. Data Exploration</h3>

In [None]:
# Split and shuffle data
train_data, test_data = split_data(df, shuffle=True)
print(train_data, test_data)

In [None]:
# Import Pycaret library depending on the algorithm choice
pycaret = importlib.import_module(f"pycaret.{algorithm_choice}")

In [None]:
# Initialize data in PyCaret with all the defined parameters
pycaret.setup(data=train_data, target=target, session_id=123)

<h3>6. Feature Engineering and Model Training</h3>

Here we are using the pycaret automl tool to train the model. The automl tool tries a number of machine learning algorithms depending on the type of machine learning problem you are trying to solve <br>
(regression, classification or time series). The automl tool then selects the best model based on the accuracy metrics of the tried models. 

In [None]:
# Train and evaluate the performance of all estimators available in the model library using cross-validation.
bestModel = pycaret.compare_models()

<h3>7. Model Evaluation: </h3>

Here we evaluate the performance of the best model, getting some visual representation of hyperparameters, features and other important details about the selected model.


In [None]:
# Evaluate model: Display UI analyzing Hyperparameters, Confusion Matrix, Class Report, etc.
pycaret.evaluate_model(bestModel)

<h3>8. Saving Model for future predition</h3>

Here we are using a function that saves the model to s3. 

In [None]:
# Finalise model with validation data
final_model = pycaret.finalize_model(bestModel)


In [None]:
# save model locally
pycaret.save_model(final_model, model_name)

In [None]:
# Upload model to s3
save_model_to_s3(model_name, f'{model_name}-model', final_model)

<h3>9. Deploying the model endpoints</h3> 

Here we use a function that creates the model endpoint in sagemaker. 

In [None]:
# deploy model to sagemaker endpoint
deploy_model(model_name, pycaret_ecr_name, instance_type, endpoint_name, role)

# Clean up stage
## Remove Endpoint and Endpoint Config

In [None]:
# Create a low-level SageMaker service client.
my_region = boto3.session.Session().region_name
sagemaker_client = boto3.client('sagemaker', region_name=my_region)

# Delete endpoint
sagemaker_client.delete_endpoint(EndpointName=endpoint_name)

# Delete endpoint configuration
sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_name)