# SageMaker Pipelines to automate the training to deployment processes.

1.  **Automated Training Pipeline**: A single SageMaker Pipeline will orchestrate data processing, training multiple models (one for each data version), evaluating them, and registering the best one.
2.  **Conditional Model Registration**: The pipeline will only register a model if its performance (accuracy in this case) on the test set meets a predefined threshold.
3.  **Automated Deployment**: The best model from the pipeline will be deployed to a SageMaker endpoint for real-time inference.

-----

#### Prerequisites

  * A running **SageMaker MLflow Tracking Server**. Use the existing one in SageMaker Studio Home Page.

-----

### 1\. Setup and Configuration

First, let's install the necessary libraries and configure our environment.

In [None]:
# The SageMaker Studio environment comes with most of these pre-installed.
# This cell ensures all dependencies are present.
!pip install -q boto3 sagemaker mlflow "scikit-learn>=1.0" "pandas>=1.2"

In [None]:
import sys
import subprocess

# Ensure MLflow is installed
try:
    import mlflow
    import sagemaker_mlflow
except ImportError:
    print("Installing MLflow...")
    subprocess.check_call([sys.executable, "-m", "pip", "install",  "boto3==1.37.1", "botocore==1.37.1", "s3transfer", "mlflow==2.22.0", "sagemaker-mlflow==0.1.0"])
    import mlflow
    import sagemaker_mlflow

In [None]:
pip show sagemaker_mlflow

In [None]:
import sagemaker
import boto3
import mlflow
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification
import os

# Setup SageMaker session
sess = sagemaker.Session()
sagemaker_client = boto3.client("sagemaker")
s3_client = boto3.client("s3")

# --- IMPORTANT: CONFIGURE THESE VARIABLES ---
s3_bucket = sess.default_bucket()
# ----------------------
# UPDATE THESE VARIABLES
bucket_name = 'sagemaker-iti112-common'  # e.g., 'my-company-sagemaker-bucket'
base_folder = '1234567a@nyp.edu.sg'      # e.g., 'users/my-name'
# ----------------------

sagemaker_session = sagemaker.Session()
s3_client = boto3.client('s3')

# Define the base path for our datasets if needed
data_path = f"s3://{bucket_name}/{base_folder}/mlflow-demo"

# Assuming you have your boto3 client and server name
tracking_server_name = "mlflow-server-1234567a"

try:
    response = sagemaker_client.describe_mlflow_tracking_server(
        TrackingServerName=tracking_server_name
    )
    tracking_server_arn = response['TrackingServerArn']
    print(f"Found MLflow Tracking Server ARN: {tracking_server_arn}")
except Exception as e:
    print(f"Could not find tracking server: {e}")
    tracking_server_arn = None

# ARN of your MLflow Tracking Server
# Find this in the SageMaker console or by running `aws sagemaker list-mlflow-tracking-servers`
mlflow_tracking_server_arn = tracking_server_arn

# IAM role for SageMaker execution
role = sagemaker.get_execution_role()

print(f"S3 Bucket: {data_path}")
print(f"SageMaker Role ARN: {role}")
print(f"MLflow Tracking Server ARN: {mlflow_tracking_server_arn}")

In [None]:
# Connect to the MLflow Tracking Server
# Set the MLflow tracking URI to your managed server
if tracking_server_arn:
    mlflow.set_tracking_uri(mlflow_tracking_server_arn)
    print("MLflow tracking URI set successfully.")

# Define an experiment name. If it doesn't exist, MLflow creates it.
experiment_name = "Customer-Churn-Prediction"
mlflow.set_experiment(experiment_name)

print(f"MLflow tracking URI set to: {mlflow.get_tracking_uri()}")
print(f"MLflow experiment set to: '{experiment_name}'")

-----

### 2\. Data Versioning Simulation

Reproducibility is key in MLOps. We'll simulate two versions of a dataset to see how MLflow can track experiments tied to specific data.

In [None]:
# Create and Upload Data Version 1
print("Creating data version 1...")
X_v1, y_v1 = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=42)
df_v1 = pd.DataFrame(X_v1, columns=[f'feature_{i}' for i in range(10)])
df_v1['target'] = y_v1

# Define S3 path for v1
s3_path_v1 = f"s3://{s3_bucket}/{base_folder}/mlops-demo/data/v1/data.csv"
data_v1_s3_uri = os.path.dirname(s3_path_v1)

# Upload to S3
print(f"Uploading data v1 to {s3_path_v1}")
df_v1.to_csv(s3_path_v1, index=False)

In [None]:
# Create and Upload Data Version 2 (a slightly modified version)
print("Creating data version 2 with more samples...")
X_v2, y_v2 = make_classification(n_samples=1500, n_features=10, n_informative=5, n_redundant=5, random_state=123) # More samples, different seed
df_v2 = pd.DataFrame(X_v2, columns=[f'feature_{i}' for i in range(10)])
df_v2['target'] = y_v2

# Define S3 path for v2
s3_path_v2 = f"s3://{s3_bucket}/{base_folder}/mlops-demo/data/v2/data.csv"
data_v2_s3_uri = os.path.dirname(s3_path_v2) # Log the directory URI

# Upload to S3
print(f"Uploading data v2 to {s3_path_v2}")
df_v2.to_csv(s3_path_v2, index=False)

-----

### 3\. Creating the SageMaker Pipeline

Now, we'll create the pipeline scripts that will be executed as steps in our SageMaker Pipeline.

#### 3.1. Preprocessing Script

This script will take the raw data, split it into training and testing sets, and save them back to S3.

In [None]:
%%writefile preprocess.py

import argparse
import pandas as pd
from sklearn.model_selection import train_test_split
import os

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--input-path", type=str, help="Directory containing data.csv")
    parser.add_argument("--output-train-path", type=str, help="Output directory for train.csv")
    parser.add_argument("--output-test-path", type=str, help="Output directory for test.csv")
    args = parser.parse_args()

    # Use provided paths or fall back to SageMaker defaults
    input_path = args.input_path or "/opt/ml/processing/input"
    output_train_path = args.output_train_path or "/opt/ml/processing/train"
    output_test_path = args.output_test_path or "/opt/ml/processing/test"

    input_file = os.path.join(input_path, "data.csv")
    print(f"Reading input file from {input_file}...")
    df = pd.read_csv(input_file)

    print("Splitting into train/test...")
    train, test = train_test_split(df, test_size=0.2, random_state=42)

    os.makedirs(output_train_path, exist_ok=True)
    os.makedirs(output_test_path, exist_ok=True)

    train_output = os.path.join(output_train_path, "train.csv")
    test_output = os.path.join(output_test_path, "test.csv")

    print(f"Saving train to {train_output}")
    train.to_csv(train_output, index=False)

    print(f"Saving test to {test_output}")
    test.to_csv(test_output, index=False)

    print("Preprocessing complete.")


#### 3.2. Training Script

This script will train a model on the preprocessed data and log the results to MLflow.

In [None]:
%%writefile requirements.txt
boto3==1.28.57
botocore==1.31.85

mlflow
sagemaker-mlflow
scikit-learn
pandas
joblib


In [None]:
%%writefile train.py

import sys
import subprocess

# # Ensure MLflow is installed
try:
    import mlflow
    import sagemaker_mlflow
except ImportError:
    print("Installing MLflow...")
    subprocess.check_call([sys.executable, "-m", "pip", "install",  "boto3==1.37.1", "botocore==1.37.1", "s3transfer", "mlflow==2.22.0", "sagemaker-mlflow==0.1.0"])
    import mlflow
    import sagemaker_mlflow
    
# import mlflow
# import sagemaker_mlflow
import mlflow.sklearn
import os
import argparse
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import joblib
import glob

parser = argparse.ArgumentParser()
parser.add_argument("--tracking_server_arn", type=str, required=True)
parser.add_argument("--experiment_name", type=str, default="Default")
parser.add_argument("--model_output_path", type=str, default="/opt/ml/model")
parser.add_argument("-C", "--C", type=float, default=0.5)
args, _ = parser.parse_known_args()

# Load training data
train_path = glob.glob("/opt/ml/input/data/train/*.csv")[0]
df = pd.read_csv(train_path)
X = df.drop("target", axis=1)
y = df["target"]

# Set up MLflow
mlflow.set_tracking_uri(args.tracking_server_arn)
mlflow.set_experiment(args.experiment_name)

with mlflow.start_run() as run:
    mlflow.log_param("C", args.C)
    model = LogisticRegression(C=args.C)
    model.fit(X, y)
    acc = accuracy_score(y, model.predict(X))
    mlflow.log_metric("accuracy", acc)

    mlflow.sklearn.log_model(sk_model=model, artifact_path="model")

    os.makedirs(args.model_output_path, exist_ok=True)
    joblib.dump(model, os.path.join(args.model_output_path, "model.joblib"))
    with open(os.path.join(args.model_output_path, "run_id.txt"), "w") as f:
        f.write(run.info.run_id)

    print(f"Training complete. Accuracy: {acc:.4f}")
    print(f"MLflow Run ID: {run.info.run_id}")


#### 3.3. Evaluation Script

This script evaluates the model and creates an evaluation report.

In [None]:
%%writefile evaluate.py
import argparse
import pandas as pd
from sklearn.metrics import accuracy_score
import joblib
import os
import json
import boto3
import tarfile

if __name__ == "__main__":
    # --- Parse Arguments ---
    parser = argparse.ArgumentParser()
    parser.add_argument("--model-path", type=str, required=True, help="Path to the directory containing the model.tar.gz file.")
    parser.add_argument("--test-path", type=str, required=True, help="Path to the directory containing test.csv.")
    parser.add_argument("--output-path", type=str, required=True, help="Path to save the evaluation.json report.")
    parser.add_argument("--model-package-group-name", type=str, required=True, help="Name of the SageMaker Model Package Group.")
    parser.add_argument("--region", type=str, required=True, help="The AWS region for creating the boto3 client.")
    args = parser.parse_args()

    # --- Extract and Load Model ---
    # SageMaker packages models in a .tar.gz file. We need to extract it first.
    model_archive_path = os.path.join(args.model_path, 'model.tar.gz')
    print(f"Extracting model from archive: {model_archive_path}")
    with tarfile.open(model_archive_path, "r:gz") as tar:
        tar.extractall(path=args.model_path)

    # Load the model using joblib
    model_file_path = os.path.join(args.model_path, "model.joblib")
    if not os.path.exists(model_file_path):
        raise FileNotFoundError(f"Model file 'model.joblib' not found after extraction in: {args.model_path}")
    
    print(f"Loading model from: {model_file_path}")
    model = joblib.load(model_file_path)

    # --- Prepare Data and Evaluate ---
    test_file_path = os.path.join(args.test_path, "test.csv")
    if not os.path.exists(test_file_path):
        raise FileNotFoundError(f"Test data not found: {test_file_path}")
    
    test_df = pd.read_csv(test_file_path)
    X_test = test_df.drop("target", axis=1)
    y_test = test_df["target"]
    
    print("Running predictions on the test dataset.")
    predictions = model.predict(X_test)
    accuracy = accuracy_score(y_test, predictions)
    report = {"accuracy": accuracy}
    print(f"Calculated accuracy: {accuracy:.4f}")

    # --- Check for Existing Baseline Model in SageMaker Model Registry ---
    print(f"Checking for baseline model in region: {args.region}")
    sagemaker_client = boto3.client("sagemaker", region_name=args.region)
    try:
        response = sagemaker_client.list_model_packages(
            ModelPackageGroupName=args.model_package_group_name,
            ModelApprovalStatus="Approved",
            SortBy="CreationTime",
            SortOrder="Descending",
            MaxResults=1,
        )
        # If the list is not empty, an approved model already exists
        report["baseline_exists"] = len(response["ModelPackageSummaryList"]) > 0
        if report["baseline_exists"]:
            print(f"An approved baseline model was found in '{args.model_package_group_name}'.")
        else:
             print(f"No approved baseline model was found in '{args.model_package_group_name}'.")

    except sagemaker_client.exceptions.ClientError as e:
        # If the ModelPackageGroup doesn't exist, there is no baseline
        if "ResourceNotFound" in str(e):
            report["baseline_exists"] = False
            print(f"Model Package Group '{args.model_package_group_name}' not found. Assuming no baseline exists.")
        else:
            raise

    # --- Write Final Report ---
    os.makedirs(args.output_path, exist_ok=True)
    report_path = os.path.join(args.output_path, "evaluation.json")
    with open(report_path, "w") as f:
        json.dump(report, f, indent=4)
        
    print(f"✅ Evaluation complete. Report written to: {report_path}")
    print("Evaluation Report:")
    print(json.dumps(report, indent=4))

### 3.4. Pipeline Definition

Now, we'll define the SageMaker Pipeline using the scripts we just created.

In [None]:
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import ProcessingStep, TrainingStep, TrainingInput
from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput
from sagemaker.workflow.properties import PropertyFile
from sagemaker.sklearn.estimator import SKLearn
from sagemaker.workflow.step_collections import RegisterModel
from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo
from sagemaker.workflow.conditions import ConditionNot
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.conditions import ConditionEquals
from sagemaker.workflow.functions import JsonGet
from sagemaker.workflow.functions import Join
from sagemaker.workflow.parameters import ParameterFloat, ParameterString
from sagemaker.model_metrics import ModelMetrics, FileSource

# Parameters
model_package_group_name = "ChurnPredictorModels"
processing_instance_type = "ml.t3.medium"
training_instance_type = "ml.m5.large"
experiment_name_param = ParameterString(name="ExperimentName", default_value="Customer-Churn-Prediction")
accuracy_threshold_param = ParameterFloat(name="AccuracyThreshold", default_value=0.85)

preprocessor = ScriptProcessor(
    image_uri=sagemaker.image_uris.retrieve("sklearn", sess.boto_region_name, "1.2-1"),
    command=[
        "python3",
    ],
    instance_type=processing_instance_type,
    instance_count=1,
    base_job_name="preprocess-data",
    role=role,
)

step_preprocess = ProcessingStep(
    name="PreprocessData",
    processor=preprocessor,
    inputs=[ProcessingInput(source=data_v2_s3_uri, destination="/opt/ml/processing/input")],
    outputs=[
        ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
        ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
    ],
    code="preprocess.py",
)

# Training Step
sklearn_estimator = SKLearn(
    entry_point="train.py", 
    framework_version="1.2-1",
    instance_type=training_instance_type,
    role=role,
    hyperparameters={
        "tracking_server_arn": mlflow_tracking_server_arn,
        "experiment_name": experiment_name_param,
        "C": 0.5,
        "model_output_path": "/opt/ml/model",
    },
    py_version="py3",
    requirements="requirements.txt" 
)

step_train = TrainingStep(
    name="TrainModel",
    estimator=sklearn_estimator,
    inputs={
        "train": TrainingInput(
            s3_data=step_preprocess.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
            content_type="text/csv",
        )
    },
)

# Evaluation Step
evaluation_processor = ScriptProcessor(
    image_uri=sagemaker.image_uris.retrieve("sklearn", sess.boto_region_name, "1.2-1"),
    command=['python3'],
    instance_type=processing_instance_type,
    instance_count=1,
    base_job_name="evaluate-model",
    role=role,
)

evaluation_report = PropertyFile(
    name="EvaluationReport", output_name="evaluation", path="evaluation.json"
)

step_eval = ProcessingStep(
    name="EvaluateModel",
    processor=evaluation_processor,
    inputs=[
        ProcessingInput(
            source=step_train.properties.ModelArtifacts.S3ModelArtifacts,
            destination="/opt/ml/processing/model",
        ),
        ProcessingInput(
            source=step_preprocess.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
            destination="/opt/ml/processing/test",
        ),
    ],
    outputs=[ProcessingOutput(output_name="evaluation", source="/opt/ml/processing/evaluation")],
    code="evaluate.py",  # SageMaker will handle uploading and running this script
    job_arguments=[  # Pass arguments here instead of in command
        "--model-path", "/opt/ml/processing/model",
        "--test-path", "/opt/ml/processing/test",
        "--output-path", "/opt/ml/processing/evaluation",
        "--model-package-group-name", model_package_group_name,
        "--region", "ap-southeast-1",
    ],
    property_files=[evaluation_report],
)


model_metrics_report = ModelMetrics(
    model_statistics=FileSource(
        s3_uri=step_eval.properties.ProcessingOutputConfig.Outputs["evaluation"].S3Output.S3Uri,
        content_type="application/json"
    )
)


# RegisterModel step (always defined, but executed conditionally)
step_register_new = RegisterModel(
    name="RegisterNewModel",
    estimator=sklearn_estimator,
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    content_types=["application/json"],
    response_types=["application/json"],
    inference_instances=["ml.t2.medium"],
    transform_instances=["ml.m5.large"],
    model_package_group_name="ChurnPredictorModels",
    model_metrics=model_metrics_report,
    approval_status="PendingManualApproval",
)

step_register_better_model = RegisterModel(
    name="RegisterBetterModel",
    estimator=sklearn_estimator,
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    content_types=["application/json"],
    response_types=["application/json"],
    inference_instances=["ml.t2.medium"],
    transform_instances=["ml.m5.large"],
    model_package_group_name="ChurnPredictorModels",
    model_metrics=model_metrics_report,
    approval_status="PendingManualApproval",
)


# Conditions: check accuracy > threshold OR no model exists
cond_accuracy = ConditionGreaterThanOrEqualTo(
    left=JsonGet(
        step_name=step_eval.name,
        property_file=evaluation_report,
        json_path="accuracy"
    ),
    right=accuracy_threshold_param
)

cond_no_registered = ConditionEquals(
    left=JsonGet(
        step_name=step_eval.name,
        property_file=evaluation_report,
        json_path="baseline_exists" # Check the key added to the report
    ),
    right=False # Condition is TRUE if baseline_exists is False
)

# Outer step: Checks for existence of registered model first
step_cond_accuracy = ConditionStep(
    name="CheckAccuracy",
    conditions=[cond_accuracy],
    if_steps=[step_register_better_model], # Register model if accuracy is high
    else_steps=[],
)

step_cond_no_registered = ConditionStep(
    name="CheckIfModelExists",
    conditions=[cond_no_registered],
    if_steps=[step_register_new], # Register model if no baseline exists
    else_steps=[step_cond_accuracy], # Do nothing if a model exists and accuracy was low
)

# Define Pipeline
pipeline = Pipeline(
    name="ChurnPredictionPipeline",
    parameters=[experiment_name_param, accuracy_threshold_param],
    steps=[step_preprocess, step_train, step_eval, step_cond_no_registered] # Use the 'no registered model' check as the primary condition step
)


pipeline.upsert(role_arn=role)
execution = pipeline.start()


### 4\. Automated Deployment with a Second Pipeline

Now, let's create a separate pipeline that is triggered by a new model registration. This pipeline will deploy the model to a SageMaker endpoint.

#### 4.1. Deployment Script

This script will take the registered model and deploy it.

In [None]:
%%writefile deploy.py
import subprocess
import sys

# --- Install required packages ---
def install(package):
    subprocess.check_call([sys.executable, "-m", "pip", "install", "boto3==1.28.57", "botocore==1.31.57", "numpy==1.24.1", "sagemaker" ])

# Ensure sagemaker SDK is installed before importing
try:
    import sagemaker
except ImportError:
    print("sagemaker SDK not found. Installing now...")
    install("sagemaker")
    import sagemaker

import argparse
import sagemaker
import boto3
from sagemaker.model import ModelPackage

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    # Accept the registered model's ARN instead of the S3 data path
    parser.add_argument("--model-package-arn", type=str, required=True)
    parser.add_argument("--role", type=str, required=True)
    parser.add_argument("--endpoint-name", type=str, required=True)
    parser.add_argument("--region", type=str, required=True)
    args = parser.parse_args()

    boto_session = boto3.Session(region_name=args.region)
    sagemaker_session = sagemaker.Session(boto_session=boto_session)

    # Create a SageMaker Model object directly from the Model Package ARN
    model = ModelPackage(
        model_package_arn=args.model_package_arn,
        role=args.role,
        sagemaker_session=sagemaker_session,
    )

    # Deploy the model to an endpoint
    print(f"Deploying registered model from ARN to endpoint: {args.endpoint_name}")
    model.deploy(
        initial_instance_count=1,
        instance_type="ml.t2.medium",
        endpoint_name=args.endpoint_name,
        # Update endpoint if it already exists
        update_endpoint=True
    )
    print("Deployment complete.")


#### 4.2. Deployment Pipeline Definition

This pipeline will be triggered when a new model is registered.

In [None]:
from sagemaker.workflow.steps import ProcessingStep
from sagemaker.processing import ScriptProcessor
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.parameters import ParameterString
import sagemaker

# Define Parameters for the deployment pipeline
# This will be provided by the EventBridge trigger
model_package_arn_param = ParameterString(name="ModelPackageArn", default_value="")
role_param = ParameterString(name="ExecutionRole", default_value=role)
endpoint_name_param = ParameterString(name="EndpointName", default_value="churn-predictor-endpoint")

# Create a ScriptProcessor for deployment
# Using a more recent scikit-learn version is generally a good idea
deploy_processor = ScriptProcessor(
    image_uri=sagemaker.image_uris.retrieve("sklearn", sess.boto_region_name, version="1.2-1"),
    command=["python3"],
    instance_type="ml.t3.medium",
    instance_count=1,
    role=role_param,
    base_job_name="deploy-registered-model"
)

# Define the deployment step that takes the model ARN as an argument
step_deploy = ProcessingStep(
    name="DeployRegisteredModel",
    processor=deploy_processor,
    code="deploy.py",
    job_arguments=[
        "--model-package-arn", step_register.properties.ModelPackageArn,
        "--role", role_param,
        "--endpoint-name", endpoint_name_param,
        "--region", "ap-southeast-1" 
    ]
)

# Define the independent deployment pipeline
deploy_pipeline = Pipeline(
    name="DeployChurnModelPipeline",
    parameters=[model_package_arn_param, role_param, endpoint_name_param],
    steps=[step_deploy]
)

# Create or update the pipeline definition
# Capture the response which contains the ARN
response = deploy_pipeline.upsert(role_arn=role)

# Extract the ARN from the response dictionary
pipeline_arn = response['PipelineArn']

print(f"Deployment pipeline ARN: {pipeline_arn}")

In [None]:
import boto3
import json

# Initialize the EventBridge client
events_client = boto3.client("events")

# Define the event pattern to listen for
# This pattern triggers when a model package in your group has its status changed to "Approved"
event_pattern = {
    "source": ["aws.sagemaker"],
    "detail-type": ["SageMaker Model Package State Change"],
    "detail": {
        "ModelPackageGroupName": [model_package_group_name], # From cell 10
        "ModelApprovalStatus": ["Approved"]
    }
}

# Define the target for the rule (our deployment pipeline)
# We need to map the event's detail to the pipeline's parameters
target = {
    "Id": "DeployChurnModelPipelineTarget",
    "Arn": pipeline_arn, # The ARN of the pipeline we just created
    "RoleArn": role, # The execution role for the pipeline
    "SageMakerPipelineParameters": {
        "PipelineParameterList": [
            {
                # Map the ARN from the event to the pipeline's "ModelPackageArn" parameter
                "Name": "ModelPackageArn",
                "Value": "$.detail.ModelPackageArn"
            }
        ]
    }
}

# Create or update the EventBridge rule
try:
    username_lower = "1234567a@nyp.edu.sg".lower().replace("@", "-")
    rule_name = f"{username_lower}-TriggerChurnDeploymentOnApproval"
    print(f"Creating or updating EventBridge rule: {rule_name}")
    response = events_client.put_rule(
        Name=rule_name,
        EventPattern=json.dumps(event_pattern),
        State="ENABLED",
        Description="Triggers the SageMaker pipeline to deploy a churn model upon approval."
    )
    
    # Add the pipeline as a target for the rule
    events_client.put_targets(Rule=rule_name, Targets=[target])
    print("EventBridge rule created successfully!")
    print("Now, when a model is approved in the Model Registry, the deployment pipeline will trigger automatically.")

except Exception as e:
    print(f"Error creating rule: {e}")

### View Cloudwatch Logs

You can view the cloudwatch logs. Here is an example for the logs of a previous endpoint.

In [None]:
import boto3

# Enter the name of your SageMaker endpoint
endpoint_name = "sklearn-churn-predictor-v3"

# The log group is created based on the endpoint name
log_group_name = f"/aws/sagemaker/Endpoints/{endpoint_name}"

# Create a CloudWatch Logs client
logs_client = boto3.client("logs")

print(f"Searching for logs in: {log_group_name}\n")

try:
    # Find all log streams in the log group, ordered by the most recent
    response = logs_client.describe_log_streams(
        logGroupName=log_group_name,
        orderBy='LastEventTime',
        descending=True
    )

    log_streams = response.get("logStreams", [])

    if not log_streams:
        print("No log streams found. The endpoint might not have processed any requests yet.")
    
    # Loop through each stream and print its recent log events
    for stream in log_streams:
        stream_name = stream['logStreamName']
        print(f"--- Logs from stream: {stream_name} ---")

        # Get log events from the stream
        log_events = logs_client.get_log_events(
            logGroupName=log_group_name,
            logStreamName=stream_name,
            startFromHead=False,  # False gets recent logs first
            limit=50  # Get up to 50 recent log events
        )
        
        # Print events in chronological order
        for event in reversed(log_events.get("events", [])):
            print(event['message'].strip())
        
        print("-" * (len(stream_name) + 24), "\n")

except logs_client.exceptions.ResourceNotFoundException:
    print(f"Error: Log group '{log_group_name}' was not found.")
    print("Please check the endpoint name and ensure it has been invoked.")
except Exception as e:
    print(f"An error occurred: {e}")

You can trigger this deployment pipeline manually or set up an EventBridge rule to trigger it automatically whenever a new model version is added to the "ChurnPredictorModels" model package group.

### Cleanup

To avoid incurring further charges, you should delete the resources you've created.

In [None]:
# # Delete the SageMaker endpoint
# predictor.delete_endpoint()

# # Delete the SageMaker Pipelines
# pipeline.delete()
# deploy_pipeline.delete()

# # Delete the MLflow Tracking Server from the SageMaker console.