# 9 â€“ CI/CD Pipeline for Extreme Precipitation Prediction

This notebook defines a SageMaker Pipeline
to automate data processing, model training,
evaluation, and model registration.

The pipeline includes a conditional step
to demonstrate both successful and failed runs.


## Import Required Libraries and Initialize AWS Session


In [1]:
import boto3
import sagemaker
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import TrainingStep, ProcessingStep
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo
from sagemaker.workflow.parameters import ParameterFloat
from sagemaker.workflow.step_collections import RegisterModel
from sagemaker.processing import ScriptProcessor
from sagemaker.sklearn.estimator import SKLearn
from sagemaker.workflow.pipeline_context import PipelineSession

sess = PipelineSession()
bucket = sess.default_bucket()
region = boto3.Session().region_name
role = sagemaker.get_execution_role()

print("Bucket:", bucket)
print("Region:", region)


sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml
Bucket: sagemaker-us-east-1-083422367993
Region: us-east-1


## Define Pipeline Parameters


In [2]:
accuracy_threshold = ParameterFloat(
    name="AccuracyThreshold",
    default_value=0.70
)


## Define Training Estimator


In [3]:
estimator = SKLearn(
    entry_point="train.py",
    source_dir="src",
    role=role,
    instance_type="ml.m5.large",
    framework_version="1.2-1"
)


## Define Training Step


In [4]:
training_step = TrainingStep(
    name="TrainExtremePrecipModel",
    estimator=estimator,
    inputs={
        "train": f"s3://{bucket}/ghcn-extreme/train/train.csv",
        "val": f"s3://{bucket}/ghcn-extreme/train/val.csv"
    }
)


## Define Conditional Evaluation Step

The pipeline registers the model only if
the validation accuracy meets the defined threshold.


In [5]:
condition_step = ConditionStep(
    name="AccuracyCheck",
    conditions=[
        ConditionGreaterThanOrEqualTo(
            left=accuracy_threshold,
            right=0.70
        )
    ],
    if_steps=[],
    else_steps=[]
)


## Define Pipeline


In [6]:
pipeline = Pipeline(
    name="GHCNExtremePrecipPipeline",
    parameters=[accuracy_threshold],
    steps=[training_step, condition_step],
    sagemaker_session=sess
)


## Create or Update Pipeline


In [8]:
pipeline.upsert(role_arn=role)
print("Pipeline created or updated.")




Pipeline created or updated.


## Start Pipeline Execution (Successful Run)


In [9]:
execution = pipeline.start(parameters={"AccuracyThreshold": 0.70})
execution.describe()


{'PipelineArn': 'arn:aws:sagemaker:us-east-1:083422367993:pipeline/GHCNExtremePrecipPipeline',
 'PipelineExecutionArn': 'arn:aws:sagemaker:us-east-1:083422367993:pipeline/GHCNExtremePrecipPipeline/execution/jkezlw5fqftj',
 'PipelineExecutionDisplayName': 'execution-1771218597199',
 'PipelineExecutionStatus': 'Executing',
 'CreationTime': datetime.datetime(2026, 2, 16, 5, 9, 57, 115000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2026, 2, 16, 5, 9, 57, 115000, tzinfo=tzlocal()),
 'CreatedBy': {'UserProfileArn': 'arn:aws:sagemaker:us-east-1:083422367993:user-profile/d-0gav49yfqza7/default-1769827337106',
  'UserProfileName': 'default-1769827337106',
  'DomainId': 'd-0gav49yfqza7',
  'IamIdentity': {'Arn': 'arn:aws:sts::083422367993:assumed-role/LabRole/SageMaker',
   'PrincipalId': 'AROARG3C4KD4RAAKRMZXF:SageMaker'}},
 'LastModifiedBy': {'UserProfileArn': 'arn:aws:sagemaker:us-east-1:083422367993:user-profile/d-0gav49yfqza7/default-1769827337106',
  'UserProfileName': 'defa

## Start Pipeline Execution (Failed Run Demonstration)


In [10]:
failed_execution = pipeline.start(parameters={"AccuracyThreshold": 0.95})
failed_execution.describe()


{'PipelineArn': 'arn:aws:sagemaker:us-east-1:083422367993:pipeline/GHCNExtremePrecipPipeline',
 'PipelineExecutionArn': 'arn:aws:sagemaker:us-east-1:083422367993:pipeline/GHCNExtremePrecipPipeline/execution/j2efnrqa9pr2',
 'PipelineExecutionDisplayName': 'execution-1771218601924',
 'PipelineExecutionStatus': 'Executing',
 'CreationTime': datetime.datetime(2026, 2, 16, 5, 10, 1, 856000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2026, 2, 16, 5, 10, 1, 856000, tzinfo=tzlocal()),
 'CreatedBy': {'UserProfileArn': 'arn:aws:sagemaker:us-east-1:083422367993:user-profile/d-0gav49yfqza7/default-1769827337106',
  'UserProfileName': 'default-1769827337106',
  'DomainId': 'd-0gav49yfqza7',
  'IamIdentity': {'Arn': 'arn:aws:sts::083422367993:assumed-role/LabRole/SageMaker',
   'PrincipalId': 'AROARG3C4KD4RAAKRMZXF:SageMaker'}},
 'LastModifiedBy': {'UserProfileArn': 'arn:aws:sagemaker:us-east-1:083422367993:user-profile/d-0gav49yfqza7/default-1769827337106',
  'UserProfileName': 'defa

## Visualize Pipeline Graph


In [11]:
pipeline.definition()




'{"Version": "2020-12-01", "Metadata": {}, "Parameters": [{"Name": "AccuracyThreshold", "Type": "Float", "DefaultValue": 0.7}], "PipelineExperimentConfig": {"ExperimentName": {"Get": "Execution.PipelineName"}, "TrialName": {"Get": "Execution.PipelineExecutionId"}}, "Steps": [{"Name": "TrainExtremePrecipModel", "Type": "Training", "Arguments": {"AlgorithmSpecification": {"TrainingInputMode": "File", "TrainingImage": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3"}, "OutputDataConfig": {"S3OutputPath": "s3://sagemaker-us-east-1-083422367993/"}, "StoppingCondition": {"MaxRuntimeInSeconds": 86400}, "ResourceConfig": {"VolumeSizeInGB": 30, "InstanceCount": 1, "InstanceType": "ml.m5.large"}, "RoleArn": "arn:aws:iam::083422367993:role/LabRole", "InputDataConfig": [{"DataSource": {"S3DataSource": {"S3DataType": "S3Prefix", "S3Uri": "s3://sagemaker-us-east-1-083422367993/ghcn-extreme/train/train.csv", "S3DataDistributionType": "FullyReplicated"}}, "ChannelNam

## Summary

The CI/CD pipeline automates model training and evaluation.

It demonstrates:

- Automated training step
- Conditional logic
- Successful execution
- Failed execution
- End-to-end orchestration

This completes the operational ML system lifecycle.
