# Employee Attrition Prediction
## Notebook 03: Model Training using AWS SageMaker

This notebook covers:
- Loading processed datasets from Amazon S3
- Training an XGBoost classification model using SageMaker
- Saving trained model artifacts to S3

No data preprocessing or feature engineering is performed here.


Basic imports

In [1]:
import sagemaker
import boto3
from sagemaker.session import Session
from sagemaker.inputs import TrainingInput
from sagemaker.image_uris import retrieve

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


SageMaker Session Setup

In [2]:
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = sagemaker_session.default_bucket()
region = boto3.Session().region_name

print("Bucket:", bucket)
print("Region:", region)
print("Role:", role)

Bucket: sagemaker-us-east-1-952878272094
Region: us-east-1
Role: arn:aws:iam::952878272094:role/LabRole


Defining S3 Paths for Processed Data

In [3]:
processed_prefix = "employee-attrition/processed"

train_data_path = f"s3://{bucket}/{processed_prefix}/train.csv"
val_data_path   = f"s3://{bucket}/{processed_prefix}/validation.csv"

print("Train path:", train_data_path)
print("Validation path:", val_data_path)

Train path: s3://sagemaker-us-east-1-952878272094/employee-attrition/processed/train.csv
Validation path: s3://sagemaker-us-east-1-952878272094/employee-attrition/processed/validation.csv


Preparing SageMaker Training Inputs

In [4]:
train_input = TrainingInput(
    s3_data=train_data_path,
    content_type="text/csv"
)

validation_input = TrainingInput(
    s3_data=val_data_path,
    content_type="text/csv"
)

Getting XGBoost Container Image

In [5]:
xgb_image = retrieve(
    framework="xgboost",
    region=region,
    version="1.7-1"
)

print(xgb_image)

683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.7-1


Defining XGBoost Estimator

In [6]:
xgb_estimator = sagemaker.estimator.Estimator(
    image_uri=xgb_image,
    role=role,
    instance_count=1,
    instance_type="ml.m5.large",
    volume_size=30,
    max_run=3600,
    output_path=f"s3://{bucket}/employee-attrition/model-artifacts",
    sagemaker_session=sagemaker_session
)

Setting up Hyperparameters

In [7]:
xgb_estimator.set_hyperparameters(
    objective="binary:logistic",
    eval_metric="auc",
    num_round=200,
    max_depth=5,
    eta=0.1,
    subsample=0.8,
    colsample_bytree=0.8
)

Starting Training

In [8]:
xgb_estimator.fit(
    {
        "train": train_input,
        "validation": validation_input
    }
)

INFO:sagemaker:Creating training-job with name: sagemaker-xgboost-2026-02-19-21-45-42-632


2026-02-19 21:45:42 Starting - Starting the training job...
2026-02-19 21:46:08 Starting - Preparing the instances for training...
2026-02-19 21:46:30 Downloading - Downloading input data......
2026-02-19 21:47:21 Downloading - Downloading the training image......
2026-02-19 21:48:47 Training - Training image download completed. Training in progress.
  import pkg_resources[0m
[34m[2026-02-19 21:48:31.557 ip-10-2-250-160.ec2.internal:7 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None[0m
[34m[2026-02-19 21:48:31.638 ip-10-2-250-160.ec2.internal:7 INFO profiler_config_parser.py:111] User has disabled profiler.[0m
[34m[2026-02-19:21:48:31:INFO] Imported framework sagemaker_xgboost_container.training[0m
[34m[2026-02-19:21:48:31:INFO] Failed to parse hyperparameter eval_metric value auc to Json.[0m
[34mReturning the value itself[0m
[34m[2026-02-19:21:48:31:INFO] Failed to parse hyperparameter objective value binary:logistic to Json.[0m
[34mReturning the value itself[0m
[