GitHub Actions Lab 2 — Automated Model Training & Calibration

This Lab demonstrates how to use GitHub Actions to automate the full lifecycle of a machine-learning model — from training, evaluation, and versioning to probability calibration — using the UCI Bank Marketing dataset.

It follows an MLOps-style continuous-integration workflow, where each code push automatically retrains and calibrates the model, stores versioned artifacts, and logs metrics for reproducibility.

Problem Statement

Predict whether a bank client will subscribe to a term deposit (yes or no) after being contacted by a marketing campaign.

Goal:
Build a supervised machine-learning model that classifies customers into two groups:

1 → yes – customer subscribed
0 → no – customer did not subscribe

Type: Binary Classification
Algorithm Used: Random Forest Classifier
Metric Focus: Accuracy • F1 Score • Brier Score

📊 Dataset — UCI Bank Marketing

Source: UCI Machine Learning Repository – Bank Marketing
File Used: bank.csv
Rows: ≈ 4,521
Columns: 17

Feature	Description
`age`	Age of the client
`job`	Type of job (e.g., admin, technician)
`marital`	Marital status
`education`	Education level
`default`	Has credit in default?
`balance`	Average yearly balance in €
`housing`	Has a housing loan?
`loan`	Has a personal loan?
`contact`	Contact communication type
`day`, `month`	Last contact day and month
`duration`	Last contact duration (seconds)
`campaign`	Number of contacts during this campaign
`pdays`, `previous`, `poutcome`	Past campaign details
`y`	Target: Subscribed ( yes / no )

Project Structure

GitHub_Lab_LAB2/
├── .github/
│   └── workflows/
│       ├── model_retraining_on_push.yml
│       └── model_calibration_on_push.yml
├── src/
│   ├── train_model.py
│   ├── evaluate_model.py
│   └── calibrate.py
├── models/
│   ├── model_<timestamp>.pkl
│   └── calibrated_model_<timestamp>.pkl
├── metrics/
│   ├── metrics.txt
│   └── calibration_metrics.txt
├── requirements.txt
└── README.md

Local Pipeline Overview

1. Data Loading & Preprocessing

Automatically downloads & extracts the UCI Bank dataset
Label-encodes categorical features
Splits data into training and testing sets

2. Model Training

Trains a RandomForestClassifier(n_estimators=200)
Saves the trained model → /models/model_<timestamp>.pkl

3. Model Evaluation

Loads test data and computes Accuracy & F1 Score
Writes results → /metrics/metrics.txt

4. Model Calibration

Uses CalibratedClassifierCV(method='isotonic')
Compares Brier Score & F1 before vs after calibration
Saves calibrated model → /models/calibrated_model_<timestamp>.pkl

GitHub Actions Workflows

Model Retraining Workflow (`model_retraining_on_push.yml`)

Trigger: Automatically runs on every push to main branch

Steps:

Checkout repository
Set up Python environment
Install dependencies from requirements.txt
Run train_model.py to train the Random Forest model
Run evaluate_model.py to compute metrics
Commit new models and metrics back to the repository
Push updated artifacts to main

Example Output:

models/model_20250120_143022.pkl
metrics/metrics.txt

Model Calibration Workflow (`model_calibration_on_push.yml`)

Trigger: Automatically runs after model retraining completes

Steps:

Checkout repository with latest trained model
Set up Python environment
Install dependencies
Run calibrate.py to perform isotonic calibration
Compare Brier Score before and after calibration
Save calibrated model and calibration metrics
Commit and push calibrated artifacts

Example Output:

models/calibrated_model_20250120_143445.pkl
metrics/calibration_metrics.txt

Getting Started

Prerequisites

Python 3.8 or higher
Git
GitHub account (for Actions)

Installation

Clone the repository:

git clone https://github.com/yourusername/GitHub_Lab_LAB2.git
cd GitHub_Lab_LAB2

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Local Execution

Train the Model

python src/train_model.py

Evaluate the Model

python src/evaluate_model.py

Calibrate the Model

python src/calibrate.py

Dependencies

pandas>=1.3.0
numpy>=1.21.0
scikit-learn>=1.0.0
requests>=2.26.0

Metrics Tracked

Model Performance Metrics

Accuracy: Overall classification accuracy
F1 Score: Harmonic mean of precision and recall
Brier Score: Measures calibration quality (lower is better)

Before and After Calibration

=== Before Calibration ===
Brier Score = 0.0656
F1 = 0.4167
Accuracy = 0.9072

=== After Calibration ===
Brier Score = 0.0621
F1 = 0.4430
Accuracy = 0.9083

CI/CD Workflow Visualization

graph LR
    A[Code Push] --> B[Trigger Retraining Workflow]
    B --> C[Train Model]
    C --> D[Evaluate Model]
    D --> E[Commit Artifacts]
    E --> F[Trigger Calibration Workflow]
    F --> G[Calibrate Model]
    G --> H[Compare Metrics]
    H --> I[Commit Calibrated Model]
    I --> J[Complete]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GitHub Actions Lab 2 — Automated Model Training & Calibration

Problem Statement

📊 Dataset — UCI Bank Marketing

Project Structure

Local Pipeline Overview

1. Data Loading & Preprocessing

2. Model Training

3. Model Evaluation

4. Model Calibration

GitHub Actions Workflows

Model Retraining Workflow (`model_retraining_on_push.yml`)

Model Calibration Workflow (`model_calibration_on_push.yml`)

Getting Started

Prerequisites

Installation

Local Execution

Train the Model

Evaluate the Model

Calibrate the Model

Dependencies

Metrics Tracked

Model Performance Metrics

Before and After Calibration

CI/CD Workflow Visualization

References

About

Uh oh!

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
data		data
metrics		metrics
models		models
src		src
README.md		README.md
requirements.txt		requirements.txt

saumith/GitHub_Lab_LAB2

Folders and files

Latest commit

History

Repository files navigation

GitHub Actions Lab 2 — Automated Model Training & Calibration

Problem Statement

📊 Dataset — UCI Bank Marketing

Project Structure

Local Pipeline Overview

1. Data Loading & Preprocessing

2. Model Training

3. Model Evaluation

4. Model Calibration

GitHub Actions Workflows

Model Retraining Workflow (model_retraining_on_push.yml)

Model Calibration Workflow (model_calibration_on_push.yml)

Getting Started

Prerequisites

Installation

Local Execution

Train the Model

Evaluate the Model

Calibrate the Model

Dependencies

Metrics Tracked

Model Performance Metrics

Before and After Calibration

CI/CD Workflow Visualization

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Model Retraining Workflow (`model_retraining_on_push.yml`)

Model Calibration Workflow (`model_calibration_on_push.yml`)

Packages