In [1]:
import os
os.chdir("/content/drive/MyDrive/Books To Read/ML Engineering Notebook+Docs")

%pwd

'/content/drive/MyDrive/Books To Read/ML Engineering Notebook+Docs'

In [2]:
!ls

'CI Cd with github actions for MLops.ipynb'   mlops-cicd-lab
'MLflow tutorial doc.ipynb'


# **The MLOps Project Structure & Unit Testing**

1. **The Standard MLOps Directory Structure**
```text
my-mlops-project/
‚îú‚îÄ‚îÄ .github/              # Where GitHub Actions live (we will touch this later)
‚îú‚îÄ‚îÄ data/                 # Local data (gitignored!)
‚îú‚îÄ‚îÄ src/                  # Source code (The "Software")
‚îÇ   ‚îú‚îÄ‚îÄ __init__.py
‚îÇ   ‚îú‚îÄ‚îÄ preprocessing.py  # Functions to clean data
‚îÇ   ‚îú‚îÄ‚îÄ train.py          # Functions to train models
‚îÇ   ‚îî‚îÄ‚îÄ predict.py        # Inference logic
‚îú‚îÄ‚îÄ tests/                # Automated tests
‚îÇ   ‚îú‚îÄ‚îÄ __init__.py
‚îÇ   ‚îî‚îÄ‚îÄ test_preprocessing.py
‚îú‚îÄ‚îÄ requirements.txt      # Dependencies
‚îî‚îÄ‚îÄ README.md
```

**Step 1: Setup and Your First Test**
-
Create a new folder `mlops-cicd-lab` and set up the structure:

In [3]:
!mkdir mlops-cicd-lab

!ls

mkdir: cannot create directory ‚Äòmlops-cicd-lab‚Äô: File exists
'MLflow tutorial doc.ipynb'   mlops-cicd-lab   Untitled0.ipynb


In [3]:
os.chdir("mlops-cicd-lab")
%pwd

'/content/drive/MyDrive/Books To Read/ML Engineering Notebook+Docs/mlops-cicd-lab'

In [5]:
!mkdir src tests

mkdir: cannot create directory ‚Äòsrc‚Äô: File exists
mkdir: cannot create directory ‚Äòtests‚Äô: File exists


In [6]:
!ls

requirements.txt  src  tests


In [7]:
!touch src/__init__.py tests/__init__.py

**Step 2: Create a Logic File**
-
Create `src/preprocessing.py`. We will write a simple function that cleans data.

In [8]:
%%writefile src/preprocessing.py
# src/preprocessing.py
import pandas as pd

def clean_data(df: pd.DataFrame) -> pd.DataFrame:
    """
    Removes rows with missing values and resets index.
    """
    df_clean = df.dropna().reset_index(drop=True)
    return df_clean

def normalize_column(df: pd.DataFrame, col_name: str) -> pd.DataFrame:
    """Divides a column by its maximum value."""
    # Avoid Division by Zero edge case in real life, but kept simple for now
    if df[col_name].max() != 0:
        df[col_name] = df[col_name] / df[col_name].max()
    return df

Overwriting src/preprocessing.py


**Step 3: Create a Test File**
-
Create `tests/test_preprocessing.py`. We will test if clean_data actually removes NaNs.

In [13]:
%%writefile tests/test_preprocessing.py
# tests/test_preprocessing.py
import pandas as pd
import pytest
from src.preprocessing import clean_data, normalize_column

def test_clean_data_removes_nulls():
    # 1. Arrange: Create dummy dirty data
    raw_data = pd.DataFrame({
        "feature1": [1.0, 2.0, None, 4.0],
        "feature2": ["a", "b", "c", None]
    })

    # 2. Act: Apply the function
    clean_df = clean_data(raw_data)

    # 3. Assert: Check expectations
    # We expect 2 rows to be removed (index 2 and 3)
    assert len(clean_df) == 2
    assert clean_df.isnull().sum().sum() == 0

def test_normalize_column_max_is_one():
    # 1. Arrange
    df = pd.DataFrame({'test_col': [10, 20, 50, 25]})

    # 2. Act
    df_norm = normalize_column(df, 'test_col')

    # 3. Assert
    # The max value (50) should become 1.0
    assert df_norm['test_col'].max() == 1.0
    # The min value (10) should become 0.2
    assert df_norm['test_col'].min() == 0.2

Overwriting tests/test_preprocessing.py


**Step 4: Install Dependencies & Run**
-
You need pandas and pytest.

In [10]:
%%writefile requirements.txt
pandas
pytest

Overwriting requirements.txt


In [11]:
!pip install -r requirements.txt



run the test command:

In [14]:
!pytest

platform linux -- Python 3.12.12, pytest-8.4.2, pluggy-1.6.0
rootdir: /content/drive/MyDrive/Books To Read/ML Engineering Notebook+Docs/mlops-cicd-lab
plugins: langsmith-0.4.47, typeguard-4.4.4, anyio-4.11.0
collected 2 items                                                              [0m

tests/test_preprocessing.py [32m.[0m[32m.[0m[32m                                           [100%][0m



# **The CI Pipeline (Continuous Integration)**

**The Concept**:

> CI (Continuous Integration) means: "Every time I save/push code, a robot runs the tests for me."
If the robot says "Fail," I am not allowed to merge the code.

**Step 1: Git Initialization**
-
Initialize git in your folder:


In [15]:
%pwd

'/content/drive/MyDrive/Books To Read/ML Engineering Notebook+Docs/mlops-cicd-lab'

In [16]:
!git init

[33mhint: Using 'master' as the name for the initial branch. This default branch name[m
[33mhint: is subject to change. To configure the initial branch name to use in all[m
[33mhint: [m
[33mhint: 	git config --global init.defaultBranch <name>[m
[33mhint: [m
[33mhint: Names commonly chosen instead of 'master' are 'main', 'trunk' and[m
[33mhint: 'development'. The just-created branch can be renamed via this command:[m
[33mhint: [m
[33mhint: 	git branch -m <name>[m
Initialized empty Git repository in /content/drive/MyDrive/Books To Read/ML Engineering Notebook+Docs/mlops-cicd-lab/.git/


Create a .gitignore file (Crucial! Never push junk files):

In [17]:
!touch .gitignore

Add this content to .gitignore:

In [18]:
%%writefile .gitignore
__pycache__/
*.pyc
.pytest_cache/
.ipynb_checkpoints/
venv/
.env
data/

Overwriting .gitignore


**Step 2: Defining the GitHub Action**
-
GitHub looks for YAML files in `.github/workflows/`.

Create the directory:

In [19]:
!mkdir -p .github/workflows

Create a file named `ci-pipeline.yml`:

In [20]:
!touch .github/workflows/ci-pipeline.yml

**Step 3: Writing the Workflow**
-
Paste this into `ci-pipeline.yml`. This is your first Robot.

In [21]:
%%writefile .github/workflows/ci-pipeline.yml
name: CI Pipeline

# Trigger: Run this pipeline when code is pushed to the main branch
on:
  push:
    branches: ["main"]
  pull_request:
    branches: ["main"]

# Job: The actual set of tasks
jobs:
  build-and-test:
    runs-on: ubuntu-latest  # The robot is a Virtual Machine (VM) running Linux

    steps:
    # 1. Check out the code from the repo onto the VM
    - name: Checkout code
      uses: actions/checkout@v3

    # 2. Install Python 3.9 on the VM
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.9'

    # 3. Install dependencies
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt

    # 4. Run the tests
    - name: Run Unit Tests
      run: pytest

Overwriting .github/workflows/ci-pipeline.yml


**Step 4: Activate the Robot**

1. Create a **New Repository** on your GitHub account (name it mlops-cicd-lab).

2. Push your local code to this remote repository:
```bash
git add .
git commit -m "Initial commit with tests and CI pipeline"
git branch -M main
git remote add origin https://github.com/<YOUR_USERNAME>/mlops-cicd-lab.git
git push -u origin main
```

3. Go to your GitHub Repository page.
4. Click on the "Actions" tab.

In [22]:
%pwd

'/content/drive/MyDrive/Books To Read/ML Engineering Notebook+Docs/mlops-cicd-lab'

In [23]:
!git add .

In [24]:
!git status

On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	[32mnew file:   .github/workflows/ci-pipeline.yml[m
	[32mnew file:   .gitignore[m
	[32mnew file:   requirements.txt[m
	[32mnew file:   src/__init__.py[m
	[32mnew file:   src/preprocessing.py[m
	[32mnew file:   tests/__init__.py[m
	[32mnew file:   tests/test_preprocessing.py[m



Create a PAT token on GitHub:
- Go to: **GitHub** ‚Üí **Settings** ‚Üí **Developer settings** ‚Üí **Personal access tokens** ‚Üí **Fine-grained tokens**
- Create a token with permissions:

1. Contents ‚Üí Read & Write
2. Metadata ‚Üí Read
3. Actions ‚Üí Read & Write
4. Workflow ‚Üí Read & Write

In [11]:
from google.colab import userdata
EMAIL = userdata.get('EMAIL')
GITHUB_USER_NAME = userdata.get('GITHUB_USER_NAME')
TOKEN = userdata.get('GITHUB_TOKEN')

GITHUB_USER_NAME

'mohamed-stifi'

In [12]:
!git config --global user.email {EMAIL}
!git config --global user.name {GITHUB_USER_NAME}

In [29]:
!git commit -m "Initial commit with tests and CI pipeline"

[master (root-commit) 7791a0e] Initial commit with tests and CI pipeline
 7 files changed, 91 insertions(+)
 create mode 100644 .github/workflows/ci-pipeline.yml
 create mode 100644 .gitignore
 create mode 100644 requirements.txt
 create mode 100644 src/__init__.py
 create mode 100644 src/preprocessing.py
 create mode 100644 tests/__init__.py
 create mode 100644 tests/test_preprocessing.py


In [53]:
!git status

On branch main
nothing to commit, working tree clean


In [31]:
!git branch -M main

In [58]:
# !git remote add origin https://github.com/{GITHUB_USER_NAME}/mlops-cicd-lab.git
!git remote set-url origin https://{GITHUB_USER_NAME}:{TOKEN}@github.com/{GITHUB_USER_NAME}/mlops-cicd-lab.git

In [62]:
#!git remote -v

In [61]:
!git push -u origin main

Enumerating objects: 12, done.
Counting objects:   8% (1/12)Counting objects:  16% (2/12)Counting objects:  25% (3/12)Counting objects:  33% (4/12)Counting objects:  41% (5/12)Counting objects:  50% (6/12)Counting objects:  58% (7/12)Counting objects:  66% (8/12)Counting objects:  75% (9/12)Counting objects:  83% (10/12)Counting objects:  91% (11/12)Counting objects: 100% (12/12)Counting objects: 100% (12/12), done.
Delta compression using up to 2 threads
Compressing objects:  12% (1/8)Compressing objects:  25% (2/8)Compressing objects:  37% (3/8)Compressing objects:  50% (4/8)Compressing objects:  62% (5/8)Compressing objects:  75% (6/8)Compressing objects:  87% (7/8)Compressing objects: 100% (8/8)Compressing objects: 100% (8/8), done.
Writing objects:   8% (1/12)Writing objects:  16% (2/12)Writing objects:  25% (3/12)Writing objects:  33% (4/12)Writing objects:  41% (5/12)Writing objects:  50% (6/12)Writing objects:  58% (7/12)Writing objects:  66% (8/12)W

# **CI for Machine Learning Logic**
> Right now, we are testing data cleaning functions. But as an ML Engineer, you need to test Model Training too.

**Step 1: Define Dependencies Properly**
-
Update `requirements.txt` in the root folder:

In [63]:
%%writefile requirements.txt
pandas
pytest
scikit-learn
joblib

Overwriting requirements.txt


**Step 2: Create the Training Logic**
-
We will write a training function that fits a simple model. We design it as a function, not a script, so it's easy to test.

Create `src/train.py`:

In [64]:
%%writefile src/train.py
# src/train.py
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import joblib

def train_model(n_estimators: int = 100):
    """Trains a Random Forest on the Iris dataset."""
    # 1. Load Data
    iris = load_iris()
    X, y = iris.data, iris.target
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # 2. Train
    model = RandomForestClassifier(n_estimators=n_estimators, random_state=42)
    model.fit(X_train, y_train)

    # 3. Evaluate
    predictions = model.predict(X_test)
    accuracy = accuracy_score(y_test, predictions)

    return model, accuracy

Writing src/train.py


**Step 3: Test the Model Logic**
-
How do you unit test an ML model?
- **Smoke Test**: Does the function run without crashing?
- **Sanity Check**: Is accuracy better than random guessing? (> 0.33 for Iris)
- **Shape Check**: Does prediction output have the right shape?


Create `tests/test_train.py`:

In [65]:
%%writefile tests/test_train.py
# tests/test_train.py
import pytest
from src.train import train_model

def test_train_model_runs():
    """Smoke test: Does training finish and return a valid model?"""
    model, accuracy = train_model(n_estimators=10) # Use small N for speed

    assert accuracy > 0.5  # Iris is easy, accuracy should be high
    assert hasattr(model, "predict") # Check if it's a real model object

Writing tests/test_train.py


**Step 4: Test Check**
-
run

In [66]:
!pytest

platform linux -- Python 3.12.12, pytest-8.4.2, pluggy-1.6.0
rootdir: /content/drive/MyDrive/Books To Read/ML Engineering Notebook+Docs/mlops-cicd-lab
plugins: langsmith-0.4.47, typeguard-4.4.4, anyio-4.11.0
collected 3 items                                                              [0m

tests/test_preprocessing.py [32m.[0m[32m.[0m[32m                                           [ 66%][0m
tests/test_train.py [32m.[0m[32m                                                    [100%][0m



**Step 5: Push to GitHub**
-


In [67]:
!git status

On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   requirements.txt[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31msrc/train.py[m
	[31mtests/test_train.py[m

no changes added to commit (use "git add" and/or "git commit -a")


In [68]:
!git add .

In [69]:
!git status

On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	[32mmodified:   requirements.txt[m
	[32mnew file:   src/train.py[m
	[32mnew file:   tests/test_train.py[m



In [70]:
!git commit -m "Add ML training logic and requirements"

[main 7387cd1] Add ML training logic and requirements
 3 files changed, 35 insertions(+)
 create mode 100644 src/train.py
 create mode 100644 tests/test_train.py


In [71]:
!git push

Enumerating objects: 11, done.
Counting objects:   9% (1/11)Counting objects:  18% (2/11)Counting objects:  27% (3/11)Counting objects:  36% (4/11)Counting objects:  45% (5/11)Counting objects:  54% (6/11)Counting objects:  63% (7/11)Counting objects:  72% (8/11)Counting objects:  81% (9/11)Counting objects:  90% (10/11)Counting objects: 100% (11/11)Counting objects: 100% (11/11), done.
Delta compression using up to 2 threads
Compressing objects:  16% (1/6)Compressing objects:  33% (2/6)Compressing objects:  50% (3/6)Compressing objects:  66% (4/6)Compressing objects:  83% (5/6)Compressing objects: 100% (6/6)Compressing objects: 100% (6/6), done.
Writing objects:  14% (1/7)Writing objects:  28% (2/7)Writing objects:  42% (3/7)Writing objects:  57% (4/7)Writing objects:  71% (5/7)Writing objects:  85% (6/7)Writing objects: 100% (7/7)Writing objects: 100% (7/7), 1.26 KiB | 37.00 KiB/s, done.
Total 7 (delta 0), reused 0 (delta 0), pack-reused 0
To https://github.co

# **Packaging & Containerization**
Now that our code is tested and "CI-verified", we need to ship it.
> We cannot email `src/train.py` to the production server. The server might have the wrong Python version or missing libraries.

We use **Docker**.

**Step 1: The Dockerfile**
-

A `Dockerfile` is a recipe to build a computer that contains your project.

Create a file named `Dockerfile` (no extension) in the root of `mlops-cicd-lab`.


In [72]:
%%writefile Dockerfile
# 1. Base Image: Start with a lightweight Linux + Python 3.9
FROM python:3.9-slim

# 2. Setup Work Directory: Create a folder inside the container
WORKDIR /app

# 3. Copy Requirements: Move the file from laptop to container
COPY requirements.txt .

# 4. Install Dependencies: Run pip inside the container
# --no-cache-dir keeps the image small
RUN pip install --no-cache-dir -r requirements.txt

# 5. Copy Code: Move the src folder
COPY src/ src/

# 6. Default Command: What happens when we run the container?
# For now, let's make it run the training script
CMD ["python", "-c", "from src.train import train_model; print(train_model())"]

Writing Dockerfile


**Step 2: Build the Image**
-
This turns the recipe into an actual "Image" (a frozen computer).

> Install Docker in your machine (if you use google colab like me follow this guide [install_docker_in_colab](https://gist.github.com/mwufi/6718b30761cd109f9aff04c5144eb885)
)

Run this in your terminal:

In [None]:
# !docker build -t mlops-train:v1 .

**Step 3: Run the Container**
-
Now we spin up a container from that image.

In [None]:
#!docker run mlops-train:v1

**Step 4: Update Dockerfile**
-


In [6]:
%%writefile Dockerfile
# 1. Base Image: Start with a lightweight Linux + Python 3.9
FROM python:3.9-slim

# 2. Setup Work Directory: Create a folder inside the container
WORKDIR /app

# 3. Copy Requirements: Move the file from laptop to container
COPY requirements.txt .

# 4. Install Dependencies: Run pip inside the container
# --no-cache-dir keeps the image small
RUN pip install --no-cache-dir -r requirements.txt

# 5. Copy Code: Move the src folder
COPY src/ src/
COPY tests/ tests/

# 6. Default Command: What happens when we run the container?
# For now, let's make it run the training script
CMD ["python", "-c", "from src.train import train_model; print(train_model())"]


Overwriting Dockerfile


In [4]:
!git status

Refresh index: 100% (9/9), done.
On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31mDockerfile[m

nothing added to commit but untracked files present (use "git add" to track)


# **Automating the Build (CD - Continuous Delivery)**

> Manual docker build is for experimentation. In production, GitHub Actions should build the image and push it to a `Registry` (like an App Store for Docker images).

We will use **Docker Hub** (it's free and standard).

**Step 1: Setup Docker Hub**
-
1. Go to [`hub.docker.com`](hub.docker.com) and create a free account (if you don't have one).
2. Create a **New Repository**:
- Name: mlops-cicd
- Visibility: Public (easier for learning, keeps it free).

**Step 2: Connect GitHub to Docker Hub**
-
Your GitHub robot needs permission to upload files to your Docker Hub account. We use **Secrets**

1. Go to your GitHub Repo ‚Üí Settings ‚Üí Secrets and variables ‚Üí Actions.
2. Click New repository secret.
3. Add two secrets:
- **DOCKER_USERNAME**: Your Docker Hub ID.
- **DOCKER_PASSWORD**: Your Docker Hub Password (or an Access Token, which is better, but password works for now).

**Step 3: Update the Pipeline**
-
We are going to modify `.github/workflows/ci-pipeline.yml`.

We will add a new **job** called `build-and-push-image`.

**Logic:**
1. Run Tests first.
2. **IF** tests pass ‚Üí Log in to Docker Hub ‚Üí Build Image ‚Üí Push Image.

Replace your entire YAML file with this advanced version:

In [5]:
%%writefile .github/workflows/ci-pipeline.yml
name: CI/CD Pipeline

on:
  push:
    branches: ["main"]

jobs:
  # Job 1: The CI Part (Testing)
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.9'
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt
    - name: Run Unit Tests
      run: pytest

  # Job 2: The CD Part (Building & Delivering)
  build-image:
    needs: test  # <--- CRITICAL: Only run if 'test' passes!
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Log in to Docker Hub
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}

      - name: Build and push Docker image
        uses: docker/build-push-action@v4
        with:
          context: .
          push: true
          # Change 'your-username' below to your actual Docker Hub username!
          tags: ${{ secrets.DOCKER_USERNAME }}/mlops-cicd:latest

Overwriting .github/workflows/ci-pipeline.yml


In [7]:
!git status

On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   .github/workflows/ci-pipeline.yml[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31mDockerfile[m

no changes added to commit (use "git add" and/or "git commit -a")


In [8]:
!git add .

In [9]:
!git status

On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	[32mmodified:   .github/workflows/ci-pipeline.yml[m
	[32mnew file:   Dockerfile[m



In [13]:
!git commit -m "Add Docker Containerization and CD Job"

[main f28fe4b] Add Docker Containerization and CD Job
 2 files changed, 47 insertions(+), 17 deletions(-)
 create mode 100644 Dockerfile


In [15]:
!git status

On branch main
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean


In [17]:
!git push

Enumerating objects: 10, done.
Counting objects:  10% (1/10)Counting objects:  20% (2/10)Counting objects:  30% (3/10)Counting objects:  40% (4/10)Counting objects:  50% (5/10)Counting objects:  60% (6/10)Counting objects:  70% (7/10)Counting objects:  80% (8/10)Counting objects:  90% (9/10)Counting objects: 100% (10/10)Counting objects: 100% (10/10), done.
Delta compression using up to 2 threads
Compressing objects:  25% (1/4)Compressing objects:  50% (2/4)Compressing objects:  75% (3/4)Compressing objects: 100% (4/4)Compressing objects: 100% (4/4), done.
Writing objects:  16% (1/6)Writing objects:  33% (2/6)Writing objects:  50% (3/6)Writing objects:  66% (4/6)Writing objects:  83% (5/6)Writing objects: 100% (6/6)Writing objects: 100% (6/6), 1.43 KiB | 69.00 KiB/s, done.
Total 6 (delta 0), reused 0 (delta 0), pack-reused 0
To https://github.com/mohamed-stifi/mlops-cicd-lab.git
   7387cd1..f28fe4b  main -> main


# **The CD Pipeline (FastAPI Serving)**

Right now, your Docker image just runs a script and exits. That is useful for Batch Training, but useless for a Real-time App (like a website asking for a prediction).

> We need to wrap our model in a **REST API**. We will use `FastAPI`, the modern standard for Python ML serving.

**Step 1: Save the Model Artifact**
-
An API needs a trained model file (`model.joblib`) to load.

Modify `src/train.py` to **save** the model to disk after training.

In [18]:
%%writefile src/train.py
# src/train.py (Updated)
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import joblib

def train_model(n_estimators: int = 100):
    iris = load_iris()
    X, y = iris.data, iris.target
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    model = RandomForestClassifier(n_estimators=n_estimators, random_state=42)
    model.fit(X_train, y_train)

    predictions = model.predict(X_test)
    accuracy = accuracy_score(y_test, predictions)

    # --- NEW: Save the model ---
    joblib.dump(model, "model.joblib")
    print(f"Model saved to model.joblib with accuracy: {accuracy}")

    return model, accuracy

if __name__ == "__main__":
    train_model()

Overwriting src/train.py


**Step 2: Create the API**
-
Create a new file `src/app.py`. This script will:

1. Load the saved model.
2. Listen for HTTP requests.
3. Return predictions.

In [19]:
%%writefile src/app.py
# src/app.py
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np

# 1. Define Input Schema (Data Validation)
class IrisInput(BaseModel):
    sepal_length: float
    sepal_width: float
    petal_length: float
    petal_width: float

# 2. Load Model
app = FastAPI(title="Iris Prediction API")
model = joblib.load("model.joblib")

# 3. Define Endpoint
@app.post("/predict")
def predict(data: IrisInput):
    # Convert input to numpy array
    features = np.array([[
        data.sepal_length,
        data.sepal_width,
        data.petal_length,
        data.petal_width
    ]])

    # Make prediction
    prediction = model.predict(features)
    return {"class": int(prediction[0])}

Writing src/app.py


**Step 3: Update Requirements**
-
Add these to `requirements.txt`:
- fastapi
- uvicorn

In [20]:
%%writefile requirements.txt
pandas
pytest
scikit-learn
joblib
fastapi
uvicorn

Overwriting requirements.txt


**Step 4: Update Dockerfile (The "Serving" Image)**
-
We need to change the `Dockerfile` to:
1. Run the training script during the build (so `model.joblib` exists inside the image).
2. Start the Web Server (`uvicorn`) instead of running a script.

Update your `Dockerfile`:

In [21]:
%%writefile Dockerfile
FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY src/ src/
COPY tests/ tests/

# 1. Build-Time Training: Train the model so it is baked into the image
# (In advanced MLOps, we download from S3/MLflow, but this is best for starting)
RUN python src/train.py

# 2. Expose the port (Documentation only)
EXPOSE 8000

# 3. Run the API Server
CMD ["uvicorn", "src.app:app", "--host", "0.0.0.0", "--port", "8000"]

Overwriting Dockerfile


**Rebuild locally**:

In [None]:
#!docker build -t mlops-api:v1 .

**Run the API**:

In [None]:
#!docker run -p 8000:8000 mlops-api:v1

**Test it**: Open your browser to: `http://localhost:8000/docs`.

- This is the Swagger UI (Auto-generated by FastAPI).
- Click POST /predict ‚Üí Try it out.
- Enter some numbers (e.g., 5.1, 3.5, 1.4, 0.2).
- Click Execute.

**Push to Github**

In [22]:
!git status

On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   Dockerfile[m
	[31mmodified:   requirements.txt[m
	[31mmodified:   src/train.py[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31msrc/app.py[m

no changes added to commit (use "git add" and/or "git commit -a")


In [23]:
!git add .

In [24]:
!git commit -m "FastAPI Serving"

[main 64dc141] FastAPI Serving
 4 files changed, 60 insertions(+), 25 deletions(-)
 rewrite Dockerfile (80%)
 create mode 100644 src/app.py


In [25]:
!git push

Enumerating objects: 12, done.
Counting objects:   8% (1/12)Counting objects:  16% (2/12)Counting objects:  25% (3/12)Counting objects:  33% (4/12)Counting objects:  41% (5/12)Counting objects:  50% (6/12)Counting objects:  58% (7/12)Counting objects:  66% (8/12)Counting objects:  75% (9/12)Counting objects:  83% (10/12)Counting objects:  91% (11/12)Counting objects: 100% (12/12)Counting objects: 100% (12/12), done.
Delta compression using up to 2 threads
Compressing objects:  14% (1/7)Compressing objects:  28% (2/7)Compressing objects:  42% (3/7)Compressing objects:  57% (4/7)Compressing objects:  71% (5/7)Compressing objects:  85% (6/7)Compressing objects: 100% (7/7)Compressing objects: 100% (7/7), done.
Writing objects:  14% (1/7)Writing objects:  28% (2/7)Writing objects:  42% (3/7)Writing objects:  57% (4/7)Writing objects:  71% (5/7)Writing objects:  85% (6/7)Writing objects: 100% (7/7)Writing objects: 100% (7/7), 1.49 KiB | 101.00 KiB/s, done.
Total 7 (

# **The "CT" Pipeline (Continuous Training)**
The Flaw: currently, your `Dockerfile` has this line:
- `RUN python src/train.py`

This means every time you build the image (even just to fix a typo in the README), the model retrains. This is wasteful and dangerous in production.

> **The Solution**: We must decouple Training from Deployment.

1. **CI Pipeline**: Checks code quality. (Fast).
2. **CT Pipeline**: Retrains the model. (Slow, triggered manually or by schedule).

We will create a new workflow that uses G**itHub Artifacts** to pass the model between jobs.

**Step 1: Clean the Dockerfile**
-
We will stop training inside the Docker build. The Docker image should just receive a pre-trained model.

Remove the RUN python src/train.py line.

In [26]:
%%writefile Dockerfile
FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY src/ src/
# COPY tests/ tests/  <-- In prod, we usually don't copy tests, but okay for lab.

# ‚ùå REMOVED: RUN python src/train.py
# The model.joblib must be provided from the outside now!

EXPOSE 8000
CMD ["uvicorn", "src.app:app", "--host", "0.0.0.0", "--port", "8000"]

Overwriting Dockerfile


**Step 2: Create the CT Workflow**
-
We will create a new file `.github/workflows/retrain.yml`.

This workflow will utilize **Workflow Dispatch** (a manual button in GitHub) so you can trigger training on demand.

**Logic:**
1. **Train Job**: Runs `src/train.py` and saves `model.joblib`. Uploads it as a temporary GitHub Artifact.
2. **Build Job**: Downloads the artifact, puts it in the folder, and then builds the Docker image.

Create `.github/workflows/retrain.yml`:


In [27]:
%%writefile .github/workflows/retrain.yml
name: Manual Retrain & Deploy

# Trigger: Manual button click (or Schedule)
on:
  workflow_dispatch:
    inputs:
      n_estimators:
        description: 'Number of trees'
        required: true
        default: '150'

jobs:
  # Job 1: Train the Model
  train:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3

    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.9'

    - name: Install dependencies
      run: pip install -r requirements.txt

    - name: Train Model
      # We pass the input from the UI to the script arguments
      run: |
        # We need to quickly modify train.py to accept args or just run default
        # For this lab, let's just run the script, assuming it saves model.joblib
        python src/train.py

    - name: Upload Model Artifact
      uses: actions/upload-artifact@v4
      with:
        name: trained-model
        path: model.joblib
        retention-days: 1

  # Job 2: Package & Push
  deploy:
    needs: train # Wait for training to finish
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Download Model Artifact
        uses: actions/download-artifact@v4
        with:
          name: trained-model
          # Downloads to current directory, so Dockerfile finds it!

      - name: Log in to Docker Hub
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}

      - name: Build and push Docker image
        uses: docker/build-push-action@v4
        with:
          context: .
          push: true
          tags: ${{ secrets.DOCKER_USERNAME }}/mlops-cicd:prod

Writing .github/workflows/retrain.yml


**Step 3: Push to GitHub**
-

In [28]:
!git status

On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   Dockerfile[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m.github/workflows/retrain.yml[m

no changes added to commit (use "git add" and/or "git commit -a")


In [29]:
!git add .

In [30]:
!git commit -m "Add Continuous Training"

[main 2289358] Add Continuous Training
 2 files changed, 68 insertions(+), 7 deletions(-)
 create mode 100644 .github/workflows/retrain.yml


In [31]:
!git push

Enumerating objects: 10, done.
Counting objects:  10% (1/10)Counting objects:  20% (2/10)Counting objects:  30% (3/10)Counting objects:  40% (4/10)Counting objects:  50% (5/10)Counting objects:  60% (6/10)Counting objects:  70% (7/10)Counting objects:  80% (8/10)Counting objects:  90% (9/10)Counting objects: 100% (10/10)Counting objects: 100% (10/10), done.
Delta compression using up to 2 threads
Compressing objects:  20% (1/5)Compressing objects:  40% (2/5)Compressing objects:  60% (3/5)Compressing objects:  80% (4/5)Compressing objects: 100% (5/5)Compressing objects: 100% (5/5), done.
Writing objects:  16% (1/6)Writing objects:  33% (2/6)Writing objects:  50% (3/6)Writing objects:  66% (4/6)Writing objects:  83% (5/6)Writing objects: 100% (6/6)Writing objects: 100% (6/6), 1.43 KiB | 39.00 KiB/s, done.
Total 6 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), completed with 1 local object.[K
To https://github.com/mohamed-stifi/mlo

# **Documentation**



In [33]:
!ls

Dockerfile  docs  requirements.txt  src  tests


In [37]:
import os

# Create the docs directory
!mkdir -p docs

# Verify
!ls

Dockerfile  docs  requirements.txt  src  tests


**Generate the README.md**
-

In [None]:
%%writefile README.md
# üöÄ MLOps CI/CD Lab: End-to-End Pipeline

This repository demonstrates a production-grade **MLOps (Machine Learning Operations)** pipeline implemented using **GitHub Actions**, **Docker**, and **FastAPI**.

It moves beyond simple model training scripts to a structured, automated ecosystem that ensures code quality, reproducibility, and automated deployment.

## üèóÔ∏è Architecture

The system is designed with a **Separation of Concerns** principle:

1.  **CI (Continuous Integration):** Triggered on every code push. verifying the code logic via Unit Tests.
2.  **CD (Continuous Delivery):** Packages the model and API into a Docker container and pushes it to Docker Hub.
3.  **CT (Continuous Training):** A decoupled workflow triggered manually (or via schedule) to retrain the model on heavy compute, completely separate from the deployment logic.

---

## üìÇ Project Structure

```text
mlops-cicd-lab/
‚îú‚îÄ‚îÄ .github/workflows/
‚îÇ   ‚îú‚îÄ‚îÄ ci-pipeline.yml    # CI/CD: Tests code & Builds Docker image (if tests pass)
‚îÇ   ‚îî‚îÄ‚îÄ retrain.yml        # CT: Retrains model & Updates artifact (Manual Trigger)
‚îú‚îÄ‚îÄ docs/                  # Project documentation and tutorials
‚îú‚îÄ‚îÄ src/
‚îÇ   ‚îú‚îÄ‚îÄ app.py             # FastAPI serving application
‚îÇ   ‚îú‚îÄ‚îÄ train.py           # Model training logic (produces model.joblib)
‚îÇ   ‚îî‚îÄ‚îÄ preprocessing.py   # Data cleaning logic
‚îú‚îÄ‚îÄ tests/                 # Pytest unit tests
‚îú‚îÄ‚îÄ Dockerfile             # Recipe for the production image
‚îú‚îÄ‚îÄ requirements.txt       # Python dependencies
‚îî‚îÄ‚îÄ README.md              # Project documentation
```
---

## ‚öôÔ∏è Workflows Explained

### 1. The Gatekeeper (CI Pipeline)
*   **Trigger:** `git push` to `main`.
*   **Action:** Installs dependencies and runs `pytest`.
*   **Goal:** Ensures no broken code (syntax errors, logic bugs) ever reaches the deployment stage.

### 2. The Delivery Truck (CD Pipeline)
*   **Trigger:** Successful completion of the CI Tests.
*   **Action:**
    1.  Logs into Docker Hub using GitHub Secrets.
    2.  Builds a Docker image containing the code and the model.
    3.  Pushes the image to Docker Hub with the `latest` tag.

### 3. The Factory (CT Pipeline)
*   **Trigger:** Manual "Workflow Dispatch" (UI Button).
*   **Action:**
    1.  Runs `src/train.py` to generate a new `model.joblib`.
    2.  Uploads the model as a GitHub Artifact.
    3.  Triggers the build process to package this **new** model into the Docker image.
*   **Why?** This decouples heavy training (which might need GPUs) from the lightweight CI checks.

---
## üöÄ How to Run Locally

### Prerequisites
*   Python 3.9+
*   Docker

### 1. Installation
```bash
pip install -r requirements.txt
```

### 2. Run Tests
```bash
pytest
```

### 3. Build & Run Docker Container
```bash
# Build the image
docker build -t mlops-api:v1 .

# Run the container (Mapping port 8000)
docker run -p 8000:8000 mlops-api:v1
```

### 4. Test the API
Once the container is running, open your browser to the Swagger UI:
> `http://localhost:8000/docs`

Or use curl:
```bash
curl -X 'POST' \
  'http://localhost:8000/predict' \
  -H 'Content-Type: application/json' \
  -d '{
  "sepal_length": 5.1,
  "sepal_width": 3.5,
  "petal_length": 1.4,
  "petal_width": 0.2
}'
```

---
## üõ†Ô∏è Technologies Used

*   **GitHub Actions:** For orchestration.
*   **Docker:** For containerization and environment consistency.
*   **FastAPI:** For high-performance model serving.
*   **Scikit-Learn:** For the ML model (Random Forest).
*   **Pytest:** For automated testing.
*   **Docker Hub:** As the Container Registry.

---
### üìñ Documentation

A full step-by-step notebook explaining the creation of this pipeline can be found in `docs/project_documentation.ipynb`.

In [None]:








### **Step 3: Push Changes to GitHub**

Run these commands to commit the new structure and the documentation.

```python
!git add .
!git commit -m "Docs: Add README and move notebook to docs folder"
!git push
```