# Model Versioning
## Objective

Teach systematic model versioning for production ML pipelines, emphasizing:

- Reproducibility
- Traceability
- Controlled deployment and rollback
- Governance compliance

> Versioning ensures that every deployed model is identifiable, auditable, and reproducible.

## Why Model Versioning Matters
### Risks Without Versioning

- Cannot reproduce results
- Cannot rollback to previous models
- Deployment inconsistencies
- Regulatory or audit failures
### Benefits

- Tracks model evolution
- Enables rollback and A/B testing
- Supports reproducible pipelines and CI/CD
- 
## Versioning Concepts
### Semantic Versioning (Recommended)


| Component | Meaning                                                   |
| --------- | --------------------------------------------------------- |
| Major     | Breaking change in model architecture or features         |
| Minor     | Feature update or retraining with same architecture       |
| Patch     | Bug fixes, hyperparameter tweaks, or metadata corrections |


Example: v1.2.0 → Major: 1, Minor: 2, Patch: 0

### Version Metadata

- `model_name`

- `version`

- `training_date`

- `dataset_snapshot_id`

- `feature_set_version`

- `hyperparameters`

- `library_versions` (Python, scikit-learn, etc.)

> Store metadata in JSON or MLflow-like tracking systems.

## Tools for Model Versioning

| Tool                         | Type                    | Notes                          |
| ---------------------------- | ----------------------- | ------------------------------ |
| Git                          | Code versioning         | Track model scripts, pipelines |
| MLflow                       | Model registry          | Store artifacts, track metrics |
| DVC                          | Data & model versioning | Handles large files, pipelines |
| Custom JSON + Artifact Store | Lightweight             | Good for small projects        |


## Practical Example: Versioning a Model with MLflow
### Install MLflow

In [None]:
pip install mlflow

## Tracking Experiment

In [None]:
import mlflow
import mlflow.sklearn

mlflow.set_experiment("churn_model")


## Train and Log Model

In [None]:
with mlflow.start_run(run_name="logistic_regression_v1"):
    pipeline.fit(X_train, y_train)
    mlflow.sklearn.log_model(pipeline, "model")
    mlflow.log_params({"C": 1.0, "solver": "lbfgs"})
    mlflow.log_metric("accuracy", pipeline.score(X_val, y_val))


## Register Model Version

In [None]:
result = mlflow.register_model(
    "runs:/<run_id>/model",
    "churn_model_registry"
)

# Git-Based Versioning

- Track pipeline scripts and training notebooks

- Tag releases (e.g., v1.0.0) for reproducibility

- Combine with artifact versioning (pickle/joblib/ONNX)

In [None]:
git tag -a v1.0.0 -m "Initial production-ready model"


# Lightweight JSON Versioning (Alternative)

In [None]:
import json
metadata = {
    "model_name": "churn_classifier",
    "version": "1.0.0",
    "training_date": "2026-02-09",
    "features": ["age", "income", "region"],
    "sklearn_version": "1.4.0"
}

with open("artifacts/model_metadata.json", "w") as f:
    json.dump(metadata, f, indent=4)

## Best Practices

- Always version both code and artifact

- Store environment metadata (Python, libraries, OS)

- Link model version ↔ feature version ↔ dataset snapshot

- Maintain a model registry for production rollout

- Document reason for each version change

## Anti-Patterns to Avoid

- ❌ Overwriting models without version bump
- ❌ Ignoring library versions
- ❌ Using informal naming like model_final.pkl
- ❌ Not tracking feature set changes

## Key Takeaways

- Model versioning is essential for reproducibility and auditability

- Semantic versioning + metadata = robust governance

- Combine artifact storage with code and environment tracking

- MLflow, DVC, or lightweight JSON solutions can all be production-ready

## Transition Forward

➡ 02_data_and_feature_versioning.ipynb

- Version datasets and features

- Ensure deployed model inputs remain consistent