# 5. Model Deployment Guide ‚Äî UK Housing

**Author:** Marin Janushaj  
**Team:** Yunus  
**Date:** November 2025  
**Goal:** Prepare models for production deployment

## Deployment Overview

This notebook demonstrates:
1. Loading and testing saved models
2. Creating prediction functions for deployment
3. API design (Flask example)
4. Docker containerization
5. CI/CD pipeline setup
6. Monitoring and maintenance strategies

## What You Need for Deployment:

‚úÖ **Already have** (from previous notebooks):
- Trained models (best_model.pkl, pycaret_model.pkl)
- Streamlit app (app.py)
- Preprocessing pipeline

üîú **This notebook provides**:
- API code for backend
- Docker configuration
- CI/CD automation
- Monitoring tools

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import joblib
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

print("="*80)
print("MODEL DEPLOYMENT GUIDE")
print("="*80)

MODEL DEPLOYMENT GUIDE


## 1. Load and Test Models

In [2]:
# Check available models
model_dir = Path("../data/clean/")
models = list(model_dir.glob("*.pkl"))

print("Available models:")
for m in models:
    size_mb = m.stat().st_size / (1024*1024)
    print(f"  {m.name} ({size_mb:.2f} MB)")

# Load best model
try:
    model_bundle = joblib.load(model_dir / "best_model.pkl")
    
    # Check if it's a proper bundle or just the model
    if isinstance(model_bundle, dict) and 'model' in model_bundle:
        print(f"\n‚úì Loaded: {model_bundle.get('model_name', 'Unknown')}")
        print(f"  Test R¬≤: {model_bundle['metrics']['test_r2']:.4f}")
        print(f"  Test MAE: ¬£{model_bundle['metrics']['test_mae']:,.0f}")
    else:
        # It's just a model, not a bundle - create a minimal bundle
        print("\n‚ö† Model file doesn't have metadata, creating minimal bundle...")
        model_bundle = {
            'model': model_bundle,
            'model_name': 'LightGBM',
            'metrics': {'test_r2': 0.0, 'test_mae': 50000},
            'feature_names': None,
            'target_encoder': None
        }
        print("  Note: Run notebook 4 to create a proper model bundle with metrics")
        
except Exception as e:
    print(f"\n‚ùå Error loading model: {e}")
    print("   Please run notebook 4 first to create best_model.pkl")
    raise

Available models:
  best_model.pkl (0.49 MB)
  pycaret_model.pkl (1.35 MB)

‚úì Loaded: LightGBM
  Test R¬≤: 0.6836
  Test MAE: ¬£61,409


## 2. Create Production-Ready Prediction Function

In [3]:
def predict_price(property_type, is_new, duration, county, year, month=1, quarter=1):
    """
    Production prediction function with validation.
    
    Returns: dict with price and confidence interval
    """
    # Input validation
    assert property_type in ['D','S','T','F','O'], "Invalid property_type"
    assert is_new in ['Y','N'], "Invalid is_new"
    assert duration in ['F','L','U'], "Invalid duration"
    assert 1995 <= year <= 2025, "Year out of range"
    
    # Create input
    input_data = pd.DataFrame([{
        'type': property_type,
        'is_new': is_new,
        'duration': duration,
        'county': county.upper(),
        'year': year,
        'month': month,
        'quarter': quarter
    }])
    
    # Get model components
    model = model_bundle['model']
    encoder = model_bundle.get('target_encoder')
    features = model_bundle.get('feature_names')
    
    # Preprocess based on available components
    if encoder is not None and 'county' in input_data.columns:
        try:
            input_data['county_encoded'] = encoder.transform(input_data[['county']])
            input_data = input_data.drop('county', axis=1)
        except:
            # If encoding fails, use default encoding
            input_data['county_encoded'] = 0.0
            input_data = input_data.drop('county', axis=1)
    elif 'county' in input_data.columns:
        input_data = input_data.drop('county', axis=1)
    
    # One-hot encode categorical variables
    input_data = pd.get_dummies(input_data, columns=['type','is_new','duration'], drop_first=True)
    
    # Align features if we have the feature list
    if features is not None:
        for col in features:
            if col not in input_data.columns:
                input_data[col] = 0
        input_data = input_data[features]
    
    # Predict (model was trained with log-transformed target)
    prediction_log = model.predict(input_data)[0]
    
    # ALWAYS apply inverse transform (expm1 is inverse of log1p)
    # The model was trained on np.log1p(price), so predictions are in log space
    price = np.expm1(prediction_log)
    
    # Get MAE for confidence interval
    mae = model_bundle.get('metrics', {}).get('test_mae', 50000)
    
    return {
        'price': float(price),
        'price_log': float(prediction_log),
        'confidence_lower': float(max(1000, price - 2*mae)),
        'confidence_upper': float(price + 2*mae),
        'model': model_bundle.get('model_name', 'Unknown')
    }

print("‚úì Prediction function ready (FIXED: always applies expm1)")

‚úì Prediction function ready (FIXED: always applies expm1)


## 3. Test Predictions

In [4]:
# Test cases
tests = [
    ("London Flat (New Build, Leasehold)", 'F', 'Y', 'L', 'GREATER LONDON', 2016),
    ("Manchester Semi-Detached (Freehold)", 'S', 'N', 'F', 'GREATER MANCHESTER', 2015),
    ("Cornwall Detached (Freehold)", 'D', 'N', 'F', 'CORNWALL', 2017)
]

print("="*80)
print("TEST PREDICTIONS (FIXED)")
print("="*80)

for name, *params in tests:
    result = predict_price(*params)
    print(f"\n{name}:")
    print(f"  Predicted Price: ¬£{result['price']:,.0f}")
    print(f"  Log Value: {result['price_log']:.4f}")
    print(f"  Confidence Range: ¬£{result['confidence_lower']:,.0f} - ¬£{result['confidence_upper']:,.0f}")
    print(f"  Model: {result['model']}")

print("\n" + "="*80)
print("‚úÖ All predictions now show realistic UK house prices!")
print("="*80)

TEST PREDICTIONS (FIXED)

London Flat (New Build, Leasehold):
  Predicted Price: ¬£754,848
  Log Value: 13.5343
  Confidence Range: ¬£632,030 - ¬£877,666
  Model: LightGBM

Manchester Semi-Detached (Freehold):
  Predicted Price: ¬£248,214
  Log Value: 12.4221
  Confidence Range: ¬£125,396 - ¬£371,031
  Model: LightGBM

Cornwall Detached (Freehold):
  Predicted Price: ¬£321,911
  Log Value: 12.6820
  Confidence Range: ¬£199,093 - ¬£444,728
  Model: LightGBM

‚úÖ All predictions now show realistic UK house prices!


## 4. Flask API Code

Save this as `api.py` for deployment.

In [5]:
api_code = '''from flask import Flask, request, jsonify
import joblib
import pandas as pd
import numpy as np

app = Flask(__name__)
model_bundle = joblib.load('data/clean/best_model.pkl')

@app.route('/health')
def health():
    return jsonify({'status': 'ok', 'model': model_bundle['model_name']})

@app.route('/predict', methods=['POST'])
def predict():
    data = request.json
    result = predict_price(**data)  # Use function from above
    return jsonify(result)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
'''

with open('api_example.py', 'w') as f:
    f.write(api_code)

print("‚úì API code saved: api_example.py")

‚úì API code saved: api_example.py


## 5. Docker Configuration

In [6]:
# Dockerfile
dockerfile = '''FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 5000
CMD ["python", "api.py"]
'''

# docker-compose.yml
compose = '''version: '3.8'
services:
  api:
    build: .
    ports:
      - "5000:5000"
    volumes:
      - ./data:/app/data
'''

with open('Dockerfile.example', 'w') as f:
    f.write(dockerfile)
with open('docker-compose.example.yml', 'w') as f:
    f.write(compose)

print("‚úì Docker files created")
print("Run: docker-compose up --build")

‚úì Docker files created
Run: docker-compose up --build


## 6. GitHub Actions CI/CD

In [7]:
workflow = '''name: Deploy
on:
  push:
    branches: [ main ]
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Build
      run: docker build -t housing-model .
    - name: Deploy
      run: echo "Deploy to production"
'''

import os
os.makedirs('.github/workflows', exist_ok=True)
with open('.github/workflows/deploy.example.yml', 'w') as f:
    f.write(workflow)

print("‚úì CI/CD workflow created")

‚úì CI/CD workflow created


## 7. Deployment Checklist

### Before Deployment:
- [x] Model trained (R¬≤ > 0.65)
- [x] Model saved with preprocessing
- [x] Prediction function tested
- [x] Streamlit app working (`app.py`)
- [ ] API created and tested
- [ ] Docker container built
- [ ] Error handling added
- [ ] Monitoring set up

### Deployment Options:

**1. Heroku (Easiest)**
```bash
heroku create
git push heroku main
```

**2. Oracle Cloud (Free)**
- Always Free compute
- Good for students

**3. Local/Pi**
- Use ngrok for demos

### After Deployment:
- Monitor latency (< 100ms)
- Log predictions
- Retrain monthly
- Collect feedback

---

**Your Streamlit app (`app.py`) is already deployment-ready!**

**End of Deployment Guide**