# üè• Medicine Demand Forecasting - Model Training
## Barangay Health Center Management System

**Gradient Boosting Regression Model Training Notebook**

This notebook trains machine learning models to forecast monthly, quarterly, and seasonal medicine demand.

### Features:
- ‚úÖ Train Gradient Boosting models for each medicine
- ‚úÖ Monthly predictions (1-12 months ahead)
- ‚úÖ Quarterly predictions
- ‚úÖ Seasonal predictions (Philippine seasons)
- ‚úÖ Model evaluation and validation
- ‚úÖ Export trained models for VPS deployment

### Requirements:
- Database export (CSV or SQL) with dispensing history
- Or direct MySQL database connection

---

## üì¶ Step 1: Install Required Packages

In [None]:
# Install required packages
!pip install -q pandas numpy scikit-learn joblib matplotlib seaborn plotly
!pip install -q sqlalchemy mysql-connector-python

print("‚úÖ All packages installed successfully!")

## üìö Step 2: Import Libraries

In [None]:
import pandas as pd
import numpy as np
import json
import warnings
import os
import joblib
import calendar
from datetime import datetime, timedelta
from dateutil.relativedelta import relativedelta

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Suppress warnings
warnings.filterwarnings('ignore')

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("‚úÖ Libraries imported successfully!")
print(f"üìÖ Current Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## ‚öôÔ∏è Step 3: Configuration

In [None]:
# Configuration
MODEL_DIR = "trained_models"
RESULTS_DIR = "forecast_results"
DATA_DIR = "data"

# Create directories
for directory in [MODEL_DIR, RESULTS_DIR, DATA_DIR]:
    os.makedirs(directory, exist_ok=True)

# Philippine seasons
SEASONS = {
    'Dry Season (Tag-init)': [3, 4, 5],        # March-May: Hot dry
    'Wet Season (Tag-ulan)': [6, 7, 8, 9],     # June-Sept: Southwest monsoon
    'Cool Dry (Amihan)': [12, 1, 2],           # Dec-Feb: Northeast monsoon
    'Transition': [10, 11]                      # Oct-Nov: Transition
}

def get_season(month):
    """Get Philippine season from month."""
    for season, months in SEASONS.items():
        if month in months:
            return season
    return 'Transition'

print("‚úÖ Configuration complete!")
print(f"üìÅ Model directory: {MODEL_DIR}")
print(f"üìÅ Results directory: {RESULTS_DIR}")

## üì§ Step 4: Upload Data

You have **two options** to load data:

### Option A: Upload CSV files
Required files:
- `medicine_dispensing.csv` (columns: med_name, quantity_given, date_given, category)
- `medicines.csv` (columns: med_name, category)
- `holidays.csv` (optional - columns: event_date, is_national_holiday)

### Option B: Connect to MySQL database directly
- Provide database credentials below

In [None]:
from google.colab import files

# Choose your data loading method
USE_CSV = True  # Set to False if using direct database connection

if USE_CSV:
    print("üì§ Upload your CSV files:")
    print("\nRequired: medicine_dispensing.csv")
    print("Required: medicines.csv")
    print("Optional: holidays.csv")
    print("\nClick 'Choose Files' to upload...")
    
    uploaded = files.upload()
    
    # Move uploaded files to data directory
    for filename in uploaded.keys():
        os.rename(filename, os.path.join(DATA_DIR, filename))
    
    print(f"\n‚úÖ Uploaded {len(uploaded)} file(s)")
else:
    print("üìä Using direct database connection (configure in next cell)")

## üîå Step 5: Database Connection (Optional)

**Only run this if USE_CSV = False**

In [None]:
# Database configuration (only if not using CSV)
if not USE_CSV:
    from sqlalchemy import create_engine
    
    # ‚ö†Ô∏è IMPORTANT: Don't commit credentials to public repositories!
    DB_CONFIG = {
        'host': 'your_host_or_ip',  # e.g., '192.168.1.100' or 'your-vps.com'
        'user': 'root',
        'password': 'your_password',
        'database': 'barangay_health_center'
    }
    
    # Create database engine
    db_url = f"mysql+mysqlconnector://{DB_CONFIG['user']}:{DB_CONFIG['password']}@{DB_CONFIG['host']}/{DB_CONFIG['database']}"
    engine = create_engine(db_url)
    
    print("‚úÖ Database connection established!")
else:
    print("‚è≠Ô∏è Skipping database connection (using CSV files)")

## üìä Step 6: Load and Explore Data

In [None]:
# Load data based on chosen method
if USE_CSV:
    # Load from CSV
    df_dispensing = pd.read_csv(os.path.join(DATA_DIR, 'medicine_dispensing.csv'))
    df_medicines = pd.read_csv(os.path.join(DATA_DIR, 'medicines.csv'))
    
    # Load holidays if available
    holidays_file = os.path.join(DATA_DIR, 'holidays.csv')
    if os.path.exists(holidays_file):
        df_holidays = pd.read_csv(holidays_file)
    else:
        df_holidays = pd.DataFrame(columns=['event_date', 'is_national_holiday'])
        print("‚ö†Ô∏è No holidays.csv found - using empty holiday data")
else:
    # Load from database
    query_dispensing = """
        SELECT m.med_name, m.category, ma.quantity_given, DATE(ma.date_given) AS date_given
        FROM medicine_assistance ma
        JOIN medicines m ON ma.med_id = m.med_id
        WHERE ma.date_given IS NOT NULL
    """
    df_dispensing = pd.read_sql(query_dispensing, engine)
    df_medicines = pd.read_sql("SELECT DISTINCT med_name, category FROM medicines", engine)
    
    try:
        df_holidays = pd.read_sql("SELECT event_date, is_national_holiday FROM external_events", engine)
    except:
        df_holidays = pd.DataFrame(columns=['event_date', 'is_national_holiday'])

# Display data overview
print("="*80)
print("DATA OVERVIEW")
print("="*80)
print(f"\nüìã Dispensing Records: {len(df_dispensing):,}")
print(f"üíä Unique Medicines: {df_dispensing['med_name'].nunique()}")
print(f"üìÖ Date Range: {df_dispensing['date_given'].min()} to {df_dispensing['date_given'].max()}")
print(f"üéâ Holiday Records: {len(df_holidays):,}")

print("\nüìä Sample Dispensing Data:")
display(df_dispensing.head(10))

print("\nüìà Medicine Categories:")
display(df_medicines.groupby('category').size().sort_values(ascending=False))

## üîß Step 7: Data Preprocessing & Feature Engineering

In [None]:
print("üîß Starting data preprocessing...\n")

# Convert dates
df_dispensing['date_given'] = pd.to_datetime(df_dispensing['date_given'])
df_dispensing['period'] = df_dispensing['date_given'].dt.to_period('M')

# Get all medicines and date range
all_meds = df_medicines['med_name'].dropna().unique().tolist()
min_date = df_dispensing['period'].min()
max_date = df_dispensing['period'].max()
full_period_range = pd.period_range(start=min_date, end=max_date, freq='M')

print(f"üìÖ Creating continuous time series: {min_date} to {max_date} ({len(full_period_range)} months)")

# Create continuous time series (all medicine-month combinations)
multi_index = pd.MultiIndex.from_product(
    [all_meds, full_period_range],
    names=['med_name', 'period']
)
monthly_template = pd.DataFrame(index=multi_index).reset_index()
monthly_template['date_start'] = monthly_template['period'].apply(lambda x: x.start_time)

# Aggregate dispensing data by month
monthly_usage = df_dispensing.groupby(['med_name', 'period'])['quantity_given'].sum().reset_index()

# Merge and fill gaps with zeros
monthly_df = monthly_template.merge(monthly_usage, on=['med_name', 'period'], how='left')
monthly_df['total_quantity'] = monthly_df['quantity_given'].fillna(0)

# Add medicine categories
med_categories = df_medicines.set_index('med_name')['category'].to_dict()
monthly_df['category'] = monthly_df['med_name'].map(med_categories)

print("‚úÖ Base time series created")
print(f"   Total records: {len(monthly_df):,}")

# === FEATURE ENGINEERING ===
print("\nüé® Engineering features...")

# Time-based features
monthly_df['month_of_year'] = monthly_df['date_start'].dt.month
monthly_df['quarter'] = monthly_df['date_start'].dt.quarter
monthly_df['year'] = monthly_df['date_start'].dt.year
monthly_df['days_in_month'] = monthly_df['date_start'].dt.days_in_month

# Philippine season
monthly_df['season'] = monthly_df['month_of_year'].apply(get_season)

# Cyclical encoding (better for seasonality)
monthly_df['month_sin'] = np.sin(2 * np.pi * monthly_df['month_of_year'] / 12)
monthly_df['month_cos'] = np.cos(2 * np.pi * monthly_df['month_of_year'] / 12)

# Lag features (previous months)
for lag in [1, 2, 3, 6, 12]:
    monthly_df[f'lag_{lag}'] = monthly_df.groupby('med_name')['total_quantity'].shift(lag)

# Rolling statistics
for window in [3, 6, 12]:
    monthly_df[f'rolling_mean_{window}'] = monthly_df.groupby('med_name')['total_quantity'].transform(
        lambda x: x.rolling(window=window, min_periods=1).mean()
    )
    monthly_df[f'rolling_std_{window}'] = monthly_df.groupby('med_name')['total_quantity'].transform(
        lambda x: x.rolling(window=window, min_periods=1).std().fillna(0)
    )

# Time index (trend)
monthly_df['time_index'] = monthly_df.groupby('med_name').cumcount()

# Holiday features
if len(df_holidays) > 0:
    df_holidays['event_date'] = pd.to_datetime(df_holidays['event_date'])
    df_holidays['period'] = df_holidays['event_date'].dt.to_period('M')
    monthly_events = df_holidays.groupby('period')['is_national_holiday'].sum().reset_index()
    monthly_events.columns = ['period', 'total_holidays']
    
    monthly_df = monthly_df.merge(monthly_events, on='period', how='left')
    monthly_df['total_holidays'] = monthly_df['total_holidays'].fillna(0)
    monthly_df['holiday_ratio'] = monthly_df['total_holidays'] / monthly_df['days_in_month']
    print("   ‚úÖ Holiday features added")
else:
    monthly_df['total_holidays'] = 0
    monthly_df['holiday_ratio'] = 0
    print("   ‚ö†Ô∏è No holiday data - using zeros")

# Category encoding
category_mapping = {cat: idx for idx, cat in enumerate(monthly_df['category'].unique())}
monthly_df['category_encoded'] = monthly_df['category'].map(category_mapping)

print("‚úÖ Feature engineering complete!")
print(f"   Total features: {len(monthly_df.columns)}")

# Display feature summary
print("\nüìä Feature Summary:")
display(monthly_df.describe())

print("\nüéØ Sample processed data:")
display(monthly_df.head(10))

## üìà Step 8: Data Visualization

In [None]:
# Top medicines by total dispensing
top_medicines = monthly_df.groupby('med_name')['total_quantity'].sum().sort_values(ascending=False).head(10)

plt.figure(figsize=(14, 6))
top_medicines.plot(kind='barh', color='steelblue')
plt.title('Top 10 Medicines by Total Dispensing', fontsize=16, fontweight='bold')
plt.xlabel('Total Quantity Dispensed', fontsize=12)
plt.ylabel('Medicine', fontsize=12)
plt.tight_layout()
plt.show()

# Monthly trend for top medicine
top_med = top_medicines.index[0]
top_med_data = monthly_df[monthly_df['med_name'] == top_med].sort_values('date_start')

plt.figure(figsize=(14, 6))
plt.plot(top_med_data['date_start'], top_med_data['total_quantity'], marker='o', linewidth=2)
plt.title(f'Monthly Dispensing Trend: {top_med}', fontsize=16, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Quantity Dispensed', fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Seasonal distribution
seasonal_dist = monthly_df.groupby('season')['total_quantity'].sum().sort_values(ascending=False)

plt.figure(figsize=(10, 6))
seasonal_dist.plot(kind='bar', color=['#FF6B6B', '#4ECDC4', '#45B7D1', '#FFA07A'])
plt.title('Total Dispensing by Philippine Season', fontsize=16, fontweight='bold')
plt.xlabel('Season', fontsize=12)
plt.ylabel('Total Quantity', fontsize=12)
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

## üéØ Step 9: Train Gradient Boosting Models

In [None]:
print("="*80)
print("TRAINING GRADIENT BOOSTING MODELS")
print("="*80)

# Feature columns for training
feature_cols = [
    'month_of_year', 'quarter', 'month_sin', 'month_cos',
    'lag_1', 'lag_2', 'lag_3', 'lag_6', 'lag_12',
    'rolling_mean_3', 'rolling_mean_6', 'rolling_mean_12',
    'rolling_std_3', 'rolling_std_6', 'rolling_std_12',
    'time_index', 'holiday_ratio', 'days_in_month',
    'category_encoded'
]

models = {}
model_metrics = []
trained_count = 0
skipped_count = 0

print(f"\nüéØ Training models for {len(all_meds)} medicines...\n")

for i, med in enumerate(all_meds, 1):
    # Get medicine data
    med_data = monthly_df[monthly_df['med_name'] == med].copy()
    
    # Remove rows with NaN in critical features
    med_data_clean = med_data.dropna(subset=['lag_1', 'lag_2', 'lag_3'])
    
    if len(med_data_clean) < 6:
        skipped_count += 1
        if i % 10 == 0:
            print(f"   Progress: {i}/{len(all_meds)} (Trained: {trained_count}, Skipped: {skipped_count})")
        continue
    
    # Prepare features and target
    X = med_data_clean[feature_cols].values
    y = med_data_clean['total_quantity'].values
    
    # Train-test split
    if len(X) >= 12:
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, shuffle=False
        )
    else:
        X_train, y_train = X, y
        X_test, y_test = None, None
    
    # Train model
    model = GradientBoostingRegressor(
        n_estimators=150,
        learning_rate=0.1,
        max_depth=4,
        min_samples_split=4,
        min_samples_leaf=2,
        subsample=0.8,
        random_state=42,
        loss='squared_error'
    )
    
    model.fit(X_train, y_train)
    
    # Evaluate
    if X_test is not None:
        y_pred = model.predict(X_test)
        mae = mean_absolute_error(y_test, y_pred)
        rmse = np.sqrt(mean_squared_error(y_test, y_pred))
        r2 = r2_score(y_test, y_pred)
        
        model_metrics.append({
            'medicine': med,
            'category': med_data.iloc[0]['category'],
            'train_samples': len(X_train),
            'test_samples': len(X_test),
            'mae': mae,
            'rmse': rmse,
            'r2_score': r2
        })
    else:
        mae, rmse, r2 = None, None, None
    
    # Store model
    models[med] = {
        'model': model,
        'feature_cols': feature_cols,
        'last_data': med_data.iloc[-1].to_dict(),
        'category': med_data.iloc[0]['category'],
        'mae': mae,
        'r2': r2
    }
    
    # Save model to disk
    model_filename = f"{med.replace(' ', '_')}_enhanced_gbr.joblib"
    model_path = os.path.join(MODEL_DIR, model_filename)
    joblib.dump(model, model_path)
    
    trained_count += 1
    
    if i % 10 == 0:
        print(f"   Progress: {i}/{len(all_meds)} (Trained: {trained_count}, Skipped: {skipped_count})")

print("\n" + "="*80)
print(f"‚úÖ Training Complete!")
print(f"   Total medicines: {len(all_meds)}")
print(f"   Models trained: {trained_count}")
print(f"   Skipped (insufficient data): {skipped_count}")
print("="*80)

# Display model performance
if model_metrics:
    df_metrics = pd.DataFrame(model_metrics)
    print("\nüìä Model Performance Summary:")
    print(f"   Average MAE: {df_metrics['mae'].mean():.2f}")
    print(f"   Average RMSE: {df_metrics['rmse'].mean():.2f}")
    print(f"   Average R¬≤: {df_metrics['r2_score'].mean():.3f}")
    
    print("\nüèÜ Top 10 Best Performing Models (by R¬≤):")
    display(df_metrics.sort_values('r2_score', ascending=False).head(10))

## üîÆ Step 10: Generate Forecasts

In [None]:
def generate_future_features(models, med_name, months_ahead):
    """Generate future feature vectors for prediction."""
    last_data = models[med_name]['last_data']
    feature_cols = models[med_name]['feature_cols']
    
    future_features = []
    last_date = pd.to_datetime(last_data['date_start'])
    
    for i in range(1, months_ahead + 1):
        future_date = last_date + relativedelta(months=i)
        
        # Time features
        month = future_date.month
        quarter = (month - 1) // 3 + 1
        month_sin = np.sin(2 * np.pi * month / 12)
        month_cos = np.cos(2 * np.pi * month / 12)
        days_in_month = calendar.monthrange(future_date.year, future_date.month)[1]
        time_index = last_data['time_index'] + i
        
        # Lag features
        if i == 1:
            lag_1 = last_data['total_quantity']
            lag_2 = last_data.get('lag_1', lag_1)
            lag_3 = last_data.get('lag_2', lag_1)
            lag_6 = last_data.get('lag_5', lag_1)
            lag_12 = last_data.get('lag_11', lag_1)
        else:
            lag_1 = future_features[i-2][4] if i > 1 else last_data['total_quantity']
            lag_2 = future_features[i-3][4] if i > 2 else last_data.get('lag_1', lag_1)
            lag_3 = future_features[i-4][4] if i > 3 else last_data.get('lag_2', lag_1)
            lag_6 = future_features[i-7][4] if i > 6 else last_data.get('lag_5', lag_1)
            lag_12 = future_features[i-13][4] if i > 12 else last_data.get('lag_11', lag_1)
        
        # Rolling stats (approximate)
        rolling_mean_3 = last_data.get('rolling_mean_3', lag_1)
        rolling_mean_6 = last_data.get('rolling_mean_6', lag_1)
        rolling_mean_12 = last_data.get('rolling_mean_12', lag_1)
        rolling_std_3 = last_data.get('rolling_std_3', 0)
        rolling_std_6 = last_data.get('rolling_std_6', 0)
        rolling_std_12 = last_data.get('rolling_std_12', 0)
        
        holiday_ratio = last_data.get('holiday_ratio', 0.1)
        category_encoded = last_data.get('category_encoded', 0)
        
        features = [
            month, quarter, month_sin, month_cos,
            lag_1, lag_2, lag_3, lag_6, lag_12,
            rolling_mean_3, rolling_mean_6, rolling_mean_12,
            rolling_std_3, rolling_std_6, rolling_std_12,
            time_index, holiday_ratio, days_in_month,
            category_encoded
        ]
        
        future_features.append(features)
    
    return np.array(future_features)

def calculate_seasonal_predictions(monthly_preds):
    """Calculate seasonal predictions."""
    current_month = datetime.now().month
    seasonal_preds = {}
    
    month_predictions = {}
    for i, pred in enumerate(monthly_preds[:12]):
        month = ((current_month + i - 1) % 12) + 1
        month_predictions[month] = pred
    
    for season_name, months in SEASONS.items():
        season_values = [month_predictions.get(m, 0) for m in months if m in month_predictions]
        if season_values:
            seasonal_preds[season_name] = round(float(np.mean(season_values)), 2)
        else:
            seasonal_preds[season_name] = 0
    
    return seasonal_preds

# Generate forecasts
print("üîÆ Generating forecasts for all trained models...\n")

all_forecasts = {}
months_ahead = 12

for med_name in models.keys():
    try:
        # Generate future features
        X_future = generate_future_features(models, med_name, months_ahead)
        
        # Predict
        model = models[med_name]['model']
        predictions = model.predict(X_future)
        predictions = np.maximum(predictions, 0)  # Ensure non-negative
        
        # Store monthly predictions
        monthly_preds = [round(float(p), 2) for p in predictions]
        
        # Calculate quarterly
        quarterly_preds = []
        for q in range(0, len(monthly_preds), 3):
            quarter_avg = np.mean(monthly_preds[q:q+3])
            quarterly_preds.append(round(float(quarter_avg), 2))
        
        # Calculate seasonal
        seasonal_preds = calculate_seasonal_predictions(monthly_preds)
        
        all_forecasts[med_name] = {
            'monthly': {
                'next_1_month': monthly_preds[0],
                'next_2_months': monthly_preds[1],
                'next_3_months': monthly_preds[2],
                'all_months': monthly_preds
            },
            'quarterly': {
                'next_quarter': quarterly_preds[0],
                'all_quarters': quarterly_preds
            },
            'seasonal': seasonal_preds,
            'model_performance': {
                'mae': models[med_name].get('mae'),
                'r2_score': models[med_name].get('r2')
            },
            'category': models[med_name]['category']
        }
    except Exception as e:
        print(f"‚ö†Ô∏è Error forecasting {med_name}: {e}")

print(f"‚úÖ Generated forecasts for {len(all_forecasts)} medicines")

# Display top predictions
sorted_forecasts = sorted(
    all_forecasts.items(),
    key=lambda x: x[1]['monthly']['next_1_month'],
    reverse=True
)[:10]

print("\nüìà TOP 10 HIGHEST PREDICTED DEMAND (Next Month):")
print("-" * 80)
print(f"{'Medicine':<40} {'Category':<20} {'Predicted Qty':<15}")
print("-" * 80)
for med, data in sorted_forecasts:
    print(f"{med:<40} {data['category']:<20} {data['monthly']['next_1_month']:<15.2f}")
print("-" * 80)

## üìä Step 11: Visualize Forecasts

In [None]:
# Visualize forecast for top medicine
top_med = sorted_forecasts[0][0]
top_med_forecast = all_forecasts[top_med]

# Get historical data
historical = monthly_df[monthly_df['med_name'] == top_med].sort_values('date_start')

# Create future dates
last_date = historical['date_start'].max()
future_dates = [last_date + relativedelta(months=i) for i in range(1, 13)]

# Plot
plt.figure(figsize=(16, 7))

# Historical
plt.plot(historical['date_start'], historical['total_quantity'], 
         marker='o', linewidth=2, label='Historical', color='steelblue')

# Forecast
plt.plot(future_dates, top_med_forecast['monthly']['all_months'],
         marker='s', linewidth=2, linestyle='--', label='Forecast', color='orangered')

plt.axvline(x=last_date, color='gray', linestyle=':', linewidth=2, label='Forecast Start')
plt.title(f'Medicine Demand Forecast: {top_med}', fontsize=16, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Quantity', fontsize=12)
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Seasonal forecast visualization
seasonal_data = top_med_forecast['seasonal']
seasons = list(seasonal_data.keys())
values = list(seasonal_data.values())

plt.figure(figsize=(10, 6))
colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#FFA07A']
plt.bar(seasons, values, color=colors)
plt.title(f'Seasonal Demand Forecast: {top_med}', fontsize=16, fontweight='bold')
plt.xlabel('Season', fontsize=12)
plt.ylabel('Predicted Average Quantity', fontsize=12)
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

## üíæ Step 12: Save Results & Export Models

In [None]:
print("üíæ Saving forecast results...\n")

# Save comprehensive forecast
output_file = os.path.join(RESULTS_DIR, 'enhanced_forecast_results.json')
with open(output_file, 'w') as f:
    json.dump(all_forecasts, f, indent=4)
print(f"‚úÖ Saved: {output_file}")

# Save backward-compatible monthly forecast
monthly_simple = {med: data['monthly']['next_1_month'] for med, data in all_forecasts.items()}
with open(os.path.join(RESULTS_DIR, 'forecast_results.json'), 'w') as f:
    json.dump(monthly_simple, f, indent=4)
print(f"‚úÖ Saved: {RESULTS_DIR}/forecast_results.json")

# Save seasonal forecast
seasonal_forecast = {
    med: {
        'next_month_pred': data['monthly']['next_1_month'],
        'quarter_avg_pred': data['quarterly']['next_quarter']
    }
    for med, data in all_forecasts.items()
}
with open(os.path.join(RESULTS_DIR, 'seasonal_forecast.json'), 'w') as f:
    json.dump(seasonal_forecast, f, indent=4)
print(f"‚úÖ Saved: {RESULTS_DIR}/seasonal_forecast.json")

# Save model performance metrics
if model_metrics:
    df_metrics.to_csv(os.path.join(RESULTS_DIR, 'model_performance.csv'), index=False)
    print(f"‚úÖ Saved: {RESULTS_DIR}/model_performance.csv")

print("\n" + "="*80)
print("üì¶ EXPORT SUMMARY")
print("="*80)
print(f"\n‚úÖ Trained Models: {len(models)} saved in '{MODEL_DIR}/'")
print(f"‚úÖ Forecast Results: {len(all_forecasts)} saved in '{RESULTS_DIR}/'")
print(f"‚úÖ Total Files: {len(os.listdir(MODEL_DIR)) + len(os.listdir(RESULTS_DIR))}")
print("\n" + "="*80)

## üì• Step 13: Download Models & Results for VPS Deployment

In [None]:
import shutil
from google.colab import files

print("üì¶ Preparing deployment package...\n")

# Create deployment package
deployment_dir = 'medicine_forecast_deployment'
os.makedirs(deployment_dir, exist_ok=True)

# Copy models
shutil.copytree(MODEL_DIR, os.path.join(deployment_dir, 'models'), dirs_exist_ok=True)
print(f"‚úÖ Copied {len(os.listdir(MODEL_DIR))} models")

# Copy results
shutil.copytree(RESULTS_DIR, os.path.join(deployment_dir, 'forecast_results'), dirs_exist_ok=True)
print(f"‚úÖ Copied {len(os.listdir(RESULTS_DIR))} result files")

# Create README
readme_content = f"""# Medicine Demand Forecasting - Deployment Package

## Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}

## Contents:
- `models/`: {len(os.listdir(MODEL_DIR))} trained Gradient Boosting models (.joblib files)
- `forecast_results/`: Forecast predictions (JSON files)

## Deployment Instructions:

1. **Upload to VPS:**
   - Upload the entire `models/` directory to `/home/user/health/models/`
   - Upload forecast JSON files to `/home/user/health/forecast_results/`

2. **Install Dependencies on VPS:**
   ```bash
   pip install pandas numpy scikit-learn joblib sqlalchemy mysql-connector-python
   ```

3. **Run Forecasting:**
   - Use `forecast_enhanced_gbr.py` to generate new forecasts
   - Models will be loaded from the `models/` directory

4. **API Integration:**
   - The forecast API will read from JSON files in `forecast_results/`
   - Files: `forecast_results.json`, `seasonal_forecast.json`, `enhanced_forecast_results.json`

## Model Information:
- Algorithm: Gradient Boosting Regressor
- Features: 19 (time, lag, rolling stats, holidays, categories)
- Predictions: Monthly (1-12 months), Quarterly, Seasonal

## Notes:
- Models are pre-trained and ready to use
- Re-train periodically as new dispensing data accumulates
- Ensure database credentials are configured in Python scripts
"""

with open(os.path.join(deployment_dir, 'README.md'), 'w') as f:
    f.write(readme_content)
print("‚úÖ Created README.md")

# Create deployment script
deploy_script = """#!/bin/bash
# Deployment script for VPS

echo "üöÄ Deploying Medicine Forecasting Models..."

# Set paths
TARGET_DIR="/home/user/health"

# Copy models
echo "üì¶ Copying models..."
cp -r models/ $TARGET_DIR/models/

# Copy forecast results
echo "üìä Copying forecast results..."
cp -r forecast_results/ $TARGET_DIR/forecast_results/

echo "‚úÖ Deployment complete!"
echo "Run: python3 $TARGET_DIR/forecast_enhanced_gbr.py"
"""

with open(os.path.join(deployment_dir, 'deploy.sh'), 'w') as f:
    f.write(deploy_script)
os.chmod(os.path.join(deployment_dir, 'deploy.sh'), 0o755)
print("‚úÖ Created deploy.sh")

# Create ZIP archive
print("\nüóúÔ∏è Creating ZIP archive...")
shutil.make_archive('medicine_forecast_deployment', 'zip', deployment_dir)
print("‚úÖ Created medicine_forecast_deployment.zip")

# Download
print("\nüì• Downloading deployment package...")
files.download('medicine_forecast_deployment.zip')

print("\n" + "="*80)
print("üéâ DEPLOYMENT PACKAGE READY!")
print("="*80)
print("\nüì¶ Package Contents:")
print(f"   - {len(os.listdir(MODEL_DIR))} trained models")
print(f"   - {len(os.listdir(RESULTS_DIR))} forecast files")
print("   - README.md with deployment instructions")
print("   - deploy.sh script for VPS setup")
print("\n‚úÖ Download complete! Upload to your VPS and run deploy.sh")
print("="*80)

## üìã Training Summary

### What We Accomplished:

1. ‚úÖ **Data Processing**: Loaded and preprocessed dispensing records
2. ‚úÖ **Feature Engineering**: Created 19 predictive features
3. ‚úÖ **Model Training**: Trained Gradient Boosting models for each medicine
4. ‚úÖ **Forecasting**: Generated monthly, quarterly, and seasonal predictions
5. ‚úÖ **Validation**: Evaluated model performance with MAE, RMSE, R¬≤
6. ‚úÖ **Export**: Created deployment package for VPS

### Next Steps:

1. **Upload to VPS**: Extract `medicine_forecast_deployment.zip` on your server
2. **Run Deployment**: Execute `./deploy.sh` to install models
3. **Test Forecasting**: Run `python3 forecast_enhanced_gbr.py`
4. **Integrate with Dashboard**: Update PHP dashboard to display forecasts
5. **Schedule Retraining**: Set up cron job to retrain models monthly

### Model Maintenance:

- **Retrain monthly** as new dispensing data accumulates
- **Monitor performance** using MAE and R¬≤ scores
- **Update features** if new data sources become available
- **Adjust parameters** if forecast accuracy drops

---

**üéâ Congratulations! Your medicine demand forecasting system is ready for production!**