# üö¶ Traffic Clearance Predictor - Complete ML System

This notebook implements a comprehensive machine learning system to predict traffic clearance times based on various factors including weather, vehicle count, time patterns, and environmental conditions.

## üìã Project Overview
- **Objective**: Predict traffic congestion clearance duration in minutes
- **Algorithm**: Optimized Random Forest Regressor with hyperparameter tuning
- **Features**: 8 engineered features including temporal and environmental variables
- **Performance Goal**: Achieve 85%+ accuracy (R¬≤ score)

In [1]:
# === STEP 1: IMPORTS AND SETUP ===
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Machine Learning Libraries
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.model_selection import train_test_split, RandomizedSearchCV, cross_val_score
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import mean_absolute_error, r2_score, mean_squared_error
from sklearn.pipeline import Pipeline
import joblib

# Configuration
plt.style.use('seaborn-v0_8')
np.random.seed(42)

print("‚úÖ All libraries imported successfully!")
print(f"üìÖ Analysis started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

‚úÖ All libraries imported successfully!
üìÖ Analysis started at: 2025-09-13 13:45:28


In [2]:
# === STEP 2: DATA LOADING AND INITIAL EXPLORATION ===
try:
    # Load dataset
    df = pd.read_csv("synthetic_traffic_data.csv")
    
    print("üéØ DATASET OVERVIEW:")
    print(f"üìä Shape: {df.shape[0]} rows √ó {df.shape[1]} columns")
    print(f"üìã Columns: {list(df.columns)}")
    print(f"üéØ Target Variable: congestion_duration_minutes ({df['congestion_duration_minutes'].min()}-{df['congestion_duration_minutes'].max()} minutes)")
    
    # Check for missing values
    missing_values = df.isnull().sum()
    if missing_values.sum() == 0:
        print("‚úÖ No missing values found!")
    else:
        print(f"‚ö†Ô∏è Missing values detected:\n{missing_values[missing_values > 0]}")
    
    # Basic statistics
    print(f"\nüìà TRAFFIC PATTERNS:")
    print(f"Average Congestion Duration: {df['congestion_duration_minutes'].mean():.1f} minutes")
    print(f"Average Vehicle Count: {df['vehicle_count'].mean():.1f} vehicles")
    print(f"Weather Conditions: {df['weather'].unique()}")
    print(f"Congestion Levels: {df['congestion_level'].unique()}")
    
except Exception as e:
    print(f"‚ùå Error loading data: {str(e)}")
    print("Please ensure 'synthetic_traffic_data.csv' is in the current directory")

üéØ DATASET OVERVIEW:
üìä Shape: 168 rows √ó 12 columns
üìã Columns: ['date', 'time', 'hour', 'day_of_week', 'day_name', 'vehicle_count', 'congestion_level', 'congestion_duration_minutes', 'is_weekend', 'weather', 'temperature', 'visibility']
üéØ Target Variable: congestion_duration_minutes (5-96 minutes)
‚úÖ No missing values found!

üìà TRAFFIC PATTERNS:
Average Congestion Duration: 27.8 minutes
Average Vehicle Count: 33.5 vehicles
Weather Conditions: ['Clear' 'Fog' 'Light Rain' 'Heavy Rain']
Congestion Levels: ['Low' 'Medium' 'High']


In [3]:
# === STEP 3: DATA PREPROCESSING AND FEATURE ENGINEERING ===

# Create a copy for processing
data = df.copy()

print("üîß FEATURE ENGINEERING IN PROGRESS...")

# 1. Encode categorical variables
le_weather = LabelEncoder()
le_congestion = LabelEncoder()

data['weather_encoded'] = le_weather.fit_transform(data['weather'])
data['congestion_level_encoded'] = le_congestion.fit_transform(data['congestion_level'])

# Store encoders for future use
encoders = {
    'weather': le_weather,
    'congestion_level': le_congestion
}

# 2. Create cyclical features for time
data['hour_sin'] = np.sin(2 * np.pi * data['hour'] / 24)
data['hour_cos'] = np.cos(2 * np.pi * data['hour'] / 24)
data['day_sin'] = np.sin(2 * np.pi * data['day_of_week'] / 7)
data['day_cos'] = np.cos(2 * np.pi * data['day_of_week'] / 7)

# 3. Create rush hour indicator
data['is_rush_hour'] = ((data['hour'] >= 7) & (data['hour'] <= 9) | 
                       (data['hour'] >= 17) & (data['hour'] <= 19)).astype(int)

# 4. Weather severity mapping
weather_severity = {'Clear': 1, 'Fog': 2, 'Light Rain': 3, 'Heavy Rain': 4}
data['weather_severity'] = data['weather'].map(weather_severity)

# 5. Vehicle density categories
data['vehicle_density'] = pd.cut(data['vehicle_count'], 
                               bins=[0, 20, 50, 100], 
                               labels=['Low', 'Medium', 'High'],
                               include_lowest=True)
data['vehicle_density_encoded'] = LabelEncoder().fit_transform(data['vehicle_density'].astype(str))

# 6. Interaction features
data['weather_vehicle_interaction'] = data['weather_severity'] * data['vehicle_count'] / 100
data['rush_weather_interaction'] = data['is_rush_hour'] * data['weather_severity']

# Define feature sets
basic_features = [
    'hour', 'day_of_week', 'vehicle_count', 'is_weekend',
    'temperature', 'visibility', 'weather_encoded', 'congestion_level_encoded'
]

engineered_features = basic_features + [
    'hour_sin', 'hour_cos', 'day_sin', 'day_cos',
    'is_rush_hour', 'weather_severity', 'vehicle_density_encoded',
    'weather_vehicle_interaction', 'rush_weather_interaction'
]

target_variable = 'congestion_duration_minutes'

print(f"‚úÖ Feature engineering completed!")
print(f"üìä Basic features: {len(basic_features)}")
print(f"üöÄ Engineered features: {len(engineered_features)}")
print(f"üéØ Target variable: {target_variable}")

# Display sample of processed data
print(f"\nüìã PROCESSED DATA SAMPLE:")
display_cols = ['hour', 'vehicle_count', 'weather', 'weather_encoded', 'is_rush_hour', 'congestion_duration_minutes']
print(data[display_cols].head())

üîß FEATURE ENGINEERING IN PROGRESS...
‚úÖ Feature engineering completed!
üìä Basic features: 8
üöÄ Engineered features: 17
üéØ Target variable: congestion_duration_minutes

üìã PROCESSED DATA SAMPLE:
   hour  vehicle_count     weather  weather_encoded  is_rush_hour  \
0     0              7       Clear                0             0   
1     1              2         Fog                1             0   
2     2              0  Light Rain                3             0   
3     3              0       Clear                0             0   
4     4              0       Clear                0             0   

   congestion_duration_minutes  
0                            9  
1                           10  
2                            7  
3                            8  
4                           11  


In [4]:
# === STEP 4: EXPLORATORY DATA ANALYSIS ===

print("üìà TRAFFIC PATTERN ANALYSIS:")

# 1. Rush hour analysis
rush_hour_stats = data.groupby('is_rush_hour').agg({
    'congestion_duration_minutes': ['mean', 'std'],
    'vehicle_count': 'mean'
}).round(2)

print(f"\nüö¶ RUSH HOUR IMPACT:")
print(rush_hour_stats)

# 2. Weather impact analysis
weather_impact = data.groupby('weather').agg({
    'congestion_duration_minutes': ['mean', 'count'],
    'vehicle_count': 'mean'
}).round(2)

print(f"\nüåßÔ∏è WEATHER IMPACT:")
print(weather_impact)

# 3. Hourly patterns
hourly_patterns = data.groupby('hour').agg({
    'congestion_duration_minutes': 'mean',
    'vehicle_count': 'mean'
}).round(2)

print(f"\n‚è∞ PEAK TRAFFIC HOURS:")
peak_hours = hourly_patterns.nlargest(5, 'vehicle_count')
print(peak_hours)

# 4. Correlation analysis
correlation_features = ['hour', 'vehicle_count', 'temperature', 'visibility', 
                       'weather_severity', 'is_rush_hour', 'congestion_duration_minutes']
correlation_matrix = data[correlation_features].corr()

print(f"\nüîó TOP CORRELATIONS WITH TARGET:")
target_corr = correlation_matrix['congestion_duration_minutes'].sort_values(key=abs, ascending=False)
print(target_corr.drop('congestion_duration_minutes').head(6))

# 5. Weekend vs Weekday analysis
weekend_analysis = data.groupby('is_weekend').agg({
    'congestion_duration_minutes': ['mean', 'std'],
    'vehicle_count': ['mean', 'max']
}).round(2)

print(f"\nüìÖ WEEKEND vs WEEKDAY PATTERNS:")
print(weekend_analysis)

üìà TRAFFIC PATTERN ANALYSIS:

üö¶ RUSH HOUR IMPACT:
             congestion_duration_minutes        vehicle_count
                                    mean    std          mean
is_rush_hour                                                 
0                                  21.18  15.16         24.82
1                                  47.55  22.89         59.64

üåßÔ∏è WEATHER IMPACT:
           congestion_duration_minutes       vehicle_count
                                  mean count          mean
weather                                                   
Clear                            25.24    51         30.69
Fog                              26.41    34         32.97
Heavy Rain                       34.61    38         39.55
Light Rain                       25.91    45         32.07

‚è∞ PEAK TRAFFIC HOURS:
      congestion_duration_minutes  vehicle_count
hour                                            
17                          56.57          74.86
18                       

In [5]:
# === STEP 5: ADVANCED ML MODEL WITH HYPERPARAMETER OPTIMIZATION ===

class AdvancedTrafficPredictor:
    def __init__(self):
        self.model = None
        self.scaler = None
        self.feature_names = None
        self.performance_metrics = {}
        self.feature_importance = None
        self.is_trained = False
        self.encoders = None
    
    def prepare_data(self, data, feature_columns, target_column, test_size=0.2):
        """Prepare and split data for training"""
        try:
            # Extract features and target
            X = data[feature_columns].copy()
            y = data[target_column].copy()
            
            # Handle any remaining missing values
            X = X.fillna(X.mean())
            
            # Store feature names
            self.feature_names = feature_columns
            
            # Split data
            X_train, X_test, y_train, y_test = train_test_split(
                X, y, test_size=test_size, random_state=42, stratify=None
            )
            
            # Scale features
            self.scaler = StandardScaler()
            X_train_scaled = self.scaler.fit_transform(X_train)
            X_test_scaled = self.scaler.transform(X_test)
            
            return X_train_scaled, X_test_scaled, y_train, y_test, X_train, X_test
            
        except Exception as e:
            print(f"‚ùå Data preparation error: {str(e)}")
            return None, None, None, None, None, None
    
    def optimize_hyperparameters(self, X_train, y_train):
        """Perform hyperparameter optimization"""
        print("üîÑ Hyperparameter optimization in progress...")
        
        # Define parameter distributions
        param_distributions = {
            'n_estimators': [100, 200, 300, 400, 500],
            'max_depth': [10, 15, 20, 25, 30, None],
            'min_samples_split': [2, 3, 5, 10],
            'min_samples_leaf': [1, 2, 4, 6],
            'max_features': ['sqrt', 'log2', None],
            'bootstrap': [True, False]
        }
        
        # Create base model
        base_model = RandomForestRegressor(random_state=42, n_jobs=-1)
        
        # Randomized search
        random_search = RandomizedSearchCV(
            estimator=base_model,
            param_distributions=param_distributions,
            n_iter=50,
            cv=5,
            scoring='r2',
            random_state=42,
            n_jobs=-1,
            verbose=0
        )
        
        # Fit the model
        random_search.fit(X_train, y_train)
        
        print(f"‚úÖ Best CV Score: {random_search.best_score_:.4f}")
        print(f"üéØ Best Parameters: {random_search.best_params_}")
        
        return random_search.best_estimator_
    
    def train_model(self, data, feature_columns, target_column):
        """Train the complete model"""
        try:
            print("üöÄ TRAINING ADVANCED TRAFFIC PREDICTOR")
            print("="*50)
            
            # Prepare data
            X_train_scaled, X_test_scaled, y_train, y_test, X_train_orig, X_test_orig = \
                self.prepare_data(data, feature_columns, target_column)
            
            if X_train_scaled is None:
                return False
            
            print(f"üìä Training samples: {X_train_scaled.shape[0]}")
            print(f"üìä Test samples: {X_test_scaled.shape[0]}")
            print(f"üìä Features: {X_train_scaled.shape[1]}")
            
            # Optimize hyperparameters
            self.model = self.optimize_hyperparameters(X_train_scaled, y_train)
            
            # Make predictions
            y_pred_train = self.model.predict(X_train_scaled)
            y_pred_test = self.model.predict(X_test_scaled)
            
            # Calculate metrics
            train_r2 = r2_score(y_train, y_pred_train)
            test_r2 = r2_score(y_test, y_pred_test)
            test_mae = mean_absolute_error(y_test, y_pred_test)
            test_rmse = np.sqrt(mean_squared_error(y_test, y_pred_test))
            
            # Cross-validation
            cv_scores = cross_val_score(self.model, X_train_scaled, y_train, cv=5, scoring='r2')
            
            # Store metrics
            self.performance_metrics = {
                'train_r2': train_r2,
                'test_r2': test_r2,
                'test_mae': test_mae,
                'test_rmse': test_rmse,
                'cv_mean': cv_scores.mean(),
                'cv_std': cv_scores.std(),
                'accuracy_percentage': test_r2 * 100
            }
            
            # Feature importance
            self.feature_importance = self.model.feature_importances_
            
            # Model is trained
            self.is_trained = True
            
            # Display results
            print(f"\nüèÜ TRAINING RESULTS:")
            print(f"‚úÖ Training R¬≤: {train_r2:.4f}")
            print(f"‚úÖ Test R¬≤: {test_r2:.4f} ({test_r2*100:.2f}% accuracy)")
            print(f"‚úÖ Test MAE: {test_mae:.2f} minutes")
            print(f"‚úÖ Test RMSE: {test_rmse:.2f} minutes")
            print(f"‚úÖ CV Score: {cv_scores.mean():.4f} ¬± {cv_scores.std():.4f}")
            
            return True
            
        except Exception as e:
            print(f"‚ùå Training error: {str(e)}")
            self.is_trained = False
            return False
    
    def get_feature_importance(self, top_n=10):
        """Get feature importance analysis"""
        if not self.is_trained or self.feature_importance is None:
            return None
        
        importance_df = pd.DataFrame({
            'feature': self.feature_names,
            'importance': self.feature_importance
        }).sort_values('importance', ascending=False)
        
        return importance_df.head(top_n)
    
    def predict(self, input_data):
        """Make predictions on new data"""
        if not self.is_trained:
            raise ValueError("Model not trained yet!")
        
        # Ensure input has correct features
        input_scaled = self.scaler.transform(input_data[self.feature_names])
        return self.model.predict(input_scaled)
    
    def save_model(self, filepath="traffic_predictor_model.pkl"):
        """Save the trained model"""
        if self.is_trained:
            model_data = {
                'model': self.model,
                'scaler': self.scaler,
                'feature_names': self.feature_names,
                'performance_metrics': self.performance_metrics,
                'encoders': self.encoders
            }
            joblib.dump(model_data, filepath)
            print(f"‚úÖ Model saved as {filepath}")
        else:
            print("‚ùå No trained model to save!")

print("‚úÖ AdvancedTrafficPredictor class defined successfully!")

‚úÖ AdvancedTrafficPredictor class defined successfully!


In [6]:
# === STEP 6: TRAIN AND EVALUATE MODELS ===

# Initialize predictor
predictor = AdvancedTrafficPredictor()
predictor.encoders = encoders

print("üéØ COMPARING MODEL CONFIGURATIONS:")
print("="*60)

# Train with basic features
print("\nüîµ MODEL 1: Basic Features")
basic_success = predictor.train_model(data, basic_features, target_variable)
basic_performance = predictor.performance_metrics.copy() if basic_success else {}

# Train with engineered features
print("\nüü¢ MODEL 2: Engineered Features")
predictor_advanced = AdvancedTrafficPredictor()
predictor_advanced.encoders = encoders
advanced_success = predictor_advanced.train_model(data, engineered_features, target_variable)
advanced_performance = predictor_advanced.performance_metrics.copy() if advanced_success else {}

# Compare results
if basic_success and advanced_success:
    print("\nüìä PERFORMANCE COMPARISON:")
    print("="*60)
    
    comparison = pd.DataFrame({
        'Basic Features': [
            f"{basic_performance['test_r2']:.4f}",
            f"{basic_performance['test_mae']:.2f}",
            f"{basic_performance['test_rmse']:.2f}",
            f"{basic_performance['accuracy_percentage']:.2f}%"
        ],
        'Engineered Features': [
            f"{advanced_performance['test_r2']:.4f}",
            f"{advanced_performance['test_mae']:.2f}",
            f"{advanced_performance['test_rmse']:.2f}",
            f"{advanced_performance['accuracy_percentage']:.2f}%"
        ]
    }, index=['R¬≤ Score', 'MAE (minutes)', 'RMSE (minutes)', 'Accuracy'])
    
    print(comparison)
    
    # Determine best model
    if advanced_performance['test_r2'] > basic_performance['test_r2']:
        best_model = predictor_advanced
        model_name = "Advanced (Engineered Features)"
        improvement = (advanced_performance['test_r2'] - basic_performance['test_r2']) * 100
        print(f"\nüèÜ WINNER: {model_name}")
        print(f"üìà Improvement: +{improvement:.2f}% accuracy")
    else:
        best_model = predictor
        model_name = "Basic Features"
        print(f"\nüèÜ WINNER: {model_name}")
        
else:
    print("‚ùå Model training failed!")
    best_model = None

# Feature importance analysis
if best_model and best_model.is_trained:
    print(f"\nüéØ TOP 10 MOST IMPORTANT FEATURES:")
    print("="*60)
    feature_imp = best_model.get_feature_importance(top_n=10)
    
    for idx, row in feature_imp.iterrows():
        print(f"{idx+1:2d}. {row['feature']:25s} {row['importance']:.4f}")
    
    # Performance interpretation
    final_accuracy = best_model.performance_metrics['accuracy_percentage']
    print(f"\nüé™ FINAL MODEL PERFORMANCE ANALYSIS:")
    print("="*60)
    
    if final_accuracy >= 90:
        print("üåü OUTSTANDING: Excellent performance for traffic prediction!")
        print("   Ready for production deployment.")
    elif final_accuracy >= 85:
        print("‚≠ê VERY GOOD: Strong predictive performance achieved!")
        print("   Suitable for real-world applications.")
    elif final_accuracy >= 75:
        print("üëç GOOD: Solid performance with room for improvement.")
        print("   Consider additional feature engineering.")
    else:
        print("üîÑ NEEDS IMPROVEMENT: Model requires optimization.")
        print("   Consider more data or different algorithms.")

üéØ COMPARING MODEL CONFIGURATIONS:

üîµ MODEL 1: Basic Features
üöÄ TRAINING ADVANCED TRAFFIC PREDICTOR
üìä Training samples: 134
üìä Test samples: 34
üìä Features: 8
üîÑ Hyperparameter optimization in progress...
‚úÖ Best CV Score: 0.6849
üéØ Best Parameters: {'n_estimators': 300, 'min_samples_split': 10, 'min_samples_leaf': 4, 'max_features': 'log2', 'max_depth': 25, 'bootstrap': True}

üèÜ TRAINING RESULTS:
‚úÖ Training R¬≤: 0.8109
‚úÖ Test R¬≤: 0.8622 (86.22% accuracy)
‚úÖ Test MAE: 6.34 minutes
‚úÖ Test RMSE: 8.17 minutes
‚úÖ CV Score: 0.6849 ¬± 0.0627

üü¢ MODEL 2: Engineered Features
üöÄ TRAINING ADVANCED TRAFFIC PREDICTOR
üìä Training samples: 134
üìä Test samples: 34
üìä Features: 17
üîÑ Hyperparameter optimization in progress...
‚úÖ Best CV Score: 0.6994
üéØ Best Parameters: {'n_estimators': 200, 'min_samples_split': 2, 'min_samples_leaf': 6, 'max_features': 'sqrt', 'max_depth': 25, 'bootstrap': True}

üèÜ TRAINING RESULTS:
‚úÖ Training R¬≤: 0.7947
‚úÖ Test 

In [7]:
# === STEP 7: MODEL TESTING WITH SAMPLE PREDICTIONS ===

if best_model and best_model.is_trained:
    print("üß™ TESTING MODEL WITH SAMPLE SCENARIOS:")
    print("="*60)
    
    # Create test scenarios
    test_scenarios = [
        {
            'name': 'üåÖ Rush Hour - Clear Weather',
            'hour': 8, 'day_of_week': 1, 'vehicle_count': 85, 'is_weekend': 0,
            'temperature': 25, 'visibility': 10, 'weather_encoded': 0, 'congestion_level_encoded': 2
        },
        {
            'name': 'üåßÔ∏è Heavy Rain - Peak Traffic',
            'hour': 17, 'day_of_week': 2, 'vehicle_count': 95, 'is_weekend': 0,
            'temperature': 20, 'visibility': 7, 'weather_encoded': 2, 'congestion_level_encoded': 2
        },
        {
            'name': 'üòé Weekend - Light Traffic',
            'hour': 14, 'day_of_week': 6, 'vehicle_count': 30, 'is_weekend': 1,
            'temperature': 28, 'visibility': 10, 'weather_encoded': 0, 'congestion_level_encoded': 1
        },
        {
            'name': 'üåô Late Night - Minimal Traffic',
            'hour': 2, 'day_of_week': 3, 'vehicle_count': 5, 'is_weekend': 0,
            'temperature': 18, 'visibility': 10, 'weather_encoded': 0, 'congestion_level_encoded': 0
        },
        {
            'name': 'üå´Ô∏è Foggy Morning Commute',
            'hour': 7, 'day_of_week': 4, 'vehicle_count': 70, 'is_weekend': 0,
            'temperature': 22, 'visibility': 8, 'weather_encoded': 1, 'congestion_level_encoded': 2
        }
    ]
    
    # Make predictions for each scenario
    for scenario in test_scenarios:
        scenario_name = scenario.pop('name')
        
        # Create DataFrame for prediction
        scenario_df = pd.DataFrame([scenario])
        
        # Add engineered features if using advanced model
        if len(best_model.feature_names) > len(basic_features):
            # Add cyclical features
            scenario_df['hour_sin'] = np.sin(2 * np.pi * scenario_df['hour'] / 24)
            scenario_df['hour_cos'] = np.cos(2 * np.pi * scenario_df['hour'] / 24)
            scenario_df['day_sin'] = np.sin(2 * np.pi * scenario_df['day_of_week'] / 7)
            scenario_df['day_cos'] = np.cos(2 * np.pi * scenario_df['day_of_week'] / 7)
            
            # Add other engineered features
            scenario_df['is_rush_hour'] = ((scenario_df['hour'] >= 7) & (scenario_df['hour'] <= 9) | 
                                         (scenario_df['hour'] >= 17) & (scenario_df['hour'] <= 19)).astype(int)
            
            # Map weather severity
            weather_map = {0: 1, 1: 2, 2: 4, 3: 3}  # Clear, Fog, Heavy Rain, Light Rain
            scenario_df['weather_severity'] = scenario_df['weather_encoded'].map(weather_map)
            
            # Vehicle density
            vehicle_count = scenario_df['vehicle_count'].iloc[0]
            if vehicle_count <= 20:
                scenario_df['vehicle_density_encoded'] = 0
            elif vehicle_count <= 50:
                scenario_df['vehicle_density_encoded'] = 1
            else:
                scenario_df['vehicle_density_encoded'] = 2
            
            # Interaction features
            scenario_df['weather_vehicle_interaction'] = scenario_df['weather_severity'] * scenario_df['vehicle_count'] / 100
            scenario_df['rush_weather_interaction'] = scenario_df['is_rush_hour'] * scenario_df['weather_severity']
        
        try:
            prediction = best_model.predict(scenario_df)[0]
            print(f"{scenario_name}")
            print(f"   üïí Predicted Clearance Time: {prediction:.1f} minutes")
            
            # Add context
            if prediction < 15:
                print(f"   ‚úÖ Quick clearance expected")
            elif prediction < 30:
                print(f"   ‚ö†Ô∏è  Moderate delay expected")
            elif prediction < 60:
                print(f"   üö® Significant delay expected")
            else:
                print(f"   üî¥ Major traffic disruption expected")
            print()
            
        except Exception as e:
            print(f"   ‚ùå Prediction error: {str(e)}")
            print()
    
    # Save the model
    print("üíæ SAVING MODEL:")
    print("="*30)
    best_model.save_model("advanced_traffic_predictor.pkl")
    
    print(f"\nüéä MODEL TRAINING COMPLETED SUCCESSFULLY!")
    print(f"üìà Final Model Accuracy: {best_model.performance_metrics['accuracy_percentage']:.2f}%")
    print(f"üéØ Model Type: {model_name}")
    print(f"üíæ Model Saved As: advanced_traffic_predictor.pkl")
    print(f"\nüöÄ Your Traffic Clearance Predictor is ready for deployment!")
    
else:
    print("‚ùå No trained model available for testing!")

üß™ TESTING MODEL WITH SAMPLE SCENARIOS:
üåÖ Rush Hour - Clear Weather
   üïí Predicted Clearance Time: 43.3 minutes
   üö® Significant delay expected

üåßÔ∏è Heavy Rain - Peak Traffic
   üïí Predicted Clearance Time: 43.3 minutes
   üö® Significant delay expected

üòé Weekend - Light Traffic
   üïí Predicted Clearance Time: 18.5 minutes
   ‚ö†Ô∏è  Moderate delay expected

üåô Late Night - Minimal Traffic
   üïí Predicted Clearance Time: 26.6 minutes
   ‚ö†Ô∏è  Moderate delay expected

üå´Ô∏è Foggy Morning Commute
   üïí Predicted Clearance Time: 41.1 minutes
   üö® Significant delay expected

üíæ SAVING MODEL:
‚úÖ Model saved as advanced_traffic_predictor.pkl

üéä MODEL TRAINING COMPLETED SUCCESSFULLY!
üìà Final Model Accuracy: 86.22%
üéØ Model Type: Basic Features
üíæ Model Saved As: advanced_traffic_predictor.pkl

üöÄ Your Traffic Clearance Predictor is ready for deployment!


In [8]:
# === STEP 8: MODEL SUMMARY AND DEPLOYMENT INSTRUCTIONS ===

if best_model and best_model.is_trained:
    print("üìã FINAL MODEL SUMMARY:")
    print("="*60)
    
    metrics = best_model.performance_metrics
    
    summary_info = {
        'üéØ Model Type': 'Random Forest Regressor (Optimized)',
        'üìä Dataset Size': f"{data.shape[0]} records",
        'üîß Features Used': f"{len(best_model.feature_names)} features",
        'üìà Accuracy (R¬≤)': f"{metrics['accuracy_percentage']:.2f}%",
        'üìâ Mean Absolute Error': f"{metrics['test_mae']:.2f} minutes",
        'üìê Root Mean Square Error': f"{metrics['test_rmse']:.2f} minutes",
        'üîÑ Cross-Validation': f"{metrics['cv_mean']:.4f} ¬± {metrics['cv_std']:.4f}",
        'üíæ Model File': 'advanced_traffic_predictor.pkl'
    }
    
    for key, value in summary_info.items():
        print(f"{key:25s} {value}")
    
    print(f"\nüöÄ DEPLOYMENT INSTRUCTIONS:")
    print("="*60)
    
    deployment_code = '''# Load the saved model
import joblib
import pandas as pd
import numpy as np

# Load model
model_data = joblib.load('advanced_traffic_predictor.pkl')
model = model_data['model']
scaler = model_data['scaler']
feature_names = model_data['feature_names']

# Example prediction function
def predict_clearance_time(hour, day_of_week, vehicle_count, is_weekend, 
                          temperature, visibility, weather_encoded, 
                          congestion_level_encoded):
    """Predict traffic clearance time"""
    
    # Create input dataframe
    input_data = pd.DataFrame([{
        'hour': hour,
        'day_of_week': day_of_week, 
        'vehicle_count': vehicle_count,
        'is_weekend': is_weekend,
        'temperature': temperature,
        'visibility': visibility,
        'weather_encoded': weather_encoded,
        'congestion_level_encoded': congestion_level_encoded
    }])
    
    # Add engineered features (if using advanced model)
    if len(feature_names) > 8:
        # Add cyclical and other engineered features
        input_data['hour_sin'] = np.sin(2 * np.pi * input_data['hour'] / 24)
        input_data['hour_cos'] = np.cos(2 * np.pi * input_data['hour'] / 24)
        # ... (add other features as needed)
    
    # Scale and predict
    input_scaled = scaler.transform(input_data[feature_names])
    prediction = model.predict(input_scaled)[0]
    
    return round(prediction, 1)

# Example usage
clearance_time = predict_clearance_time(
    hour=8, day_of_week=1, vehicle_count=75, is_weekend=0,
    temperature=25, visibility=10, weather_encoded=0, 
    congestion_level_encoded=2
)
print(f"Predicted clearance time: {clearance_time} minutes")'''
    
    print("üìù Sample deployment code saved in model summary.")
    print("\n‚úÖ Traffic Clearance Predictor is fully operational!")
    print("üéâ Ready for real-time traffic management applications!")
    
    # Save deployment code to file
    with open('deployment_example.py', 'w') as f:
        f.write(deployment_code)
    print("\nüíæ Deployment example saved as 'deployment_example.py'")
    
else:
    print("‚ùå Model training was not successful. Please review the errors above.")

print("\n" + "="*60)
print("üéä TRAFFIC CLEARANCE PREDICTOR PROJECT COMPLETED!")
print("="*60)

üìã FINAL MODEL SUMMARY:
üéØ Model Type              Random Forest Regressor (Optimized)
üìä Dataset Size            168 records
üîß Features Used           8 features
üìà Accuracy (R¬≤)           86.22%
üìâ Mean Absolute Error     6.34 minutes
üìê Root Mean Square Error  8.17 minutes
üîÑ Cross-Validation        0.6849 ¬± 0.0627
üíæ Model File              advanced_traffic_predictor.pkl

üöÄ DEPLOYMENT INSTRUCTIONS:
üìù Sample deployment code saved in model summary.

‚úÖ Traffic Clearance Predictor is fully operational!
üéâ Ready for real-time traffic management applications!

üíæ Deployment example saved as 'deployment_example.py'

üéä TRAFFIC CLEARANCE PREDICTOR PROJECT COMPLETED!
