# Model Evaluation for Rainfall Prediction

This notebook evaluates the trained machine learning models using various evaluation metrics including:
- Accuracy Score
- Confusion Matrix
- ROC-AUC Curve
- Precision, Recall, and F1-Score

## 1. Import Required Libraries

In [None]:
import numpy as np
import pandas as pd
import pickle
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score, confusion_matrix, roc_auc_score, roc_curve
from sklearn.metrics import precision_score, recall_score, f1_score, classification_report
import warnings
warnings.filterwarnings('ignore')

print("All libraries imported successfully!")

## 2. Load the Test Data and Predictions

Assuming you have already trained your models and have predictions saved

In [None]:
# Load your test data and predictions
# y_test = your test labels
# y_pred = predictions from your best model

# For demonstration, assuming these are already available from training
# You can load them from pickle files if needed:

# y_test = pickle.load(open('y_test.pkl', 'rb'))
# y_pred = pickle.load(open('y_pred.pkl', 'rb'))

print("Test data loaded successfully!")

## 3. Calculate Accuracy Scores for All Models

In [None]:
# Calculate accuracy scores
# Assuming p1_test through p7_test are predictions from different models

print("="*50)
print("Model Accuracy Scores on Test Data")
print("="*50)

accuracy_scores = {
    'xgboost': accuracy_score(y_test, p1_test),
    'Random_Forest': accuracy_score(y_test, p2_test),
    'svm': accuracy_score(y_test, p3_test),
    'Dtree': accuracy_score(y_test, p4_test),
    'GBM': accuracy_score(y_test, p5_test),
    'log': accuracy_score(y_test, p6_test),
    'KNN': accuracy_score(y_test, p7_test)
}

for model, score in accuracy_scores.items():
    print(f"{model}: {score}")

# Find the best model
best_model = max(accuracy_scores, key=accuracy_scores.get)
best_accuracy = accuracy_scores[best_model]

print(f"\n✓ Best Model: {best_model} with Accuracy: {best_accuracy}")

## 4. Confusion Matrix Analysis

In [None]:
# Calculate confusion matrix for the best model
# Using Random Forest predictions (p2_test) as example

conf_matrix = confusion_matrix(y_test, p2_test)

print("Confusion Matrix:")
print(conf_matrix)

# Extract values from confusion matrix
tn, fp, fn, tp = conf_matrix.ravel()

print(f"\nConfusion Matrix Components:")
print(f"True Negative (TN): {tn}")
print(f"False Positive (FP): {fp}")
print(f"False Negative (FN): {fn}")
print(f"True Positive (TP): {tp}")

## 5. Visualize Confusion Matrix

In [None]:
# Create a beautiful confusion matrix heatmap
fig, ax = plt.subplots(figsize=(7.5, 7.5))
ax.matshow(conf_matrix, alpha=0.3)

for i in range(conf_matrix.shape[0]):
    for j in range(conf_matrix.shape[1]):
        ax.text(x=j, y=i, s=conf_matrix[i, j], va='center', ha='center', size='xx-large')

plt.xlabel('Predictions', fontsize=18)
plt.ylabel('Actuals', fontsize=18)
plt.title('Confusion Matrix', fontsize=18)
plt.show()

## 6. Calculate Detailed Performance Metrics

In [None]:
# Calculate detailed metrics
accuracy = accuracy_score(y_test, p2_test)
precision = precision_score(y_test, p2_test, average='weighted', zero_division=0)
recall = recall_score(y_test, p2_test, average='weighted', zero_division=0)
f1 = f1_score(y_test, p2_test, average='weighted', zero_division=0)

print("\nDetailed Performance Metrics:")
print("="*50)
print(f"Accuracy:  {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall:    {recall:.4f}")
print(f"F1-Score:  {f1:.4f}")
print("="*50)

# Classification Report
print("\nClassification Report:")
print(classification_report(y_test, p2_test))

## 7. ROC-AUC Curve Analysis

In [None]:
# Calculate ROC-AUC score
# Note: Need probability predictions, not class predictions
# y_pred_proba = model.predict_proba(x_test_scaled)[:, 1]

auc = roc_auc_score(y_test, y_pred_proba)
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

print(f"AUC Score: {auc:.4f}")
print("\nROC Curve Information:")
print(f"False Positive Rate (FPR) values: {fpr[:5]}...")  # Show first 5 values
print(f"True Positive Rate (TPR) values: {tpr[:5]}...")
5

## 8. Plot ROC-AUC Curve

In [None]:
# Plot the ROC curve
plt.figure(figsize=(12, 10), dpi=80)
plt.axis('scaled')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.title("AUC & ROC Curve")
plt.plot(fpr, tpr, 'v')
plt.fill_between(fpr, tpr, facecolor='blue', alpha=0.8)
plt.text(1, 0.05, f'AUC = {auc:.4f}', ha='right', fontsize=10, weight='bold', color='black')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.show()

## 9. Model Comparison Summary

In [None]:
# Create a summary dataframe of all models
models_data = {
    'Model': ['XGBoost', 'Random Forest', 'SVM', 'Decision Tree', 'GBM', 'Logistic Regression', 'KNN'],
    'Accuracy': [accuracy_scores['xgboost'],
                 accuracy_scores['Random_Forest'],
                 accuracy_scores['svm'],
                 accuracy_scores['Dtree'],
                 accuracy_scores['GBM'],
                 accuracy_scores['log'],
                 accuracy_scores['KNN']]
}

models_df = pd.DataFrame(models_data)
models_df = models_df.sort_values('Accuracy', ascending=False)

print("\nModel Comparison Summary:")
print("="*50)
print(models_df.to_string(index=False))
print("="*50)

## 10. Save the Best Model using Pickle

In [None]:
# Save the best model (Random Forest in this example)
import pickle

# Save the trained model
pickle.dump(Rand_Forest, open('Rainfall.pkl', 'wb'))  # Model saving
print("✓ Model saved as 'Rainfall.pkl'")

# Save the scaler as well (important for preprocessing new data)
pickle.dump(scaler, open('scale.pkl', 'wb'))  # Scaling the data
print("✓ Scaler saved as 'scale.pkl'")

# Save encoder if you have one
# pickle.dump(encoder, open('encoder.pkl', 'wb'))  # Encoder saving
# pickle.dump(imputer, open('imputer.pkl', 'wb'))  # Imputer saving

print("\n✓ All models and preprocessing objects saved successfully!")

## 11. Load and Test the Saved Model

In [None]:
# Load the saved model
loaded_model = pickle.load(open('Rainfall.pkl', 'rb'))
loaded_scaler = pickle.load(open('scale.pkl', 'rb'))

print("✓ Model loaded successfully from pickle file")

# Test the loaded model
y_pred_loaded = loaded_model.predict(x_test_scaled)
accuracy_loaded = accuracy_score(y_test, y_pred_loaded)

print(f"✓ Loaded Model Accuracy: {accuracy_loaded:.4f}")
print(f"✓ Original Model Accuracy: {best_accuracy:.4f}")
print(f"\n✓ Model saved and loaded correctly!")

## Summary

### Key Findings:

1. **Accuracy Score**: Measures the ratio of correct predictions to total predictions
2. **Confusion Matrix**: Shows True Positives, True Negatives, False Positives, and False Negatives
3. **ROC-AUC Curve**: Evaluates the model's ability to distinguish between classes
4. **Performance Metrics**: 
   - **Precision**: Out of predicted positive cases, how many were actually positive
   - **Recall**: Out of actual positive cases, how many were correctly predicted
   - **F1-Score**: Harmonic mean of precision and recall

### Model Selection:
The best model has been selected based on accuracy and saved as `Rainfall.pkl` for deployment in the Flask application.