# üî• AI-Based Forest Fire & Smoke Detection Using Aerial Imagery

---

## üß© Problem Statement

### What Problem Are We Solving?

We are building an **AI system** that can look at pictures taken by drones flying over forests and tell us:
- "This area has **FIRE**! üî•"
- "This area is **SAFE**! ‚úÖ"

### Why Is This Important?

Forest fires are **dangerous disasters** that:
- Destroy animals' homes ü¶å
- Pollute the air we breathe üå´Ô∏è
- Hurt people and their houses üè†

If we detect fires **early**, we can save lives!

### Real-Life Analogy

Think of this like being a **doctor for the forest**:
- A doctor checks your temperature ‚Üí Our AI checks how "red" areas look
- A doctor looks at X-rays ‚Üí Our AI looks at drone pictures
- Doctor says "healthy" or "sick" ‚Üí AI says "safe" or "fire detected"

---

## ü™ú Steps to Solve

1. **Load Data** - Read the forest tile information
2. **Understand Data** - Explore features and class distribution
3. **Visualize** - Draw graphs to see patterns
4. **Train Model** - Teach the computer to recognize fire
5. **Evaluate** - Check how well it learned
6. **Create Risk Map** - Show dangerous areas
7. **Drone Recommendations** - Suggest where to send drones

---

## üéØ Expected Output

- **Accuracy > 85%** - Model correctly classifies most tiles
- **Fire Risk Heatmap** - Visual map showing danger zones
- **Drone Deployment Plan** - Priority list for inspection

---

## üìö Section 1: Import Libraries

### üîπ What Are Libraries?

Libraries are like **toolboxes** that contain pre-written code. Instead of writing everything from scratch, we use these tools.

### üîπ Why Do We Import Libraries?

- **pandas** - Works with data tables (like Excel)
- **numpy** - Does math operations on arrays
- **matplotlib** - Creates graphs and charts
- **seaborn** - Makes beautiful statistical plots
- **scikit-learn** - Contains machine learning algorithms

### üîπ Real-Life Analogy

Importing libraries is like:
- Getting a calculator from your pencil box (instead of doing math by hand)
- Using a ready-made template instead of designing from scratch

In [None]:
# =============================================================================
# SECTION 1: IMPORT LIBRARIES
# =============================================================================

# pandas: For working with data tables (DataFrames)
# WHY: We need to read CSV files and manipulate data
import pandas as pd

# numpy: For numerical operations on arrays
# WHY: Fast mathematical computations
import numpy as np

# matplotlib: For creating visualizations
# WHY: We need to draw graphs and charts
import matplotlib.pyplot as plt

# seaborn: For beautiful statistical visualizations
# WHY: Makes prettier graphs than matplotlib alone
import seaborn as sns

# scikit-learn: Machine learning library
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, classification_report, roc_curve, roc_auc_score
)
from sklearn.preprocessing import StandardScaler

# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings('ignore')

# Set visual style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")

print("‚úÖ All libraries imported successfully!")

---

## üìÅ Task 1: Data Understanding

### üîπ What Is This Task?

We need to **load and explore** our dataset to understand:
- How many samples (tiles) do we have?
- What features (columns) are available?
- How are safe vs fire tiles distributed?

### üîπ Why Is This Important?

Before teaching a computer, we must understand the data - just like a teacher reviews the textbook before teaching students.

### üîπ Dataset Features Explained

| Feature | What It Measures | Fire Indicator? |
|---------|-----------------|------------------|
| mean_red | How red the area looks | üî• Fire is RED |
| mean_green | How green the area looks | üå≤ Healthy trees |
| mean_blue | How blue the area looks | üíß Reference |
| red_blue_ratio | Red √∑ Blue | üî• High = fire |
| smoke_whiteness | How white/gray it looks | üí® Smoke indicator |

In [None]:
# =============================================================================
# TASK 1: LOAD THE DATASET
# =============================================================================

# WHAT: Read the CSV file from Google Sheets
# WHY: We need data to train our AI model
# HOW: pd.read_csv() reads CSV files into a DataFrame

DATA_URL = "https://docs.google.com/spreadsheets/d/1aszzbqsZ3G_LmH81EvRL06i5jDwJDl1SyDJXgxsbygM/export?format=csv"

print("üì• Loading dataset from Google Sheets...")
df = pd.read_csv(DATA_URL)

print(f"‚úÖ Dataset loaded!")
print(f"   Shape: {df.shape[0]} rows √ó {df.shape[1]} columns")

### üîç Exploring the Dataset

Now let's look at the first few rows to understand what our data looks like.

- **df.head()** shows the first 5 rows
- Each row = One tile from the aerial image
- Each column = One feature (measurement)

In [None]:
# Display first 5 rows of the dataset
# WHAT: df.head() shows preview of data
# WHY: To visually inspect the data structure

print("üìä Dataset Preview (First 5 Rows):")
df.head()

### üìã Data Types and Missing Values

Let's check:
1. What type of data is in each column (numbers, text, etc.)
2. Are there any missing values?

In [None]:
# Check data types
print("üìã Column Names and Data Types:")
print(df.dtypes)

print("\nüîç Missing Values Check:")
missing = df.isnull().sum()
print(f"   Total missing values: {missing.sum()}")

if missing.sum() == 0:
    print("   ‚úÖ No missing values - dataset is complete!")

### üìà Statistical Summary

The `.describe()` method gives us:
- **count**: Number of values
- **mean**: Average value
- **std**: How spread out values are
- **min/max**: Smallest and largest values

In [None]:
# Statistical summary
print("üìà Statistical Summary:")
df.describe().round(3)

### üéØ Class Distribution (Fire vs Safe)

Let's count how many tiles are:
- **0 = Safe** (no fire)
- **1 = Fire** (fire detected)

This tells us if our data is **balanced** or **imbalanced**.

In [None]:
# Class distribution
print("üéØ Target Variable Distribution (fire_label):")
class_counts = df['fire_label'].value_counts()

print(f"   Safe (0): {class_counts[0]} tiles ({class_counts[0]/len(df)*100:.1f}%)")
print(f"   Fire (1): {class_counts[1]} tiles ({class_counts[1]/len(df)*100:.1f}%)")

# Visualize class distribution
plt.figure(figsize=(8, 5))
colors = ['green', 'red']
plt.bar(['Safe (0)', 'Fire (1)'], class_counts.values, color=colors)
plt.title('Class Distribution: Safe vs Fire Tiles', fontsize=14)
plt.ylabel('Number of Tiles')
plt.show()

---

## üìä Data Visualization

### üîπ Why Visualize Data?

Graphs help us:
- **See patterns** that numbers hide
- **Compare** fire vs safe tiles
- **Identify** which features are most useful

### üîπ Real-Life Analogy

Like a weather map that shows temperature with colors instead of just numbers - much easier to understand!

In [None]:
# Feature distributions by class
print("üìà Feature Distributions: Safe vs Fire")

features = [col for col in df.columns if col != 'fire_label']

fig, axes = plt.subplots(2, 5, figsize=(18, 8))
axes = axes.flatten()

for idx, feature in enumerate(features):
    ax = axes[idx]
    df[df['fire_label'] == 0][feature].hist(ax=ax, alpha=0.5, label='Safe', bins=20, color='green')
    df[df['fire_label'] == 1][feature].hist(ax=ax, alpha=0.5, label='Fire', bins=20, color='red')
    ax.set_title(feature, fontsize=10)
    ax.legend(fontsize=8)

plt.suptitle('Feature Distributions: Safe vs Fire', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

### üîó Correlation Heatmap

**Correlation** tells us how features relate to each other:
- **+1.0** = Perfect positive relationship
- **-1.0** = Perfect negative relationship
- **0.0** = No relationship

In [None]:
# Correlation heatmap
print("üìä Correlation Heatmap")

plt.figure(figsize=(12, 10))
correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='RdYlGn', center=0, fmt='.2f')
plt.title('Feature Correlation Heatmap', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

# Show correlation with target
print("\nüéØ Feature Correlation with Fire Label:")
correlations = df.corr()['fire_label'].drop('fire_label').sort_values(ascending=False)
for feature, corr in correlations.items():
    indicator = "üî•" if corr > 0.2 else "üí®" if corr > 0 else "üå≤"
    print(f"   {indicator} {feature}: {corr:.3f}")

---

## ü§ñ Task 2: Machine Learning Model

### üîπ What Is Machine Learning?

Machine Learning is teaching computers to learn from examples, just like:
- A child learns to recognize dogs by seeing many dog pictures
- Our model learns to recognize fire by seeing many fire examples

### üîπ What Is Random Forest?

Random Forest is like asking **100 experts** (decision trees) and taking the **majority vote**.

### üîπ Steps:
1. **Prepare data** - Separate features (X) and target (y)
2. **Split data** - 80% training, 20% testing
3. **Train model** - Teach the computer
4. **Evaluate** - Check how well it learned

In [None]:
# =============================================================================
# PREPARE FEATURES AND TARGET
# =============================================================================

# WHAT: Separate input features (X) from target label (y)
# WHY: ML needs separate inputs and expected outputs

print("‚úÇÔ∏è Preparing features and target...")

X = df.drop('fire_label', axis=1)  # All columns except target
y = df['fire_label']               # Target column

print(f"   Features (X): {X.shape}")
print(f"   Target (y): {y.shape}")

### ‚úÇÔ∏è Train-Test Split

We divide data into:
- **Training set (80%)** - Used to teach the model
- **Testing set (20%)** - Used to test if learning worked

**Analogy**: Study from a textbook, then take a surprise test!

In [None]:
# Split data into training and testing sets
# train_test_split() randomly divides the data

X_train, X_test, y_train, y_test = train_test_split(
    X, y,                    # Data to split
    test_size=0.2,           # 20% for testing
    random_state=42,         # For reproducibility
    stratify=y               # Keep class proportions equal
)

print(f"üìö Data Split:")
print(f"   Training set: {X_train.shape[0]} samples")
print(f"   Testing set: {X_test.shape[0]} samples")

### üìè Feature Scaling

**What**: Normalize features to same scale (mean=0, std=1)

**Why**: Some features range 0-1, others 0-10. Scaling makes them comparable.

**Analogy**: Converting different currencies to a common currency (like USD)

In [None]:
# Feature scaling using StandardScaler
# WHAT: Transforms features to have mean=0 and std=1
# WHY: ML algorithms often work better with scaled features

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("‚úÖ Features scaled using StandardScaler")

### üå≤ Train Random Forest Classifier

**Parameters explained**:
- `n_estimators=100`: Use 100 decision trees
- `max_depth=10`: Each tree can be 10 levels deep
- `random_state=42`: Ensures reproducible results

In [None]:
# Train Random Forest Classifier
print("üå≤ Training Random Forest Classifier...")

rf_model = RandomForestClassifier(
    n_estimators=100,        # Number of trees
    max_depth=10,            # Maximum depth per tree
    min_samples_split=5,     # Minimum samples to split
    min_samples_leaf=2,      # Minimum samples in leaf
    random_state=42,         # Reproducibility
    n_jobs=-1                # Use all CPU cores
)

rf_model.fit(X_train_scaled, y_train)
print("‚úÖ Model training complete!")

# Make predictions
y_pred = rf_model.predict(X_test_scaled)
y_pred_proba = rf_model.predict_proba(X_test_scaled)[:, 1]
print(f"‚úÖ Predictions made on {len(y_pred)} test samples")

### üìà Model Evaluation

**Metrics explained**:

| Metric | What It Measures |
|--------|------------------|
| **Accuracy** | % of all predictions that are correct |
| **Precision** | Of "fire" predictions, % that were correct |
| **Recall** | Of actual fires, % that we caught |
| **F1-Score** | Balance of precision and recall |
| **ROC-AUC** | Ranking ability (0.5=guessing, 1.0=perfect) |

> **Important**: Recall for fire is CRITICAL! Missing a fire is much worse than a false alarm.

In [None]:
# Calculate evaluation metrics
print("üìà MODEL EVALUATION RESULTS:")
print("-" * 60)

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_pred_proba)

print(f"\nüéØ Accuracy:  {accuracy:.4f} ({accuracy*100:.1f}%)")
print(f"üîç Precision: {precision:.4f} (Of fire predictions, {precision*100:.1f}% correct)")
print(f"üö® Recall:    {recall:.4f} (Caught {recall*100:.1f}% of actual fires)")
print(f"‚öñÔ∏è F1-Score:  {f1:.4f} (Balance of precision and recall)")
print(f"üìä ROC-AUC:   {roc_auc:.4f} (Ranking ability)")

print("\nüìã Classification Report:")
print(classification_report(y_test, y_pred, target_names=['Safe (0)', 'Fire (1)']))

### üéØ Confusion Matrix

A confusion matrix shows:
- **True Negatives (TN)**: Safe tiles correctly predicted as safe
- **False Positives (FP)**: Safe tiles wrongly predicted as fire (false alarms)
- **False Negatives (FN)**: Fire tiles missed (dangerous!)
- **True Positives (TP)**: Fire tiles correctly caught

In [None]:
# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Predicted Safe', 'Predicted Fire'],
            yticklabels=['Actual Safe', 'Actual Fire'])
plt.title('Confusion Matrix', fontsize=14, fontweight='bold')
plt.ylabel('Actual Label')
plt.xlabel('Predicted Label')
plt.show()

print("\nConfusion Matrix Interpretation:")
print(f"‚Ä¢ True Negatives (Safe‚ÜíSafe): {cm[0,0]}")
print(f"‚Ä¢ False Positives (Safe‚ÜíFire): {cm[0,1]} ‚ö†Ô∏è False alarms")
print(f"‚Ä¢ False Negatives (Fire‚ÜíSafe): {cm[1,0]} ‚ùå MISSED FIRES!")
print(f"‚Ä¢ True Positives (Fire‚ÜíFire): {cm[1,1]} ‚úÖ Correctly caught")

### üìà ROC Curve

**ROC** = Receiver Operating Characteristic

- Shows tradeoff between catching fires (TPR) and false alarms (FPR)
- **Higher curve = Better model**
- AUC = 1.0 is perfect, AUC = 0.5 is random guessing

In [None]:
# ROC Curve
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, 'b-', linewidth=2, label=f'Random Forest (AUC = {roc_auc:.3f})')
plt.plot([0, 1], [0, 1], 'r--', linewidth=1, label='Random Guess (AUC = 0.5)')
plt.fill_between(fpr, tpr, alpha=0.3)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate (Recall)')
plt.title('ROC Curve - Fire Detection Performance', fontsize=14, fontweight='bold')
plt.legend(loc='lower right')
plt.grid(True, alpha=0.3)
plt.show()

### üèÜ Feature Importance

Which features help the model most in detecting fire?

Random Forest calculates how much each feature contributes to predictions.

In [None]:
# Feature Importance
print("üèÜ Feature Importance:")

feature_importance = pd.DataFrame({
    'Feature': features,
    'Importance': rf_model.feature_importances_
}).sort_values('Importance', ascending=False)

plt.figure(figsize=(10, 6))
sns.barplot(x='Importance', y='Feature', data=feature_importance, palette='Reds_r')
plt.title('Feature Importance for Fire Detection', fontsize=14, fontweight='bold')
plt.xlabel('Importance Score')
plt.show()

print("\nTop 5 Most Important Features:")
for i, (_, row) in enumerate(feature_importance.head(5).iterrows()):
    print(f"   {i+1}. {row['Feature']}: {row['Importance']:.3f}")

---

## üó∫Ô∏è Task 3: Spatial Risk Analysis & Visualization

### üîπ What Is Spatial Risk Analysis?

We assign a **risk probability** to each tile and create a **heatmap** showing:
- üî¥ Red = High fire risk
- üü° Yellow = Medium risk
- üü¢ Green = Safe areas

### üîπ Why Is This Useful?

Helps emergency responders **prioritize** which areas to check first.

In [None]:
# Calculate fire risk for all tiles
print("üî• Calculating fire risk probabilities...")

X_all_scaled = scaler.transform(X)
all_predictions = rf_model.predict(X_all_scaled)
all_probabilities = rf_model.predict_proba(X_all_scaled)[:, 1]

df['fire_risk_probability'] = all_probabilities
df['predicted_label'] = all_predictions

# Classify risk levels
def classify_risk(prob):
    if prob < 0.25: return 'Low'
    elif prob < 0.50: return 'Medium'
    elif prob < 0.75: return 'High'
    else: return 'Critical'

df['risk_level'] = df['fire_risk_probability'].apply(classify_risk)

print("\nüìä Risk Level Distribution:")
risk_counts = df['risk_level'].value_counts()
for level in ['Low', 'Medium', 'High', 'Critical']:
    if level in risk_counts:
        count = risk_counts[level]
        emoji = {'Low': 'üü¢', 'Medium': 'üü°', 'High': 'üü†', 'Critical': 'üî¥'}[level]
        print(f"   {emoji} {level}: {count} tiles ({count/len(df)*100:.1f}%)")

### üó∫Ô∏è Fire Risk Heatmap

This map shows fire risk across all tiles.

Each cell represents one tile from the aerial imagery.

In [None]:
# Create fire risk heatmap
n_tiles = len(df)
grid_size = int(np.ceil(np.sqrt(n_tiles)))

risk_grid = np.zeros((grid_size, grid_size))
for i, prob in enumerate(all_probabilities):
    row = i // grid_size
    col = i % grid_size
    if row < grid_size and col < grid_size:
        risk_grid[row, col] = prob

plt.figure(figsize=(12, 10))
im = plt.imshow(risk_grid, cmap='YlOrRd', interpolation='nearest', vmin=0, vmax=1)
plt.colorbar(im, label='Fire Risk Probability', shrink=0.8)
plt.title('üî• Fire Risk Heatmap (Aerial Tile Analysis)', fontsize=14, fontweight='bold')
plt.xlabel('Tile Column (East ‚Üí)')
plt.ylabel('Tile Row (North ‚Üí)')
plt.show()

---

## üöÅ Task 4: Drone Deployment Recommendations

Based on our risk analysis, we recommend:
1. **Phase 1 (IMMEDIATE)**: Deploy drones to critical risk areas
2. **Phase 2 (URGENT)**: Patrol high-risk areas within 30 minutes
3. **Phase 3 (MONITORING)**: Regular checks on medium-risk areas

In [None]:
# Drone deployment recommendations
print("üöÅ DRONE DEPLOYMENT STRATEGY")
print("=" * 60)

critical_count = len(df[df['risk_level'] == 'Critical'])
high_count = len(df[df['risk_level'] == 'High'])
medium_count = len(df[df['risk_level'] == 'Medium'])

print(f"\nüî¥ PHASE 1 - IMMEDIATE (Critical Risk: {critical_count} tiles)")
print("   ‚Ä¢ Deploy all available drones")
print("   ‚Ä¢ Alert ground firefighting teams")
print("   ‚Ä¢ Notify emergency services")

print(f"\nüü† PHASE 2 - URGENT (High Risk: {high_count} tiles)")
print("   ‚Ä¢ Schedule drone patrol within 30 minutes")
print("   ‚Ä¢ Position firefighting resources nearby")

print(f"\nüü° PHASE 3 - MONITORING (Medium Risk: {medium_count} tiles)")
print("   ‚Ä¢ Regular patrol every 2 hours")
print("   ‚Ä¢ Set up automated monitoring")

# Show top priority tiles
high_risk_tiles = df[df['fire_risk_probability'] >= 0.75].nlargest(5, 'fire_risk_probability')
print("\nüö® TOP 5 PRIORITY TILES:")
for i, (idx, row) in enumerate(high_risk_tiles.iterrows()):
    print(f"   #{i+1} Tile {idx}: Risk = {row['fire_risk_probability']*100:.1f}%")

---

## üìù Task 5: Reflection

### Dataset Limitations

1. **No temporal data** - Can't track fire progression over time
2. **Tiles analyzed independently** - Fire in one tile affects neighbors
3. **No weather data** - Wind and humidity affect fire spread

### Potential Improvements

1. Use **deep learning** (CNN) on raw images
2. Add **real-time** processing for live drone feeds
3. Integrate with **GIS systems** for 3D terrain mapping

In [None]:
# Final summary
print("üìä MODEL PERFORMANCE SUMMARY")
print("=" * 60)
print(f"\n   Metric          Value    Target   Status")
print("   " + "-" * 40)
print(f"   Accuracy        {accuracy*100:.1f}%    >85%     {'‚úÖ' if accuracy > 0.85 else '‚ö†Ô∏è'}")
print(f"   Precision       {precision*100:.1f}%    >75%     {'‚úÖ' if precision > 0.75 else '‚ö†Ô∏è'}")
print(f"   Recall          {recall*100:.1f}%    >80%     {'‚úÖ' if recall > 0.80 else '‚ö†Ô∏è'}")
print(f"   ROC-AUC         {roc_auc:.3f}     >0.85    {'‚úÖ' if roc_auc > 0.85 else '‚ö†Ô∏è'}")

print("\nüéâ FOREST FIRE DETECTION ANALYSIS COMPLETE!")