# üåæ AI-Based Crop Health Monitoring Using Drone Multispectral Data üöÅ

## üß© Problem Statement

### What Problem Are We Solving?

Imagine you are a **farmer** with a HUGE farm - so big that you can't walk around and check every single plant! Some plants might be **sick** (stressed) - they might not be getting enough water, bugs might be eating them, or the soil might not be good.

**Solution:** We use a **flying robot called a DRONE** üöÅ that flies over the farm and takes special photos. Then we use **AI (Artificial Intelligence)** ü§ñ to analyze these photos and tell which plants are healthy and which are stressed.

### Real-Life Analogy

| Human Health | Plant Health |
|-------------|------------|
| Doctor uses thermometer üå°Ô∏è | Drone uses special camera üì∏ |
| Doctor checks blood pressure | Drone checks plant color/moisture |
| Doctor says "Take medicine!" | AI says "Water this area!" |

---

## ü™ú Steps to Solve the Problem

1. **Load and explore** the dataset (understand what we have)
2. **Prepare data** for machine learning (clean and split)
3. **Train 5 ML models** and compare them
4. **Create a stress map** showing healthy vs stressed areas
5. **Recommend drone flight paths** for inspection

---

## üéØ Expected Output

- Model comparison table showing which algorithm works best
- A colorful heatmap showing stressed areas in red, healthy in green
- Priority list of zones for drone inspection

---

## üì¶ Section 1: Importing Libraries

### üîπ What are libraries?

Libraries are like **toolboxes** that contain ready-made tools. Instead of building everything from scratch, we use tools others have created.

**Real-Life Analogy:** Like using a calculator instead of doing math by hand!

### Libraries we'll use:

| Library | What it does | Like in real life |
|---------|--------------|-------------------|
| `pandas` | Handles data tables | Excel spreadsheet |
| `numpy` | Math operations | Calculator |
| `matplotlib` | Creates charts | Drawing with colors |
| `seaborn` | Prettier charts | Professional artist |
| `sklearn` | Machine learning | AI brain |

In [None]:
# =============================================================================
# IMPORTING LIBRARIES
# =============================================================================

# pandas: For handling data like Excel spreadsheets
import pandas as pd

# numpy: For fast math operations
import numpy as np

# matplotlib: For creating charts and plots
import matplotlib.pyplot as plt

# seaborn: For beautiful statistical visualizations
import seaborn as sns

# Ignore warnings to keep output clean
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ All libraries imported successfully!")

---

### üîπ Importing Machine Learning Tools from scikit-learn

#### 2.1 What does each import do?

| Import | Purpose | Simple Explanation |
|--------|---------|-------------------|
| `train_test_split` | Splits data into training and testing | Like splitting flashcards for study vs exam |
| `StandardScaler` | Makes all features same scale | Converting inches and kg to same unit |
| `LabelEncoder` | Converts text labels to numbers | "Healthy"‚Üí0, "Stressed"‚Üí1 |

#### 2.2 Why do we need these?
- **train_test_split**: We can't test students on questions they already studied!
- **StandardScaler**: Comparing height (180cm) with weight (70kg) is unfair - different scales!
- **LabelEncoder**: Computers understand numbers, not words

In [None]:
# =============================================================================
# IMPORTING MACHINE LEARNING TOOLS
# =============================================================================

# For splitting data into train/test sets
from sklearn.model_selection import train_test_split

# For scaling features to same range
from sklearn.preprocessing import StandardScaler

# For converting text labels to numbers
from sklearn.preprocessing import LabelEncoder

print("‚úÖ ML preprocessing tools imported!")

---

### üîπ Importing Classification Models

We will compare **5 different AI models** to see which one works best for detecting crop stress.

| Model | How it works | Real-Life Analogy |
|-------|--------------|-------------------|
| **Logistic Regression** | Draws a straight line to separate classes | Like sorting apples vs oranges with a ruler |
| **Decision Tree** | Makes yes/no decisions like a flowchart | Like 20 Questions game |
| **Random Forest** | Many trees voting together | Asking 100 people and going with majority |
| **SVM** | Finds the best boundary between classes | Drawing the widest possible road between groups |
| **KNN** | Looks at nearest neighbors | "You are who your friends are" |

In [None]:
# =============================================================================
# IMPORTING CLASSIFICATION MODELS
# =============================================================================

# Model 1: Logistic Regression - Simple, fast, interpretable
from sklearn.linear_model import LogisticRegression

# Model 2: Decision Tree - Easy to understand flowchart
from sklearn.tree import DecisionTreeClassifier

# Model 3: Random Forest - Many trees voting together
from sklearn.ensemble import RandomForestClassifier

# Model 4: Support Vector Machine - Finds best boundary
from sklearn.svm import SVC

# Model 5: K-Nearest Neighbors - Looks at neighbors
from sklearn.neighbors import KNeighborsClassifier

print("‚úÖ All 5 classification models imported!")

---

### üîπ Importing Evaluation Metrics

How do we know if our AI is good? We use **metrics** - like grades for AI!

| Metric | What it measures | Simple Explanation |
|--------|-----------------|-------------------|
| **Accuracy** | Overall correctness | % of correct answers |
| **Precision** | When AI says "Stressed", how often is it right? | Don't cry wolf if not a wolf |
| **Recall** | Of actual stressed plants, how many did AI find? | Don't miss any wolves |
| **F1-Score** | Balance of precision and recall | Harmonic mean of both |
| **ROC-AUC** | Overall ranking quality | How good at sorting |

In [None]:
# =============================================================================
# IMPORTING EVALUATION METRICS
# =============================================================================

from sklearn.metrics import (
    accuracy_score,      # Overall correct %
    precision_score,     # When I say yes, am I right?
    recall_score,        # Did I find all the yes cases?
    f1_score,           # Balance of precision and recall
    roc_auc_score,      # Overall ranking quality
    confusion_matrix,   # Table showing right/wrong predictions
    classification_report  # Detailed summary
)

# For file handling
import os

print("‚úÖ Evaluation metrics imported!")

---

## üìä TASK 1: DATA UNDERSTANDING

### üîπ Loading the Dataset

#### 2.1 What is the dataset?
Our dataset contains measurements taken by a drone flying over a farm. Each row represents one small area (grid cell) of the field.

#### 2.2 Why is this important?
This data tells us HOW HEALTHY each area of the farm is using special colors of light that humans can't see!

#### 2.3 Understanding the Features (Columns)

| Feature | What it measures | Simple Explanation |
|---------|-----------------|-------------------|
| `ndvi_mean` | Plant greenness | How green and alive |
| `gndvi` | Green-based NDVI | Another greenness measure |
| `savi` | Vegetation ignoring soil | Plant health without dirt |
| `evi` | Enhanced vegetation | Better for dense crops |
| `moisture_index` | Water content | Is plant thirsty? |
| `canopy_density` | Leaf coverage | How many leaves cover ground |
| `grid_x, grid_y` | Location | Where in the field |

In [None]:
# =============================================================================
# LOADING THE DATASET
# =============================================================================

print("üìä TASK 1: DATA UNDERSTANDING")
print("=" * 50)

# Load data from local CSV file
# pd.read_csv() reads a comma-separated file into a DataFrame (table)
df = pd.read_csv("data/crop_health_data.csv")

print(f"‚úÖ Dataset loaded successfully!")
print(f"\nüìã Dataset Shape: {df.shape[0]} rows √ó {df.shape[1]} columns")
print(f"   ‚Ä¢ Each row = one grid cell (small area) in the field")
print(f"   ‚Ä¢ Each column = one measurement taken by the drone")

---

### üîπ Exploring the Data Structure

#### 2.1 What does `df.columns` do?
Lists all the column names in our dataset.

#### 2.2 Why do we need this?
To understand what measurements we have before building models.

In [None]:
# =============================================================================
# EXPLORING COLUMNS
# =============================================================================

print("\nüìä All Columns in our Dataset:")
print("-" * 40)
for i, col in enumerate(df.columns, 1):
    print(f"   {i:2}. {col}")

---

### üîπ Viewing First Few Rows

#### 2.1 What does `df.head()` do?
Shows the first 5 rows of data (you can pass a number to show more/less).

#### 2.2 Why is this useful?
Quick visual check to see what the actual data values look like.

In [None]:
# =============================================================================
# VIEWING FIRST 5 ROWS
# =============================================================================

print("\nüîç First 5 rows of our dataset:")
df.head()

---

### üîπ Statistical Summary

#### 2.1 What does `df.describe()` do?
Calculates statistics like mean, min, max, standard deviation for each column.

#### 2.2 Why do we need statistics?
- To understand the RANGE of values
- To spot OUTLIERS (unusual values)
- To know if SCALING is needed

In [None]:
# =============================================================================
# STATISTICAL SUMMARY
# =============================================================================

print("\nüìà Statistical Summary:")
df.describe().round(3)

---

### üîπ Checking for Missing Values

#### 2.1 What are missing values?
Empty cells in our data - like unanswered questions on a test.

#### 2.2 Why is this a problem?
ML models can crash or give wrong results with missing data!

#### 2.3 How do we check?
`df.isnull().sum()` counts how many missing values in each column.

In [None]:
# =============================================================================
# CHECKING FOR MISSING VALUES
# =============================================================================

print("\nüîé Missing Values Check:")
missing = df.isnull().sum().sum()

if missing == 0:
    print("   ‚úÖ No missing values found! Data is complete.")
else:
    print(f"   ‚ö†Ô∏è Found {missing} missing values. Need cleaning!")

---

### üîπ Understanding the Target Variable

#### 2.1 What is the target variable?
The column we want to PREDICT - `crop_health_label` (Healthy or Stressed).

#### 2.2 Why check the distribution?
- **Balanced**: ~50% healthy, ~50% stressed ‚Üí easy to train
- **Imbalanced**: 99% healthy, 1% stressed ‚Üí model might just say "healthy" always!

#### 2.3 What does `value_counts()` do?
Counts how many of each unique value exists.

In [None]:
# =============================================================================
# TARGET VARIABLE DISTRIBUTION
# =============================================================================

print("\nüéØ Target Variable Distribution:")
print("-" * 40)

target_counts = df['crop_health_label'].value_counts()
for label, count in target_counts.items():
    percentage = count / len(df) * 100
    emoji = "üü¢" if label == "Healthy" else "üî¥"
    print(f"   {emoji} {label}: {count} samples ({percentage:.1f}%)")

# Visualize the distribution
plt.figure(figsize=(8, 5))
colors = ['#2ecc71', '#e74c3c']  # Green for healthy, Red for stressed
target_counts.plot(kind='bar', color=colors, edgecolor='black')
plt.title('üéØ Distribution of Crop Health Labels', fontsize=14)
plt.xlabel('Crop Health Status')
plt.ylabel('Number of Samples')
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()

---

## ü§ñ TASK 2: MACHINE LEARNING MODEL COMPARISON

### üîπ Step 1: Separating Features and Target

#### 2.1 What are we doing?
Splitting data into:
- **X (Features)**: The information we use to make predictions (vegetation indices)
- **y (Target)**: What we want to predict (Healthy/Stressed)

#### 2.2 Why separate them?
ML models need to know: "Here's the input (X), learn to predict the output (y)"

#### 2.3 Real-Life Analogy
- **X** = Student's study habits, attendance, homework scores
- **y** = Final exam result (Pass/Fail)

The model learns: "Given these habits, what's the likely result?"

In [None]:
# =============================================================================
# SEPARATING FEATURES AND TARGET
# =============================================================================

print("\nü§ñ TASK 2: MACHINE LEARNING MODEL COMPARISON")
print("=" * 50)

# Define feature columns (vegetation indices only, not grid coordinates)
feature_columns = ['ndvi_mean', 'ndvi_std', 'ndvi_min', 'ndvi_max', 'gndvi', 
                   'savi', 'evi', 'red_edge_1', 'red_edge_2', 'nir_reflectance',
                   'soil_brightness', 'canopy_density', 'moisture_index']

# X = Features (input data)
X = df[feature_columns]

# y = Target (what we predict)
y = df['crop_health_label']

print(f"\nüîß Data Preparation Summary:")
print(f"   ‚Ä¢ Number of features: {len(feature_columns)}")
print(f"   ‚Ä¢ Number of samples: {len(X)}")
print(f"   ‚Ä¢ Target variable: crop_health_label (Healthy/Stressed)")

---

### üîπ Step 2: Encoding Labels

#### 2.1 What is label encoding?
Converting text labels to numbers that computers understand.

#### 2.2 Why do we need this?
Computers can't do math with words like "Healthy" or "Stressed" - they need numbers!

#### 2.3 How LabelEncoder works:
```
"Healthy"  ‚Üí 0
"Stressed" ‚Üí 1
```

#### 2.4 fit_transform() explained:
- **fit()**: Learn the mapping (which label = which number)
- **transform()**: Apply the mapping to the data
- **fit_transform()**: Do both at once

In [None]:
# =============================================================================
# ENCODING LABELS
# =============================================================================

print("\nüè∑Ô∏è Encoding target labels...")

# Create encoder object
label_encoder = LabelEncoder()

# Fit (learn) and transform (apply) in one step
y_encoded = label_encoder.fit_transform(y)

print(f"   Original labels: {list(label_encoder.classes_)}")
print(f"   Encoded values:  {list(range(len(label_encoder.classes_)))}")
print(f"   ‚úÖ Encoding: 'Healthy' ‚Üí 0, 'Stressed' ‚Üí 1")

---

### üîπ Step 3: Train-Test Split

#### 2.1 What is train-test split?
Dividing data into:
- **Training set (80%)**: Data the model learns from
- **Testing set (20%)**: Data we use to evaluate the model

#### 2.2 Why is this CRITICAL?
You can't test students on questions they already practiced! That's cheating!
Similarly, we can't test models on data they already saw.

#### 2.3 Important Parameters:

| Parameter | Value | Meaning |
|-----------|-------|----------|
| `test_size` | 0.2 | 20% for testing, 80% for training |
| `random_state` | 42 | Seed for reproducibility (same split every run) |
| `stratify` | y | Keep same class ratio in train/test |

In [None]:
# =============================================================================
# TRAIN-TEST SPLIT
# =============================================================================

print("\n‚úÇÔ∏è Splitting data into train (80%) and test (20%) sets...")

X_train, X_test, y_train, y_test = train_test_split(
    X,              # Features
    y_encoded,      # Target (encoded)
    test_size=0.2,  # 20% for testing
    random_state=42,  # For reproducibility
    stratify=y_encoded  # Keep class ratio balanced
)

print(f"   ‚Ä¢ Training samples: {len(X_train)} (80%)")
print(f"   ‚Ä¢ Testing samples:  {len(X_test)} (20%)")

---

### üîπ Step 4: Feature Scaling

#### 2.1 What is feature scaling?
Making all features have the same scale (mean=0, std=1).

#### 2.2 Why is this important?
Imagine comparing:
- NDVI: ranges from 0 to 1
- NIR reflectance: ranges from 0.2 to 0.9

Without scaling, features with larger ranges dominate!

#### 2.3 StandardScaler formula:
```
z = (x - mean) / std
```

#### 2.4 CRITICAL RULE:
- `fit_transform()` on TRAINING data only
- `transform()` on TEST data (no fitting!)

Why? We can't use test data statistics - that would be cheating!

In [None]:
# =============================================================================
# FEATURE SCALING
# =============================================================================

print("\nüìè Scaling features (StandardScaler)...")

# Create scaler object
scaler = StandardScaler()

# Fit on training data AND transform
X_train_scaled = scaler.fit_transform(X_train)

# Only transform testing data (using training statistics)
X_test_scaled = scaler.transform(X_test)

print("   ‚úÖ Features scaled to mean=0, std=1")

---

### üîπ Step 5: Training and Comparing Models

#### 2.1 What are we doing?
Training 5 different ML models and comparing their performance.

#### 2.2 Why compare multiple models?
Different models work better for different problems. We need to find the BEST one!

#### 2.3 Evaluation Metrics Explained:

| Metric | Formula | When it matters |
|--------|---------|----------------|
| **Accuracy** | Correct / Total | General performance |
| **Precision** | TP / (TP + FP) | Cost of false alarms is high |
| **Recall** | TP / (TP + FN) | Missing cases is costly |
| **F1-Score** | 2 √ó (P √ó R) / (P + R) | Balance needed |

In [None]:
# =============================================================================
# TRAINING AND COMPARING 5 MODELS
# =============================================================================

print("\nüè≠ Training and Evaluating 5 Classification Models:")
print("=" * 60)

# Define our 5 models
models = {
    'Logistic Regression': LogisticRegression(max_iter=1000, random_state=42),
    'Decision Tree': DecisionTreeClassifier(random_state=42),
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
    'SVM': SVC(probability=True, random_state=42),
    'KNN': KNeighborsClassifier(n_neighbors=5)
}

# Store results
results = []
best_f1 = 0
best_model = None
best_model_name = None

for name, model in models.items():
    print(f"\nüîÑ Training: {name}...")
    
    # Train the model
    model.fit(X_train_scaled, y_train)
    
    # Make predictions
    y_pred = model.predict(X_test_scaled)
    y_proba = model.predict_proba(X_test_scaled)[:, 1] if hasattr(model, 'predict_proba') else y_pred
    
    # Calculate metrics
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    roc_auc = roc_auc_score(y_test, y_proba)
    
    # Store results
    results.append({
        'Model': name,
        'Accuracy': accuracy,
        'Precision': precision,
        'Recall': recall,
        'F1-Score': f1,
        'ROC-AUC': roc_auc
    })
    
    # Track best model
    if f1 > best_f1:
        best_f1 = f1
        best_model = model
        best_model_name = name
    
    # Print results
    print(f"   Accuracy:  {accuracy:.4f}")
    print(f"   Precision: {precision:.4f}")
    print(f"   Recall:    {recall:.4f}")
    print(f"   F1-Score:  {f1:.4f}")
    print(f"   ROC-AUC:   {roc_auc:.4f}")

---

### üîπ Model Comparison Table

In [None]:
# =============================================================================
# MODEL COMPARISON TABLE
# =============================================================================

print("\n" + "=" * 60)
print("üìã MODEL COMPARISON TABLE (Sorted by F1-Score)")
print("=" * 60)

results_df = pd.DataFrame(results)
results_df = results_df.sort_values('F1-Score', ascending=False)
print(results_df.to_string(index=False))

print(f"\nüèÜ BEST MODEL: {best_model_name} (F1-Score: {best_f1:.4f})")

---

### üîπ Visualizing Model Comparison

In [None]:
# =============================================================================
# MODEL COMPARISON VISUALIZATION
# =============================================================================

fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(len(results_df))
width = 0.15

metrics = ['Accuracy', 'Precision', 'Recall', 'F1-Score', 'ROC-AUC']
colors = ['#3498db', '#2ecc71', '#f1c40f', '#e74c3c', '#9b59b6']

for i, metric in enumerate(metrics):
    ax.bar(x + i*width, results_df[metric], width, label=metric, color=colors[i])

ax.set_xlabel('Models', fontsize=12)
ax.set_ylabel('Score', fontsize=12)
ax.set_title('üèÜ Model Performance Comparison', fontsize=14)
ax.set_xticks(x + width * 2)
ax.set_xticklabels(results_df['Model'], rotation=45, ha='right')
ax.legend(loc='lower right')
ax.set_ylim(0, 1.1)
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

---

### üîπ Confusion Matrix for Best Model

#### What is a Confusion Matrix?
A table showing:
- **True Positives (TP)**: Actually stressed, predicted stressed ‚úÖ
- **True Negatives (TN)**: Actually healthy, predicted healthy ‚úÖ
- **False Positives (FP)**: Actually healthy, predicted stressed ‚ùå
- **False Negatives (FN)**: Actually stressed, predicted healthy ‚ùå

In [None]:
# =============================================================================
# CONFUSION MATRIX
# =============================================================================

y_pred_best = best_model.predict(X_test_scaled)
cm = confusion_matrix(y_test, y_pred_best)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Healthy', 'Stressed'],
            yticklabels=['Healthy', 'Stressed'])
plt.title(f'üîç Confusion Matrix: {best_model_name}', fontsize=14)
plt.ylabel('Actual Label')
plt.xlabel('Predicted Label')
plt.tight_layout()
plt.show()

print(f"\nüìä Classification Report for {best_model_name}:\n")
print(classification_report(y_test, y_pred_best, target_names=['Healthy', 'Stressed']))

---

## üó∫Ô∏è TASK 3: SPATIAL ANALYSIS & VISUALIZATION

### üîπ Creating a Field Stress Heatmap

#### 2.1 What is a heatmap?
A colorful map where colors represent values:
- üü¢ **Green** = Healthy crops
- üî¥ **Red** = Stressed crops

#### 2.2 Why is this useful?
Farmers can quickly see WHICH AREAS of their field need attention.

#### 2.3 How do we create it?
1. Predict stress for ALL grid cells
2. Arrange by grid_x and grid_y
3. Color based on stress probability

In [None]:
# =============================================================================
# SPATIAL ANALYSIS - FIELD STRESS HEATMAP
# =============================================================================

print("\nüó∫Ô∏è TASK 3: SPATIAL ANALYSIS & VISUALIZATION")
print("=" * 50)

print("\nüîÆ Generating predictions for all grid cells...")

# Scale ALL features (not just train/test)
X_all_scaled = scaler.transform(X)

# Predict on all data
y_all_pred = best_model.predict(X_all_scaled)
y_all_proba = best_model.predict_proba(X_all_scaled)[:, 1]

# Add predictions to dataframe
df['predicted_label'] = label_encoder.inverse_transform(y_all_pred)
df['stress_probability'] = y_all_proba

print(f"   ‚úÖ Predictions generated for {len(df)} grid cells")

---

### üîπ Stress Heatmap Visualization

In [None]:
# =============================================================================
# STRESS HEATMAP
# =============================================================================

print("\nüó∫Ô∏è Creating field stress heatmap...")

# Create pivot table for heatmap
heatmap_data = df.pivot_table(
    values='stress_probability',
    index='grid_y',
    columns='grid_x',
    aggfunc='mean'
)

# Create the heatmap
plt.figure(figsize=(14, 10))
sns.heatmap(
    heatmap_data,
    cmap='RdYlGn_r',  # Red-Yellow-Green reversed (Red = stressed)
    annot=False,
    vmin=0,
    vmax=1,
    cbar_kws={'label': 'Stress Probability'}
)
plt.title('üåæ Field Stress Heatmap\n(Red = Stressed, Green = Healthy)', fontsize=14)
plt.xlabel('Grid X (Column)')
plt.ylabel('Grid Y (Row)')
plt.tight_layout()
plt.show()

# Summary statistics
print("\nüìä Stress Distribution Summary:")
stress_counts = df['predicted_label'].value_counts()
for label, count in stress_counts.items():
    percentage = count / len(df) * 100
    emoji = "üü¢" if label == "Healthy" else "üî¥"
    print(f"   {emoji} {label}: {count} cells ({percentage:.1f}%)")

---

## üöÅ TASK 4: DRONE INSPECTION RECOMMENDATIONS

### üîπ Prioritizing Inspection Zones

Based on stress probability, we categorize zones:

| Priority | Stress Level | Action |
|----------|-------------|--------|
| üî¥ CRITICAL | ‚â•80% | Inspect IMMEDIATELY |
| üü† HIGH | 60-80% | Inspect within 24 hours |
| üü° MODERATE | 40-60% | Schedule inspection |
| üü¢ LOW | 20-40% | Monitor regularly |
| ‚úÖ HEALTHY | <20% | No action needed |

In [None]:
# =============================================================================
# DRONE INSPECTION RECOMMENDATIONS
# =============================================================================

print("\nüöÅ TASK 4: DRONE INSPECTION RECOMMENDATIONS")
print("=" * 50)

# Categorize stress levels
def categorize_stress(prob):
    if prob >= 0.8:
        return 'CRITICAL'
    elif prob >= 0.6:
        return 'HIGH'
    elif prob >= 0.4:
        return 'MODERATE'
    elif prob >= 0.2:
        return 'LOW'
    else:
        return 'HEALTHY'

df['stress_priority'] = df['stress_probability'].apply(categorize_stress)

print("\nüìã INSPECTION PRIORITY ZONES:")
priority_counts = df['stress_priority'].value_counts()
priority_order = ['CRITICAL', 'HIGH', 'MODERATE', 'LOW', 'HEALTHY']
priority_emoji = {'CRITICAL': 'üî¥', 'HIGH': 'üü†', 'MODERATE': 'üü°', 'LOW': 'üü¢', 'HEALTHY': '‚úÖ'}

for priority in priority_order:
    if priority in priority_counts.index:
        count = priority_counts[priority]
        percentage = count / len(df) * 100
        emoji = priority_emoji[priority]
        print(f"   {emoji} {priority}: {count} cells ({percentage:.1f}%)")

---

### üîπ Specific Zone Recommendations

In [None]:
# =============================================================================
# SPECIFIC ZONE RECOMMENDATIONS
# =============================================================================

print("\nüöÅ RECOMMENDED DRONE FLIGHT PATH:")

critical_zones = df[df['stress_priority'] == 'CRITICAL'][['grid_x', 'grid_y', 'stress_probability']]
high_zones = df[df['stress_priority'] == 'HIGH'][['grid_x', 'grid_y', 'stress_probability']]

if len(critical_zones) > 0:
    print("\n‚ö†Ô∏è CRITICAL ZONES - Inspect IMMEDIATELY:")
    for _, row in critical_zones.head(5).iterrows():
        print(f"   üìç Grid ({int(row['grid_x'])}, {int(row['grid_y'])}) - Stress: {row['stress_probability']:.1%}")
    if len(critical_zones) > 5:
        print(f"   ... and {len(critical_zones)-5} more critical zones")

if len(high_zones) > 0:
    print("\nüü† HIGH PRIORITY - Inspect within 24 hours:")
    for _, row in high_zones.head(5).iterrows():
        print(f"   üìç Grid ({int(row['grid_x'])}, {int(row['grid_y'])}) - Stress: {row['stress_probability']:.1%}")

print("\nüí° INTERPRETATION GUIDELINES:")
print("   1. Critical zones require immediate ground inspection")
print("   2. High stress may indicate pest infestation or water stress")
print("   3. Consider sending follow-up drones with higher resolution")
print("   4. Compare with historical data for trend analysis")

---

## üìù TASK 5: REFLECTION

### üîπ Current Limitations

In [None]:
# =============================================================================
# REFLECTION ON LIMITATIONS AND IMPROVEMENTS
# =============================================================================

print("\nüìù TASK 5: REFLECTION")
print("=" * 50)

print("""
‚ö†Ô∏è CURRENT LIMITATIONS:

1. üìä Dataset Size:
   - Current dataset is limited in size
   - Real-world farms have millions of data points
   - More data would improve model accuracy

2. üå¶Ô∏è Temporal Data Missing:
   - We have single snapshot, not time series
   - Plant stress develops over time
   - Tracking trends would improve prediction

3. üå°Ô∏è Weather Data Not Included:
   - Temperature, rainfall affect plant health
   - Integrating weather would improve accuracy

4. ü¶† No Disease/Pest Classification:
   - Current model only detects "stress"
   - Doesn't tell us WHAT is causing stress
   - Multi-class classification would be better

üöÄ PROPOSED IMPROVEMENTS:

1. üîÑ Time Series Analysis using LSTM networks
2. üß† Deep Learning on raw drone imagery
3. üåê Multi-source data fusion (weather + soil sensors)
4. üì± Real-time edge computing on drones
5. ü§ù Farmer feedback loop for model improvement
""")

---

## ‚úÖ PROJECT SUMMARY

In [None]:
# =============================================================================
# PROJECT SUMMARY
# =============================================================================

print("\n" + "=" * 60)
print("‚úÖ PROJECT EXECUTION COMPLETE")
print("=" * 60)

print(f"""
üèÜ KEY FINDINGS:
   ‚Ä¢ Best Model: {best_model_name}
   ‚Ä¢ F1-Score: {best_f1:.4f}
   ‚Ä¢ Total Grid Cells Analyzed: {len(df)}
   ‚Ä¢ Stressed Cells Identified: {len(df[df['predicted_label'] == 'Stressed'])}
   ‚Ä¢ Critical Zones: {len(df[df['stress_priority'] == 'CRITICAL'])}

üéØ RECOMMENDATIONS:
   1. Deploy drone to critical zones first
   2. Take close-up images of stressed plants
   3. Consult agronomist for treatment plan
   4. Schedule re-scan after treatment

üåæ Thank you for using AI Crop Health Monitoring! üöÅ
""")