# Agricultural Remote Sensing: Crop Health Monitoring with Satellite Data

## Overview

This notebook demonstrates fundamental techniques in agricultural remote sensing for monitoring crop health and productivity. Remote sensing enables non-destructive, large-scale assessment of crop conditions throughout the growing season.

## Dataset Description

**Study Area**: Agricultural fields monitored across a complete growing season

**Data Specifications**:
- **100 field observations** collected from multispectral satellite imagery
- **4 crop types**: Corn, Wheat, Soybean, and Cotton
- **5-month temporal coverage**: April through August (growing season)
- **Spectral bands**: Red (620-670 nm), Near-Infrared (NIR, 760-900 nm), Green (520-600 nm), Blue (450-520 nm)

## Methods

### Spectral Reflectance Analysis
We analyze how different crops reflect electromagnetic radiation across visible and near-infrared wavelengths. Healthy vegetation has a distinctive spectral signature:
- **Low Red reflectance**: Chlorophyll absorbs red light for photosynthesis
- **High NIR reflectance**: Healthy leaf cell structure strongly reflects NIR radiation
- **Moderate Green reflectance**: Creates the visible green appearance of plants

### Vegetation Indices
We calculate three key vegetation indices that quantify crop health:

1. **NDVI (Normalized Difference Vegetation Index)**: The most widely used index for vegetation monitoring
   - Range: -1 to +1
   - Healthy vegetation: >0.6
   - Sparse/stressed vegetation: 0.2-0.4
   - Non-vegetated surfaces: <0.2

2. **EVI (Enhanced Vegetation Index)**: Improved sensitivity in high biomass regions
   - Reduces atmospheric interference
   - Better performance in dense canopy conditions

3. **SAVI (Soil-Adjusted Vegetation Index)**: Minimizes soil brightness influences
   - Particularly useful for sparse or early-season vegetation
   - Soil adjustment factor (L=0.5) reduces soil background effects

### Analysis Objectives
- Characterize spectral signatures of different crop types
- Monitor temporal patterns across the growing season (phenology)
- Identify stressed vs. healthy fields using vegetation indices
- Compare crop performance and health status

## 1. Setup and Package Import

Import essential libraries for data manipulation, statistical analysis, and visualization.

In [None]:
# Data manipulation and numerical operations
# Visualization libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

# Statistical analysis

# Machine learning utilities

# Configure visualization settings
plt.style.use("seaborn-v0_8-darkgrid")
sns.set_palette("husl")
plt.rcParams["figure.figsize"] = (12, 6)
plt.rcParams["font.size"] = 10

# Display all columns in pandas
pd.set_option("display.max_columns", None)

print("Packages imported successfully!")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

## 2. Load and Explore Data

Load the satellite observations and examine the dataset structure, including spectral band values for each field observation.

In [None]:
# Load the dataset
# Note: Update the path to your actual data file location
data_path = "../data/crop_health_data.csv"

# For demonstration, create sample data if file doesn't exist
try:
    df = pd.read_csv(data_path)
    print("Data loaded from file successfully!")
except FileNotFoundError:
    print("Creating sample dataset for demonstration...")

    # Create synthetic data that mimics real agricultural satellite observations
    np.random.seed(42)

    crops = ["Corn", "Wheat", "Soybean", "Cotton"]
    months = ["April", "May", "June", "July", "August"]
    month_order = {m: i for i, m in enumerate(months)}

    n_samples = 100

    data = []
    for i in range(n_samples):
        crop = np.random.choice(crops)
        month = np.random.choice(months)
        month_idx = month_order[month]

        # Simulate seasonal growth patterns (vegetation increases then decreases)
        growth_factor = 1.0 + 0.3 * month_idx - 0.05 * (month_idx**2)

        # Crop-specific characteristics
        if crop == "Corn":
            base_nir = 0.45 * growth_factor
            base_red = 0.10 / growth_factor
        elif crop == "Wheat":
            base_nir = 0.40 * growth_factor
            base_red = 0.12 / growth_factor
        elif crop == "Soybean":
            base_nir = 0.42 * growth_factor
            base_red = 0.11 / growth_factor
        else:  # Cotton
            base_nir = 0.38 * growth_factor
            base_red = 0.13 / growth_factor

        # Add random variation for field-to-field differences
        nir = np.clip(base_nir + np.random.normal(0, 0.05), 0.15, 0.65)
        red = np.clip(base_red + np.random.normal(0, 0.02), 0.05, 0.25)
        green = np.clip(nir * 0.4 + np.random.normal(0, 0.02), 0.08, 0.30)
        blue = np.clip(red * 1.1 + np.random.normal(0, 0.015), 0.04, 0.20)

        data.append(
            {
                "Field_ID": f"F{i + 1:03d}",
                "Crop_Type": crop,
                "Month": month,
                "Month_Num": month_idx + 4,  # April = 4
                "Red": red,
                "NIR": nir,
                "Green": green,
                "Blue": blue,
                "Latitude": 40.0 + np.random.uniform(-2, 2),
                "Longitude": -95.0 + np.random.uniform(-2, 2),
            }
        )

    df = pd.DataFrame(data)
    print("Sample data created successfully!")

# Display basic information
print(f"\nDataset shape: {df.shape[0]} observations, {df.shape[1]} variables")
print("\nFirst few rows:")
df.head()

In [None]:
# Dataset information
print("Dataset Information:")
print("=" * 50)
df.info()

print("\n" + "=" * 50)
print("Statistical Summary of Spectral Bands:")
print("=" * 50)
df[["Red", "NIR", "Green", "Blue"]].describe()

In [None]:
# Group by crop type and month to see data distribution
print("Observations by Crop Type:")
print(df["Crop_Type"].value_counts().sort_index())

print("\nObservations by Month:")
print(df["Month"].value_counts()[["April", "May", "June", "July", "August"]])

print("\nObservations by Crop Type and Month:")
crop_month_summary = df.groupby(["Crop_Type", "Month"]).size().unstack(fill_value=0)
crop_month_summary = crop_month_summary[["April", "May", "June", "July", "August"]]
print(crop_month_summary)

## 3. Spectral Reflectance Analysis

### Understanding Spectral Signatures

Each crop has a unique spectral signature determined by:
- **Leaf structure**: Affects NIR reflectance (internal scattering)
- **Chlorophyll content**: Determines red light absorption
- **Water content**: Influences NIR and shortwave infrared reflectance
- **Leaf angle distribution**: Affects overall reflectance patterns

The "red edge" is the steep transition between red absorption and NIR reflection, which is highly sensitive to vegetation health and chlorophyll content.

In [None]:
# Calculate mean reflectance by crop type
spectral_bands = ["Blue", "Green", "Red", "NIR"]
band_wavelengths = [475, 560, 650, 850]  # Approximate center wavelengths in nm

mean_reflectance = df.groupby("Crop_Type")[spectral_bands].mean()

# Create spectral reflectance curve plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

# Plot 1: Spectral curves
for crop in mean_reflectance.index:
    values = mean_reflectance.loc[crop, spectral_bands].values
    ax1.plot(band_wavelengths, values, marker="o", linewidth=2.5, markersize=8, label=crop)

ax1.set_xlabel("Wavelength (nm)", fontsize=12, fontweight="bold")
ax1.set_ylabel("Reflectance", fontsize=12, fontweight="bold")
ax1.set_title(
    "Mean Spectral Reflectance Curves by Crop Type", fontsize=14, fontweight="bold", pad=15
)
ax1.legend(title="Crop Type", fontsize=10)
ax1.grid(True, alpha=0.3)
ax1.set_xlim(450, 900)

# Add band region annotations
ax1.axvspan(450, 500, alpha=0.1, color="blue", label="Blue")
ax1.axvspan(500, 600, alpha=0.1, color="green")
ax1.axvspan(620, 670, alpha=0.1, color="red")
ax1.axvspan(760, 900, alpha=0.1, color="darkred")

# Plot 2: NIR vs Red scatter (key for vegetation indices)
for crop in df["Crop_Type"].unique():
    crop_data = df[df["Crop_Type"] == crop]
    ax2.scatter(crop_data["Red"], crop_data["NIR"], alpha=0.6, s=50, label=crop)

ax2.set_xlabel("Red Reflectance", fontsize=12, fontweight="bold")
ax2.set_ylabel("NIR Reflectance", fontsize=12, fontweight="bold")
ax2.set_title("NIR vs Red Reflectance: Vegetation Space", fontsize=14, fontweight="bold", pad=15)
ax2.legend(title="Crop Type", fontsize=10)
ax2.grid(True, alpha=0.3)

# Add diagonal line representing soil line
ax2.plot([0, 0.3], [0, 0.3], "k--", alpha=0.5, linewidth=1.5, label="Soil Line")

plt.tight_layout()
plt.show()

print("Mean Spectral Reflectance Values:")
print(mean_reflectance.round(3))

### Key Observations:

- **Blue band (450-500 nm)**: Low reflectance due to chlorophyll absorption
- **Green band (520-600 nm)**: Moderate reflectance (why plants appear green)
- **Red band (620-670 nm)**: Strong chlorophyll absorption, lowest reflectance
- **NIR band (760-900 nm)**: High reflectance from leaf mesophyll structure

The large contrast between Red (low) and NIR (high) reflectance is the foundation of most vegetation indices.

## 4. Calculate Vegetation Indices

### Mathematical Formulations

**NDVI (Normalized Difference Vegetation Index)**:
$$NDVI = \frac{NIR - Red}{NIR + Red}$$

**EVI (Enhanced Vegetation Index)**:
$$EVI = 2.5 \times \frac{NIR - Red}{NIR + 2.4 \times Red + 1}$$

**SAVI (Soil-Adjusted Vegetation Index)**:
$$SAVI = 1.5 \times \frac{NIR - Red}{NIR + Red + 0.5}$$

### Interpretation Guidelines

**NDVI Values**:
- **< 0.2**: Barren soil, rock, water, or snow
- **0.2 - 0.4**: Sparse vegetation, stressed crops, or early season growth
- **0.4 - 0.6**: Moderate vegetation density
- **> 0.6**: Dense, healthy vegetation at peak growth
- **> 0.8**: Very dense vegetation (mature crops, forests)

In [None]:
# Calculate vegetation indices
df["NDVI"] = (df["NIR"] - df["Red"]) / (df["NIR"] + df["Red"])
df["EVI"] = 2.5 * ((df["NIR"] - df["Red"]) / (df["NIR"] + 2.4 * df["Red"] + 1))
df["SAVI"] = 1.5 * ((df["NIR"] - df["Red"]) / (df["NIR"] + df["Red"] + 0.5))

# Display statistics
print("Vegetation Index Statistics:")
print("=" * 60)
print(df[["NDVI", "EVI", "SAVI"]].describe().round(3))

# Distribution plots
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

indices = ["NDVI", "EVI", "SAVI"]
colors = ["#2ecc71", "#3498db", "#e74c3c"]

for idx, (index, color) in enumerate(zip(indices, colors)):
    axes[idx].hist(df[index], bins=25, color=color, alpha=0.7, edgecolor="black")
    axes[idx].axvline(
        df[index].mean(),
        color="red",
        linestyle="--",
        linewidth=2,
        label=f"Mean: {df[index].mean():.3f}",
    )
    axes[idx].axvline(
        df[index].median(),
        color="orange",
        linestyle="--",
        linewidth=2,
        label=f"Median: {df[index].median():.3f}",
    )
    axes[idx].set_xlabel(index, fontsize=11, fontweight="bold")
    axes[idx].set_ylabel("Frequency", fontsize=11, fontweight="bold")
    axes[idx].set_title(f"{index} Distribution", fontsize=12, fontweight="bold")
    axes[idx].legend(fontsize=9)
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nVegetation indices calculated successfully!")

## 5. Temporal Analysis: Growing Season Phenology

### Crop Phenology Concepts

**Growing Season Stages**:
1. **Emergence (April-May)**: Low NDVI, rapid increase as plants establish
2. **Vegetative Growth (May-June)**: Steep NDVI increase, maximum photosynthesis
3. **Peak Green (June-July)**: Maximum NDVI, full canopy cover
4. **Reproductive Stage (July)**: Stable high NDVI
5. **Senescence (August)**: NDVI decline as plants mature and dry down

Different crops have different phenological patterns based on their growth characteristics and life cycles.

In [None]:
# Temporal analysis of NDVI across growing season
month_order = ["April", "May", "June", "July", "August"]

# Calculate mean NDVI by crop and month
temporal_ndvi = df.groupby(["Crop_Type", "Month"])["NDVI"].agg(["mean", "std"]).reset_index()
temporal_ndvi["Month"] = pd.Categorical(
    temporal_ndvi["Month"], categories=month_order, ordered=True
)
temporal_ndvi = temporal_ndvi.sort_values(["Crop_Type", "Month"])

# Create temporal plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 5))

# Plot 1: NDVI time series by crop
for crop in df["Crop_Type"].unique():
    crop_data = temporal_ndvi[temporal_ndvi["Crop_Type"] == crop]
    ax1.plot(
        crop_data["Month"], crop_data["mean"], marker="o", linewidth=2.5, markersize=10, label=crop
    )
    ax1.fill_between(
        range(len(crop_data)),
        crop_data["mean"] - crop_data["std"],
        crop_data["mean"] + crop_data["std"],
        alpha=0.2,
    )

ax1.set_xlabel("Month (Growing Season)", fontsize=12, fontweight="bold")
ax1.set_ylabel("Mean NDVI", fontsize=12, fontweight="bold")
ax1.set_title("Crop Phenology: NDVI Temporal Patterns", fontsize=14, fontweight="bold", pad=15)
ax1.legend(title="Crop Type", fontsize=10)
ax1.grid(True, alpha=0.3)
ax1.set_ylim(0, 1)

# Add phenology stage annotations
ax1.axhspan(0.6, 1.0, alpha=0.1, color="green", label="Healthy")
ax1.axhspan(0.4, 0.6, alpha=0.1, color="yellow")
ax1.axhspan(0, 0.4, alpha=0.1, color="red")

# Plot 2: Box plots by month showing variation
df_sorted = df.copy()
df_sorted["Month"] = pd.Categorical(df_sorted["Month"], categories=month_order, ordered=True)
df_sorted = df_sorted.sort_values("Month")

sns.boxplot(data=df_sorted, x="Month", y="NDVI", ax=ax2, palette="viridis")
ax2.set_xlabel("Month (Growing Season)", fontsize=12, fontweight="bold")
ax2.set_ylabel("NDVI", fontsize=12, fontweight="bold")
ax2.set_title("NDVI Variation Throughout Growing Season", fontsize=14, fontweight="bold", pad=15)
ax2.grid(True, alpha=0.3, axis="y")

# Add reference lines
ax2.axhline(0.6, color="green", linestyle="--", linewidth=1.5, alpha=0.7, label="Healthy threshold")
ax2.axhline(0.4, color="orange", linestyle="--", linewidth=1.5, alpha=0.7, label="Stress threshold")
ax2.legend(fontsize=9)

plt.tight_layout()
plt.show()

print("Temporal Statistics by Month:")
print(df.groupby("Month")["NDVI"].describe()[["mean", "std", "min", "max"]].round(3))

## 6. Health Classification

### Classification Scheme

Based on NDVI values, we classify field health status:

- **Healthy** (NDVI > 0.6): Dense canopy, optimal photosynthetic activity
- **Moderate** (0.4 ≤ NDVI ≤ 0.6): Adequate growth but potential stress
- **Stressed** (NDVI < 0.4): Sparse vegetation, significant stress factors

Stress factors can include:
- Water deficit (drought)
- Nutrient deficiency
- Disease or pest damage
- Poor soil conditions
- Early or late season stages

In [None]:
# Classify fields based on NDVI thresholds
def classify_health(ndvi):
    if ndvi > 0.6:
        return "Healthy"
    elif ndvi >= 0.4:
        return "Moderate"
    else:
        return "Stressed"


df["Health_Status"] = df["NDVI"].apply(classify_health)

# Create visualizations
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Plot 1: Overall health distribution
health_counts = df["Health_Status"].value_counts()
colors_health = {"Healthy": "#27ae60", "Moderate": "#f39c12", "Stressed": "#e74c3c"}
ordered_health = ["Healthy", "Moderate", "Stressed"]
colors_ordered = [colors_health[h] for h in ordered_health if h in health_counts.index]

axes[0, 0].bar(
    ordered_health,
    [health_counts.get(h, 0) for h in ordered_health],
    color=colors_ordered,
    alpha=0.8,
    edgecolor="black",
    linewidth=1.5,
)
axes[0, 0].set_ylabel("Number of Fields", fontsize=11, fontweight="bold")
axes[0, 0].set_xlabel("Health Status", fontsize=11, fontweight="bold")
axes[0, 0].set_title("Distribution of Field Health Status", fontsize=12, fontweight="bold")
axes[0, 0].grid(True, alpha=0.3, axis="y")

# Add percentage labels
total = len(df)
for i, status in enumerate(ordered_health):
    count = health_counts.get(status, 0)
    pct = 100 * count / total
    axes[0, 0].text(i, count + 1, f"{pct:.1f}%", ha="center", fontsize=10, fontweight="bold")

# Plot 2: Health by crop type
health_crop = pd.crosstab(df["Crop_Type"], df["Health_Status"], normalize="index") * 100
health_crop = health_crop[[h for h in ordered_health if h in health_crop.columns]]
health_crop.plot(
    kind="bar",
    stacked=True,
    ax=axes[0, 1],
    color=[colors_health[h] for h in health_crop.columns],
    alpha=0.8,
    edgecolor="black",
    linewidth=1,
)
axes[0, 1].set_ylabel("Percentage of Fields (%)", fontsize=11, fontweight="bold")
axes[0, 1].set_xlabel("Crop Type", fontsize=11, fontweight="bold")
axes[0, 1].set_title("Health Status Distribution by Crop Type", fontsize=12, fontweight="bold")
axes[0, 1].legend(title="Health Status", fontsize=9)
axes[0, 1].set_xticklabels(axes[0, 1].get_xticklabels(), rotation=45, ha="right")
axes[0, 1].grid(True, alpha=0.3, axis="y")

# Plot 3: Health by month
df_sorted["Health_Status"] = pd.Categorical(
    df_sorted["Health_Status"], categories=ordered_health, ordered=True
)
health_month = pd.crosstab(df_sorted["Month"], df_sorted["Health_Status"], normalize="index") * 100
health_month = health_month[[h for h in ordered_health if h in health_month.columns]]
health_month.plot(
    kind="bar",
    stacked=True,
    ax=axes[1, 0],
    color=[colors_health[h] for h in health_month.columns],
    alpha=0.8,
    edgecolor="black",
    linewidth=1,
)
axes[1, 0].set_ylabel("Percentage of Fields (%)", fontsize=11, fontweight="bold")
axes[1, 0].set_xlabel("Month", fontsize=11, fontweight="bold")
axes[1, 0].set_title("Health Status Distribution by Month", fontsize=12, fontweight="bold")
axes[1, 0].legend(title="Health Status", fontsize=9)
axes[1, 0].set_xticklabels(axes[1, 0].get_xticklabels(), rotation=45, ha="right")
axes[1, 0].grid(True, alpha=0.3, axis="y")

# Plot 4: NDVI distribution by health status
health_order = ["Stressed", "Moderate", "Healthy"]
sns.violinplot(
    data=df,
    y="Health_Status",
    x="NDVI",
    ax=axes[1, 1],
    order=health_order,
    palette={"Healthy": "#27ae60", "Moderate": "#f39c12", "Stressed": "#e74c3c"},
)
axes[1, 1].set_xlabel("NDVI", fontsize=11, fontweight="bold")
axes[1, 1].set_ylabel("Health Status", fontsize=11, fontweight="bold")
axes[1, 1].set_title("NDVI Distribution by Health Category", fontsize=12, fontweight="bold")
axes[1, 1].axvline(0.4, color="orange", linestyle="--", linewidth=1.5, alpha=0.7)
axes[1, 1].axvline(0.6, color="green", linestyle="--", linewidth=1.5, alpha=0.7)
axes[1, 1].grid(True, alpha=0.3, axis="x")

plt.tight_layout()
plt.show()

# Print summary statistics
print("Health Status Summary:")
print("=" * 60)
print(f"Total fields analyzed: {len(df)}")
print(
    f"\nHealthy fields: {health_counts.get('Healthy', 0)} ({100 * health_counts.get('Healthy', 0) / len(df):.1f}%)"
)
print(
    f"Moderate health fields: {health_counts.get('Moderate', 0)} ({100 * health_counts.get('Moderate', 0) / len(df):.1f}%)"
)
print(
    f"Stressed fields: {health_counts.get('Stressed', 0)} ({100 * health_counts.get('Stressed', 0) / len(df):.1f}%)"
)

print("\nMean NDVI by Health Status:")
print(df.groupby("Health_Status")["NDVI"].mean().round(3))

## 7. Crop Type Comparison

Different crop species have distinct spectral characteristics due to:
- **Canopy architecture**: Row spacing, leaf angle distribution
- **Leaf properties**: Size, thickness, surface texture
- **Growth patterns**: C3 vs C4 photosynthetic pathways
- **Phenological timing**: Different emergence and maturation dates

In [None]:
# Compare vegetation indices across crop types
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Plot 1: NDVI by crop type
sns.boxplot(data=df, x="Crop_Type", y="NDVI", ax=axes[0, 0], palette="Set2")
axes[0, 0].set_ylabel("NDVI", fontsize=11, fontweight="bold")
axes[0, 0].set_xlabel("Crop Type", fontsize=11, fontweight="bold")
axes[0, 0].set_title("NDVI Comparison Across Crop Types", fontsize=12, fontweight="bold")
axes[0, 0].axhline(0.6, color="green", linestyle="--", alpha=0.5)
axes[0, 0].axhline(0.4, color="orange", linestyle="--", alpha=0.5)
axes[0, 0].grid(True, alpha=0.3, axis="y")
axes[0, 0].set_xticklabels(axes[0, 0].get_xticklabels(), rotation=45, ha="right")

# Plot 2: EVI by crop type
sns.boxplot(data=df, x="Crop_Type", y="EVI", ax=axes[0, 1], palette="Set3")
axes[0, 1].set_ylabel("EVI", fontsize=11, fontweight="bold")
axes[0, 1].set_xlabel("Crop Type", fontsize=11, fontweight="bold")
axes[0, 1].set_title("EVI Comparison Across Crop Types", fontsize=12, fontweight="bold")
axes[0, 1].grid(True, alpha=0.3, axis="y")
axes[0, 1].set_xticklabels(axes[0, 1].get_xticklabels(), rotation=45, ha="right")

# Plot 3: SAVI by crop type
sns.boxplot(data=df, x="Crop_Type", y="SAVI", ax=axes[1, 0], palette="Pastel1")
axes[1, 0].set_ylabel("SAVI", fontsize=11, fontweight="bold")
axes[1, 0].set_xlabel("Crop Type", fontsize=11, fontweight="bold")
axes[1, 0].set_title("SAVI Comparison Across Crop Types", fontsize=12, fontweight="bold")
axes[1, 0].grid(True, alpha=0.3, axis="y")
axes[1, 0].set_xticklabels(axes[1, 0].get_xticklabels(), rotation=45, ha="right")

# Plot 4: Comparison of all indices by crop (violin plot)
indices_melted = df.melt(
    id_vars=["Crop_Type"], value_vars=["NDVI", "EVI", "SAVI"], var_name="Index", value_name="Value"
)
sns.violinplot(
    data=indices_melted,
    x="Crop_Type",
    y="Value",
    hue="Index",
    ax=axes[1, 1],
    palette="muted",
    split=False,
)
axes[1, 1].set_ylabel("Index Value", fontsize=11, fontweight="bold")
axes[1, 1].set_xlabel("Crop Type", fontsize=11, fontweight="bold")
axes[1, 1].set_title("All Vegetation Indices by Crop Type", fontsize=12, fontweight="bold")
axes[1, 1].legend(title="Index", fontsize=9)
axes[1, 1].grid(True, alpha=0.3, axis="y")
axes[1, 1].set_xticklabels(axes[1, 1].get_xticklabels(), rotation=45, ha="right")

plt.tight_layout()
plt.show()

# Statistical summary
print("Vegetation Indices by Crop Type:")
print("=" * 60)
crop_summary = df.groupby("Crop_Type")[["NDVI", "EVI", "SAVI"]].agg(["mean", "std"])
print(crop_summary.round(3))

## 8. Spatial Patterns Analysis

### Vegetation Space Visualization

The NIR-Red reflectance space (also called vegetation space) reveals important patterns:
- **Soil Line**: Diagonal line where NIR ≈ Red (bare soil)
- **Vegetation Trajectory**: Perpendicular distance from soil line indicates vegetation amount
- **Isovegetation Lines**: Lines parallel to soil line represent similar vegetation density

Healthy vegetation appears in the upper-left region (high NIR, low Red), while stressed vegetation or sparse cover falls closer to the soil line.

In [None]:
# Spatial pattern analysis: NIR vs Red colored by different attributes
fig, axes = plt.subplots(2, 2, figsize=(15, 14))

# Plot 1: Colored by health status
for health in ordered_health:
    health_data = df[df["Health_Status"] == health]
    axes[0, 0].scatter(
        health_data["Red"],
        health_data["NIR"],
        c=colors_health[health],
        label=health,
        alpha=0.6,
        s=80,
        edgecolors="black",
        linewidth=0.5,
    )

axes[0, 0].plot([0, 0.3], [0, 0.3], "k--", alpha=0.5, linewidth=2, label="Soil Line")
axes[0, 0].set_xlabel("Red Reflectance", fontsize=12, fontweight="bold")
axes[0, 0].set_ylabel("NIR Reflectance", fontsize=12, fontweight="bold")
axes[0, 0].set_title(
    "Vegetation Space: Colored by Health Status", fontsize=13, fontweight="bold", pad=10
)
axes[0, 0].legend(fontsize=10, loc="upper left")
axes[0, 0].grid(True, alpha=0.3)

# Plot 2: Colored by crop type
crop_colors = {"Corn": "#FF6B6B", "Wheat": "#4ECDC4", "Soybean": "#45B7D1", "Cotton": "#FFA07A"}
for crop in df["Crop_Type"].unique():
    crop_data = df[df["Crop_Type"] == crop]
    axes[0, 1].scatter(
        crop_data["Red"],
        crop_data["NIR"],
        c=crop_colors.get(crop, "gray"),
        label=crop,
        alpha=0.6,
        s=80,
        edgecolors="black",
        linewidth=0.5,
    )

axes[0, 1].plot([0, 0.3], [0, 0.3], "k--", alpha=0.5, linewidth=2, label="Soil Line")
axes[0, 1].set_xlabel("Red Reflectance", fontsize=12, fontweight="bold")
axes[0, 1].set_ylabel("NIR Reflectance", fontsize=12, fontweight="bold")
axes[0, 1].set_title(
    "Vegetation Space: Colored by Crop Type", fontsize=13, fontweight="bold", pad=10
)
axes[0, 1].legend(fontsize=10, loc="upper left")
axes[0, 1].grid(True, alpha=0.3)

# Plot 3: Colored by NDVI value (continuous)
scatter = axes[1, 0].scatter(
    df["Red"],
    df["NIR"],
    c=df["NDVI"],
    cmap="RdYlGn",
    alpha=0.7,
    s=80,
    edgecolors="black",
    linewidth=0.5,
    vmin=0,
    vmax=1,
)
axes[1, 0].plot([0, 0.3], [0, 0.3], "k--", alpha=0.5, linewidth=2, label="Soil Line")
axes[1, 0].set_xlabel("Red Reflectance", fontsize=12, fontweight="bold")
axes[1, 0].set_ylabel("NIR Reflectance", fontsize=12, fontweight="bold")
axes[1, 0].set_title("Vegetation Space: Colored by NDVI", fontsize=13, fontweight="bold", pad=10)
cbar = plt.colorbar(scatter, ax=axes[1, 0])
cbar.set_label("NDVI", fontsize=11, fontweight="bold")
axes[1, 0].legend(fontsize=10, loc="upper left")
axes[1, 0].grid(True, alpha=0.3)

# Plot 4: Colored by month (temporal progression)
month_colors = {
    "April": "#e74c3c",
    "May": "#e67e22",
    "June": "#f39c12",
    "July": "#2ecc71",
    "August": "#3498db",
}
for month in month_order:
    month_data = df[df["Month"] == month]
    axes[1, 1].scatter(
        month_data["Red"],
        month_data["NIR"],
        c=month_colors[month],
        label=month,
        alpha=0.6,
        s=80,
        edgecolors="black",
        linewidth=0.5,
    )

axes[1, 1].plot([0, 0.3], [0, 0.3], "k--", alpha=0.5, linewidth=2, label="Soil Line")
axes[1, 1].set_xlabel("Red Reflectance", fontsize=12, fontweight="bold")
axes[1, 1].set_ylabel("NIR Reflectance", fontsize=12, fontweight="bold")
axes[1, 1].set_title("Vegetation Space: Colored by Month", fontsize=13, fontweight="bold", pad=10)
axes[1, 1].legend(fontsize=10, loc="upper left")
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Spatial patterns reveal:")
print("- Healthy fields cluster in high NIR, low Red region")
print("- Stressed fields fall closer to the soil line")
print("- Temporal progression shows seasonal vegetation dynamics")

## 9. Summary Statistics and Key Findings

This section synthesizes the analysis results into actionable insights for agricultural management.

In [None]:
# Create comprehensive summary table
summary_stats = []

# Overall statistics
summary_stats.append({"Metric": "Total Fields Analyzed", "Value": len(df), "Unit": "fields"})

summary_stats.append({"Metric": "Study Period", "Value": "April - August", "Unit": "5 months"})

summary_stats.append(
    {"Metric": "Crop Types", "Value": ", ".join(df["Crop_Type"].unique()), "Unit": "4 types"}
)

# NDVI statistics
summary_stats.append(
    {"Metric": "Mean NDVI (All Fields)", "Value": f"{df['NDVI'].mean():.3f}", "Unit": "index value"}
)

summary_stats.append(
    {
        "Metric": "NDVI Range",
        "Value": f"{df['NDVI'].min():.3f} - {df['NDVI'].max():.3f}",
        "Unit": "min-max",
    }
)

# Health classification
health_counts = df["Health_Status"].value_counts()
summary_stats.append(
    {
        "Metric": "Healthy Fields",
        "Value": f"{health_counts.get('Healthy', 0)} ({100 * health_counts.get('Healthy', 0) / len(df):.1f}%)",
        "Unit": "count (%)",
    }
)

summary_stats.append(
    {
        "Metric": "Moderate Health Fields",
        "Value": f"{health_counts.get('Moderate', 0)} ({100 * health_counts.get('Moderate', 0) / len(df):.1f}%)",
        "Unit": "count (%)",
    }
)

summary_stats.append(
    {
        "Metric": "Stressed Fields",
        "Value": f"{health_counts.get('Stressed', 0)} ({100 * health_counts.get('Stressed', 0) / len(df):.1f}%)",
        "Unit": "count (%)",
    }
)

# Peak month
monthly_mean = df.groupby("Month")["NDVI"].mean()
peak_month = monthly_mean.idxmax()
summary_stats.append(
    {
        "Metric": "Peak Vegetation Month",
        "Value": f"{peak_month} (NDVI: {monthly_mean[peak_month]:.3f})",
        "Unit": "month",
    }
)

# Best performing crop
crop_mean = df.groupby("Crop_Type")["NDVI"].mean()
best_crop = crop_mean.idxmax()
summary_stats.append(
    {
        "Metric": "Highest Mean NDVI Crop",
        "Value": f"{best_crop} (NDVI: {crop_mean[best_crop]:.3f})",
        "Unit": "crop type",
    }
)

# Spectral characteristics
summary_stats.append(
    {"Metric": "Mean NIR Reflectance", "Value": f"{df['NIR'].mean():.3f}", "Unit": "reflectance"}
)

summary_stats.append(
    {"Metric": "Mean Red Reflectance", "Value": f"{df['Red'].mean():.3f}", "Unit": "reflectance"}
)

summary_stats.append(
    {
        "Metric": "NIR/Red Ratio (Mean)",
        "Value": f"{(df['NIR'] / df['Red']).mean():.2f}",
        "Unit": "ratio",
    }
)

# Create summary DataFrame
summary_df = pd.DataFrame(summary_stats)

print("=" * 70)
print("AGRICULTURAL REMOTE SENSING ANALYSIS - SUMMARY REPORT")
print("=" * 70)
print(summary_df.to_string(index=False))
print("=" * 70)

# Create detailed crop comparison table
print("\n" + "=" * 70)
print("CROP-SPECIFIC PERFORMANCE METRICS")
print("=" * 70)

crop_comparison = (
    df.groupby("Crop_Type")
    .agg(
        {
            "NDVI": ["mean", "std", "min", "max"],
            "EVI": ["mean", "std"],
            "SAVI": ["mean", "std"],
            "NIR": "mean",
            "Red": "mean",
        }
    )
    .round(3)
)

print(crop_comparison)

# Health status by crop
print("\n" + "=" * 70)
print("HEALTH STATUS DISTRIBUTION BY CROP TYPE")
print("=" * 70)
health_by_crop = pd.crosstab(
    df["Crop_Type"], df["Health_Status"], margins=True, margins_name="Total"
)
health_by_crop_pct = pd.crosstab(df["Crop_Type"], df["Health_Status"], normalize="index") * 100
print(health_by_crop)
print("\nPercentages:")
print(health_by_crop_pct.round(1))

# Temporal trends
print("\n" + "=" * 70)
print("TEMPORAL TRENDS: MEAN NDVI BY MONTH")
print("=" * 70)
temporal_summary = df.groupby("Month")["NDVI"].agg(["mean", "std", "min", "max"]).round(3)
temporal_summary = temporal_summary.reindex(month_order)
print(temporal_summary)

## Key Findings and Management Recommendations

### Main Observations:

1. **Crop Health Status**:
   - The majority of fields show healthy vegetation (NDVI > 0.6)
   - Fields with moderate or stressed status may require intervention
   - Health status varies by crop type and growing stage

2. **Seasonal Patterns**:
   - NDVI increases from April (emergence) through June-July (peak growth)
   - August shows beginning of senescence for some crops
   - Different crops exhibit distinct phenological curves

3. **Crop-Specific Performance**:
   - Corn typically shows highest NDVI values due to C4 photosynthesis
   - Cotton and wheat may show lower values due to canopy structure
   - Soybean demonstrates intermediate characteristics

4. **Spectral Signatures**:
   - Strong NIR-Red contrast indicates healthy vegetation
   - Fields approaching soil line may indicate stress or sparse cover
   - Green peak confirms chlorophyll presence

### Agricultural Management Applications:

1. **Precision Agriculture**:
   - Use NDVI maps to guide variable-rate fertilizer application
   - Target irrigation to fields with declining indices
   - Scout low-NDVI areas for pest or disease problems

2. **Yield Prediction**:
   - Peak-season NDVI correlates with final yield
   - Monitor temporal integration of NDVI for biomass accumulation
   - Compare current year to historical baselines

3. **Stress Detection**:
   - Early identification of water stress before visible symptoms
   - Nutrient deficiency detection through spectral anomalies
   - Disease outbreak monitoring through spatial patterns

4. **Crop Insurance and Documentation**:
   - Objective evidence of crop condition throughout season
   - Support for insurance claims related to crop damage
   - Historical records for field performance analysis

### Future Analysis Directions:

- Incorporate additional spectral bands (red edge, shortwave infrared)
- Apply machine learning for automated stress classification
- Integrate with weather data for stress attribution
- Develop field-specific NDVI baselines for anomaly detection
- Create time-series models for yield prediction

---

## Conclusion

This analysis demonstrates the power of agricultural remote sensing for crop monitoring. By analyzing multispectral satellite imagery and calculating vegetation indices, we can:

- **Quantify crop health** objectively across large areas
- **Monitor temporal dynamics** throughout the growing season
- **Identify stressed areas** requiring management intervention
- **Compare crop performance** across fields and varieties
- **Support data-driven decisions** for precision agriculture

Remote sensing provides a cost-effective, non-destructive approach to agricultural monitoring that complements traditional field measurements and enables proactive management of crop production systems.