Project Introduction: Hyper-Local Agricultural Forecasting for Staple Crops
Maize is not just a crop; it is a cornerstone of global food security and a vital part of Mexico's cultural and economic heritage. Predicting its yield is critical for regional planning and market stability. This project introduces a specialized deep learning framework designed to provide highly accurate, localized forecasts for maize, one of the world's most important staple foods.

What Does This Model Do?
Our goal is to accurately predict three key agricultural metrics for maize: Production (tons), Yield (tons per hectare), and the economic Production Value. The model achieves this by learning the complex relationship between a full year of local climate data and the final harvest results.

Why Is It Important?
This tool moves beyond generic, one-size-fits-all predictions. By providing precise, municipality-level forecasts, we can help farmers mitigate climate risks, optimize planting strategies, and improve their economic outcomes. For policymakers, this information is invaluable for managing food supply chains and ensuring national food security. This project is a direct application of NASA's global data to solve local, real-world challenges.

Our Data: Fusing Space and Terrestrial Information
The strength of our model lies in its ability to combine data from two distinct, high-quality sources: NASA's global climate data and the Mexican government's official agricultural records.

1. Climate Data - from NASA POWER
We accessed time-series data for each municipality in Mexico directly from NASA's POWER (Prediction Of Worldwide Energy Resources) project. This dataset provides a comprehensive monthly overview of key meteorological variables. For this model, we focused on two critical parameters that directly influence the conditions for crop growth:

TS (Earth Skin Temperature): This is the temperature of the very top layer of the Earth's surface. Unlike air temperature, this variable is crucial as it directly impacts seed germination, root development, and water evaporation from the soil.

ALLSKY_SFC_LW_DWN (All-Sky Surface Longwave Downward Irradiance): This measures the infrared heat energy radiated back from the atmosphere to the ground. It is a critical component of the greenhouse effect and plays a vital role in maintaining stable temperatures, especially overnight, preventing frost damage.

2. Agricultural Data - from SIAP (Mexico)
The ground-truth data for our model comes from the Agrifood and Fisheries Information Service (SIAP) of Mexico. This source provides detailed annual statistics for major crops. The key metrics include:

Produccion (Production): The total tonnage of the crop produced.

Rendimiento (Yield): The production per hectare (t/ha).

ValorProduccion (Production Value): The total economic value of the harvest.

A Robust and Scalable Solution: The Multi-Model Approach
The Importance of Our AI Architecture
Instead of building a single, monolithic model to predict the yield for the entire country, we chose a more sophisticated and effective strategy: a hyper-local, multi-model framework.

Our architecture trains a separate, specialized LSTM model for every single municipality. An LSTM (Long Short-Term Memory) network is perfectly suited for time-series data, as it can remember important climate events from early in the year (like a drought) that might affect the harvest months later.

By creating an "expert" model for each location, we achieve several key advantages:

High Accuracy: Each model learns the unique climate patterns and soil conditions of its specific municipality, leading to more precise local predictions.

Scalability: The framework is inherently scalable. Adding a new municipality simply means training a new, independent model without affecting the others.

Robustness: If data for one municipality is noisy or sparse, it doesn't degrade the performance of the models for other regions.

This innovative approach ensures that our predictions are not just statistically accurate on a national level, but are also relevant and actionable for the individual communities on the ground.

Data Sources
Climate Data: NASA POWER Data Access Viewer - https://power.larc.nasa.gov/data-access-viewer/

Agricultural Data: Sistema de Información Agroalimentaria y Pesquera (SIAP) -

In [None]:
# ==============================================================================
# NASA Space Apps Challenge - Maize Yield Prediction with Multi-Model LSTM
#
# This script trains an independent LSTM model for each municipality to predict
# maize production, yield, and production value based on NASA's climate data.
# ==============================================================================

# ------------------------------------------------------------------------------
# STEP 0: ENVIRONMENT SETUP & SEED FIXING FOR REPRODUCIBILITY
# ------------------------------------------------------------------------------
print("--- STEP 0: PREPARING ENVIRONMENT AND SETTING SEEDS ---")

# Import essential libraries
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.preprocessing import StandardScaler
import sys

# CRITICAL: Set seeds for reproducibility as per NASA guidelines
SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)

print("Seeds have been set. The model is now reproducible.")

# ------------------------------------------------------------------------------
# STEP 1: UPLOAD AND LOAD DATA
# ------------------------------------------------------------------------------
print("\n--- STEP 1: UPLOAD AND LOAD DATA ---")
print("Please select your 'Maiz. Clima.csv' file when the button appears.")

try:
    from google.colab import files
    uploaded = files.upload()
    filename = list(uploaded.keys())[0]
    print(f"\nFile '{filename}' uploaded successfully.")
    df = pd.read_csv(filename)
    print("File read successfully.")
except (ValueError, IndexError, ImportError):
    print("\nCould not run file upload in this environment or it was canceled.")
    print("Please make sure 'Maiz. Clima.csv' is in the same directory.")
    try:
        df = pd.read_csv("Maiz. Clima.csv")
    except FileNotFoundError:
        print("ERROR: 'Maiz. Clima.csv' not found. Please upload the file.")
        sys.exit()

# ------------------------------------------------------------------------------
# STEP 2: DATA PREPROCESSING
# ------------------------------------------------------------------------------
print("\n--- STEP 2: PREPROCESSING DATA ---")

# Define feature and target columns
climate_features_x = ['jan_x', 'feb_x', 'mar_x', 'apr_x', 'may_x', 'jun_x', 'jul_x', 'aug_x', 'sep_x', 'oct_x', 'nov_x', 'dec_x']
climate_features_y = ['jan_y', 'feb_y', 'mar_y', 'apr_y', 'may_y', 'jun_y', 'jul_y', 'aug_y', 'sep_y', 'oct_y', 'nov_y', 'dec_y']
all_features = climate_features_x + climate_features_y
targets = ['Produccion', 'Rendimiento', 'ValorProduccion']

# Handle missing values (-999) by replacing them with the mean of their respective municipality
print("Handling missing values...")
for col in all_features:
    df[col] = df[col].replace(-999, np.nan)
    # Group by municipality, then fill NaN with the mean of that group
    df[col] = df.groupby('municipio')[col].transform(lambda x: x.fillna(x.mean()))

# Handle any remaining NaNs (for municipalities with all NaN data) with the global mean
df.fillna(df[all_features].mean(), inplace=True)
print("Missing values handled.")

# Scale features and targets
print("Scaling data...")
scaler_X = StandardScaler()
df[all_features] = scaler_X.fit_transform(df[all_features])

scaler_y = StandardScaler()
df[targets] = scaler_y.fit_transform(df[targets])
print("Data scaling complete.")

# ------------------------------------------------------------------------------
# STEP 3: DEFINE THE LSTM MODEL
# ------------------------------------------------------------------------------
print("\n--- STEP 3: DEFINING THE LSTM MODEL ARCHITECTURE ---")

class LSTMModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(LSTMModel, self).__init__()
        self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        # Reshape input to (batch, seq_len, features)
        # We treat the 12 months of data as features for a single time step
        x = x.unsqueeze(1)
        lstm_out, _ = self.lstm(x)
        # Get the output of the last time step
        last_time_step_out = lstm_out[:, -1, :]
        out = self.fc(last_time_step_out)
        return out

print("Model class defined.")

# ------------------------------------------------------------------------------
# STEP 4: TRAIN A SEPARATE MODEL FOR EACH MUNICIPALITY
# ------------------------------------------------------------------------------
print("\n--- STEP 4: BEGINNING TRAINING LOOP FOR EACH MUNICIPALITY ---")

all_results = []
municipalities = df['municipio'].unique()
epochs = 100

for municipality in municipalities:
    print(f"\n--- Training model for: {municipality} ---")
    df_m = df[df['municipio'] == municipality]

    # Prepare data tensors for PyTorch
    X_torch = torch.tensor(df_m[all_features].values, dtype=torch.float32)
    y_torch = torch.tensor(df_m[targets].values, dtype=torch.float32)

    # Initialize model, optimizer, and loss function
    model = LSTMModel(input_dim=len(all_features), hidden_dim=64, output_dim=len(targets))
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    criterion = nn.MSELoss()

    # Training loop
    for epoch in range(epochs):
        model.train()
        optimizer.zero_grad()
        outputs = model(X_torch)
        loss = criterion(outputs, y_torch)
        loss.backward()
        optimizer.step()

        if (epoch + 1) % 50 == 0:
            print(f"Municipality: {municipality}, Epoch: {epoch+1}/{epochs}, Loss: {loss.item():.4f}")

    # Evaluation and prediction
    model.eval()
    with torch.no_grad():
        predictions_scaled = model(X_torch).detach().numpy()

    # Inverse transform predictions to their original scale
    predictions_rescaled = scaler_y.inverse_transform(predictions_scaled)
    real_values_rescaled = scaler_y.inverse_transform(y_torch.numpy())

    # Store results
    for i in range(len(df_m)):
        all_results.append({
            'Municipality': municipality,
            'Real_Production': real_values_rescaled[i, 0],
            'Predicted_Production': predictions_rescaled[i, 0],
            'Real_Yield': real_values_rescaled[i, 1],
            'Predicted_Yield': predictions_rescaled[i, 1],
            'Real_ProductionValue': real_values_rescaled[i, 2],
            'Predicted_ProductionValue': predictions_rescaled[i, 2]
        })

print("\n--- All models have been trained. ---")

# ------------------------------------------------------------------------------
# STEP 5: DISPLAY FINAL RESULTS
# ------------------------------------------------------------------------------
print("\n--- STEP 5: AGGREGATING AND DISPLAYING RESULTS ---")

df_results = pd.DataFrame(all_results)

# Calculate absolute and relative errors
df_results['Abs_Error_Yield'] = abs(df_results['Real_Yield'] - df_results['Predicted_Yield'])
df_results['Rel_Error_Yield'] = (df_results['Abs_Error_Yield'] / df_results['Real_Yield']).replace(np.inf, 0)

print("\nFinal Predictions DataFrame (first 10 rows):")
print(df_results.head(10).to_string())

print("\nDescriptive statistics of the results:")
print(df_results[['Real_Yield', 'Predicted_Yield', 'Abs_Error_Yield', 'Rel_Error_Yield']].describe())

# Save results to a CSV file
df_results.to_csv("Maize_Model_Predictions.csv", index=False)
print("\nResults have been saved to 'Maize_Model_Predictions.csv'")



--- STEP 0: PREPARING ENVIRONMENT AND SETTING SEEDS ---
Seeds have been set. The model is now reproducible.

--- STEP 1: UPLOAD AND LOAD DATA ---
Please select your 'Maiz. Clima.csv' file when the button appears.


Saving Maiz. Clima.csv to Maiz. Clima.csv

File 'Maiz. Clima.csv' uploaded successfully.
File read successfully.

--- STEP 2: PREPROCESSING DATA ---
Handling missing values...
Missing values handled.
Scaling data...
Data scaling complete.

--- STEP 3: DEFINING THE LSTM MODEL ARCHITECTURE ---
Model class defined.

--- STEP 4: BEGINNING TRAINING LOOP FOR EACH MUNICIPALITY ---

--- Training model for: acatic ---
Municipality: acatic, Epoch: 50/100, Loss: 0.0602
Municipality: acatic, Epoch: 100/100, Loss: 0.0100

--- Training model for: amatitan ---
Municipality: amatitan, Epoch: 50/100, Loss: 0.1022
Municipality: amatitan, Epoch: 100/100, Loss: 0.0174

--- Training model for: ameca ---
Municipality: ameca, Epoch: 50/100, Loss: 1.4201
Municipality: ameca, Epoch: 100/100, Loss: 0.2890

--- Training model for: arandas ---
Municipality: arandas, Epoch: 50/100, Loss: 0.2019
Municipality: arandas, Epoch: 100/100, Loss: 0.0385

--- Training model for: atengo ---
Municipality: atengo, Epoch: 50/1