# Monte Carlo Simulation for Agricultural Forecasting (2024-2030)

## Objective

Forecast future values of four key agricultural variables under uncertainty:
- **Expected sales volume** ($X$): Market demand
- **Yield per mu** ($q$): Production capacity
- **Sales price** ($p$): Market value
- **Planting cost** ($c$): Production expense

## Methodology: Hybrid Monte Carlo Model

Two stochastic models capture different uncertainty patterns:

### Model A: Compounding Trend

For variables with "average annual growth rate" (cost, price, wheat/corn sales):

$$V_t^{(k)} = V_{2023} \times (1 + r^{(k)})^t, \quad r^{(k)} \sim U(r_{min}, r_{max})$$

- Growth rate $r^{(k)}$ sampled **once per scenario**
- Models long-term trend uncertainty

### Model B: Annual Random Walk

For variables with "annual change" (yield, other crop sales):

$$V_t^{(k)} = V_{t-1}^{(k)} \times (1 + r_t^{(k)}), \quad r_t^{(k)} \sim U(a, b)$$

- Growth rate $r_t^{(k)}$ sampled **every year**
- Models short-term volatility with cumulative effects

In [1]:
import pandas as pd
import numpy as np

np.random.seed(42)

## 1. Load 2023 Baseline Data

Aggregate by crop (mean across plot types) and convert price units:

$$P_{\text{kg}} = P_{\text{jin}} \times 2 \quad (\text{1 jin} = 0.5 \text{ kg})$$

In [2]:
# Load and clean data
sales_df = pd.read_csv('sales_volume_data.csv')
sales_df = sales_df[sales_df['Crop Name'].notna() & (sales_df['Crop Name'] != '0')]
sales_df = sales_df[~sales_df['Crop Name'].str.contains('planting costs', na=False)]

# Aggregate by crop
crop_data = sales_df.groupby('Crop Name').agg({
    'Expected_Sales_Volume': 'first',
    'Yield_per_mu': 'mean',
    'Avg_Price': 'mean',
    'Cost_per_mu': 'mean'
}).reset_index()

# Convert price: Yuan/jin → Yuan/kg
crop_data['Price_per_kg'] = crop_data['Avg_Price'] * 2

print(f"Loaded {len(crop_data)} crops with 2023 baseline data")

Loaded 41 crops with 2023 baseline data


## 2. Crop Classification

Three categories determine growth parameters:
- **G1 (Grains)**: Field crops, climate-sensitive
- **G2 (Vegetables)**: Partial greenhouse, moderate stability
- **G3 (Mushrooms)**: Full greenhouse, market-driven

In [3]:
# Define categories
G1_crops = [
    'Wheat', 'Corn', 'Sorghum', 'Millet', 'Broomcorn Millet', 'Buckwheat', 'Barley', 'Naked Oat', 'Rice',
    'Soybean', 'Black Bean', 'Red Bean', 'Mung Bean', 'Climbing Bean', 'Cowpea', 'Sword Bean', 'Kidney Bean',
    'Potato', 'Sweet Potato', 'Pumpkin'
]

G2_crops = [
    'Tomato', 'Cucumber', 'Eggplant', 'Cauliflower', 'Chili Pepper', 'Green Pepper',
    'Chinese Cabbage', 'Cabbage', 'Yellow Heart Cabbage', 'Baby Bok Choy', 'Lettuce ', 'Romaine Lettuce',
    'Celery', 'Spinach ', 'Water Spinach', 'White Radish', 'Red Radish'
]

G3_crops = ['Morel Mushroom', 'Shiitake Mushroom', 'Golden Oyster Mushroom', 'White Elf Mushroom']

# Map crops to categories
crop_category = {}
for crop in G1_crops: crop_category[crop] = 'G1'
for crop in G2_crops: crop_category[crop] = 'G2'
for crop in G3_crops: crop_category[crop] = 'G3'

crop_data['Category'] = crop_data['Crop Name'].map(crop_category).fillna('G2')
print(f"G1: {len(G1_crops)}, G2: {len(G2_crops)}, G3: {len(G3_crops)}")

G1: 20, G2: 17, G3: 4


## 3. Define Growth Rate Parameters

Based on problem statement:

| Variable | G1 (Grains) | G2 (Vegetables) | G3 (Mushrooms) | Model |
|----------|-------------|-----------------|----------------|-------|
| Sales ($\alpha$) | ±5% | ±5% | ±5% | B (volatility) |
| Yield ($\beta$) | ±10% | ±10% | ±10% | B (volatility) |
| Price ($\gamma$) | ±1% | +4% to +6% | -5% to -1% | A (trend) |
| Cost ($\delta$) | +5% | +5% | +5% | A (fixed) |

**Special cases:**
- Wheat/Corn sales: +5% to +10% (Model A)
- Morel Mushroom price: -5% fixed (Model A)

In [4]:
# Growth rate ranges
growth_rates = {
    'G1': {'alpha': (-0.05, 0.05), 'beta': (-0.10, 0.10), 'gamma': (-0.01, 0.01), 'delta': 0.05},
    'G2': {'alpha': (-0.05, 0.05), 'beta': (-0.10, 0.10), 'gamma': (0.04, 0.06), 'delta': 0.05},
    'G3': {'alpha': (-0.05, 0.05), 'beta': (-0.10, 0.10), 'gamma': (-0.05, -0.01), 'delta': 0.05}
}

# Special cases
special_crops = {
    'Wheat': {'alpha': (0.05, 0.10)},
    'Corn': {'alpha': (0.05, 0.10)},
    'Morel Mushroom': {'gamma': -0.05}
}

## 4. Run Monte Carlo Simulation

For each scenario $k \in \{1, ..., 1000\}$:

1. **Sample Model A rates** (once per scenario):
   - Cost trend: $\delta^{(k)}$
   - Price trend: $\gamma^{(k)}$
   - Sales trend: $\alpha^{(k)}$ (wheat/corn only)

2. **Iterate through years** $t \in \{2024, ..., 2030\}$:
   - Apply Model A: $V_t = V_{2023} \times (1 + r)^{t-2023}$
   - Sample Model B rates (each year): $r_t^{(k)}$
   - Apply Model B: $V_t = V_{t-1} \times (1 + r_t)$

3. **Store results** for aggregation

In [5]:
N_SCENARIOS = 1000
YEARS = list(range(2024, 2031))
all_predictions = []

for scenario_idx in range(N_SCENARIOS):
    for _, row in crop_data.iterrows():
        crop = row['Crop Name']
        category = row['Category']
        base_params = growth_rates[category]
        
        # Sample Model A rates (once per scenario)
        delta_trend = base_params['delta']
        
        if crop == 'Morel Mushroom':
            gamma_trend = special_crops[crop]['gamma']
        else:
            gamma_trend = np.random.uniform(*base_params['gamma'])
        
        alpha_trend = None
        if crop in ['Wheat', 'Corn']:
            alpha_trend = np.random.uniform(*special_crops[crop]['alpha'])
        
        # Initialize values
        last_sales = row['Expected_Sales_Volume']
        last_yield = row['Yield_per_mu']
        base_sales = row['Expected_Sales_Volume']
        base_price = row['Price_per_kg']
        base_cost = row['Cost_per_mu']
        
        # Iterate through years
        for year in YEARS:
            t = year - 2023
            
            # Model A: Compounding trend
            current_cost = base_cost * (1 + delta_trend) ** t
            current_price = base_price * (1 + gamma_trend) ** t
            
            # Model B: Annual volatility (Yield)
            beta_t = np.random.uniform(*base_params['beta'])
            current_yield = last_yield * (1 + beta_t)
            last_yield = current_yield
            
            # Hybrid: Sales (Model A for wheat/corn, Model B for others)
            if alpha_trend is not None:
                current_sales = base_sales * (1 + alpha_trend) ** t
            else:
                alpha_t = np.random.uniform(*base_params['alpha'])
                current_sales = last_sales * (1 + alpha_t)
                last_sales = current_sales
            
            all_predictions.append({
                'Scenario': scenario_idx,
                'Crop': crop,
                'Category': category,
                'Year': year,
                'Sales_Volume': current_sales,
                'Yield_per_Mu': current_yield,
                'Price': current_price,
                'Cost': current_cost
            })

predictions_raw = pd.DataFrame(all_predictions)
print(f"Simulation complete: {len(predictions_raw):,} records ({N_SCENARIOS} scenarios × {len(crop_data)} crops × 7 years)")

Simulation complete: 287,000 records (1000 scenarios × 41 crops × 7 years)


## 5. Aggregate Results

Compute statistics across scenarios:

$$\bar{V}_{i,t} = \frac{1}{N} \sum_{k=1}^{N} V_{i,t}^{(k)}, \quad \sigma_{i,t} = \sqrt{\frac{1}{N-1} \sum_{k=1}^{N} (V_{i,t}^{(k)} - \bar{V}_{i,t})^2}$$

where $i$ = crop, $t$ = year, $N$ = 1000 scenarios.

In [9]:
# Calculate statistics
predictions_stats = predictions_raw.groupby(['Crop', 'Category', 'Year']).agg({
    'Sales_Volume': ['mean', 'std', 'min', 'max'],
    'Yield_per_Mu': ['mean', 'std', 'min', 'max'],
    'Price': ['mean', 'std', 'min', 'max'],
    'Cost': ['mean', 'std', 'min', 'max']
}).reset_index()

# Flatten column names
predictions_stats.columns = ['_'.join(col).strip('_') if col[1] else col[0] for col in predictions_stats.columns]

# Export
output_file = 'crop_predictions_2024_2030.csv'
predictions_stats.to_csv(output_file, index=False)

print(f"Exported to: {output_file}")

Exported to: crop_predictions_2024_2030.csv
