# Multi-Commodity Climate Analysis

This notebook examines the relationship between climate variables and prices for multiple commodities including:
- Coffee
- Cocoa
- Maize (Corn)
- Wheat
- Soybeans
- Cotton
- Rice

For each commodity, we analyze climate data from key growing regions and explore how climate variables correlate with price movements.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import os
import glob

# Set plot styling
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('viridis')

# Set up plot parameters for better display
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 12

# Import commodity region definitions
from commodity_regions import COMMODITY_NAMES, COMMODITY_REGIONS, PRIMARY_REGIONS

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

## 1. Load Commodity Climate Data

First, we'll load the climate and price data for all commodities that were processed.

In [None]:
# Find all commodity climate files
commodity_files = glob.glob("*_climate_joined.csv")

# Dictionary to store DataFrames for each commodity
commodity_dfs = {}

for file in commodity_files:
    # Extract commodity name from filename
    commodity = file.split('_')[0].capitalize()
    
    # Load the data
    df = pd.read_csv(file)
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Store in dictionary
    commodity_dfs[commodity] = df
    
    # Display basic info
    print(f"Loaded {commodity} data: {len(df)} records from {df['Date'].min().strftime('%Y-%m')} to {df['Date'].max().strftime('%Y-%m')}")
    print(f"  Regions: {', '.join(df['Region'].unique())}")
    print(f"  Price column: {commodity}_Price")
    print()

In [None]:
# Preview one dataset (coffee)
if 'Coffee' in commodity_dfs:
    display(commodity_dfs['Coffee'].head())

## 2. Compare Commodity Prices Over Time

Now, let's compare how the prices of different commodities have changed over time.

In [None]:
# Plot all commodity prices on the same chart
plt.figure(figsize=(14, 8))

for commodity, df in commodity_dfs.items():
    price_col = f"{commodity}_Price"
    if price_col in df.columns:
        # Normalize prices to allow comparison (first value = 100)
        first_price = df.iloc[0][price_col]
        normalized_prices = df[price_col] / first_price * 100
        
        plt.plot(df['Date'], normalized_prices, label=commodity, linewidth=2)

plt.title('Normalized Commodity Prices (2015-2022)', fontsize=14)
plt.xlabel('Date')
plt.ylabel('Price Index (first month = 100)')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 3. Explore Climate Variables by Commodity Region

Let's examine how climate variables differ across commodity growing regions.

In [None]:
# Compare temperature profiles across commodity regions
plt.figure(figsize=(14, 8))

for commodity, df in commodity_dfs.items():
    # Get first region
    region = df['Region'].iloc[0]
    plt.plot(df['Date'], df['temperature_C'], label=f"{region} ({commodity})")
    
plt.title('Temperature in Different Commodity Growing Regions', fontsize=14)
plt.xlabel('Date')
plt.ylabel('Temperature (°C)')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Compare precipitation profiles across commodity regions
plt.figure(figsize=(14, 8))

for commodity, df in commodity_dfs.items():
    # Get first region
    region = df['Region'].iloc[0]
    plt.plot(df['Date'], df['precip_m'], label=f"{region} ({commodity})")
    
plt.title('Precipitation in Different Commodity Growing Regions', fontsize=14)
plt.xlabel('Date')
plt.ylabel('Precipitation (m)')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 4. Climate-Price Correlation Analysis

Now, let's analyze how climate variables correlate with commodity prices for each commodity.

In [None]:
# Calculate correlations for each commodity
correlation_results = {}

for commodity, df in commodity_dfs.items():
    price_col = f"{commodity}_Price"
    
    if price_col in df.columns:
        # Calculate correlations between climate variables and price
        climate_vars = ['temperature_C', 'precip_m', 'temp_anomaly', 'precip_anomaly', 'drought_index']
        climate_vars = [var for var in climate_vars if var in df.columns]
        
        corr_data = df[climate_vars + [price_col]].corr()[price_col].drop(price_col)
        correlation_results[commodity] = corr_data

# Convert to DataFrame for easier viewing
correlation_df = pd.DataFrame(correlation_results)
correlation_df

In [None]:
# Visualize correlations
plt.figure(figsize=(12, 8))
sns.heatmap(correlation_df, annot=True, cmap='coolwarm', center=0, fmt='.2f')
plt.title('Correlation between Climate Variables and Commodity Prices', fontsize=14)
plt.tight_layout()
plt.show()

## 5. Time-Lagged Correlation Analysis

Climate impacts on commodity prices often have a time lag. Let's analyze how climate variables correlate with future commodity prices.

In [None]:
# Function to calculate lagged correlations
def calculate_lagged_correlations(df, price_col, climate_var, max_lag=6):
    results = []
    
    # Sort DataFrame by date
    df_sorted = df.sort_values('Date').copy()
    
    # Calculate correlations with different lags
    for lag in range(max_lag + 1):
        # Create lagged price column
        lagged_col = f"{price_col}_lag_{lag}"
        df_sorted[lagged_col] = df_sorted[price_col].shift(-lag)  # Negative shift for future prices
        
        # Calculate correlation
        correlation = df_sorted[climate_var].corr(df_sorted[lagged_col])
        results.append((lag, correlation))
    
    return pd.DataFrame(results, columns=['Lag (months)', 'Correlation'])

In [None]:
# Calculate and plot lagged correlations for each commodity
for commodity, df in commodity_dfs.items():
    price_col = f"{commodity}_Price"
    
    if price_col in df.columns:
        # Create a figure for this commodity
        plt.figure(figsize=(14, 8))
        
        # Calculate lagged correlations for temperature and precipitation
        temp_corr = calculate_lagged_correlations(df, price_col, 'temperature_C', max_lag=6)
        precip_corr = calculate_lagged_correlations(df, price_col, 'precip_m', max_lag=6)
        
        # Plot temperature correlation
        plt.subplot(1, 2, 1)
        plt.bar(temp_corr['Lag (months)'], temp_corr['Correlation'], color='red', alpha=0.7)
        plt.title(f'Temperature → {commodity} Price')
        plt.xlabel('Lag (months)')
        plt.ylabel('Correlation')
        plt.axhline(y=0, color='black', linestyle='-', alpha=0.3)
        plt.xticks(range(7))
        plt.grid(True, alpha=0.3)
        
        # Plot precipitation correlation
        plt.subplot(1, 2, 2)
        plt.bar(precip_corr['Lag (months)'], precip_corr['Correlation'], color='blue', alpha=0.7)
        plt.title(f'Precipitation → {commodity} Price')
        plt.xlabel('Lag (months)')
        plt.ylabel('Correlation')
        plt.axhline(y=0, color='black', linestyle='-', alpha=0.3)
        plt.xticks(range(7))
        plt.grid(True, alpha=0.3)
        
        plt.suptitle(f'Lagged Correlations for {commodity}', fontsize=16)
        plt.tight_layout()
        plt.subplots_adjust(top=0.9)
        plt.show()

## 6. Seasonal Price Analysis

Let's examine the seasonal patterns in commodity prices and explore how they relate to climate variables.

In [None]:
# Plot seasonal price patterns for each commodity
plt.figure(figsize=(14, 8))

for commodity, df in commodity_dfs.items():
    price_col = f"{commodity}_Price"
    
    if price_col in df.columns:
        # Calculate monthly averages
        monthly_avg = df.groupby('Month')[price_col].mean()
        
        # Normalize to percentage deviation from annual mean
        annual_mean = monthly_avg.mean()
        normalized_monthly = (monthly_avg / annual_mean - 1) * 100
        
        # Plot seasonal pattern
        plt.plot(normalized_monthly.index, normalized_monthly.values,
                 marker='o', linewidth=2, label=commodity)
        
plt.title('Seasonal Price Patterns by Commodity', fontsize=14)
plt.xlabel('Month')
plt.ylabel('% Deviation from Annual Mean')
plt.xticks(range(1, 13), ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                         'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
plt.axhline(y=0, color='black', linestyle='-', alpha=0.3)
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 7. Drought Analysis

Droughts can have significant impacts on commodity production and prices. Let's analyze how drought conditions affect different commodities.

In [None]:
# Plot drought index for each commodity region
plt.figure(figsize=(14, 8))

for commodity, df in commodity_dfs.items():
    if 'drought_index' in df.columns:
        # Get first region
        region = df['Region'].iloc[0]
        plt.plot(df['Date'], df['drought_index'],
                 label=f"{region} ({commodity})")
    
plt.title('Drought Index by Commodity Region', fontsize=14)
plt.xlabel('Date')
plt.ylabel('Drought Index')
plt.axhline(y=0, color='black', linestyle='-', alpha=0.3)
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Compare price changes during drought vs. normal conditions
drought_impact = {}

for commodity, df in commodity_dfs.items():
    price_col = f"{commodity}_Price"
    
    if 'drought_index' in df.columns and price_col in df.columns:
        # Define drought conditions (negative drought index)
        df['drought_condition'] = df['drought_index'] < -0.5
        
        # Calculate average price during drought and normal conditions
        avg_prices = df.groupby('drought_condition')[price_col].mean()
        
        # Calculate percent difference
        if False in avg_prices.index and True in avg_prices.index:
            normal_price = avg_prices[False]
            drought_price = avg_prices[True]
            percent_diff = (drought_price / normal_price - 1) * 100
            
            drought_impact[commodity] = {
                'Normal': normal_price,
                'Drought': drought_price,
                'Percent Difference': percent_diff
            }

# Convert to DataFrame
drought_impact_df = pd.DataFrame(drought_impact).T
drought_impact_df

In [None]:
# Plot drought impact
plt.figure(figsize=(12, 6))
drought_impact_df['Percent Difference'].plot(kind='bar', color='crimson')
plt.title('Price Impact During Drought Conditions', fontsize=14)
plt.xlabel('Commodity')
plt.ylabel('% Change in Price')
plt.axhline(y=0, color='black', linestyle='-', alpha=0.3)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 8. Commodity-Specific Analysis

Let's look at specific climate factors that are particularly important for each commodity.

In [None]:
# Analyze critical climate thresholds for various commodities
def analyze_climate_thresholds(commodity_data):
    results = {}
    
    for commodity, df in commodity_data.items():
        price_col = f"{commodity}_Price"
        commodity_results = {}
        
        if 'temperature_C' in df.columns and price_col in df.columns:
            # Analyze temperature thresholds
            # For most crops, there are optimal temperature ranges
            if commodity == 'Coffee':
                # Coffee prefers 18-22°C
                df['temp_category'] = pd.cut(
                    df['temperature_C'],
                    bins=[0, 18, 22, 100],
                    labels=['Too Cold', 'Optimal', 'Too Hot']
                )
            elif commodity == 'Cocoa':
                # Cocoa prefers 22-28°C
                df['temp_category'] = pd.cut(
                    df['temperature_C'],
                    bins=[0, 22, 28, 100],
                    labels=['Too Cold', 'Optimal', 'Too Hot']
                )
            elif commodity in ['Wheat', 'Maize']:
                # Wheat and maize have lower optimal temperatures
                df['temp_category'] = pd.cut(
                    df['temperature_C'],
                    bins=[0, 15, 25, 100],
                    labels=['Too Cold', 'Optimal', 'Too Hot']
                )
            else:
                # Default binning
                df['temp_category'] = pd.cut(
                    df['temperature_C'],
                    bins=[0, 20, 30, 100],
                    labels=['Too Cold', 'Optimal', 'Too Hot']
                )
            
            # Calculate average prices by temperature category
            temp_prices = df.groupby('temp_category')[price_col].mean()
            commodity_results['Temperature Thresholds'] = temp_prices.to_dict()
            
            # Calculate percentage of time in each category
            temp_counts = df['temp_category'].value_counts(normalize=True) * 100
            commodity_results['Temperature Distribution'] = temp_counts.to_dict()
        
        # Add results to main dictionary
        results[commodity] = commodity_results
    
    return results

# Run the analysis
threshold_analysis = analyze_climate_thresholds(commodity_dfs)

# Display results
for commodity, results in threshold_analysis.items():
    if 'Temperature Thresholds' in results:
        print(f"\n=== {commodity} ===")
        print("Average prices by temperature category:")
        for category, price in results['Temperature Thresholds'].items():
            print(f"  {category}: ${price:.2f}")
        
        print("\nPercentage of time in each temperature category:")
        for category, pct in results['Temperature Distribution'].items():
            print(f"  {category}: {pct:.1f}%")

In [None]:
# Plot price vs. temperature scatter plots for each commodity
num_commodities = len(commodity_dfs)
cols = 2
rows = (num_commodities + 1) // 2

plt.figure(figsize=(14, 4 * rows))

for i, (commodity, df) in enumerate(commodity_dfs.items(), 1):
    price_col = f"{commodity}_Price"
    
    if 'temperature_C' in df.columns and price_col in df.columns:
        plt.subplot(rows, cols, i)
        
        # Create scatter plot
        plt.scatter(df['temperature_C'], df[price_col], alpha=0.6)
        
        # Add trend line
        z = np.polyfit(df['temperature_C'], df[price_col], 1)
        p = np.poly1d(z)
        x_range = np.linspace(df['temperature_C'].min(), df['temperature_C'].max(), 100)
        plt.plot(x_range, p(x_range), 'r--')
        
        # Add correlation value
        corr = df['temperature_C'].corr(df[price_col])
        plt.text(0.05, 0.95, f"Correlation: {corr:.2f}",
                transform=plt.gca().transAxes, fontsize=12,
                bbox=dict(facecolor='white', alpha=0.7))
        
        plt.title(f'{commodity}: Price vs. Temperature')
        plt.xlabel('Temperature (°C)')
        plt.ylabel(f'{commodity} Price')
        plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 9. Cross-Commodity Climate Impact Analysis

Let's examine if climate conditions in one region can affect prices of multiple commodities.

In [None]:
# Combine all commodity prices into one DataFrame
price_df = pd.DataFrame({
    'Date': commodity_dfs['Coffee']['Date'].copy()
})

# Add price columns for each commodity
for commodity, df in commodity_dfs.items():
    price_col = f"{commodity}_Price"
    if price_col in df.columns:
        price_df[price_col] = df[price_col].values

# Calculate cross-correlation matrix between commodity prices
price_cols = [f"{c}_Price" for c in commodity_dfs.keys() if f"{c}_Price" in price_df.columns]
price_corr = price_df[price_cols].corr()

# Visualize cross-correlations
plt.figure(figsize=(10, 8))
sns.heatmap(price_corr, annot=True, cmap='coolwarm', center=0, fmt='.2f')
plt.title('Cross-Correlation between Commodity Prices', fontsize=14)
plt.tight_layout()
plt.show()

In [None]:
# Add some climate variables to the combined DataFrame
# Let's use coffee region climate as a proxy for global climate patterns
if 'Coffee' in commodity_dfs:
    coffee_df = commodity_dfs['Coffee']
    price_df['temperature_C'] = coffee_df['temperature_C'].values
    price_df['precip_m'] = coffee_df['precip_m'].values
    
    if 'drought_index' in coffee_df.columns:
        price_df['drought_index'] = coffee_df['drought_index'].values

# Calculate correlations between coffee region climate and all commodity prices
climate_vars = ['temperature_C', 'precip_m', 'drought_index']
climate_vars = [var for var in climate_vars if var in price_df.columns]

climate_price_corr = pd.DataFrame()
for var in climate_vars:
    correlations = {}
    for price_col in price_cols:
        correlations[price_col.replace('_Price', '')] = price_df[var].corr(price_df[price_col])
    climate_price_corr[var] = pd.Series(correlations)

# Display results
climate_price_corr

In [None]:
# Visualize climate-price correlations
plt.figure(figsize=(12, 8))
sns.heatmap(climate_price_corr, annot=True, cmap='coolwarm', center=0, fmt='.2f')
plt.title('Coffee Region Climate Correlations with Commodity Prices', fontsize=14)
plt.tight_layout()
plt.show()

## 10. Conclusions

Based on our analysis of climate data and commodity prices, we can draw the following conclusions:

1. **Temperature Impacts**: Different commodities show varying levels of sensitivity to temperature changes in their growing regions.

2. **Precipitation Effects**: Precipitation patterns have significant correlations with commodity prices, with drought conditions generally associated with higher prices.

3. **Time Lags**: Climate effects on commodity prices often show time lags of several months, reflecting the time between growing conditions and market impacts.

4. **Cross-Commodity Relationships**: Some commodities show correlated price movements, suggesting common climate drivers or market interactions.

5. **Regional Differences**: Growing regions for different commodities have distinct climate profiles, leading to varied climate sensitivities.

These insights can be valuable for:
- Commodity traders looking to forecast price movements
- Risk management strategies for agricultural producers
- Policy makers concerned with food security and climate change adaptation
- Researchers studying climate-economy interactions