# Climate and Commodity Data Analysis

This notebook explores the relationship between climate variables and coffee commodity prices. It analyzes data from multiple coffee growing regions and investigates correlations between climate factors (like temperature, precipitation) and coffee price fluctuations.

## Overview

1. Data Loading
2. Exploratory Data Analysis
3. Time Series Analysis
4. Regional Climate Comparison
5. Climate-Price Correlation Analysis
6. Signature Kernel Analysis
7. Visualization and Insights

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import os

# Set plot styling
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('viridis')

# Set up plot parameters for better display
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 12

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

## 1. Data Loading

We'll load the combined climate and commodity dataset that was created using the `coffee_climate_monthly.py` script.

In [None]:
# Load the joined dataset
joined_data_path = "climate_commodity_joined.csv"

if os.path.exists(joined_data_path):
    df = pd.read_csv(joined_data_path)
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Display basic info
    print(f"Loaded {len(df)} records from {df['Year'].min()} to {df['Year'].max()}")
    print(f"\nRegions covered: {df['Region'].nunique()}")
    print(df['Region'].unique())
    
    # Display first few rows
    df.head()
else:
    print(f"Error: Combined dataset not found at {joined_data_path}")
    print("Please run the coffee_climate_monthly.py script first to generate the dataset.")

## 2. Exploratory Data Analysis

Let's examine the structure and statistics of our combined dataset.

In [None]:
# Display column information
print("Dataset columns:")
for col in df.columns:
    print(f"- {col}")

# Display summary statistics
print("\nSummary statistics:")
df.describe()

In [None]:
# Check for missing values
missing_data = df.isnull().sum()
print("Missing values by column:")
print(missing_data[missing_data > 0])

## 3. Time Series Analysis

Let's explore how coffee prices and climate variables have changed over time.

In [None]:
# Plot coffee price over time
plt.figure(figsize=(14, 6))
plt.plot(df['Date'].unique(), df.groupby('Date')['Coffee_Price'].mean(), marker='', linewidth=2)
plt.title('Coffee Price Over Time')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.grid(True)
plt.tight_layout()
plt.show()

In [None]:
# Plot temperature and precipitation over time for a specific region
region = df['Region'].unique()[0]  # Use the first region as an example

region_df = df[df['Region'] == region]

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10), sharex=True)

# Temperature
ax1.plot(region_df['Date'], region_df['temperature_C'], color='red', linewidth=2)
ax1.set_title(f'Temperature Over Time - {region}')
ax1.set_ylabel('Temperature (°C)')
ax1.grid(True)

# Precipitation
ax2.plot(region_df['Date'], region_df['precip_m'], color='blue', linewidth=2)
ax2.set_title(f'Precipitation Over Time - {region}')
ax2.set_xlabel('Date')
ax2.set_ylabel('Precipitation (m)')
ax2.grid(True)

plt.tight_layout()
plt.show()

## 4. Regional Climate Comparison

Compare climate patterns across different coffee growing regions.

In [None]:
# Get average temperature by region over time
plt.figure(figsize=(14, 8))

for region in df['Region'].unique():
    region_data = df[df['Region'] == region]
    
    # Group by year and month and calculate average temperature
    temp_by_month = region_data.groupby(['Year', 'Month'])['temperature_C'].mean().reset_index()
    # Create date from year and month for plotting
    temp_by_month['Date'] = pd.to_datetime(temp_by_month[['Year', 'Month']].assign(day=1))
    
    plt.plot(temp_by_month['Date'], temp_by_month['temperature_C'], label=region)

plt.title('Average Temperature by Region')
plt.xlabel('Date')
plt.ylabel('Temperature (°C)')
plt.legend(loc='best')
plt.grid(True)
plt.tight_layout()
plt.show()

In [None]:
# Seasonal patterns across regions - temperature by month
plt.figure(figsize=(14, 8))

for region in df['Region'].unique():
    region_data = df[df['Region'] == region]
    monthly_temp = region_data.groupby('Month')['temperature_C'].mean()
    plt.plot(monthly_temp.index, monthly_temp.values, marker='o', linewidth=2, label=region)

plt.title('Monthly Temperature Patterns by Region')
plt.xlabel('Month')
plt.ylabel('Average Temperature (°C)')
plt.xticks(range(1, 13), ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
plt.legend(loc='best')
plt.grid(True)
plt.tight_layout()
plt.show()

## 5. Climate-Price Correlation Analysis

Analyze the relationship between climate variables and coffee prices.

In [None]:
# Calculate correlation between climate variables and coffee price
correlation_columns = ['temperature_C', 'precip_m', 'Coffee_Price']
if 'relative_humidity' in df.columns:
    correlation_columns.append('relative_humidity')
if 'temp_anomaly' in df.columns:
    correlation_columns.extend(['temp_anomaly', 'precip_anomaly', 'drought_index'])

correlations = {}
for region in df['Region'].unique():
    region_data = df[df['Region'] == region]
    corr = region_data[correlation_columns].corr()
    correlations[region] = corr['Coffee_Price'].drop('Coffee_Price')

# Convert to DataFrame for easier viewing
correlation_df = pd.DataFrame(correlations)
correlation_df

In [None]:
# Visualize correlations
plt.figure(figsize=(12, 8))
sns.heatmap(correlation_df, annot=True, cmap='coolwarm', center=0, fmt='.2f')
plt.title('Correlation between Climate Variables and Coffee Price')
plt.tight_layout()
plt.show()

In [None]:
# Scatter plot of coffee price vs. temperature anomaly for each region
if 'temp_anomaly' in df.columns:
    plt.figure(figsize=(16, 10))
    
    for i, region in enumerate(df['Region'].unique()):
        plt.subplot(2, len(df['Region'].unique())//2 + len(df['Region'].unique())%2, i+1)
        
        region_data = df[df['Region'] == region]
        plt.scatter(region_data['temp_anomaly'], region_data['Coffee_Price'], alpha=0.6)
        
        # Add trend line
        z = np.polyfit(region_data['temp_anomaly'], region_data['Coffee_Price'], 1)
        p = np.poly1d(z)
        plt.plot(region_data['temp_anomaly'], p(region_data['temp_anomaly']), 'r--')
        
        plt.title(region)
        plt.xlabel('Temperature Anomaly (°C)')
        plt.ylabel('Coffee Price')
        plt.grid(True)
    
    plt.tight_layout()
    plt.show()

## 6. Signature Kernel Analysis

Analyze the climate signatures we calculated in the processing script.

In [None]:
# Plot drought index over time for all regions
if 'drought_index' in df.columns:
    plt.figure(figsize=(14, 8))
    
    for region in df['Region'].unique():
        region_data = df[df['Region'] == region]
        plt.plot(region_data['Date'], region_data['drought_index'], label=region)
    
    plt.axhline(y=0, color='k', linestyle='--', alpha=0.3)
    plt.title('Drought Index Over Time')
    plt.xlabel('Date')
    plt.ylabel('Drought Index')
    plt.legend(loc='best')
    plt.grid(True)
    plt.tight_layout()
    plt.show()

In [None]:
# Analyze impact of extreme climate events on coffee price
if 'drought_index' in df.columns:
    # Define extreme drought as drought_index < -1
    df['extreme_drought'] = df['drought_index'] < -1
    
    # Calculate average coffee price during extreme drought vs. normal conditions
    drought_impact = df.groupby(['Region', 'extreme_drought'])['Coffee_Price'].mean().unstack()
    drought_impact.columns = ['Normal', 'Drought']
    
    # Calculate percent change
    drought_impact['Percent Change'] = ((drought_impact['Drought'] - drought_impact['Normal']) / 
                                       drought_impact['Normal'] * 100)
    
    print("Impact of extreme drought on coffee prices:")
    drought_impact

## 7. Advanced Time Series Analysis

Let's look at lagged effects and seasonal decomposition.

In [None]:
# Calculate lagged correlations (climate impacts price with a delay)
from statsmodels.tsa.stattools import ccf

region = df['Region'].unique()[0]  # Example with first region
region_data = df[df['Region'] == region].sort_values('Date')

# Create time series
temp_series = region_data['temperature_C'].values
price_series = region_data['Coffee_Price'].values

# Calculate cross-correlation
max_lag = 12  # Max 12 months lag
cross_corr = ccf(temp_series, price_series, adjusted=False)[:max_lag+1]

# Plot cross-correlation
plt.figure(figsize=(10, 6))
plt.stem(range(len(cross_corr)), cross_corr)
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.title(f'Cross-Correlation: Temperature → Coffee Price ({region})')
plt.xlabel('Lag (months)')
plt.ylabel('Correlation')
plt.xticks(range(max_lag+1))
plt.grid(True)
plt.tight_layout()
plt.show()

# Find the lag with highest correlation
max_corr_lag = np.argmax(np.abs(cross_corr))
print(f"Strongest correlation at lag {max_corr_lag} months: {cross_corr[max_corr_lag]:.4f}")

In [None]:
# Seasonal decomposition of coffee price
from statsmodels.tsa.seasonal import seasonal_decompose

# Get unique dates and corresponding average coffee price
price_ts = df.groupby('Date')['Coffee_Price'].mean()

# Fill any gaps in the time series
date_range = pd.date_range(start=price_ts.index.min(), end=price_ts.index.max(), freq='MS')
price_ts = price_ts.reindex(date_range).interpolate()

# Apply seasonal decomposition
decomposition = seasonal_decompose(price_ts, model='additive', period=12)

# Plot the decomposition
fig, (ax1, ax2, ax3, ax4) = plt.subplots(4, 1, figsize=(14, 14), sharex=True)

# Original
ax1.plot(decomposition.observed)
ax1.set_title('Original Coffee Price')
ax1.grid(True)

# Trend
ax2.plot(decomposition.trend)
ax2.set_title('Trend')
ax2.grid(True)

# Seasonal
ax3.plot(decomposition.seasonal)
ax3.set_title('Seasonality')
ax3.grid(True)

# Residual
ax4.plot(decomposition.resid)
ax4.set_title('Residuals')
ax4.grid(True)

plt.tight_layout()
plt.show()

## 8. Climate Change Impact Analysis

Analyze how climate change might be affecting coffee growing regions.

In [None]:
# Check for trends in temperature and precipitation over time
for region in df['Region'].unique():
    region_data = df[df['Region'] == region]
    
    # Group by year to see annual trends
    annual_data = region_data.groupby('Year').agg({
        'temperature_C': 'mean',
        'precip_m': 'sum'
    }).reset_index()
    
    # Create the plot
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10), sharex=True)
    
    # Temperature trend
    ax1.plot(annual_data['Year'], annual_data['temperature_C'], marker='o', linewidth=2)
    z1 = np.polyfit(annual_data['Year'], annual_data['temperature_C'], 1)
    p1 = np.poly1d(z1)
    ax1.plot(annual_data['Year'], p1(annual_data['Year']), 'r--')
    ax1.set_title(f'Temperature Trend - {region}')
    ax1.set_ylabel('Average Temperature (°C)')
    ax1.grid(True)
    
    # Add trend info
    trend_temp = z1[0] * 10  # Change per decade
    ax1.text(0.05, 0.95, f'Trend: {trend_temp:.2f}°C per decade', 
             transform=ax1.transAxes, fontsize=12, 
             bbox=dict(facecolor='white', alpha=0.7))
    
    # Precipitation trend
    ax2.plot(annual_data['Year'], annual_data['precip_m'], marker='o', linewidth=2)
    z2 = np.polyfit(annual_data['Year'], annual_data['precip_m'], 1)
    p2 = np.poly1d(z2)
    ax2.plot(annual_data['Year'], p2(annual_data['Year']), 'r--')
    ax2.set_title(f'Precipitation Trend - {region}')
    ax2.set_xlabel('Year')
    ax2.set_ylabel('Total Precipitation (m)')
    ax2.grid(True)
    
    # Add trend info
    trend_precip = z2[0] * 10 * 100  # Change per decade in cm
    ax2.text(0.05, 0.95, f'Trend: {trend_precip:.2f} cm per decade', 
             transform=ax2.transAxes, fontsize=12,
             bbox=dict(facecolor='white', alpha=0.7))
    
    plt.tight_layout()
    plt.show()

## 9. Conclusion and Key Findings

Based on the analysis above, we can draw the following conclusions:

1. **Climate-Price Relationships**: Summarize any significant correlations observed
2. **Regional Differences**: Note how climate patterns differ across coffee growing regions
3. **Time Lags**: Discuss any observed lag effects between climate changes and price responses
4. **Climate Change Impacts**: Summarize temperature and precipitation trends
5. **Extreme Events**: Discuss how drought or extreme temperature events affected prices

## Next Steps

1. Develop predictive models for coffee price based on climate variables
2. Add more climate signature features (growing degree days, frost events, etc.)
3. Investigate climate impacts on coffee quality, not just price
4. Analyze how climate affects specific stages of coffee plant growth