# Combined Commodity Climate Analysis

This notebook demonstrates how to work with the combined commodity-climate dataset that contains data for multiple commodities and their growing regions in a single file.

## Dataset Structure

The combined dataset includes:
- Commodity prices for 7 major agricultural commodities
- Climate data for the primary growing region of each commodity
- Climate signatures (temperature anomalies, drought indices, etc.)
- Date information (Year, Month)

This unified structure makes cross-commodity analysis much simpler.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import os

# Set plot styling
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('viridis')

# Set up plot parameters
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 12

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

## 1. Load the Combined Dataset

Let's load the combined dataset and explore its structure.

In [None]:
# Load the combined dataset
combined_file = "all_commodities_combined.csv"
df = pd.read_csv(combined_file)
df['Date'] = pd.to_datetime(df['Date'])

# Display basic info
print(f"Loaded {len(df)} records from {df['Date'].min().strftime('%Y-%m')} to {df['Date'].max().strftime('%Y-%m')}")
print(f"\nDataset shape: {df.shape} (rows, columns)")

# Find price columns
price_cols = [col for col in df.columns if col.endswith('_Price')]
print(f"\nPrice columns: {price_cols}")

# Find region columns
region_cols = [col for col in df.columns if col.endswith('_Region')]
print(f"\nRegion columns: {region_cols}")

# Display the first few rows
df.head()

Let's also check the climate columns:

In [None]:
# Print climate-related columns
date_cols = ['Date', 'Year', 'Month']
non_climate_cols = date_cols + price_cols + region_cols
climate_cols = [col for col in df.columns if col not in non_climate_cols]

print("Climate-related columns:")
for col in climate_cols:
    print(f"- {col}")

## 2. Analyze Price Trends

Now let's plot all commodity prices on a single chart with normalized values for easier comparison.

In [None]:
# Create a DataFrame for normalized prices
norm_prices = pd.DataFrame({'Date': df['Date']})

# Normalize each price series (first value = 100)
for price_col in price_cols:
    commodity = price_col.split('_')[0]
    first_price = df[price_col].iloc[0]
    norm_prices[commodity] = df[price_col] / first_price * 100

# Plot normalized prices
plt.figure(figsize=(14, 8))

for commodity in norm_prices.columns:
    if commodity != 'Date':
        plt.plot(norm_prices['Date'], norm_prices[commodity], linewidth=2, label=commodity)

plt.title('Normalized Commodity Prices (2015-2022)', fontsize=14)
plt.xlabel('Date')
plt.ylabel('Price Index (First Month = 100)')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 3. Compare Price Correlations

Let's analyze how different commodity prices correlate with each other.

In [None]:
# Calculate price correlations
price_corr = df[price_cols].corr()

# Rename for better display
price_corr.columns = [col.split('_')[0] for col in price_corr.columns]
price_corr.index = [idx.split('_')[0] for idx in price_corr.index]

# Visualize correlations
plt.figure(figsize=(10, 8))
sns.heatmap(price_corr, annot=True, cmap='coolwarm', center=0, fmt='.2f')
plt.title('Correlation Between Commodity Prices', fontsize=14)
plt.tight_layout()
plt.show()

## 4. Cross-Commodity Climate Analysis

Let's analyze how climate variables from one region correlate with prices of various commodities.

In [None]:
# Find a good climate column - temperature should be available for all regions
temp_col = next((col for col in climate_cols if 'temperature' in col.lower()), None)
precip_col = next((col for col in climate_cols if 'precip' in col.lower()), None)
drought_col = next((col for col in climate_cols if 'drought' in col.lower()), None)

# Create correlation matrix between climate variables and commodity prices
climate_price_corr = pd.DataFrame()

climate_vars = [temp_col, precip_col, drought_col]
climate_vars = [var for var in climate_vars if var is not None]

for var in climate_vars:
    corr_values = {}
    for price_col in price_cols:
        commodity = price_col.split('_')[0]
        correlation = df[var].corr(df[price_col])
        corr_values[commodity] = correlation
    
    climate_price_corr[var] = pd.Series(corr_values)

# Visualize climate-price correlations
plt.figure(figsize=(12, 8))
sns.heatmap(climate_price_corr, annot=True, cmap='coolwarm', center=0, fmt='.2f')
plt.title('Climate Variable Correlations with Commodity Prices', fontsize=14)
plt.tight_layout()
plt.show()

## 5. Seasonal Price Analysis

Let's analyze seasonal patterns in all commodity prices.

In [None]:
# Plot seasonal price patterns for each commodity
plt.figure(figsize=(14, 8))

for price_col in price_cols:
    commodity = price_col.split('_')[0]
    
    # Calculate monthly averages
    monthly_avg = df.groupby('Month')[price_col].mean()
    
    # Normalize to percentage deviation from annual mean
    annual_mean = monthly_avg.mean()
    normalized_monthly = (monthly_avg / annual_mean - 1) * 100
    
    # Plot seasonal pattern
    plt.plot(normalized_monthly.index, normalized_monthly.values,
             marker='o', linewidth=2, label=commodity)
    
plt.title('Seasonal Price Patterns by Commodity', fontsize=14)
plt.xlabel('Month')
plt.ylabel('% Deviation from Annual Mean')
plt.xticks(range(1, 13), ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                         'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
plt.axhline(y=0, color='black', linestyle='-', alpha=0.3)
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 6. Temperature Sensitivity Analysis

Let's analyze how sensitive different commodities are to temperature changes.

In [None]:
# Only run if we have a temperature column
if temp_col is not None:
    # Create a multi-panel figure
    fig, axes = plt.subplots(3, 3, figsize=(18, 14))
    axes = axes.flatten()
    
    # Scatter plot for each commodity
    for i, price_col in enumerate(price_cols):
        if i < len(axes):
            commodity = price_col.split('_')[0]
            ax = axes[i]
            
            # Scatter plot
            ax.scatter(df[temp_col], df[price_col], alpha=0.6)
            
            # Add trend line
            z = np.polyfit(df[temp_col], df[price_col], 1)
            p = np.poly1d(z)
            x_range = np.linspace(df[temp_col].min(), df[temp_col].max(), 100)
            ax.plot(x_range, p(x_range), 'r--')
            
            # Add correlation coefficient
            corr = df[temp_col].corr(df[price_col])
            ax.text(0.05, 0.95, f"Correlation: {corr:.2f}",
                   transform=ax.transAxes, fontsize=12,
                   bbox=dict(facecolor='white', alpha=0.7))
            
            ax.set_title(f'{commodity}: Price vs. Temperature')
            ax.set_xlabel('Temperature (°C)')
            ax.set_ylabel(f'Price')
            ax.grid(True, alpha=0.3)
    
    # Hide any unused subplots
    for j in range(i+1, len(axes)):
        axes[j].set_visible(False)
    
    plt.tight_layout()
    plt.suptitle('Temperature Sensitivity by Commodity', fontsize=20, y=1.02)
    plt.show()

## 7. Commodity Price Clustering

Let's use clustering to identify commodities that behave similarly in terms of price movements.

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA

# Prepare data for clustering
price_data = df[price_cols].copy()

# Standardize the data
scaler = StandardScaler()
price_scaled = scaler.fit_transform(price_data)

# Reduce dimensions with PCA
pca = PCA(n_components=2)
price_pca = pca.fit_transform(price_scaled)

# Cluster the commodities
kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(price_scaled)

# Create a DataFrame with PCA results
pca_df = pd.DataFrame(price_pca, columns=['Component 1', 'Component 2'])
pca_df['Commodity'] = [col.split('_')[0] for col in price_cols]
pca_df['Cluster'] = clusters

# Plot the clusters
plt.figure(figsize=(12, 8))
for cluster in pca_df['Cluster'].unique():
    cluster_data = pca_df[pca_df['Cluster'] == cluster]
    plt.scatter(cluster_data['Component 1'], cluster_data['Component 2'], label=f'Cluster {cluster}', s=100)

# Add commodity labels
for i, row in pca_df.iterrows():
    plt.annotate(row['Commodity'], 
                 (row['Component 1'], row['Component 2']),
                 xytext=(5, 5), textcoords='offset points',
                 fontsize=12, fontweight='bold')

plt.title('Commodity Price Pattern Clustering', fontsize=14)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Print cluster memberships
for cluster in pca_df['Cluster'].unique():
    commodities = pca_df[pca_df['Cluster'] == cluster]['Commodity'].tolist()
    print(f"Cluster {cluster}: {', '.join(commodities)}")

## 8. Climate and Price Volatility Analysis

Let's analyze how climate variables relate to price volatility for different commodities.

In [None]:
# Calculate rolling volatility (standard deviation) for each commodity price
volatility_df = pd.DataFrame({'Date': df['Date']})

window = 3  # 3-month rolling window
for price_col in price_cols:
    commodity = price_col.split('_')[0]
    # Calculate percent changes
    price_pct_change = df[price_col].pct_change()
    # Calculate rolling std dev of percent changes
    volatility = price_pct_change.rolling(window=window).std() * 100  # Scale to percentage
    volatility_df[f'{commodity}_Volatility'] = volatility

# Add climate variables
for var in climate_vars:
    volatility_df[var] = df[var]

# Drop rows with NaN due to rolling window
volatility_df = volatility_df.dropna()

# Calculate correlations between climate variables and price volatility
volatility_cols = [col for col in volatility_df.columns if '_Volatility' in col]
climate_volatility_corr = pd.DataFrame()

for var in climate_vars:
    corr_values = {}
    for vol_col in volatility_cols:
        commodity = vol_col.split('_')[0]
        correlation = volatility_df[var].corr(volatility_df[vol_col])
        corr_values[commodity] = correlation
    
    climate_volatility_corr[var] = pd.Series(corr_values)

# Visualize climate-volatility correlations
plt.figure(figsize=(12, 8))
sns.heatmap(climate_volatility_corr, annot=True, cmap='coolwarm', center=0, fmt='.2f')
plt.title('Climate Variable Correlations with Price Volatility', fontsize=14)
plt.tight_layout()
plt.show()

## 9. Conclusions

From our combined analysis of multiple commodity data, we can conclude:

1. **Price Correlations**: Certain groups of commodities show similar price patterns, suggesting common drivers

2. **Climate Sensitivity**: Different commodities show varying degrees of sensitivity to temperature, precipitation, and drought conditions

3. **Seasonal Patterns**: Each commodity has its own distinct seasonal price pattern, often related to growing seasons

4. **Climate-Price Relationships**: Climate variables from one region can correlate with prices of various commodities, showing the interconnected nature of global agricultural markets

5. **Volatility Factors**: Climate conditions correlate with price volatility, suggesting that climate stability (or instability) affects market stability for agricultural commodities

This combined dataset provides a powerful tool for further analysis, including predictive modeling of price movements based on climate conditions.