# 🌍 Climate & Spatial Analysis

> **PM Accelerator Mission**: "By making industry-leading tools and education available to individuals from all backgrounds, we level the playing field for future PM leaders."

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/moazmo/weather-trend-forecasting/blob/main/presentation/04_Climate_Spatial_Analysis.ipynb)

This notebook covers:
1. Long-term Climate Patterns
2. Geographic Temperature Distribution
3. Continental Comparisons
4. Climate Zone Analysis
5. Feature Importance for Temperature Prediction

In [10]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import LabelEncoder

# Load data
df = pd.read_csv('../data/processed/weather_cleaned.csv', parse_dates=['date'])
print(f"📊 Dataset: {len(df):,} records from {df['country'].nunique()} countries")

📊 Dataset: 102,652 records from 186 countries


## 1. Long-term Climate Patterns

### Yearly Temperature Trends

In [11]:
# Yearly average temperature trend
df['year'] = df['date'].dt.year
yearly_avg = df.groupby('year')['temperature_celsius'].mean().reset_index()

fig = px.line(
    yearly_avg, x='year', y='temperature_celsius',
    title='📈 Global Average Temperature by Year',
    labels={'temperature_celsius': 'Avg Temp (°C)', 'year': 'Year'},
    markers=True
)
fig.update_layout(template='plotly_dark')
fig.show()

# Calculate trend
if len(yearly_avg) > 1:
    trend = (yearly_avg['temperature_celsius'].iloc[-1] - yearly_avg['temperature_celsius'].iloc[0]) / len(yearly_avg)
    print(f"\n📊 Temperature Trend: {'+' if trend > 0 else ''}{trend:.3f}°C per year")


📊 Temperature Trend: -0.975°C per year


### Seasonal Patterns by Climate Zone

![Climate Zone Seasonality](images/climate_zone_seasonality.png)


In [12]:
# Define climate zones based on latitude
def get_climate_zone(lat):
    abs_lat = abs(lat)
    if abs_lat < 23.5:
        return 'Tropical'
    elif abs_lat < 35:
        return 'Subtropical'
    elif abs_lat < 55:
        return 'Temperate'
    elif abs_lat < 66.5:
        return 'Subarctic'
    else:
        return 'Polar'

df['climate_zone'] = df['latitude'].apply(get_climate_zone)

# Monthly patterns by climate zone
climate_monthly = df.groupby(['month', 'climate_zone'])['temperature_celsius'].mean().reset_index()

fig = px.line(
    climate_monthly, x='month', y='temperature_celsius', color='climate_zone',
    title='🌡️ Seasonal Patterns by Climate Zone',
    labels={'temperature_celsius': 'Avg Temp (°C)', 'month': 'Month'},
    markers=True
)
fig.update_layout(template='plotly_dark')
fig.show()

## 2. Geographic Temperature Distribution

![Geographic Distribution](images/geographic_distribution.png)


In [13]:
# Average temperature by country
country_stats = df.groupby('country').agg({
    'temperature_celsius': ['mean', 'std', 'min', 'max'],
    'latitude': 'first',
    'longitude': 'first'
}).reset_index()
country_stats.columns = ['country', 'temp_mean', 'temp_std', 'temp_min', 'temp_max', 'lat', 'lon']

fig = px.scatter_geo(
    country_stats,
    lat='lat', lon='lon',
    color='temp_mean',
    size='temp_std',
    hover_name='country',
    hover_data=['temp_min', 'temp_max'],
    title='🗺️ Global Temperature Distribution (Size = Variability)',
    color_continuous_scale='RdYlBu_r'
)
fig.update_layout(template='plotly_dark', geo=dict(bgcolor='rgba(0,0,0,0)'))
fig.show()

## 3. Continental Comparisons

In [14]:
# Assign continents based on longitude and latitude (simplified)
def get_continent(row):
    lat, lon = row['latitude'], row['longitude']
    if lat > 35 and -10 < lon < 60:
        return 'Europe'
    elif lat > 0 and lon > 60:
        return 'Asia'
    elif lat < 0 and lon > 100:
        return 'Oceania'
    elif -35 < lat < 35 and -20 < lon < 55:
        return 'Africa'
    elif lon < -30:
        return 'Americas'
    else:
        return 'Other'

df['continent'] = df.apply(get_continent, axis=1)

# Temperature distribution by continent
fig = px.box(
    df, x='continent', y='temperature_celsius',
    title='🌍 Temperature Distribution by Continent',
    color='continent'
)
fig.update_layout(template='plotly_dark', showlegend=False)
fig.show()

In [15]:
# Continental statistics
continent_stats = df.groupby('continent').agg({
    'temperature_celsius': ['mean', 'std'],
    'country': 'nunique'
}).round(2)
continent_stats.columns = ['Avg Temp (°C)', 'Std Dev', 'Countries']
print("📊 Continental Statistics:")
print(continent_stats.sort_values('Avg Temp (°C)', ascending=False))

📊 Continental Statistics:
           Avg Temp (°C)  Std Dev  Countries
continent                                   
Africa             26.23     5.87         56
Asia               24.41    10.23         30
Oceania            24.13     6.04         10
Other              23.61     9.64          5
Americas           21.20     6.68         36
Europe             16.05     9.85         51


## 4. Climate Zone Analysis

In [16]:
# Climate zone statistics
zone_stats = df.groupby('climate_zone').agg({
    'temperature_celsius': ['mean', 'std', 'min', 'max'],
    'humidity': 'mean',
    'country': 'nunique'
}).round(2)
zone_stats.columns = ['Avg Temp', 'Std', 'Min', 'Max', 'Avg Humidity', 'Countries']

print("📊 Climate Zone Statistics:")
print(zone_stats.sort_values('Avg Temp', ascending=False))

📊 Climate Zone Statistics:
              Avg Temp    Std   Min   Max  Avg Humidity  Countries
climate_zone                                                      
Tropical         25.56   4.56   4.3  47.1         73.05         93
Subtropical      24.82   9.18  -3.3  49.2         49.97         32
Temperate        16.47  10.41 -24.9  43.2         62.29         58
Subarctic        10.33   8.16 -11.8  32.3         75.04          8


In [17]:
# Heatmap: Temperature by Month and Climate Zone
pivot = df.pivot_table(values='temperature_celsius', index='climate_zone', columns='month', aggfunc='mean')

fig = px.imshow(
    pivot,
    title='🔥 Temperature Heatmap: Climate Zone × Month',
    labels=dict(x='Month', y='Climate Zone', color='Temp (°C)'),
    color_continuous_scale='RdYlBu_r'
)
fig.update_layout(template='plotly_dark')
fig.show()

## 5. Feature Importance Analysis

Using Random Forest to determine which features are most important for predicting temperature.

![Feature Importance](images/feature_importance.png)


In [18]:
# Prepare features for importance analysis
feature_df = df[['latitude', 'longitude', 'month', 'humidity', 'pressure_mb', 'wind_kph', 'cloud', 'temperature_celsius']].dropna()

X = feature_df.drop('temperature_celsius', axis=1)
y = feature_df['temperature_celsius']

# Train Random Forest
rf = RandomForestRegressor(n_estimators=100, random_state=42, n_jobs=-1)
rf.fit(X, y)

# Feature importance
importance = pd.DataFrame({
    'Feature': X.columns,
    'Importance': rf.feature_importances_
}).sort_values('Importance', ascending=True)

fig = px.bar(
    importance, x='Importance', y='Feature', orientation='h',
    title='🎯 Feature Importance for Temperature Prediction',
    color='Importance', color_continuous_scale='Blues'
)
fig.update_layout(template='plotly_dark', showlegend=False)
fig.show()

print("\n📊 Top 3 Most Important Features:")
for i, row in importance.tail(3).iterrows():
    print(f"   {row['Feature']}: {row['Importance']:.3f}")


📊 Top 3 Most Important Features:
   month: 0.171
   pressure_mb: 0.174
   latitude: 0.394


### Key Insights

1. **Latitude** is the most important feature - confirms temperature is primarily determined by distance from equator
2. **Month** is second - captures seasonality
3. **Humidity** and **Cloud Cover** also contribute, showing weather correlations

---

*Continue to Environmental Impact Analysis →*