# üìä Model Improvement: Month-Specific Temperature Averages

This notebook computes month-specific temperature averages per country to improve forecast initialization.

**Current issue:** The model uses overall mean temperature for lag initialization, ignoring seasonal patterns.

**Solution:** Use month-specific averages so predictions start from seasonally-appropriate values.

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px

print("‚úÖ Libraries loaded")

‚úÖ Libraries loaded


## 1. Load Data

In [2]:
# Load cleaned weather data
df = pd.read_csv('../data/processed/weather_cleaned.csv', parse_dates=['date'])
print(f"üìä Loaded {len(df):,} rows")
print(f"üåç Countries: {df['country'].nunique()}")
print(f"üìÖ Date range: {df['date'].min()} to {df['date'].max()}")

üìä Loaded 102,652 rows
üåç Countries: 186
üìÖ Date range: 2024-06-15 00:00:00 to 2025-12-24 00:00:00


In [3]:
# Extract month
df['month'] = df['date'].dt.month
df.head()

Unnamed: 0,country,date,temperature_celsius,humidity,pressure_mb,wind_kph,precip_mm,cloud,uv_index,latitude,...,temp_lag_1,temp_lag_2,temp_lag_3,temp_lag_7,temp_lag_14,temp_lag_30,temp_rolling_mean_7,temp_rolling_mean_14,temp_rolling_std_7,country_encoded
0,Afghanistan,2024-06-15,22.4,38.0,1009.0,9.4,0.0,27.0,6.0,34.52,...,21.0,27.7,26.6,24.1,22.5,24.3,25.128571,23.55,2.244782,0
1,Afghanistan,2024-06-16,26.3,27.0,1006.0,17.6,0.0,31.0,7.0,34.52,...,22.4,21.0,27.7,24.4,26.5,15.0,24.885714,23.542857,2.456575,0
2,Afghanistan,2024-06-17,27.0,27.0,1006.0,11.5,0.0,16.0,7.0,34.52,...,26.3,22.4,21.0,25.3,26.1,19.5,25.157143,23.528571,2.498571,0
3,Afghanistan,2024-06-18,26.8,19.0,1002.0,21.6,0.0,3.0,7.0,34.52,...,27.0,26.3,22.4,26.8,24.3,16.9,25.4,23.592857,2.595509,0
4,Afghanistan,2024-06-19,26.3,18.0,1001.0,31.0,0.0,0.0,7.0,34.52,...,26.8,27.0,26.3,26.6,19.0,14.1,25.4,23.771429,2.595509,0


## 2. Compute Month-Specific Averages

In [4]:
# Compute monthly temperature statistics per country
monthly_stats = df.groupby(['country', 'month'])['temperature_celsius'].agg(['mean', 'std', 'count']).reset_index()
monthly_stats.columns = ['country', 'month', 'temp_mean', 'temp_std', 'count']

print(f"üìä Monthly stats: {len(monthly_stats)} rows")
monthly_stats.head(12)

üìä Monthly stats: 2224 rows


Unnamed: 0,country,month,temp_mean,temp_std,count
0,Afghanistan,1,1.312903,1.731809,31
1,Afghanistan,2,4.789286,2.308333,28
2,Afghanistan,3,11.993548,4.641762,31
3,Afghanistan,4,22.986667,3.663213,30
4,Afghanistan,5,27.267742,2.905556,31
5,Afghanistan,6,29.747826,3.420704,46
6,Afghanistan,7,32.118333,2.190928,60
7,Afghanistan,8,30.664516,2.309148,62
8,Afghanistan,9,27.525,1.876403,60
9,Afghanistan,10,21.125806,2.640041,62


In [5]:
# Visualize seasonal patterns for a few countries
sample_countries = ['Egypt', 'Germany', 'Australia', 'Brazil', 'Japan']
sample = monthly_stats[monthly_stats['country'].isin(sample_countries)]

fig = px.line(sample, x='month', y='temp_mean', color='country',
              title='üå°Ô∏è Monthly Temperature Patterns by Country',
              labels={'temp_mean': 'Mean Temperature (¬∞C)', 'month': 'Month'})
fig.update_xaxes(tickmode='linear', tick0=1, dtick=1)
fig.show()

## 3. Pivot to Wide Format

In [6]:
# Pivot: one row per country, columns for each month's mean
pivot = monthly_stats.pivot(index='country', columns='month', values='temp_mean')
pivot.columns = [f'temp_mean_month_{m}' for m in pivot.columns]
pivot = pivot.reset_index()

print(f"üìä Pivot shape: {pivot.shape}")
pivot.head()

üìä Pivot shape: (186, 13)


Unnamed: 0,country,temp_mean_month_1,temp_mean_month_2,temp_mean_month_3,temp_mean_month_4,temp_mean_month_5,temp_mean_month_6,temp_mean_month_7,temp_mean_month_8,temp_mean_month_9,temp_mean_month_10,temp_mean_month_11,temp_mean_month_12
0,Afghanistan,1.312903,4.789286,11.993548,22.986667,27.267742,29.747826,32.118333,30.664516,27.525,21.125806,14.796667,8.252727
1,Albania,13.425806,12.428571,16.067742,18.403333,21.664516,30.123913,32.428333,31.601613,25.445,18.503226,13.103333,9.334545
2,Algeria,15.019355,15.425,17.616129,19.7,22.274194,28.36087,29.794915,29.598387,26.336667,21.398387,16.051667,12.681818
3,Andorra,2.035484,4.128571,1.180645,7.21,11.03871,20.506522,21.461667,22.104839,14.22,10.474194,2.888333,0.581818
4,Angola,29.380645,29.264286,29.548387,28.626667,27.86129,25.628261,23.348333,22.73871,24.326667,25.432258,26.718333,26.967273


## 4. Merge with Existing Country Stats

In [7]:
# Load existing country stats
existing = pd.read_csv('../models/country_stats.csv')
print(f"üìä Existing stats: {existing.shape}")
print(f"üìä Columns: {existing.columns.tolist()}")

üìä Existing stats: (186, 8)
üìä Columns: ['country', 'latitude', 'longitude', 'temp_mean', 'temp_std', 'temp_min', 'temp_max', 'country_encoded']


In [8]:
# Merge monthly stats
merged = existing.merge(pivot, on='country', how='left')

# Fill any missing months with overall mean (for countries with sparse data)
for m in range(1, 13):
    col = f'temp_mean_month_{m}'
    if col in merged.columns:
        merged[col] = merged[col].fillna(merged['temp_mean'])
    else:
        merged[col] = merged['temp_mean']

print(f"üìä Merged shape: {merged.shape}")
print(f"üìä New columns: {[c for c in merged.columns if 'month' in c]}")

üìä Merged shape: (186, 20)
üìä New columns: ['temp_mean_month_1', 'temp_mean_month_2', 'temp_mean_month_3', 'temp_mean_month_4', 'temp_mean_month_5', 'temp_mean_month_6', 'temp_mean_month_7', 'temp_mean_month_8', 'temp_mean_month_9', 'temp_mean_month_10', 'temp_mean_month_11', 'temp_mean_month_12']


In [9]:
# Verify a sample country
sample = merged[merged['country'] == 'Egypt'][['country', 'temp_mean'] + [f'temp_mean_month_{m}' for m in range(1, 13)]]
print("Egypt monthly averages:")
print(sample.T)

Egypt monthly averages:
                           50
country                 Egypt
temp_mean           26.673694
temp_mean_month_1   19.354839
temp_mean_month_2   16.878571
temp_mean_month_3   22.812903
temp_mean_month_4       24.17
temp_mean_month_5   28.435484
temp_mean_month_6       32.15
temp_mean_month_7   34.842373
temp_mean_month_8   33.920968
temp_mean_month_9      30.195
temp_mean_month_10  26.467742
temp_mean_month_11  21.998333
temp_mean_month_12  18.312727


## 5. Save Updated Stats

In [10]:
# Save
merged.to_csv('../models/country_stats.csv', index=False)
print("‚úÖ Saved updated country_stats.csv with monthly averages!")
print(f"üìä Final columns ({len(merged.columns)}): {merged.columns.tolist()}")

‚úÖ Saved updated country_stats.csv with monthly averages!
üìä Final columns (20): ['country', 'latitude', 'longitude', 'temp_mean', 'temp_std', 'temp_min', 'temp_max', 'country_encoded', 'temp_mean_month_1', 'temp_mean_month_2', 'temp_mean_month_3', 'temp_mean_month_4', 'temp_mean_month_5', 'temp_mean_month_6', 'temp_mean_month_7', 'temp_mean_month_8', 'temp_mean_month_9', 'temp_mean_month_10', 'temp_mean_month_11', 'temp_mean_month_12']


## 6. Impact Analysis

In [11]:
# Show the difference between overall mean and monthly mean for sample countries
countries = ['Egypt', 'Germany', 'Russia', 'Brazil', 'Australia']

for country in countries:
    row = merged[merged['country'] == country].iloc[0]
    overall = row['temp_mean']
    print(f"\n{country} (Overall: {overall:.1f}¬∞C):")
    for m in range(1, 13):
        monthly = row[f'temp_mean_month_{m}']
        diff = monthly - overall
        sign = '+' if diff >= 0 else ''
        print(f"  Month {m:2d}: {monthly:5.1f}¬∞C ({sign}{diff:.1f}¬∞C)")


Egypt (Overall: 26.7¬∞C):
  Month  1:  19.4¬∞C (-7.3¬∞C)
  Month  2:  16.9¬∞C (-9.8¬∞C)
  Month  3:  22.8¬∞C (-3.9¬∞C)
  Month  4:  24.2¬∞C (-2.5¬∞C)
  Month  5:  28.4¬∞C (+1.8¬∞C)
  Month  6:  32.1¬∞C (+5.5¬∞C)
  Month  7:  34.8¬∞C (+8.2¬∞C)
  Month  8:  33.9¬∞C (+7.2¬∞C)
  Month  9:  30.2¬∞C (+3.5¬∞C)
  Month 10:  26.5¬∞C (-0.2¬∞C)
  Month 11:  22.0¬∞C (-4.7¬∞C)
  Month 12:  18.3¬∞C (-8.4¬∞C)

Germany (Overall: 13.4¬∞C):
  Month  1:   3.8¬∞C (-9.6¬∞C)
  Month  2:   2.6¬∞C (-10.8¬∞C)
  Month  3:   8.1¬∞C (-5.3¬∞C)
  Month  4:  13.5¬∞C (+0.1¬∞C)
  Month  5:  15.3¬∞C (+1.8¬∞C)
  Month  6:  22.6¬∞C (+9.2¬∞C)
  Month  7:  22.5¬∞C (+9.1¬∞C)
  Month  8:  22.6¬∞C (+9.2¬∞C)
  Month  9:  18.3¬∞C (+4.9¬∞C)
  Month 10:  11.4¬∞C (-2.1¬∞C)
  Month 11:   5.0¬∞C (-8.4¬∞C)
  Month 12:   4.4¬∞C (-9.0¬∞C)

Russia (Overall: 11.2¬∞C):
  Month  1:   0.1¬∞C (-11.1¬∞C)
  Month  2:  -3.8¬∞C (-15.0¬∞C)
  Month  3:   5.6¬∞C (-5.6¬∞C)
  Month  4:  10.4¬∞C (-0.8¬∞C)
  Month  5:  15.5¬∞C (+4.3¬∞C)
  Month  6:  2

---
## ‚úÖ Done!

The `country_stats.csv` now includes columns:
- `temp_mean_month_1` through `temp_mean_month_12`

**Next step:** Update the web app to use these monthly averages for lag initialization.