# Climate data processing (Argentina) — Meteostat

This notebook builds a **monthly climate dataset for Argentina** by:
- downloading daily temperature metrics from Meteostat for a set of major cities (proxy stations),
- averaging across cities to obtain a **national proxy** time series,
- aggregating to **monthly** frequency (YYYY-MM),
- exporting a clean CSV used downstream by the SQL / DuckDB model.

**Output file**
- `data/processed/climate_monthly_avg.csv`

**Time period**
- 2017-01 to 2024-12 (can be adjusted below)


## 1. Install and import dependencies

In [20]:
!pip install meteostat





In [21]:
from meteostat import Daily, Point
import pandas as pd
from datetime import datetime

## 2. Configure time range and locations

In [16]:
start = datetime(2017, 1, 1)
end   = datetime(2024, 12, 31)


In [17]:
cities = {
    "Buenos_Aires": Point(-34.61, -58.38),
    "Cordoba": Point(-31.42, -64.18),
    "Rosario": Point(-32.95, -60.64),
    "Mendoza": Point(-32.89, -68.83),
    "Tucuman": Point(-26.82, -65.22)
}



## 3. Download daily data and build a daily national average

In [18]:
weather_data = []

for city, location in cities.items():
    data = Daily(location, start, end)
    df = data.fetch()
    df["city"] = city
    weather_data.append(df)


In [19]:
weather_df = pd.concat(weather_data).reset_index()
weather_df.head()
weather_df

Unnamed: 0,time,tavg,tmin,tmax,prcp,snow,wdir,wspd,wpgt,pres,tsun,city
0,2017-01-01,30.3,27.5,35.0,0.0,,,11.1,,1007.9,,Buenos_Aires
1,2017-01-02,25.4,21.0,28.0,,,,21.9,,1011.8,,Buenos_Aires
2,2017-01-03,25.8,23.7,28.1,,,,14.9,,1003.8,,Buenos_Aires
3,2017-01-04,26.2,22.3,31.0,,,,7.0,,1004.5,,Buenos_Aires
4,2017-01-05,25.1,19.7,30.8,,,,9.6,,1011.6,,Buenos_Aires
...,...,...,...,...,...,...,...,...,...,...,...,...
14604,2024-12-26,,,,,,,,,,,Tucuman
14605,2024-12-27,25.6,,33.9,,,,,,,,Tucuman
14606,2024-12-28,26.2,,34.0,,,,,,,,Tucuman
14607,2024-12-29,,,,,,,,,,,Tucuman


## 4. Aggregate to monthly and export to CSV

In [22]:
daily_mean = (
    weather_df
    .groupby("time")[["tavg", "tmin", "tmax"]]
    .mean()
    .reset_index()
)


In [23]:
daily_mean["year_month"] = daily_mean["time"].dt.to_period("M")

monthly_climate = (
    daily_mean
    .groupby("year_month")[["tavg", "tmin", "tmax"]]
    .mean()
    .reset_index()
)

monthly_climate["year_month"] = monthly_climate["year_month"].astype(str)
monthly_climate.head()


Unnamed: 0,year_month,tavg,tmin,tmax
0,2017-01,26.184946,20.79586,32.185108
1,2017-02,24.659286,19.932202,30.517798
2,2017-03,21.813925,16.893333,27.658602
3,2017-04,17.732722,13.462611,24.071
4,2017-05,14.76586,10.345538,20.486452


In [24]:
monthly_climate.to_csv(
    "../data/processed/climate_monthly_avg.csv",
    index=False
)
