# Meteo Bronze Pipeline

This notebook handles the *Bronze* stage of the Meteo data pipeline.  
It is responsible for fetching **raw weather data** from the [Open-Meteo API](https://open-meteo.com/en/docs)  
and saving it locally for later cleaning and processing in the Silver stage.

---

### 📋 Objectives
1. Define reference data for Norwegian price areas and their representative cities.
2. Create a function to fetch weather data from Open-Meteo’s ERA5 reanalysis model.
3. Download and store raw JSON/CSV responses for each city and year.

---

**Outputs:**
- `data/bronze/df_cities.csv` – city reference table  
- `data/bronze/meteo_<city>_<year>.json` – raw downloaded data


In [1]:
import pandas as pd
from pathlib import Path
import requests
import json

# Set up paths
DATA_DIR = Path("../../data/bronze")
DATA_DIR.mkdir(parents=True, exist_ok=True)

print(f"✅ Data folder ready: {DATA_DIR.resolve()}")

✅ Data folder ready: /Users/fabianheflo/UNI_courses/IND320/IND320/data/bronze


## Task 1 — Define price areas and representative cities

Create a reference table for the five Norwegian price areas (NO1–NO5),  
their representative cities, and corresponding latitude/longitude coordinates.
This table will be used in later tasks to request weather data from Open-Meteo.

In [2]:
# Define cities and price areas
cities = [
    {"price_area": "NO1", "city": "Oslo", "latitude": 59.9139, "longitude": 10.7522},
    {"price_area": "NO2", "city": "Kristiansand", "latitude": 58.1467, "longitude": 7.9956},
    {"price_area": "NO3", "city": "Trondheim", "latitude": 63.4305, "longitude": 10.3951},
    {"price_area": "NO4", "city": "Tromsø", "latitude": 69.6492, "longitude": 18.9553},
    {"price_area": "NO5", "city": "Bergen", "latitude": 60.3913, "longitude": 5.3221},
]

df_cities = pd.DataFrame(cities)
display(df_cities)

# Save to bronze folder
path = DATA_DIR / "df_cities.csv"
df_cities.to_csv(path, index=False)
print(f"💾 Saved city reference table → {path}")

Unnamed: 0,price_area,city,latitude,longitude
0,NO1,Oslo,59.9139,10.7522
1,NO2,Kristiansand,58.1467,7.9956
2,NO3,Trondheim,63.4305,10.3951
3,NO4,Tromsø,69.6492,18.9553
4,NO5,Bergen,60.3913,5.3221


💾 Saved city reference table → ../../data/bronze/df_cities.csv


In [3]:
# Verify file was saved correctly
pd.read_csv(DATA_DIR / "df_cities.csv").head()

Unnamed: 0,price_area,city,latitude,longitude
0,NO1,Oslo,59.9139,10.7522
1,NO2,Kristiansand,58.1467,7.9956
2,NO3,Trondheim,63.4305,10.3951
3,NO4,Tromsø,69.6492,18.9553
4,NO5,Bergen,60.3913,5.3221


## Task 2 — Fetch historical weather data from Open-Meteo API

This task downloads historical *reanalysis* data (ERA5 model) from the Open-Meteo API  
for a selected city and year.  

**Goal:**  
- Build a reusable function that takes `latitude`, `longitude`, and `year` as input.  
- Select the same weather variables as used in part 1 of the project (e.g. temperature, precipitation).  
- Test the function on Bergen for 2019.  
- Save the raw response to the `data/bronze/` folder.


In [4]:
def fetch_meteo_data(latitude: float, longitude: float, year: int) -> pd.DataFrame:
    """
    Download historical weather data (ERA5 reanalysis) from Open-Meteo for a given location and year.
    Returns a pandas DataFrame.
    """
    url = "https://archive-api.open-meteo.com/v1/era5"

    params = {
        "latitude": latitude,
        "longitude": longitude,
        "start_date": f"{year}-01-01",
        "end_date": f"{year}-12-31",
        # Choose the same variables as in project part 1
        "hourly": ["temperature_2m", "precipitation", "wind_speed_10m"],
        "timezone": "auto"
    }

    print(f"🔄 Fetching {year} data for {latitude=}, {longitude=}")
    response = requests.get(url, params=params)
    response.raise_for_status()  # Stop if bad request

    data = response.json()

    # Convert to DataFrame
    df = pd.DataFrame(data["hourly"])
    df["time"] = pd.to_datetime(df["time"])
    df["latitude"] = latitude
    df["longitude"] = longitude
    df["year"] = year

    return df

In [5]:
# Coordinates for Bergen (from df_cities)
lat, lon = 60.3913, 5.3221
year = 2019

df_bergen = fetch_meteo_data(lat, lon, year)
df_bergen.head()

🔄 Fetching 2019 data for latitude=60.3913, longitude=5.3221


Unnamed: 0,time,temperature_2m,precipitation,wind_speed_10m,latitude,longitude,year
0,2019-01-01 00:00:00,5.7,0.7,37.0,60.3913,5.3221,2019
1,2019-01-01 01:00:00,5.8,0.2,41.0,60.3913,5.3221,2019
2,2019-01-01 02:00:00,6.1,0.7,42.0,60.3913,5.3221,2019
3,2019-01-01 03:00:00,6.3,0.5,40.9,60.3913,5.3221,2019
4,2019-01-01 04:00:00,5.8,1.1,41.2,60.3913,5.3221,2019


In [6]:
# Save raw result
output_file = DATA_DIR / f"meteo_bergen_{year}.csv"
df_bergen.to_csv(output_file, index=False)
print(f"💾 Saved raw data → {output_file}")

💾 Saved raw data → ../../data/bronze/meteo_bergen_2019.csv
