IND 320 - NMBU

Project work, part 2 - Data Sources

\newpage

## AI usage


## Log describing



## Github and Streamlit app links

- Streamlit app: [https://liserochat-ind320-dashboard.streamlit.app
](https://liserochat-ind320-dashboard.streamlit.app)  
- GitHub repository: [https://github.com/lise-dev/liserochat-ind320-dashboard.git](https://github.com/lise-dev/liserochat-ind320-dashboard.git)

\newpage

## Step 1 – Setup and imports

In [7]:
import os
from pathlib import Path
import pandas as pd
import requests
from datetime import datetime

## Step 2 – Define constants

In [14]:
API_BASE = "https://api.elhub.no/energy-data/v0/price-areas"  
DATASET  = "PRODUCTION_PER_GROUP_MBA_HOUR"
PRICE_AREAS = ["NO1","NO2","NO3","NO4","NO5"]
PROD_GROUPS = ["solar","hydro","wind","thermal","other"]
MONTHS = [f"2021-{m:02d}" for m in range(1,13)]


## Step 3 – Define helper for monthly time ranges

The Elhub API uses UTC timestamps.  
This function takes a year-month (e.g., `2021-01`) and returns a start and end date string for that month in UTC.

In [9]:
def month_range_utc(ym: str):
    start = pd.Timestamp(f"{ym}-01 00:00:00", tz="UTC")
    end   = (start + pd.offsets.MonthEnd(1)) + pd.Timedelta(days=1)
    def fmt(ts):
        s = ts.strftime("%Y-%m-%dT%H:%M:%S%z")
        return s[:-2] + ":" + s[-2:]
    return fmt(start), fmt(end)

## Step 4 – Fetch hourly production data for one area and month

This function:
1. Builds the API query with `priceArea`, `startTime`, and `endTime`.
2. Sends a request to the Elhub endpoint.
3. Extracts the `productionPerGroupMbaHour` list.
4. Converts it into a clean Pandas DataFrame with columns:
   - `price_area`
   - `production_group`
   - `start_time`
   - `quantity_kwh`


In [None]:
def fetch_month_one_group(area: str, ym: str, group: str) -> pd.DataFrame:
    # yyyy-mm → date-only (API accepte ce format)
    start = pd.Timestamp(f"{ym}-01").strftime("%Y-%m-%d")
    end   = (pd.Timestamp(f"{ym}-01") + pd.offsets.MonthEnd(1)).strftime("%Y-%m-%d")

    url = f"https://api.elhub.no/energy-data/v0/price-areas/{area}"
    params = {
        "dataset": "PRODUCTION_PER_GROUP_MBA_HOUR",
        "startDate": start,
        "endDate": end,
        "productionGroup": group,
    }
    r = requests.get(url, params=params, timeout=60)
    r.raise_for_status()
    js = r.json()

    data = js.get("data", [])
    if not data:
        return pd.DataFrame(columns=["price_area","production_group","start_time","quantity_kwh"])
    attrs = data[0].get("attributes", {})
    items = attrs.get("productionPerGroupMbaHour", [])

    if not items:
        return pd.DataFrame(columns=["price_area","production_group","start_time","quantity_kwh"])

    df = (pd.json_normalize(items)[["priceArea","productionGroup","startTime","quantityKwh"]]
            .rename(columns={
                "priceArea":"price_area",
                "productionGroup":"production_group",
                "startTime":"start_time",
                "quantityKwh":"quantity_kwh",
            }))

    df["start_time"] = pd.to_datetime(df["start_time"], utc=True, errors="coerce")
    return df

## Step 5 – Loop over all areas and months of 2021

We now loop through all five Norwegian price areas and all twelve months of 2021.  
For each combination, we call `fetch_month()` and concatenate the resulting data frames.


In [11]:
all_chunks = []
for area in PRICE_AREAS:
    for ym in MONTHS:
        for g in PROD_GROUPS:
            try:
                dfm = fetch_month_one_group(area, ym, g)
                if not dfm.empty:
                    all_chunks.append(dfm)
                    print(f"OK  {area} {ym} {g}: {len(dfm)}")
                else:
                    print(f"EMPTY {area} {ym} {g}")
            except Exception as e:
                print(f"FAIL {area} {ym} {g}: {e}")

raw_df = (pd.concat(all_chunks, ignore_index=True)
          if all_chunks else pd.DataFrame(columns=["price_area","production_group","start_time","quantity_kwh"]))
print("TOTAL SHAPE:", raw_df.shape)
raw_df.head()


OK  NO1 2021-01 solar: 720
OK  NO1 2021-01 hydro: 720
OK  NO1 2021-01 wind: 720
OK  NO1 2021-01 thermal: 720
OK  NO1 2021-01 other: 720
OK  NO1 2021-02 solar: 648
OK  NO1 2021-02 hydro: 648
OK  NO1 2021-02 wind: 648
OK  NO1 2021-02 thermal: 648
OK  NO1 2021-02 other: 648
OK  NO1 2021-03 solar: 719
OK  NO1 2021-03 hydro: 719
OK  NO1 2021-03 wind: 719
OK  NO1 2021-03 thermal: 719
OK  NO1 2021-03 other: 719
OK  NO1 2021-04 solar: 696
OK  NO1 2021-04 hydro: 696
OK  NO1 2021-04 wind: 696
OK  NO1 2021-04 thermal: 696
OK  NO1 2021-04 other: 696
OK  NO1 2021-05 solar: 720
OK  NO1 2021-05 hydro: 720
OK  NO1 2021-05 wind: 720
OK  NO1 2021-05 thermal: 720
OK  NO1 2021-05 other: 720
OK  NO1 2021-06 solar: 696
OK  NO1 2021-06 hydro: 696
OK  NO1 2021-06 wind: 696
OK  NO1 2021-06 thermal: 696
OK  NO1 2021-06 other: 696
OK  NO1 2021-07 solar: 720
OK  NO1 2021-07 hydro: 720
OK  NO1 2021-07 wind: 720
OK  NO1 2021-07 thermal: 720
OK  NO1 2021-07 other: 720
OK  NO1 2021-08 solar: 720
OK  NO1 2021-08 hydro

Unnamed: 0,price_area,production_group,start_time,quantity_kwh
0,NO1,solar,2020-12-31 23:00:00+00:00,6.106
1,NO1,solar,2021-01-01 00:00:00+00:00,4.03
2,NO1,solar,2021-01-01 01:00:00+00:00,3.982
3,NO1,solar,2021-01-01 02:00:00+00:00,8.146
4,NO1,solar,2021-01-01 03:00:00+00:00,8.616


## Step 6 – Save the raw data to CSV

For later use (in Spark and Streamlit),  
we export the full dataset to `data/elhub_production_2021_raw.csv`.


In [None]:
from pathlib import Path
out_path = Path("../data/elhub_production_2021_raw.csv")
out_path.parent.mkdir(parents=True, exist_ok=True)
if raw_df.empty:
    print("DataFrame is empty; CSV not written.")
else:
    raw_df.to_csv(out_path, index=False)
    print(f"Saved CSV to {out_path.resolve()}")


✅ Saved CSV to /home/lse/Documents/IND320/liserochat-ind320-dashboard/data/elhub_production_2021_raw.csv
