<!-- @format -->

The eventual goal is to update a parquet file with generation data from the French grid.

- Try to read a (deployed) data file to get start date, use default otherwise.
- Use minimum of today's date or start-date plus one month as end date.
- Assemble query for API
- Parse response, concatenate to existing data
- Write parquet file, add to top-level `resources:` in YAML

The generation data is available from 2017-01-01, onwards.


In [1]:
import os
from datetime import datetime, timedelta
from zoneinfo import ZoneInfo
import requests
import functools, itertools
import polars as pl
from pyprojroot.here import here

<!-- @format -->

Let's determine the `start_date` and the `end_date` for the request:

- the start date will be the larger of:

  - 2017-01-01
  - the most-recent date in the dataset, less a day

- the end date will be the smaller of:
  - the start date plus 14 days
  - the current date

Dates are expressed as midnight, Paris time.


In [2]:
tz_local = ZoneInfo("Europe/Paris")

# timedelta() seems to use periods rather than intervals;
# i.e., it takes DST into account and returns same wall-clock time.
date_start = datetime(2017, 1, 1, tzinfo=tz_local)
date_end = date_start + timedelta(days=14)

# Todo, modify start, end based on file, today's date

In [3]:
date_start.isoformat()

'2017-01-01T00:00:00+01:00'

<!-- @format -->

Then, let's request a token:


In [4]:
auth = requests.post(
    "https://digital.iservices.rte-france.com/token/oauth/",
    headers={
        "Authorization": f'Basic {os.environ["RTE_FRANCE_BASE64"]}',
        "Content-Type": "application/x-www-form-urlencoded",
    },
)
token = auth.json()["access_token"]

<!-- @format -->

We compose a request, gather the response, then pull out the data:


In [5]:
endpoint = "https://digital.iservices.rte-france.com/open_api/actual_generation/v1/generation_mix_15min_time_scale"

data_raw = requests.get(
    f"{endpoint}/?start_date={date_start.isoformat()}&end_date={date_end.isoformat()}",
    headers={
        "Host": "digital.iservices.rte-france.com",
        "Authorization": f"Bearer {token}",
    },
)
array = data_raw.json()["generation_mix_15min_time_scale"]

<!-- @format -->

Here, I use some functional-programming tools (because that's what I know) to make one observation per interval and production-type:


In [6]:
temp = list(
    map(
        lambda x: list(
            map(
                lambda v: {
                    "type": x["production_type"],
                    "interval_start": v["start_date"],
                    "interval_end": v["end_date"],
                    "generation": v["value"],
                },
                x["values"],
            )
        ),
        array,
    )
)

fixed = functools.reduce(itertools.chain, temp)

<!-- @format -->

We're now in a form to convert this to a Polars DataFrame, to parse, etc.


In [7]:
df_raw = pl.DataFrame(fixed)

In [8]:
df = df_raw.with_columns(
    pl.col("interval_start")
    .str.strptime(pl.Datetime("ms"))
    .dt.convert_time_zone(time_zone="Europe/Paris"),
    pl.col("interval_end")
    .str.strptime(pl.Datetime("ms"))
    .dt.convert_time_zone(time_zone="Europe/Paris"),
)

In [9]:
df.drop_nulls().groupby(pl.col("type")).agg(
    pl.col("interval_start").min(),
    pl.col("interval_end").max(),
    pl.col("generation").sum(),
    pl.count(),
)

type,interval_start,interval_end,generation,count
str,"datetime[ms, Europe/Paris]","datetime[ms, Europe/Paris]",i64,u32
"""HYDRO""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,9193056,1344
"""FOSSIL_GAS""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,10229913,1344
"""NUCLEAR""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,71882034,1344
"""FOSSIL_HARD_CO…",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,2902415,1344
"""PUMPING""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,0,1344
"""BIOENERGY""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,1045899,1344
"""SOLAR""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,576204,1344
"""EXCHANGE""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,1840635,1344
"""FOSSIL_OIL""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,729073,1344
"""WIND""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,3472324,1344


In [10]:
df.write_parquet(here("data/generation.parquet"))

Try out this [link to the parquet file](/data/generation.parquet).