<!-- @format -->

The eventual goal is to update a parquet file with generation data from the French grid.

- Try to read a (deployed) data file to get start date, use default otherwise.
- Use minimum of today's date or start-date plus one month as end date.
- Assemble query for API
- Parse response, concatenate to existing data
- Write parquet file, add to top-level `resources:` in YAML

The generation data is available from 2017-01-01, onwards.


In [2]:
import os
from datetime import datetime, timedelta
from zoneinfo import ZoneInfo
import requests
import functools, itertools
import polars as pl
from pyprojroot.here import here

<!-- @format -->

Let's determine the `start_date` and the `end_date` for the request:

- the start date will be the larger of:

  - 2017-01-01
  - the most-recent date in the dataset, less a day

- the end date will be the smaller of:
  - the start date plus 14 days
  - the current date

Dates are expressed as midnight, Paris time.


In [3]:
tz_local = ZoneInfo("Europe/Paris")

# timedelta() seems to use periods rather than intervals;
# i.e., it takes DST into account and returns same wall-clock time.
date_start = datetime(2017, 1, 1, tzinfo=tz_local)
date_end = date_start + timedelta(days=14)

# Todo, modify start, end based on file, today's date

In [4]:
date_start.isoformat()

'2017-01-01T00:00:00+01:00'

<!-- @format -->

Then, let's request a token:


In [5]:
auth = requests.post(
    "https://digital.iservices.rte-france.com/token/oauth/",
    headers={
        "Authorization": f'Basic {os.environ["RTE_FRANCE_BASE64"]}',
        "Content-Type": "application/x-www-form-urlencoded",
    },
)
token = auth.json()["access_token"]

<!-- @format -->

We compose a request, gather the response, then pull out the data:


In [6]:
endpoint = "https://digital.iservices.rte-france.com/open_api/actual_generation/v1/generation_mix_15min_time_scale"

data_raw = requests.get(
    f"{endpoint}/?start_date={date_start.isoformat()}&end_date={date_end.isoformat()}",
    headers={
        "Host": "digital.iservices.rte-france.com",
        "Authorization": f"Bearer {token}",
    },
)
array = data_raw.json()["generation_mix_15min_time_scale"]

<!-- @format -->

Here, I use some functional-programming tools (because that's what I know) to make one observation per interval and production-type:


In [7]:
temp = list(
    map(
        lambda x: list(
            map(
                lambda v: {
                    "type": x["production_type"],
                    "interval_start": v["start_date"],
                    "interval_end": v["end_date"],
                    "generation": v["value"],
                },
                x["values"],
            )
        ),
        array,
    )
)

fixed = functools.reduce(itertools.chain, temp)

<!-- @format -->

We're now in a form to convert this to a Polars DataFrame, to parse, etc.


In [8]:
df_raw = pl.DataFrame(fixed)

In [9]:
df = df_raw.with_columns(
    [
        pl.col("interval_start")
        .str.strptime(pl.Datetime("ms"))
        .dt.convert_time_zone(time_zone="Europe/Paris"),
        pl.col("interval_end")
        .str.strptime(pl.Datetime("ms"))
        .dt.convert_time_zone(time_zone="Europe/Paris"),
    ]
)

In [21]:
df.groupby(pl.col("type")).agg(
    pl.col("interval_start").min(),
    pl.col("interval_end").max(),
    pl.col("generation").null_count().alias("null_count"),
).with_columns(
    pl.col("interval_start").dt.month().alias("month")
)

type,interval_start,interval_end,null_count,month
str,"datetime[ms, Europe/Paris]","datetime[ms, Europe/Paris]",u32,u32
"""FOSSIL_HARD_CO…",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,0,1
"""SOLAR""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,0,1
"""WIND""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,0,1
"""PUMPING""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,0,1
"""FOSSIL_GAS""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,0,1
"""EXCHANGE""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,0,1
"""NUCLEAR""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,0,1
"""FOSSIL_OIL""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,0,1
"""HYDRO""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,0,1
"""BIOENERGY""",2017-01-01 00:00:00 CET,2017-01-15 00:00:00 CET,0,1


In [11]:
df.write_parquet(here("data/generation.parquet"))

Try out this [link to the parquet file](/data/generation.parquet).

In [23]:
df._repr_html_()

'<div><style>\n.dataframe > thead > tr > th,\n.dataframe > tbody > tr > td {\n  text-align: right;\n}\n</style>\n<small>shape: (13_440, 4)</small><table border="1" class="dataframe"><thead><tr><th>type</th><th>interval_start</th><th>interval_end</th><th>generation</th></tr><tr><td>str</td><td>datetime[ms, Europe/Paris]</td><td>datetime[ms, Europe/Paris]</td><td>i64</td></tr></thead><tbody><tr><td>&quot;BIOENERGY&quot;</td><td>2017-01-01 00:00:00 CET</td><td>2017-01-01 00:15:00 CET</td><td>766</td></tr><tr><td>&quot;BIOENERGY&quot;</td><td>2017-01-01 00:15:00 CET</td><td>2017-01-01 00:30:00 CET</td><td>765</td></tr><tr><td>&quot;BIOENERGY&quot;</td><td>2017-01-01 00:30:00 CET</td><td>2017-01-01 00:45:00 CET</td><td>767</td></tr><tr><td>&quot;BIOENERGY&quot;</td><td>2017-01-01 00:45:00 CET</td><td>2017-01-01 01:00:00 CET</td><td>767</td></tr><tr><td>&quot;BIOENERGY&quot;</td><td>2017-01-01 01:00:00 CET</td><td>2017-01-01 01:15:00 CET</td><td>767</td></tr><tr><td>&quot;BIOENERGY&quot;</td

In [20]:
df_fake_utc = df.with_columns(
    pl.col(["interval_start", "interval_end"]).map(lambda x: x.pl.Expr.dt.replace_time_zone(time_zone="UTC")),
)
df_fake_utc

ComputeError: AttributeError: 'Series' object has no attribute 'pl'