# How to downloade data from Energinet's Energy Data Service Platform

`Note: As we disregard transmission, we do not run this jupyter notebook.`

In this notebook you will learn how to download data from the Danish TSO's data platform, [**https://www.energidataservice.dk/**](https://www.energidataservice.dk/). Compared to ENTSO-e's Transparency Platform they have more statistics but they are only provided for DK. In my experience, other TSOs do not provide such a comprehensive overview of the energy system, but if you are aware of data platforms, please feel free to share them on Absalon!

We will be downloading hourly transmission capacities (both importing and exporting). As with generation capacities of technologies relying on intermittent energy sources, transmissions capacities also vary at the hourly frequency. The difference between the technical transfer capacity of transmission lines (Total Transfer Capacity, TTC) and the actual available capacity (Net Transfer Capacity, NTC) can occur for multiple
reasons. The main reasons include variability of intermittent renewables, power plant outages, outages in the transmission lines themselves, and variation in load patterns. You can have a look at ENTSO-e’s user information on Net Transfer Capacities (NTC) here: [**https://eepublicdownloads.
entsoe.eu/clean-documents/pre2015/ntc/entsoe_NTCusersInformation.pdf**](https://eepublicdownloads.entsoe.eu/clean-documents/pre2015/ntc/entsoe_NTCusersInformation.pdf).


Before running the code make sure you have the **requests**-package installed in you conda environment. If not, this is easily installed by typing the following in your anaconda prompt:

> `$conda activate Insert_You_Environment_Name`<br>
> `$python -m pip install requests`

We start by importing a few packages:

In [1]:
import pandas as pd, numpy as np, os, pickle, requests

Let's specify an output folder:

In [2]:
direc = os.getcwd()
data_dir = os.path.join(direc,'CleanedData')

Choose the year you want to collect data for (currently only one year is supported, but you can easily adopt the notebook to collect for multiple years):

In [3]:
year = 2019

Given the chosen year, create choose the first and last hour of that year:

In [4]:
start_str, end_str = str(year)+'-01-01T00',str(year)+'-12-31T23', 
start_CET = pd.Timestamp(start_str,tz='Europe/Copenhagen'); start_UTC = start_CET.tz_convert('UTC')
end_CET = pd.Timestamp(end_str,tz='Europe/Copenhagen'); end_UTC = end_CET.tz_convert('UTC')

Download the transmission capacities using the API:

In [5]:
url = f"https://api.energidataservice.dk/dataset/Transmissionlines/download?format=json&start={str(year)}-01-01T00:00&end={str(year+1)}-01-01T00:00&timezone=DK&limit=0"
r = requests.get(url)
rawdata = r.json()

Subset data:

In [6]:
variables = ['HourUTC','PriceArea','ConnectedArea','ImportCapacity','ExportCapacity','HomePriceEUR','ConnectedPriceEUR','CongestionIncomeEUR','ScheduledExchangeDayAhead']
data = {var: [rawdata[i][var] for i in range(0,len(rawdata))] for var in variables}
df_tcap = pd.DataFrame(data).rename(columns={
    'PriceArea':'g_E','ConnectedArea':'g_EE',
    'ImportCapacity':'ImportCapacity_MW','ExportCapacity':'ExportCapacity_MW',
    'HomePriceEUR':'Price_EUR/MWh_gE','ConnectedPriceEUR':'Price_EUR/MWh_gEE'
})

---
### `Data Description`

In [7]:
print(df_tcap['g_E'].unique()) # unique price areas
print(df_tcap['g_EE'].unique()) # unique connected areas

['DK1' 'DK2']
['DE' 'DK2' 'NL' 'NO2' 'SE3' 'DK1' 'SE4']


In [8]:
df_tcap.head()

Unnamed: 0,HourUTC,g_E,g_EE,ImportCapacity_MW,ExportCapacity_MW,Price_EUR/MWh_gE,Price_EUR/MWh_gEE,CongestionIncomeEUR,ScheduledExchangeDayAhead
0,2019-12-31T22:00:00,DK1,DE,1500.0,-1350.0,32.279999,38.880001,0.0,-1350.0
1,2019-12-31T22:00:00,DK1,DK2,600.0,-590.0,32.279999,30.83,0.0,600.0
2,2019-12-31T22:00:00,DK1,NL,"{'Data': None, 'UnityType': 2, 'AssemblyName':...","{'Data': None, 'UnityType': 2, 'AssemblyName':...","{'Data': None, 'UnityType': 2, 'AssemblyName':...","{'Data': None, 'UnityType': 2, 'AssemblyName':...","{'Data': None, 'UnityType': 2, 'AssemblyName':...","{'Data': None, 'UnityType': 2, 'AssemblyName':..."
3,2019-12-31T22:00:00,DK1,NO2,850.0,-1287.0,32.279999,32.560001,0.0,-302.0
4,2019-12-31T22:00:00,DK1,SE3,715.0,-441.0,32.279999,30.83,0.0,715.0


In [9]:
df_tcap.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 63929 entries, 0 to 63928
Data columns (total 9 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   HourUTC                    63929 non-null  object
 1   g_E                        63929 non-null  object
 2   g_EE                       63929 non-null  object
 3   ImportCapacity_MW          63929 non-null  object
 4   ExportCapacity_MW          63929 non-null  object
 5   Price_EUR/MWh_gE           63929 non-null  object
 6   Price_EUR/MWh_gEE          63929 non-null  object
 7   CongestionIncomeEUR        63929 non-null  object
 8   ScheduledExchangeDayAhead  63929 non-null  object
dtypes: object(9)
memory usage: 4.4+ MB


In [10]:
df_tcap.describe()

Unnamed: 0,HourUTC,g_E,g_EE,ImportCapacity_MW,ExportCapacity_MW,Price_EUR/MWh_gE,Price_EUR/MWh_gEE,CongestionIncomeEUR,ScheduledExchangeDayAhead
count,63929,63929,63929,63929.0,63929.0,63929,63929,63929.0,63929.0
unique,8759,2,7,92.0,1580.0,4058,4969,19145.0,16012.0
top,2019-12-31T22:00:00,DK1,DE,600.0,-600.0,"{'Data': None, 'UnityType': 2, 'AssemblyName':...","{'Data': None, 'UnityType': 2, 'AssemblyName':...",0.0,0.0
freq,8,37652,17518,16985.0,8319.0,2616,2616,34254.0,7663.0



---

Drop NL because it is weird:

`We are interested in the interconnection between DK1-DK2 only. So NL does not matter.`

In [7]:
# df_tcap = df_tcap[df_tcap['g_EE']!='NL']

Correct datatime format:

In [11]:
df_tcap['HourUTC'] = pd.to_datetime(df_tcap['HourUTC'].str.replace('T',' '),utc=True)
df_tcap['HourCET/CEST'] = df_tcap['HourUTC'].dt.tz_convert('Europe/Brussels')

Format float columns and insert nans where there are dictionaries:

In [12]:
num_cols = ['ImportCapacity_MW','ExportCapacity_MW','Price_EUR/MWh_gE','Price_EUR/MWh_gEE','CongestionIncomeEUR','ScheduledExchangeDayAhead']
is_dict = df_tcap[num_cols].transform(lambda x: x.apply(type).eq(dict))
for col in [x for x,y in zip(is_dict.columns,is_dict.any()) if y]:
    df_tcap.loc[is_dict[col],col] = np.nan
df_tcap[num_cols] = df_tcap[num_cols].astype(float)
df_tcap['ExportCapacity_MW'] = df_tcap['ExportCapacity_MW'].abs() # abs(): For each value in the 'ExportCapacity_MW' column, it takes the absolute value, to ensure that all values in the column are non-negative.

Impute domestic prices if they are not available in some hours:

`Imputation by mean.`

In [13]:
idx = df_tcap['Price_EUR/MWh_gE'].isna()
df_tcap.loc[idx,'Price_EUR/MWh_gE'] = df_tcap.groupby(['HourUTC','g_E'])['Price_EUR/MWh_gE'].transform('mean')[idx]

Assume missing capacities are at zero:

In [14]:
df_tcap['ImportCapacity_MW'] = df_tcap['ImportCapacity_MW'].fillna(0)
df_tcap['ExportCapacity_MW'] = df_tcap['ExportCapacity_MW'].fillna(0)

Assume congestion income is zero if missing:

In [15]:
df_tcap['CongestionIncomeEUR'] = df_tcap['CongestionIncomeEUR'].fillna(0)

Import foreign electricity price if missing:

In [16]:
idx = df_tcap['Price_EUR/MWh_gEE'].isna()
# If conegestion income is zero
idx_tmp = (idx) & (np.isclose(df_tcap['CongestionIncomeEUR'],0))
df_tcap.loc[idx_tmp,'Price_EUR/MWh_gEE'] = df_tcap.loc[idx_tmp,'Price_EUR/MWh_gE']
# If congestion income is not zero and DK is exporting:
idx_tmp = (idx) & (~np.isclose(df_tcap['CongestionIncomeEUR'],0)) & (df_tcap['ScheduledExchangeDayAhead']<0)
df_tcap.loc[idx_tmp,'Price_EUR/MWh_gEE'] = df_tcap.loc[idx_tmp,'Price_EUR/MWh_gE']+df_tcap.loc[idx_tmp,'CongestionIncomeEUR']/df_tcap.loc[idx_tmp,'ScheduledExchangeDayAhead'].abs()
# If congestion income is not zero and DK is importing:
idx_tmp = (idx) & (~np.isclose(df_tcap['CongestionIncomeEUR'],0)) & (df_tcap['ScheduledExchangeDayAhead']>0)
df_tcap.loc[idx_tmp,'Price_EUR/MWh_gEE'] = df_tcap.loc[idx_tmp,'Price_EUR/MWh_gE']-df_tcap.loc[idx_tmp,'CongestionIncomeEUR']/df_tcap.loc[idx_tmp,'ScheduledExchangeDayAhead']
df_tcap.drop(columns=['CongestionIncomeEUR','ScheduledExchangeDayAhead'],inplace=True)

Aggregate hour domestic electricity areas:

In [17]:
df_tcap[['g_E','g_EE']] = df_tcap[['g_E','g_EE']].replace({'DK1':'DK1','DK2':'DK2','DE':'DELU'})
df_tcap = df_tcap.groupby(['g_E','g_EE','HourUTC','HourCET/CEST'])[num_cols[0:-2]].agg({
    'ImportCapacity_MW':'sum',
    'ExportCapacity_MW':'sum',
    'Price_EUR/MWh_gE':'mean',
    'Price_EUR/MWh_gEE':'mean'
}).reset_index()
# Remoce connections that have been aggregated out
idx = (df_tcap['g_E']=='DK') & (df_tcap['g_EE']=='DK')
df_tcap = df_tcap[~(idx)]

Add some helpfull variables:

In [18]:
df_tcap['HourOfTheDay'] = df_tcap['HourCET/CEST'].dt.hour
df_tcap['Weekday'] = df_tcap['HourCET/CEST'].dt.weekday 
df_tcap['Week'] = df_tcap['HourCET/CEST'].dt.isocalendar().week
df_tcap['Month'] = df_tcap['HourCET/CEST'].dt.month
df_tcap['Year'] = df_tcap['HourCET/CEST'].dt.year
df_tcap = df_tcap[df_tcap['Year']==year]
df_tcap['h'] = 1
df_tcap['h'] = df_tcap.groupby(['g_E','g_EE'])['h'].cumsum()
new_col_order = ['g_E','g_EE','HourUTC','HourCET/CEST','Year','Month','Week','Weekday','HourOfTheDay','h']+num_cols[0:-2]
df_tcap = df_tcap[new_col_order].sort_values(new_col_order).reset_index(drop=True)

`Check dataset once more`

In [19]:
df_tcap.head()

Unnamed: 0,g_E,g_EE,HourUTC,HourCET/CEST,Year,Month,Week,Weekday,HourOfTheDay,h,ImportCapacity_MW,ExportCapacity_MW,Price_EUR/MWh_gE,Price_EUR/MWh_gEE
0,DK1,DELU,2018-12-31 23:00:00+00:00,2019-01-01 00:00:00+01:00,2019,1,1,1,0,1,1500.0,900.0,28.32,4.16
1,DK1,DELU,2019-01-01 00:00:00+00:00,2019-01-01 01:00:00+01:00,2019,1,1,1,1,2,1500.0,900.0,10.07,0.06
2,DK1,DELU,2019-01-01 01:00:00+00:00,2019-01-01 02:00:00+01:00,2019,1,1,1,2,3,1500.0,900.0,-4.08,-4.97
3,DK1,DELU,2019-01-01 02:00:00+00:00,2019-01-01 03:00:00+01:00,2019,1,1,1,3,4,1500.0,900.0,-9.91,-7.17
4,DK1,DELU,2019-01-01 03:00:00+00:00,2019-01-01 04:00:00+01:00,2019,1,1,1,4,5,1500.0,900.0,-7.41,0.07


Save the data as a pickle:

In [20]:
file_path = os.path.join(data_dir,'TransmissionCapacities_DK_'+str(year))
df_tcap.to_pickle(file_path)