# How to downloade data from Energinet's Energy Data Service Platform

In this notebook you will learn how to download data from the Danish TSO's data platform, [**https://www.energidataservice.dk/**](https://www.energidataservice.dk/). Compared to ENTSO-e's Transparency Platform they have more statistics but they are only provided for DK. In my experience, other TSOs do not provide such a comprehensive overview of the energy system, but if you are aware of data platforms, please feel free to share them on Absalon!

We will be downloading hourly transmission capacities (both importing and exporting). As with generation capacities of technologies relying on intermittent energy sources, transmissions capacities also vary at the hourly frequency. The difference between the technical transfer capacity of transmission lines (Total Transfer Capacity, TTC) and the actual available capacity (Net Transfer Capacity, NTC) can occur for multiple
reasons. The main reasons include variability of intermittent renewables, power plant outages, outages in the transmission lines themselves, and variation in load patterns. You can have a look at ENTSO-e’s user information on Net Transfer Capacities (NTC) here: [**https://eepublicdownloads.
entsoe.eu/clean-documents/pre2015/ntc/entsoe_NTCusersInformation.pdf**](https://eepublicdownloads.entsoe.eu/clean-documents/pre2015/ntc/entsoe_NTCusersInformation.pdf).


Before running the code make sure you have the **requests**-package installed in you conda environment. If not, this is easily installed by typing the following in your anaconda prompt:

> `$conda activate Insert_You_Environment_Name`<br>
> `$python -m pip install requests`

We start by importing a few packages:

In [2]:
import pandas as pd, numpy as np, os, pickle, requests

Let's specify an output folder:

In [3]:
direc = os.getcwd()
data_dir = os.path.join(direc,'CleanedData')

Choose the year you want to collect data for (currently only one year is supported, but you can easily adopt the notebook to collect for multiple years):

In [4]:
year = 2022

Given the chosen year, create choose the first and last hour of that year:

In [5]:
start_str, end_str = str(year)+'-01-01T00',str(year)+'-12-31T23', 
start_CET = pd.Timestamp(start_str,tz='Europe/Copenhagen'); start_UTC = start_CET.tz_convert('UTC')
end_CET = pd.Timestamp(end_str,tz='Europe/Copenhagen'); end_UTC = end_CET.tz_convert('UTC')

In [6]:
start_CET

Timestamp('2022-01-01 00:00:00+0100', tz='Europe/Copenhagen')

In [7]:
end_CET

Timestamp('2022-12-31 23:00:00+0100', tz='Europe/Copenhagen')

Download the transmission capacities using the API:

In [8]:
url = f"https://api.energidataservice.dk/dataset/Transmissionlines/download?format=json&start={str(year)}-01-01T00:00&end={str(year+1)}-01-01T00:00&timezone=DK&limit=0"
r = requests.get(url)
rawdata = r.json()

Subset data:

In [12]:
variables = ['HourUTC','PriceArea','ConnectedArea','ImportCapacity','ExportCapacity','HomePriceEUR','ConnectedPriceEUR','CongestionIncomeEUR','ScheduledExchangeDayAhead']
data = {var: [rawdata[i][var] for i in range(0,len(rawdata))] for var in variables}
df_tcap = pd.DataFrame(data).rename(columns={
    'PriceArea':'g_E','ConnectedArea':'g_EE',
    'ImportCapacity':'ImportCapacity_MW','ExportCapacity':'ExportCapacity_MW',
    'HomePriceEUR':'Price_EUR/MWh_gE','ConnectedPriceEUR':'Price_EUR/MWh_gEE'
})

---
### Data Description

In [23]:
df_tcap.head()

Unnamed: 0,g_E,g_EE,HourUTC,HourCET/CEST,ImportCapacity_MW,ExportCapacity_MW,Price_EUR/MWh_gE,Price_EUR/MWh_gEE
0,DK1,DELU,2021-12-31 23:00:00+00:00,2022-01-01 00:00:00+01:00,870.0,700.0,50.049999,50.049999
1,DK1,DELU,2022-01-01 00:00:00+00:00,2022-01-01 01:00:00+01:00,870.0,700.0,41.330002,41.330002
2,DK1,DELU,2022-01-01 01:00:00+00:00,2022-01-01 02:00:00+01:00,870.0,700.0,43.220001,43.220001
3,DK1,DELU,2022-01-01 02:00:00+00:00,2022-01-01 03:00:00+01:00,870.0,700.0,45.459999,45.459999
4,DK1,DELU,2022-01-01 03:00:00+00:00,2022-01-01 04:00:00+01:00,870.0,700.0,37.669998,37.669998


In [25]:
df_tcap.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 70080 entries, 0 to 70079
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype                          
---  ------             --------------  -----                          
 0   g_E                70080 non-null  object                         
 1   g_EE               70080 non-null  object                         
 2   HourUTC            70080 non-null  datetime64[ns, UTC]            
 3   HourCET/CEST       70080 non-null  datetime64[ns, Europe/Brussels]
 4   ImportCapacity_MW  70080 non-null  float64                        
 5   ExportCapacity_MW  70080 non-null  float64                        
 6   Price_EUR/MWh_gE   70080 non-null  float64                        
 7   Price_EUR/MWh_gEE  70080 non-null  float64                        
dtypes: datetime64[ns, Europe/Brussels](1), datetime64[ns, UTC](1), float64(4), object(2)
memory usage: 4.3+ MB


In [26]:
df_tcap.describe()

Unnamed: 0,ImportCapacity_MW,ExportCapacity_MW,Price_EUR/MWh_gE,Price_EUR/MWh_gEE
count,70080.0,70080.0,70080.0,70080.0
mean,949.898973,964.63857,215.70607,204.01121
std,559.291907,565.717462,147.306815,144.041292
min,-963.0,0.0,-19.040001,-222.36
25%,600.0,600.0,112.489998,101.965
50%,715.0,704.0,189.550003,182.400005
75%,1248.0,1290.0,291.220001,273.950004
max,2500.0,2500.0,871.0,871.0


### Missing Values

In [30]:
import missingno as msno
msno.bar(df_tcap)

<Axes: >

Note: there are no missing values (nice :-))


---

Drop NL because it is weird:

In [7]:
# df_tcap = df_tcap[df_tcap['g_EE']!='NL']

Correct datatime format:

In [16]:
df_tcap['HourUTC'] = pd.to_datetime(df_tcap['HourUTC'].str.replace('T',' '),utc=True)
df_tcap['HourCET/CEST'] = df_tcap['HourUTC'].dt.tz_convert('Europe/Brussels')

Format float columns and insert nans where there are dictionaries:

In [17]:
num_cols = ['ImportCapacity_MW','ExportCapacity_MW','Price_EUR/MWh_gE','Price_EUR/MWh_gEE','CongestionIncomeEUR','ScheduledExchangeDayAhead']
is_dict = df_tcap[num_cols].transform(lambda x: x.apply(type).eq(dict))
for col in [x for x,y in zip(is_dict.columns,is_dict.any()) if y]:
    df_tcap.loc[is_dict[col],col] = np.nan
df_tcap[num_cols] = df_tcap[num_cols].astype(float)
df_tcap['ExportCapacity_MW'] = df_tcap['ExportCapacity_MW'].abs()

Impute domestic prices if they are not available in some hours:

In [18]:
idx = df_tcap['Price_EUR/MWh_gE'].isna()
df_tcap.loc[idx,'Price_EUR/MWh_gE'] = df_tcap.groupby(['HourUTC','g_E'])['Price_EUR/MWh_gE'].transform('mean')[idx]

Assume missing capacities are at zero:

In [19]:
df_tcap['ImportCapacity_MW'] = df_tcap['ImportCapacity_MW'].fillna(0)
df_tcap['ExportCapacity_MW'] = df_tcap['ExportCapacity_MW'].fillna(0)

Assume congestion income is zero if missing:

In [20]:
df_tcap['CongestionIncomeEUR'] = df_tcap['CongestionIncomeEUR'].fillna(0)

Import foreign electricity price if missing:

In [21]:
idx = df_tcap['Price_EUR/MWh_gEE'].isna()
# If conegestion income is zero
idx_tmp = (idx) & (np.isclose(df_tcap['CongestionIncomeEUR'],0))
df_tcap.loc[idx_tmp,'Price_EUR/MWh_gEE'] = df_tcap.loc[idx_tmp,'Price_EUR/MWh_gE']
# If congestion income is not zero and DK is exporting:
idx_tmp = (idx) & (~np.isclose(df_tcap['CongestionIncomeEUR'],0)) & (df_tcap['ScheduledExchangeDayAhead']<0)
df_tcap.loc[idx_tmp,'Price_EUR/MWh_gEE'] = df_tcap.loc[idx_tmp,'Price_EUR/MWh_gE']+df_tcap.loc[idx_tmp,'CongestionIncomeEUR']/df_tcap.loc[idx_tmp,'ScheduledExchangeDayAhead'].abs()
# If congestion income is not zero and DK is importing:
idx_tmp = (idx) & (~np.isclose(df_tcap['CongestionIncomeEUR'],0)) & (df_tcap['ScheduledExchangeDayAhead']>0)
df_tcap.loc[idx_tmp,'Price_EUR/MWh_gEE'] = df_tcap.loc[idx_tmp,'Price_EUR/MWh_gE']-df_tcap.loc[idx_tmp,'CongestionIncomeEUR']/df_tcap.loc[idx_tmp,'ScheduledExchangeDayAhead']
df_tcap.drop(columns=['CongestionIncomeEUR','ScheduledExchangeDayAhead'],inplace=True)

Aggregate hour domestic electricity areas:

In [22]:
df_tcap[['g_E','g_EE']] = df_tcap[['g_E','g_EE']].replace({'DK1':'DK1','DK2':'DK2','DE':'DELU'})
df_tcap = df_tcap.groupby(['g_E','g_EE','HourUTC','HourCET/CEST'])[num_cols[0:-2]].agg({
    'ImportCapacity_MW':'sum',
    'ExportCapacity_MW':'sum',
    'Price_EUR/MWh_gE':'mean',
    'Price_EUR/MWh_gEE':'mean'
}).reset_index()
# Remoce connections that have been aggregated out
idx = (df_tcap['g_E']=='DK') & (df_tcap['g_EE']=='DK')
df_tcap = df_tcap[~(idx)]

Add some helpfull variables:

In [31]:
df_tcap['HourOfTheDay'] = df_tcap['HourCET/CEST'].dt.hour
df_tcap['Weekday'] = df_tcap['HourCET/CEST'].dt.weekday 
df_tcap['Week'] = df_tcap['HourCET/CEST'].dt.isocalendar().week
df_tcap['Month'] = df_tcap['HourCET/CEST'].dt.month
df_tcap['Year'] = df_tcap['HourCET/CEST'].dt.year
df_tcap = df_tcap[df_tcap['Year']==year]
df_tcap['h'] = 1
df_tcap['h'] = df_tcap.groupby(['g_E','g_EE'])['h'].cumsum()
new_col_order = ['g_E','g_EE','HourUTC','HourCET/CEST','Year','Month','Week','Weekday','HourOfTheDay','h']+num_cols[0:-2]
df_tcap = df_tcap[new_col_order].sort_values(new_col_order).reset_index(drop=True)

Save the data as a pickle:

In [32]:
file_path = os.path.join(data_dir,'TransmissionCapacities_DK_'+str(year))
df_tcap.to_pickle(file_path)

---

In [33]:
file_path

'c:\\Users\\mpher\\Documents\\Uni\\Master\\02_Exchange\\01_Academics\\Energy Economics of the Green Transition\\EnergyEconomicsE2023\\DownloadDataForDK\\CleanedData\\TransmissionCapacities_DK_2022'

In [38]:
obj = pd.read_pickle(file_path)
obj.head()

Unnamed: 0,g_E,g_EE,HourUTC,HourCET/CEST,Year,Month,Week,Weekday,HourOfTheDay,h,ImportCapacity_MW,ExportCapacity_MW,Price_EUR/MWh_gE,Price_EUR/MWh_gEE
0,DK1,DELU,2021-12-31 23:00:00+00:00,2022-01-01 00:00:00+01:00,2022,1,52,5,0,1,870.0,700.0,50.049999,50.049999
1,DK1,DELU,2022-01-01 00:00:00+00:00,2022-01-01 01:00:00+01:00,2022,1,52,5,1,2,870.0,700.0,41.330002,41.330002
2,DK1,DELU,2022-01-01 01:00:00+00:00,2022-01-01 02:00:00+01:00,2022,1,52,5,2,3,870.0,700.0,43.220001,43.220001
3,DK1,DELU,2022-01-01 02:00:00+00:00,2022-01-01 03:00:00+01:00,2022,1,52,5,3,4,870.0,700.0,45.459999,45.459999
4,DK1,DELU,2022-01-01 03:00:00+00:00,2022-01-01 04:00:00+01:00,2022,1,52,5,4,5,870.0,700.0,37.669998,37.669998
