In this notebook, we will be downloading data from the [**Danish Climate Outlook 2023**](https://ens.dk/service/fremskrivninger-analyser-modeller/klimastatus-og-fremskrivning-2023) that is produced by the Danish Energy Agency. The Danish Energy Agency uses a linear programming model of the European electricity system called [**RAMSES**](https://ens.dk/sites/ens.dk/files/Analyser/ramses_energisystemmodel.pdf) as part of the producing the climate outlook. The input data for Ramses is based on an inventory of all Danish electricity and district heat generating plants in Denmark as well as a set of representative plants in the rest of Europe. The Ramses data is confidential, but the Danish Energy Agency does make available a list of representative plants for each area/country. 

We will be using this data on representative plants. It is made available on their [**website**](https://ens.dk/service/fremskrivninger-analyser-modeller/klimastatus-og-fremskrivning-2023) under the heading **"Dataark for resultater"**, but can also be directly downloaded here: [**"KF23 dataark – El og fjernvarme"**](https://ens.dk/sites/ens.dk/files/Basisfremskrivning/kf23_el_og_fjernvarme.xlsx).  Unfortunately, the data is only available in Danish but it is not difficult to translate using Google translate. We will be using:

`Note: Although, we put energy and heat prices of 2019 in our model, we will use the plant data for 2023. Because the oldest data is from 2020 and we argue that approximating data for 2019 with the plant inventory and Total Transfer Capacities (TTC) for 2023 is a good assumption becasue we want to model an increase in heat storage capacity, given today's technology and TTCs.`

Import standard packages:

In [2]:
import pandas as pd,os, numpy as np

Let's specify an output folder:

In [3]:
direc = os.getcwd()
data_dir = os.path.join(direc,'CleanedData')

## 1 Settings

In [4]:
year = 2023

Electricity areas:

In [5]:
g_E = {'DK1':'DK1','DK2':'DK2'}

Heating areas:

In [6]:
g_H = {
    'Centrale områder':'Central', # Large plants located at central nodes in the grid (often urban areas)
    'Større decentrale områder':'LargeDecentral', # large plant distributed close to actual consumption (distribution generation)
    'Mindre decentrale områder':'SmallDecentral', # small plants distributed very close to actual consumption (distributed generation)
}

Technology types:

In [7]:
tech = {
    'Industriel overskudsel':'IndustryE', # Industrial surplus electricity
    'Kondens':'CD', # Steam turbine without heat production, only for electricity production
    'Havvind':'WS', # Offshore-Wind
    'Hydro':'ROR', # Run-of-river hydro (intermittent without storage)
    'Kedel':'BH', # Boiler
    'Kraftvarme':'BP', # Assume all CHP plants are Back Pressure. 
    'Industrivarme':'IndustryH', # Industrial surplus heat
    'PtX_Brint': 'EP', # Electrolyzer Plant 
    'Elpatron':'IH', # Electric immersion water heater (electric boiler)
    'Solvarme':'SH', # Solar heat (sun heating up water)
    'Varmepumper':'HPstandard', 
    'Varmepumper(overskudsvarme)':'HPsurplusheat', 
    'Geotermi':'GT', # Geothermal heat
    'Landvind':'WL', # Onshore-Wind
    'Solceller':'PV', # Photovoltaics (sun to electricity)
}

Mapping between tech and model tech:

In [8]:
tech2modelTech = {tech:'standard_E' for tech in ['IndustryE','CD','WS','ROR','WL','PV']}
tech2modelTech.update({tech:'standard_H' for tech in ['BH','IndustryH','SH','GT']})
tech2modelTech.update({'BP':'BP','HPstandard':'HP','HPsurplusheat':'HP','IH':'HP','EP':'HP'})

Fuel types:

In [9]:
BFt = {
    'Biogas':'Biogas',
    'Havvind':np.nan,
    'Hydro':np.nan,
    'Naturgas':'Natgas',
    'Olie':'Oil',
    'Affald':'Waste',
    'Biomasse':'Biomass',
    'Elkedler':np.nan,
    'Kul':'Coal',
    'Solvarme':np.nan,
    'Varmepumper':np.nan,
    'Varmepumper(overskudsvarme)':np.nan,
    'Geotermi':np.nan,
    'Landvind':np.nan,
    'Solceller':np.nan,
    'Industrivarme':np.nan
}

## 2 Clean Plant Data

Get raw plant data:

In [10]:
df_plant = pd.read_excel(os.path.join(os.getcwd(),'RawData','ClimateOutlook2023_PlantData_and_TTC.xlsx'),sheet_name='Rådata_prod').drop(columns='version').rename(columns={
    'year':'Year',
    'ElArea':'g_E',
    'HeatArea_Category':'g_H',
    'Teknologitype':'TechnologyType',
    'Brændselstype':'BFt',
    'Elkapacitet_MW':'GeneratingCapacity_E',
    'Varmekapacitet_MW':'GeneratingCapacity_H',
    'Elproduktion_TWh':'Generation_E',
    'Varmeproduktion_TWh':'Generation_H',
    'Brændselsforbrug_TWh':'FuelConsumption',
})

Subset year:

In [11]:
df_plant = df_plant[df_plant['Year']==year]

Subset and aggregate electricity area:

`In other words, filter for DK1 and DK2 in column g_E.`

In [12]:
df_plant = df_plant[df_plant['g_E'].isin(g_E.keys())].replace({'g_E':g_E})

Subset and aggregate district heat area:

`In other words, filter for 'Centrale områder', 'Større decentrale områder', 'Mindre decentrale områder' in column g_H.`

In [13]:
df_plant = df_plant[df_plant['g_H'].isin([x for x in g_H.keys()] + [np.nan])].replace({'g_H':g_H})

Aggregate fuel types:

In [14]:
df_plant['BFt'] = df_plant['BFt'].replace(BFt)

Deal with PtX because it is split over multiple cells:

```Explanation:
- loc: This is a Pandas DataFrame accessor that is used for label-based indexing. Here it is used to access a group of rows by labels storred in idx.

- idx: This is the label in the DataFrame where you want to assign the NaN value.

- 'BFt': This is the column label where the NaN value will be assigned.
```

In [15]:
idx = df_plant['TechnologyType']=='PtX_Brint'
df_plant.loc[idx,'BFt'] = np.nan

In [16]:
df_plant.head()

Unnamed: 0,Year,g_E,g_H,BFt,TechnologyType,GeneratingCapacity_E,GeneratingCapacity_H,Generation_E,Generation_H,FuelConsumption
370,2023,DK1,,Biogas,Industriel overskudsel,15.625,0.0,0.078,0.0,0.173
371,2023,DK1,,Biogas,Kondens,10.0,0.0,0.072,0.0,0.201
372,2023,DK1,,,Havvind,1606.4,0.0,5.876,0.0,5.876
373,2023,DK1,,,Hydro,6.894,0.0,0.016,0.0,0.016
375,2023,DK1,,Natgas,Industriel overskudsel,90.183,0.0,0.082,0.0,0.206


Aggregate technology types:

`Creates new column 'tech'. Which is a word put together from the 'TechnologyType' column and the 'BFt' column, which are separated with a '_'.`

In [17]:
df_plant['tech'] = ['_'.join([y,x]) if isinstance(x,str) else y for x,y in zip(df_plant['BFt'],df_plant['TechnologyType'].replace(tech))]

In [18]:
df_plant.head()

Unnamed: 0,Year,g_E,g_H,BFt,TechnologyType,GeneratingCapacity_E,GeneratingCapacity_H,Generation_E,Generation_H,FuelConsumption,tech
370,2023,DK1,,Biogas,Industriel overskudsel,15.625,0.0,0.078,0.0,0.173,IndustryE_Biogas
371,2023,DK1,,Biogas,Kondens,10.0,0.0,0.072,0.0,0.201,CD_Biogas
372,2023,DK1,,,Havvind,1606.4,0.0,5.876,0.0,5.876,WS
373,2023,DK1,,,Hydro,6.894,0.0,0.016,0.0,0.016,ROR
375,2023,DK1,,Natgas,Industriel overskudsel,90.183,0.0,0.082,0.0,0.206,IndustryE_Natgas


Add model technology:

`Adds column 'modelTEch' by first creating a new column by copying 'TechnologyType' column and replacing the values with the names defined in the 'tech' dictionary. Then the values are again replaced with the names defined in the 'tech2modelTech' dictionary.`

In [19]:
df_plant['modelTech'] = df_plant['TechnologyType'].replace(tech).replace(tech2modelTech)

In [20]:
df_plant.head()

Unnamed: 0,Year,g_E,g_H,BFt,TechnologyType,GeneratingCapacity_E,GeneratingCapacity_H,Generation_E,Generation_H,FuelConsumption,tech,modelTech
370,2023,DK1,,Biogas,Industriel overskudsel,15.625,0.0,0.078,0.0,0.173,IndustryE_Biogas,standard_E
371,2023,DK1,,Biogas,Kondens,10.0,0.0,0.072,0.0,0.201,CD_Biogas,standard_E
372,2023,DK1,,,Havvind,1606.4,0.0,5.876,0.0,5.876,WS,standard_E
373,2023,DK1,,,Hydro,6.894,0.0,0.016,0.0,0.016,ROR,standard_E
375,2023,DK1,,Natgas,Industriel overskudsel,90.183,0.0,0.082,0.0,0.206,IndustryE_Natgas,standard_E


In [21]:
print(tech)
print(tech2modelTech)

{'Industriel overskudsel': 'IndustryE', 'Kondens': 'CD', 'Havvind': 'WS', 'Hydro': 'ROR', 'Kedel': 'BH', 'Kraftvarme': 'BP', 'Industrivarme': 'IndustryH', 'PtX_Brint': 'EP', 'Elpatron': 'IH', 'Solvarme': 'SH', 'Varmepumper': 'HPstandard', 'Varmepumper(overskudsvarme)': 'HPsurplusheat', 'Geotermi': 'GT', 'Landvind': 'WL', 'Solceller': 'PV'}
{'IndustryE': 'standard_E', 'CD': 'standard_E', 'WS': 'standard_E', 'ROR': 'standard_E', 'WL': 'standard_E', 'PV': 'standard_E', 'BH': 'standard_H', 'IndustryH': 'standard_H', 'SH': 'standard_H', 'GT': 'standard_H', 'BP': 'BP', 'HPstandard': 'HP', 'HPsurplusheat': 'HP', 'IH': 'HP', 'EP': 'HP'}


Correct some purely electricity producing plants that are part of heating areas:

In [22]:
idx = (~df_plant['g_H'].isna()) & (df_plant['GeneratingCapacity_H']==0) & (df_plant['tech']!='EP')
df_plant.loc[idx,'g_H'] = np.nan

Aggregate plants:

In [23]:
g_cols = ['Year','g_E','g_H','BFt','tech','modelTech']
num_cols = ['GeneratingCapacity_E','GeneratingCapacity_H','Generation_E','Generation_H','FuelConsumption']
df_plant[num_cols] = df_plant[num_cols].astype(float)
df_plant = df_plant.groupby(g_cols,dropna=False)[num_cols].agg('sum').reset_index()

Calculate inverse fuel efficiencies for standard thermal plants:

In [24]:
df_plant['FuelMix'] = np.nan
idx = (df_plant['modelTech'].str.find('standard_')!=-1) & (~df_plant['BFt'].isna())
df_plant.loc[idx,'FuelMix'] = df_plant.loc[idx,'FuelConsumption']/(df_plant.loc[idx,'Generation_E']+df_plant.loc[idx,'Generation_H'])

Calculate inverse fuel efficiens for back pressure plants:

In [25]:
idx = df_plant['modelTech']=='BP'
df_plant.loc[idx,'FuelMix'] = df_plant.loc[idx,'FuelConsumption']/df_plant.loc[idx,'Generation_E']

Calculate electricity-to-heat ratio of back pressure, heat pumps and ptx plants:

In [26]:
df_plant['E2H'] = np.nan
idx = df_plant['modelTech'].isin(['BP','HP'])
df_plant.loc[idx,'E2H'] = df_plant.loc[idx,'GeneratingCapacity_E']/df_plant.loc[idx,'GeneratingCapacity_H']

Calculate electricity to hydrogen for ptx plants:

In [27]:
df_plant['E2HH'] = np.nan
idx = df_plant['tech']=='EP'
df_plant.loc[idx,'E2HH'] = df_plant.loc[idx,'Generation_E']/df_plant.loc[idx,'FuelConsumption']

Correct capacities:

`In words, the value 0 is replaced with np.nan (NumPy's representation of a missing or undefined value).`

In [28]:
df_plant = df_plant.replace({'GeneratingCapacity_E':{0:np.nan},'GeneratingCapacity_H':{0:np.nan}})
df_plant.loc[df_plant['modelTech']=='BP','GeneratingCapacity_H'] = np.nan
df_plant.loc[df_plant['modelTech']=='HP','GeneratingCapacity_E'] = np.nan

Subset database:

`As we do not need all columns.`

In [29]:
df_plant = df_plant[g_cols+['GeneratingCapacity_E','GeneratingCapacity_H','FuelMix','E2H','E2HH','Generation_E','Generation_H']]

Make plant id:

`I.e. create a unique identifier per plant.`

In [30]:
df_plant.head()

Unnamed: 0,Year,g_E,g_H,BFt,tech,modelTech,GeneratingCapacity_E,GeneratingCapacity_H,FuelMix,E2H,E2HH,Generation_E,Generation_H
0,2023,DK1,Central,Biogas,BP_Biogas,BP,9.412,,2.544304,0.895869,,0.079,0.087
1,2023,DK1,Central,Biogas,IndustryH_Biogas,standard_H,,8.414,0.518519,,,0.0,0.027
2,2023,DK1,Central,Biomass,BH_Biomass,standard_H,,266.306,0.946472,,,0.0,0.411
3,2023,DK1,Central,Biomass,BP_Biomass,BP,671.081,,4.169188,0.504207,,2.181,5.694
4,2023,DK1,Central,Coal,BP_Coal,BP,1091.285,,2.650372,0.825316,,1.882,1.131


In [31]:
df_plant['id'] = ['id_'+'_'.join([str(x),str(y),str(z)]) for x,y,z in zip(df_plant['g_E'],df_plant['g_H'],df_plant['tech'])]

In [32]:
df_plant.head()

Unnamed: 0,Year,g_E,g_H,BFt,tech,modelTech,GeneratingCapacity_E,GeneratingCapacity_H,FuelMix,E2H,E2HH,Generation_E,Generation_H,id
0,2023,DK1,Central,Biogas,BP_Biogas,BP,9.412,,2.544304,0.895869,,0.079,0.087,id_DK1_Central_BP_Biogas
1,2023,DK1,Central,Biogas,IndustryH_Biogas,standard_H,,8.414,0.518519,,,0.0,0.027,id_DK1_Central_IndustryH_Biogas
2,2023,DK1,Central,Biomass,BH_Biomass,standard_H,,266.306,0.946472,,,0.0,0.411,id_DK1_Central_BH_Biomass
3,2023,DK1,Central,Biomass,BP_Biomass,BP,671.081,,4.169188,0.504207,,2.181,5.694,id_DK1_Central_BP_Biomass
4,2023,DK1,Central,Coal,BP_Coal,BP,1091.285,,2.650372,0.825316,,1.882,1.131,id_DK1_Central_BP_Coal


Adjust g_H label:

`To see in which bidding zone heat producing plant is.`

In [33]:
df_plant['g_H'] = ['_'.join([g_E,g_H]) if isinstance(g_H,str) else g_H for g_E,g_H in zip(df_plant['g_E'],df_plant['g_H'])]

In [34]:
df_plant.head()

Unnamed: 0,Year,g_E,g_H,BFt,tech,modelTech,GeneratingCapacity_E,GeneratingCapacity_H,FuelMix,E2H,E2HH,Generation_E,Generation_H,id
0,2023,DK1,DK1_Central,Biogas,BP_Biogas,BP,9.412,,2.544304,0.895869,,0.079,0.087,id_DK1_Central_BP_Biogas
1,2023,DK1,DK1_Central,Biogas,IndustryH_Biogas,standard_H,,8.414,0.518519,,,0.0,0.027,id_DK1_Central_IndustryH_Biogas
2,2023,DK1,DK1_Central,Biomass,BH_Biomass,standard_H,,266.306,0.946472,,,0.0,0.411,id_DK1_Central_BH_Biomass
3,2023,DK1,DK1_Central,Biomass,BP_Biomass,BP,671.081,,4.169188,0.504207,,2.181,5.694,id_DK1_Central_BP_Biomass
4,2023,DK1,DK1_Central,Coal,BP_Coal,BP,1091.285,,2.650372,0.825316,,1.882,1.131,id_DK1_Central_BP_Coal


Save database:

In [35]:
file_path = os.path.join(data_dir,'PlantData_DK_'+str(year))
df_plant.to_pickle(file_path)

## 2 Clean TTC Data

Get raw data:

In [36]:
df_ttc = pd.read_excel(os.path.join(os.getcwd(),'RawData','ClimateOutlook2023_PlantData_and_TTC.xlsx'),sheet_name='Rådata_NTC').drop(columns='version').rename(columns={'From':'g_E','To':'g_EE','NTC_MW':'TTC','year':'Year'})

`Inspect df_ttc`

In [37]:
df_ttc.head()

Unnamed: 0,Year,g_E,g_EE,TTC
0,2026,AT,CH,1200
1,2026,AT,CZSK,900
2,2026,AT,DELU,5400
3,2026,AT,HU,800
4,2026,AT,IT,715


Subset year:

In [38]:
df_ttc = df_ttc[df_ttc['Year']==year]

Subset to DK:

In [39]:
df_ttc = df_ttc[(df_ttc['g_E'].isin(g_E.keys())) | (df_ttc['g_EE'].isin(g_E.keys()))].replace({'g_E':g_E,'g_EE':g_E})

Drop DK to DK if aggregated:

In [40]:
df_ttc = df_ttc[df_ttc['g_E']!=df_ttc['g_EE']].reset_index(drop=True)

Shift columns to make export and import:

In [41]:
# idx = (df_ttc['g_E'].str.find('DK')!=-1) 
# idx_DK = (idx) & (df_ttc['g_EE'].str.find('DK')!=-1)
# df_ttc['ExportCapacity'] = 0
# df_ttc.loc[idx,'ExportCapacity'] = df_ttc.loc[idx,'TTC']
# df_ttc['ImportCapacity'] = 0
# df_ttc.loc[~idx,'ImportCapacity'] = df_ttc.loc[~idx,'TTC']
# df_ttc.loc[idx_DK,'ImportCapacity'] = df_ttc.loc[idx_DK,'TTC']
# g_EE = df_ttc.loc[(~idx) & (~idx_DK),'g_EE']
# df_ttc.loc[(~idx) & (~idx_DK),'g_EE'] = df_ttc.loc[(~idx) & (~idx_DK),'g_E']
# df_ttc.loc[(~idx) & (~idx_DK),'g_E'] = g_EE

Aggregate:

In [42]:
df_ttc = df_ttc.groupby(['Year','g_E','g_EE']).agg('sum').reset_index()

Save database:

In [43]:
file_path = os.path.join(data_dir,'TTC_DK_'+str(year))
df_ttc.to_pickle(file_path)