# Construct Dataset

In this Jupyter notebook, the dataset going into the $mBasicPH\_storage$ model is being constructed. Basically, this file constructs an excel file (stored in "EnergyEconGroupWork\Data") from the various excel files (stored in "EnergyEconGroupWork\DownloadDataForDK\ModelData") which were constructed from real world data.

## Settings

Import standard packages:

In [1]:
import pandas as pd,os, numpy as np

Let's specify an output folder:

In [2]:
output_dir = os.path.join(os.getcwd(),'Final_Dataset')

In [3]:
print(output_dir)

c:\Users\mpher\Documents\Uni\Master\02_Exchange\01_Academics\Energy Economics of the Green Transition\EnergyEconGroupWork\DownloadDataForDK\ModelData\Final_Dataset


## Sheet "Log"

Here we construct the sheet "Log", which defines the **UNITS** of variables.

Define **UNITS**:

In [4]:
UNITS = {
    'FuelPrice':'EUR/MWh',
    'EmissionIntensity':'Ton CO2/MWh input',
    'EmissionTax':'EUR/TCO2',
    'Load':'MWh',
    'FuelMix':'MWh input / MWh output', # Data: 'TWh input / TWh output' -> no need to adjust ratio stays
    'GeneratingCapacity':'MW', 
    'OtherMC':'EUR/MWh output',
    'FOM':'EUR/(MW/(hours per model year))/8760', # convert from year to hours per model year
    'InvestCost':'EUR2015/MWhCapacity', # Data: 'Million EUR2015/GWhCapacity' -> adjust
    'LoadVariation':'Percent of annual demand',
    'CapVariation':'Percent of generating capacity',
    'MWP_E':'EUR/MWh',
    'MWP_H':'EUR/MWh',
    'E2H':'Coefficient (negative for heat pumps, positive for backpressure)'
}

Add dictionary to "Log" dataset:

In [5]:
df_Log = pd.DataFrame(list(UNITS.items()), columns=['Parameter', 'Unit/description'])

Save as excel:

In [6]:
df_Log.to_excel(os.path.join(output_dir,'Log.xlsx'),sheet_name='Log', index=False)

## Sheet "Fundamentals"

### FuelPrice

#### FuelPrice/BFt

Get different fuel types from "FuelMix" excel file in cwd.

In [7]:
BFt = pd.read_excel(os.path.join(os.getcwd(), 'FuelMix.xlsx'))

Subset:

In [8]:
BFt = BFt['BFt'].drop_duplicates()

Convert to df and set column name:

In [9]:
Fundamentals_df = pd.DataFrame({'FuelPrice/BFt': BFt})

In [10]:
Fundamentals_df

Unnamed: 0,FuelPrice/BFt
0,Biogas
3,Biomass
5,Coal
6,Natgas
8,Oil
11,Waste


#### FuelPrice/FuelPrice

We use the file "FuelProjections" (data from DEA) from the "EnergyEconomicsE2023" GitHub repository. Unfortunately, fuel prices are only stated from 2020 onwards. Therefore, we use the prices for 2020 as they are closest to 2019. We do so, because most fuel prices in the "FuelProjections" dataset increase over time, so we are the closest to 2019 prices by using the 2020 estimates.

In [11]:
FuelPrice = pd.read_excel(os.path.join(os.getcwd(), 'FuelProjections.xlsx'), sheet_name='prices')

In the FuelPrice df the prices are in EUR/GJ but we want EUR/MWh:

1 GJ = 0.2777777778 MWh ([Source](https://www.unitconverters.net/energy/gigajoule-to-megawatt-hour.htm))

Add to fundamentals df:

In [12]:
# Create empty new column in Fundamentals_df
Fundamentals_df['FuelPrice/FuelPrice'] = np.nan

In [13]:
# Biogas
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Biogas', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Biogas'] / 0.2777777778

# Biomass (we assume Biomass only consists of Wood pellets so we get close to prices in the mBasicPH_storageLarge)
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Biomass', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Wood pellets'] / 0.2777777778

# Coal
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Coal', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Coal'] / 0.2777777778
                                                                                                          
# Natgas
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Natgas', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Natural gas'] / 0.2777777778

# Oil
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Oil', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Oil'] / 0.2777777778

# Waste
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Waste', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Waste'] / 0.2777777778


In [14]:
Fundamentals_df

Unnamed: 0,FuelPrice/BFt,FuelPrice/FuelPrice
0,Biogas,57.90268
3,Biomass,34.279615
5,Coal,7.339794
6,Natgas,13.265219
8,Oil,26.22273
11,Waste,0.048322


### EmissionIntensity

#### EmissionIntensity/BFt

Copy row "FuelPrice/BFt"

In [15]:
Fundamentals_df['EmissionIntensity/BFt'] = Fundamentals_df['FuelPrice/BFt']

#### EmissionIntensity/EmissionType

Fill new column with value "CO2", i.e. the same emission type for all fuels.

In [16]:
Fundamentals_df['EmissionIntensity/EmissionType'] = 'CO2'

#### EmissionIntensity/EmissionIntensity

As with the FuelPrice/FuelPrice, we use the file "FuelProjections" (data from DEA) from the "EnergyEconomicsE2023" GitHub repository. However, "EmissionIntensity" does not depend on time.

In [17]:
EmissionIntensity = pd.read_excel(os.path.join(os.getcwd(), 'FuelProjections.xlsx'), sheet_name='emissionIntensity')

In [18]:
EmissionIntensity

Unnamed: 0,EmissionIntensity/EmissionType/BFt,Coal,Oil,Natural gas,Straw,Wood pellets,Wood chips,Wood waste,Waste,Biogas,Hydrogen,Uranium
0,CO2,94.37,76.645,57,0.0,0,0,0,42.5,0,0,0
1,SO2,0.272,0.159884,0,0.2,0,0,0,0.075,0,0,0


Add the emission intensity of CO2 to Fundamentals_df:

In the "EmissionIntensity" datafile above the values are in **kg/GJ** but we want **Ton CO2/MWh**. 

Thus, we devide by $1000*0.2777777778$. 

In [19]:
# Create empty new column in Fundamentals_df
Fundamentals_df['EmissionIntensity/EmissionIntensity'] = np.nan

# Biogas
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Biogas', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Biogas'] / (1000*0.2777777778)

# Biomass (we assume Biomass only consists of Wood pellets so we get close to prices in the mBasicPH_storageLarge)
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Biomass', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Wood pellets'] / (1000*0.2777777778)

# Coal
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Coal', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Coal'] / (1000*0.2777777778)
                                                                                                          
# Natgas
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Natgas', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Natural gas'] / (1000*0.2777777778)

# Oil
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Oil', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Oil'] / (1000*0.2777777778)

# Waste
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Waste', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Waste'] / (1000*0.2777777778)

In [20]:
Fundamentals_df

Unnamed: 0,FuelPrice/BFt,FuelPrice/FuelPrice,EmissionIntensity/BFt,EmissionIntensity/EmissionType,EmissionIntensity/EmissionIntensity
0,Biogas,57.90268,Biogas,CO2,0.0
3,Biomass,34.279615,Biomass,CO2,0.0
5,Coal,7.339794,Coal,CO2,0.339732
6,Natgas,13.265219,Natgas,CO2,0.2052
8,Oil,26.22273,Oil,CO2,0.275922
11,Waste,0.048322,Waste,CO2,0.153


### EmissionTax

We are goint to assume the average EU ETS price during the year 2019. The average EU Carbon Permits prices during 2019 was **24.64 EUR/TCO2** ([Source](https://tradingeconomics.com/commodity/carbon)).

We can add this information to the existing *Fundamentals_df*.

In [21]:
Fundamentals_df['EmissionTax/EmissionType'] = ['CO2'] + [np.nan] * (len(Fundamentals_df) - 1)
Fundamentals_df['EmissionTax/EmissionTax'] = [24.64] + [np.nan] * (len(Fundamentals_df) - 1)

### Save as excel

In [22]:
Fundamentals_df.to_excel(os.path.join(output_dir,'Fundamentals.xlsx'),sheet_name='Fundamentals', index=False)

## Sheet "LoadVariables"

*Note: Subtitles to previous section slightly differ (as we use less subtitles, i.e. subtitles per category and not per column as in "Sheet Fundamentals" section) to make the code more readable.*

### Electricity

We get load from "Load_E" file. In this step, we also already rename columns we want in our output df "LoadVariables_df".

In [23]:
LoadVariables_df = pd.read_excel(os.path.join(os.getcwd(), 'Load_E.xlsx')).rename(columns={
    'c_E':'Load_E/c_E',
    'Load_E':'Load_E/Load_E'})

We add MWP from *MWP_E.xlsx*:

In [24]:
MWP_E = pd.read_excel(os.path.join(os.getcwd(), 'MWP_E.xlsx')).head(1).drop(columns='index')

# Transfer df from wide to long:
MWP_E = MWP_E.melt(var_name='MWP_E/c_E', value_name='MWP_E/MWP_E')

# Combine LoadVariables_df and MWP_E:
LoadVariables_df = pd.concat([LoadVariables_df, MWP_E], axis=1)

We get load variation from "LoadVariation_E" file.

In [25]:
LoadVariation_E = pd.read_excel(os.path.join(os.getcwd(), 'LoadVariation_E.xlsx')).rename(columns={
    'c_E':'LoadVariation_E/c_E',
    'h':'LoadVariation_E/h',
    'LoadVariation_E':'LoadVariation_E/LoadVariation_E'})

Add the rows to LoadVariables_df.

In [26]:
LoadVariables_df = pd.concat([LoadVariables_df, LoadVariation_E], axis=1)

### Heat

We do the same steps as for electricity. There is no export/import of heat.

In [27]:
Load_H = pd.read_excel(os.path.join(os.getcwd(), 'Load_H.xlsx')).rename(columns={
    'index':'Load_H/c_H',
    'Load_H':'Load_H/Load_H'})

In [28]:
LoadVariables_df = pd.concat([LoadVariables_df, Load_H], axis=1)

We set the same *MWP_H* as for *MWP_E*:

In [29]:
# Copy heat generator column:
MWP_H = pd.DataFrame()
MWP_H['MWP_H/c_H'] = LoadVariables_df['Load_H/c_H'].dropna()
MWP_H['MWP_H/MWP_H'] = LoadVariables_df['MWP_E/MWP_E'].iloc[0]

# Combine LoadVariables_df and MWP_H:
LoadVariables_df = pd.concat([LoadVariables_df, MWP_H], axis=1)

In [30]:
LoadVariation_H = pd.read_excel(os.path.join(os.getcwd(), 'LoadVariation_H.xlsx')).rename(columns={
    'c_H':'LoadVariation_H/c_H',
    'h':'LoadVariation_H/h',
    'LoadVariation_H':'LoadVariation_H/LoadVariation_H'})

In [31]:
LoadVariables_df = pd.concat([LoadVariables_df, LoadVariation_H], axis=1)

### Save as excel

In [32]:
LoadVariables_df.to_excel(os.path.join(output_dir,'LoadVariables.xlsx'),sheet_name='LoadVariables', index=False)

## Sheet "LoadMaps"

We take "c_E" columns from before created "LoadVariables" excel file.

In [33]:
LoadMaps_df = pd.read_excel(os.path.join(output_dir, 'LoadVariables.xlsx'),
                            usecols=['Load_E/c_E', 'Load_H/c_H']).dropna(subset=['Load_H/c_H']).rename(columns={
                                'Load_E/c_E':'c_E2g_E/c_E',
                                'Load_H/c_H':'c_H2g_H/c_H'})

Define a function to check if "DK" is included in the just created columns


In [34]:
def check_dk(value):
    if isinstance(value, str):
        if 'DK' in value:
            return 'DK'
    return np.nan

Apply function to creat new columns "c_E2g/g" and "c_H2g/g":

In [35]:
LoadMaps_df['c_E2g_E/g_E'] = LoadMaps_df['c_E2g_E/c_E'].apply(check_dk)
LoadMaps_df['c_H2g_H/g_H'] = LoadMaps_df['c_H2g_H/c_H'].apply(check_dk)

Rearrange columns:

In [36]:
LoadMaps_df = LoadMaps_df[['c_E2g_E/c_E','c_E2g_E/g_E','c_H2g_H/c_H','c_H2g_H/g_H']]

We take the last four columns from *g_E2g_H.xlsx* (file is in current wd). -> in MarketMaps

In [37]:
#g_E2g_H = pd.read_excel(os.path.join(os.getcwd(), 'g_E2g_H.xlsx')).rename(columns={
#                                'g_E':'g_E2g/g_E',
#                                'g_H':'g_H2g/g_H'})

# Fill out two missing values with function 'check_dk'
#g_E2g_H['g_E2g/g'] = g_E2g_H['g_E2g/g_E'].apply(check_dk)
#g_E2g_H['g_H2g/g'] = g_E2g_H['g_H2g/g_H'].apply(check_dk)

# Rearrange columns
#g_E2g_H = g_E2g_H[['g_E2g/g_E','g_E2g/g','g_H2g/g_H','g_H2g/g']]

# Concat
#LoadMaps_df = pd.concat([LoadMaps_df, g_E2g_H], axis=1)

### Save as excel

In [38]:
LoadMaps_df.to_excel(os.path.join(output_dir,'LoadMaps.xlsx'),sheet_name='LoadMaps', index=False)

## Sheet "MarketMaps"

We take the market maps from *E42_Data* and adjust it to our dataset (manually done in excel). We store it in "EnergyEconGroupWork\DownloadDataForDK\ModelData\Final_Dataset\MarketMaps.xlsx".

## Sheet "GeneratorsVariables"

### FuelMix

Import from "FuelMix.xlsx".

In [39]:
FuelMix = pd.read_excel(os.path.join(os.getcwd(), 'FuelMix.xlsx')).rename(columns={
    'id':'FuelMix/id',
    'BFt':'FuelMix/BFt',
    'FuelMix':'FuelMix/FuelMix'}).dropna()

In [40]:
FuelMix

Unnamed: 0,FuelMix/id,FuelMix/BFt,FuelMix/FuelMix
0,id_DK_Central_BH_Biogas,Biogas,1.25
1,id_DK_Central_BP_Biogas,Biogas,2.504762
2,id_DK_Central_IndustryH_Biogas,Biogas,0.566667
3,id_DK_Central_BH_Biomass,Biomass,0.987013
4,id_DK_Central_BP_Biomass,Biomass,4.119367
5,id_DK_Central_BP_Coal,Coal,2.650372
6,id_DK_Central_BH_Natgas,Natgas,1.1
7,id_DK_Central_BP_Natgas,Natgas,2.629268
8,id_DK_Central_BH_Oil,Oil,1.124457
9,id_DK_Central_BP_Oil,Oil,3.292683


(Remove "id_" prefix from "FuelMix/id" column:)

In [41]:
#FuelMix['FuelMix/id'] = FuelMix['FuelMix/id'].str.replace('id_', '')

There is one problem with the plant *id_DK_nan_IndustryE_Biomass*. It has a value of "inf" of use of fuel (column: FuelMix/FuelMix). We are going to drop this column.

In [42]:
FuelMix = FuelMix[FuelMix['FuelMix/FuelMix'] != np.inf]

### GeneratingCap Electricity

In [43]:
GeneratingCap_E = pd.read_excel(os.path.join(os.getcwd(), 'GeneratingCapacity_E.xlsx')).rename(columns={
    'id':'GeneratingCap_E/id',
    'GeneratingCapacity_E':'GeneratingCap_E/GeneratingCap_E'})

In [44]:
#GeneratingCap_E['GeneratingCap_E/id'] = GeneratingCap_E['GeneratingCap_E/id'].str.replace('id_', '')

In [45]:
# Filter out rows containing "ImportFrom"
#GeneratingCap_E = GeneratingCap_E[~GeneratingCap_E['GeneratingCap_E/id'].str.contains('ImportFrom')]

In [46]:
# Filter out rows containing "nan"
#GeneratingCap_E = GeneratingCap_E[~GeneratingCap_E['GeneratingCap_E/id'].str.contains('nan')]

### GeneratingCap Heat

In [47]:
GeneratingCap_H = pd.read_excel(os.path.join(os.getcwd(), 'GeneratingCapacity_H.xlsx')).rename(columns={
    'id':'GeneratingCap_H/id',
    'GeneratingCapacity_H':'GeneratingCap_H/GeneratingCap_H'})

In [48]:
#GeneratingCap_H['GeneratingCap_H/id'] = GeneratingCap_H['GeneratingCap_H/id'].str.replace('id_', '')

### OtherMC

Importing while only keeping first row. We only keep the first row, because expect for importing MC, which we are disregarding, MC are the same across all hours.

In [49]:
OtherMC = pd.read_excel(os.path.join(os.getcwd(), 'OtherMC.xlsx')).dropna()

Filter out importing MC:

In [50]:
#OtherMC = OtherMC.filter(regex='^(?!.*ImportFrom).*$', axis=1)

Transfer df from wide to long:

In [51]:
#OtherMC = OtherMC.melt(var_name='OtherMC/id', value_name='OtherMC/OtherMC')

Drop "id" again:

In [52]:
#OtherMC['OtherMC/id'] = OtherMC['OtherMC/id'].str.replace('id_', '')

In [53]:
# Filter out rows containing "nan"
#OtherMC = OtherMC[~OtherMC['OtherMC/id'].str.contains('nan')]

### FOM

In [54]:
FOM = pd.read_excel(os.path.join(os.getcwd(), 'FOM.xlsx')).rename(columns={
    'id':'FOM/id',
    'FOM':'FOM/FOM'}).dropna()

In [55]:
#FOM['FOM/id'] = FOM['FOM/id'].str.replace('id_', '')

In [56]:
# Filter out rows containing "nan"
#FOM = FOM[~FOM['FOM/id'].str.contains('nan')]

Filter out importing MC:

In [57]:
# Filter out rows containing "ImportFrom"
#FOM = FOM[~FOM['FOM/id'].str.contains('ImportFrom')]

### InvestCost

We abstract from investment costs in generators.

### E2H

In [58]:
E2H = pd.read_excel(os.path.join(os.getcwd(), 'E2H.xlsx')).rename(columns={
    'id':'E2H/id',
    'E2H':'E2H/E2H'})

In [59]:
#E2H['E2H/id'] = E2H['E2H/id'].str.replace('id_', '')

### Put dataframes together

In [60]:
GeneratorsVariables_df = pd.concat([FuelMix,GeneratingCap_E,GeneratingCap_H,OtherMC,FOM,E2H], axis=1)

### Save as excel

In [61]:
GeneratorsVariables_df.to_excel(os.path.join(output_dir,'GeneratorsVariables.xlsx'),sheet_name='GeneratorsVariables', index=False)

## Sheet "GeneratorsMaps"

### id2tech

Import from *id2tech.xlsx*:

In [62]:
id2tech = pd.read_excel(os.path.join(os.getcwd(), 'id2tech.xlsx')).rename(columns={
    'id':'id2tech/id',
    'tech':'id2tech/tech'})

Remove *id_* prefix from *id2tech/id* column:

In [63]:
#id2tech['id2tech/id'] = id2tech['id2tech/id'].str.replace('id_', '')

Filter out rows containing *ImportFrom*:

In [64]:
#id2tech = id2tech[~id2tech['id2tech/id'].str.contains('ImportFrom')]

In [65]:
# Filter out rows containing "nan"
#id2tech = id2tech[~id2tech['id2tech/id'].str.contains('nan')]

### id2hvt

Import from *id2hvt.xlsx*:

In [66]:
id2hvt = pd.read_excel(os.path.join(os.getcwd(), 'id2hvt.xlsx')).rename(columns={
    'id':'id2hvt/id',
    'hvt':'id2hvt/hvt'})

Remove *id_* prefix from *id2hvt/id* column:

In [67]:
#id2hvt['id2hvt/id'] = id2hvt['id2hvt/id'].str.replace('id_', '')

Filter out rows containing *ImportFrom*:

In [68]:
#id2hvt = id2hvt[~id2hvt['id2hvt/id'].str.contains('ImportFrom')]

In [69]:
# Filter out rows containing "nan"
#id2hvt = id2hvt[~id2hvt['id2hvt/id'].str.contains('nan')]

### id2g_E

Import from *id2g_E.xlsx*:

In [70]:
id2g_E = pd.read_excel(os.path.join(os.getcwd(), 'id2g_E.xlsx')).rename(columns={
    'id':'id2g_E/id',
    'g_E':'id2g_E/g_E'})

Remove *id_* prefix from *id2g_E/id* column:

In [71]:
#id2g_E['id2g_E/id'] = id2g_E['id2g_E/id'].str.replace('id_', '')

Filter out rows containing *ImportFrom*:

In [72]:
#id2g_E = id2g_E[~id2g_E['id2g_E/id'].str.contains('ImportFrom')]

In [73]:
# Filter out rows containing "nan"
#id2g_E = id2g_E[~id2g_E['id2g_E/id'].str.contains('nan')]

### id2g_H

Import from *id2g_H.xlsx*:

In [74]:
id2g_H = pd.read_excel(os.path.join(os.getcwd(), 'id2g_H.xlsx')).rename(columns={
    'id':'id2g_H/id',
    'g_H':'id2g_H/g_H'})

Remove *id_* prefix from *id2g_H/id* column:

In [75]:
#id2g_H['id2g_H/id'] = id2g_H['id2g_H/id'].str.replace('id_', '')

### tech2modelTech

Import from *tech2modelTech.xlsx*:

In [76]:
tech2modelTech = pd.read_excel(os.path.join(os.getcwd(), 'tech2modelTech.xlsx')).rename(columns={
    'tech':'tech2modelTech/tech',
    'modelTech':'tech2modelTech/modelTech'})

### Put dataframes together

In [77]:
GeneratorsMaps_df = pd.concat([id2tech,id2hvt,id2g_E,id2g_H,tech2modelTech], axis=1)

### Save as excel

In [78]:
GeneratorsMaps_df.to_excel(os.path.join(output_dir,'GeneratorsMaps.xlsx'),sheet_name='GeneratorsMaps', index=False)

## Sheet "StorageVariables"

For the different values below we take the DEA's *technology_datasheet_for_energy_storage.xlsx*. We consider the case of a **141 Large hot water tank**.

Notes:
- All prices in the datasheet are in EUR2020. Thus, we are going to inflation adjust them for 2019.
- As the technology *141 Large scale hot water tank* was last updated in 2018 (see *Index* sheet in excel file) we are goint to use the data for year 2015 (observed and thus not estimated data). We do this without loss of generality as the data for the 2020 middle estimate remained unchanged compared to 2015.
- We are goint to add one Heat Storage (HS) *facility* to each district heat network. Additionally, we are going to assume that the HS are identical in their technology variables. In total we are going to add the following six HS facilities:
    - DK1_Central_HS
    - DK1_LargeDecentral_HS
    - DK1_SmallDecentral_HS
    - DK2_Central_HS
    - DK2_LargeDecentral_HS
    - DK2_SmallDecentral_HS

Import data:

In [79]:
technology_datasheet_for_energy_storage = pd.read_excel(os.path.join(os.getcwd(), 'technology_datasheet_for_energy_storage.xlsx'),sheet_name='alldata_flat').drop(columns=['Technology'])

Subset for *141 Large hot water tank*:

In [80]:
technology_datasheet_for_energy_storage = technology_datasheet_for_energy_storage[technology_datasheet_for_energy_storage['ws'] == '141 Large hot water tank']

Subset for 2015 data:

In [81]:
technology_datasheet_for_energy_storage = technology_datasheet_for_energy_storage[technology_datasheet_for_energy_storage['year'] == 2015]

### GeneratingCap_H

We are thinking of the heat storage facility as being equiped with a **large hot water tank and a HPstandard**. Thus, the *GeneratingCap_H* for the HS facility is the same as the representative *HPstandard* for the respective geographical area. 

We take this data from the sheet *GeneratorsVariables.xlsx* in the *Final_Dataset* folder:

In [82]:
GeneratingCap_HS = pd.read_excel(os.path.join(output_dir,'GeneratorsVariables.xlsx'), usecols=['GeneratingCap_H/id','GeneratingCap_H/GeneratingCap_H']).dropna()

Subset to *HPstandard*:

In [83]:
GeneratingCap_HS = GeneratingCap_HS[GeneratingCap_HS['GeneratingCap_H/id'].str.contains('HPstandard')]

Rename *id*s:

In [84]:
GeneratingCap_HS['GeneratingCap_H/id'] = GeneratingCap_HS['GeneratingCap_H/id'].str.replace('HPstandard', 'HS')

### chargeCap_H

We can see from the excel file *mBasicPH_storage.xlsx* (folder path: EnergyEconomicsE2023\Documentation\Data) that the chargeCap is assumed to be the same as the GeneratingCap. Thus, we are also going to use this assumption.

In [85]:
chargeCap_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'chargeCap_H/id',
    'GeneratingCap_H/GeneratingCap_H':'chargeCap_H/chargeCap_H'})

### sCap

We calculate the *storage capacity (sCap)* as follows:

$sCap = GeneratingCap \cdot \underbrace{\frac{\text{Energy storage capacity for one unit [MWh]}}{\text{Output capacity for one unit [MW]}}}_{\equiv E2H, \ \text{data from Technology Data for Energy storage (DEA)}} = GeneratingCap \cdot \underbrace{\frac{175}{2.9}}_{\text{constant}}$.

The fraction $\frac{\text{Energy storage capacity for one unit [MWh]}}{\text{Output capacity for one unit [MW]}}$ indicates the number of hours required to completely empty the tank. Similar to exercise E2.4 of exercise class *E44_SolutionGuide.ipynb*, we are going to call this ratio the `"energy storage to heat"-ratio/storage duration` and we are goint to denote it $E2H$.

In [86]:
EnergyStorageCapacity = (technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Energy storage capacity for one unit [MWh)', 'val']).reset_index(drop=True)
EnergyStorageCapacity

0    175
Name: val, dtype: object

In [87]:
OutputCapacity = (technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Output capacity for one unit [MW]', 'val']).reset_index(drop=True)
OutputCapacity

0    2.9
Name: val, dtype: object

In [88]:
E2H = pd.to_numeric(EnergyStorageCapacity / OutputCapacity)
E2H

0    60.344828
Name: val, dtype: float64

Now we construct the *sCap* dataframe:

In [89]:
sCap = GeneratingCap_HS.copy().rename(columns={
    'GeneratingCap_H/id':'sCap/id'})

Add column calculating the *sCap*:

In [90]:
sCap['sCap/sCap'] = sCap['GeneratingCap_H/GeneratingCap_H'] * E2H.iloc[0]

Drop *GeneratingCap_H/GeneratingCap_H* column again:

In [91]:
sCap = sCap.drop(columns='GeneratingCap_H/GeneratingCap_H')

### effC

In DEA's *technology datasheet*, `effC` corresponds to *Charge efficiency [%]*. 

In [92]:
effC = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'effC/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [93]:
DEA_effC = pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='- Charge efficiency [%]', 'val']).reset_index(drop=True) / 100)
DEA_effC

0    1.0
Name: val, dtype: float64

In [94]:
effC['effC/effC'] = DEA_effC.iloc[0]

### effD

In DEA's *technology datasheet*, `effD` corresponds to *Discharge efficiency [%]*. 

In [95]:
effD = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'effD/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [96]:
DEA_effD = pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='- Discharge efficiency [%]', 'val']).reset_index(drop=True) / 100)
DEA_effD

0    1.0
Name: val, dtype: float64

In [97]:
effD['effD/effD'] = DEA_effD.iloc[0]

### selfDischarge

In DEA's *technology datasheet*, `selfDischarge` corresponds to *Energy losses during storage [%/day]*.

We follow the calculations in *mBasicPH_storage.xlsx* (file path: EnergyEconomicsE2023\Documentation\Data\mBasicPH_storage.xlsx) and calculate the selfDischarge as follows:

$selfDischarge = 1 - \left(\frac{\overbrace{\text{Round trip efficiency [\%] from DEA}}^{\equiv \ 98}}{100}\right)^{\frac{1}{24}} = 1 - 0.98^{\frac{1}{24}}$.

In [98]:
selfDischarge = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'selfDischarge/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [99]:
DEA_RoundTripEfficiency = (pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Round trip efficiency [%]', 'val']).reset_index(drop=True))).iloc[0]
DEA_RoundTripEfficiency

98

In [100]:
beta = 1 - (DEA_RoundTripEfficiency/100)**(1/24)
beta

0.0008414252746161699

Construct df

In [101]:
selfDischarge['selfDischarge/selfDischarge'] = beta

### OtherMC

In DEA's *technology datasheet*, `OtherMC` corresponds to *Variable O&M [EUR2015/MWhoutput]*. We note that they are zero and thus we do not have to inflation adjust them.

In [102]:
OtherMC_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'OtherMC/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [103]:
DEA_OtherMC = (pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Variable O&M [EUR2015/MWhoutput]', 'val']).reset_index(drop=True))).iloc[0]
DEA_OtherMC

0

In [104]:
OtherMC_HS['OtherMC/OtherMC'] = DEA_OtherMC

### FOM

In DEA's *technology datasheet*, `FOM` corresponds to *Fixed O&M [EUR2015/MWhCapacity/year)*. They are not zero and thus we inflation adjust them.

In [105]:
FOM_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'FOM/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [106]:
DEA_FOM_2020 = (pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Fixed O&M [EUR2015/MWhCapacity/year)', 'val']).reset_index(drop=True))).iloc[0]
DEA_FOM_2020

9.14524

Inflation rates:

In [107]:
π = pd.Series([1.531122704, 3.289449396, 2.662841655, 1.219993423, 0.199343827, -0.06164468, 0.183334861, 1.429107433, 1.73860862, 1.630522608, 0.476498853, 2.554506996, 8.833698867],index=pd.Index(range(2010,2023),name='t')).div(100).add(1)

In [108]:
DEA_FOM_2019 = DEA_FOM_2020 / π[2020]
DEA_FOM_2019

9.101869695300339

Add to df:

In [109]:
FOM_HS['FOM/FOM'] = DEA_FOM_2019

### InvestCost

In DEA's *technology datasheet*, `InvestCost` corresponds to *Specific investment [MEUR2015/GWhCapacity]*. They are not zero and thus we inflation adjust them. Additionally, we need to adjust to have them in EUR (not in millions of EUR) and MWh (not in GWh).

Formally, we do the following:

$InvestCost \cdot \frac{MEUR}{1 \ GWh} = InvestCost \cdot \frac{1'000'000 \ EUR}{1'000 \ MWh} = InvestCost \cdot 1'000 \frac{EUR}{MWh}$

In [110]:
InvestCost_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'InvestCost/tech'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [111]:
DEA_InvestCost_2020 = (pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Specific investment [MEUR2015/GWhCapacity]', 'val']).reset_index(drop=True))).iloc[0]
DEA_InvestCost_2020

3.152747173740094

Adjust units:

In [112]:
DEA_InvestCost_2020_adj = DEA_InvestCost_2020 * 1000
DEA_InvestCost_2020_adj

3152.7471737400942

In [113]:
DEA_InvestCost_2019_adj = DEA_InvestCost_2020_adj / π[2020]
DEA_InvestCost_2019_adj

3137.7956136316548

Add to df:

In [114]:
InvestCost_HS['InvestCost/InvestCost'] = DEA_InvestCost_2019_adj

### Put dataframes together

In [115]:
StorageVariables_df = pd.concat([GeneratingCap_HS,chargeCap_HS,sCap,effC,effD,selfDischarge,OtherMC_HS,FOM_HS,InvestCost_HS], axis=1)

### Save as excel

In [116]:
StorageVariables_df.to_excel(os.path.join(output_dir,'StorageVariables.xlsx'),sheet_name='StorageVariables', index=False)

## Sheet "StorageMaps"

### id2tech

In [117]:
id2tech_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'id2tech/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H').reset_index(drop=True)

In [118]:
id2tech_HS['id2tech/tech'] = 'HS'

### id2hvt

In [119]:
id2hvt_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'id2hvt/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H').reset_index(drop=True)

In [120]:
id2hvt_HS['id2hvt/hvt'] = 'standard_H'

### id2g_H

In [121]:
id2g_H_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'id2g_H/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H').reset_index(drop=True)

In [122]:
id2g_H_HS['id2g_H/g_H'] = id2g_H_HS['id2g_H/id'].apply(check_dk)

### tech2modelTech

In [123]:
tech2modelTech_data = {
    'tech2modelTech/tech':['HS'],
    'tech2modelTech/modelTech':['HS']
}

As dataframe:

In [124]:
tech2modelTech = pd.DataFrame(tech2modelTech_data)

### Put dataframes together

In [125]:
StorageMaps_df = pd.concat([id2tech_HS,id2hvt_HS,id2g_H_HS,tech2modelTech], axis=1)

### Save as excel

In [126]:
StorageMaps_df.to_excel(os.path.join(output_dir,'StorageMaps.xlsx'),sheet_name='StorageMaps', index=False)

## Sheet "HourlyVariation"

We use the excel file *CapVariation.xlsx* (file path: EnergyEconGroupWork\DownloadDataForDK\ModelData\CapVariation.xlsx).

In [127]:
HourlyVariation_df = pd.read_excel(os.path.join(os.getcwd(), 'CapVariation.xlsx')).rename(columns={'h':'CapVariation/h/hvt'})

Drop "import" columns:

In [128]:
#HourlyVariation_df = HourlyVariation_df.filter(regex='^(?!.*ImportFrom).*$', axis=1)

Save as excel:

In [129]:
HourlyVariation_df.to_excel(os.path.join(output_dir,'HourlyVariation.xlsx'),sheet_name='HourlyVariation', index=False)

## Sheet "TransmissionLines"

`As we aggregate DK1 and DK2 we do not need this anymore.`

(We copy the sheet from *E42_Data.xlsx* as we have the same data in file *lineCapacity.xlsx*.)

## Sheet "hMaps"

We take this sheet from *E44_Data* and store it in "EnergyEconGroupWork\DownloadDataForDK\ModelData\Final_Dataset\hMaps.xlsx".

## Combine excel files as different sheets within one file

Define directory where final dataset is saved to:

In [130]:
df_final_dir = 'C:\\Users\\mpher\\Documents\\Uni\\Master\\02_Exchange\\01_Academics\\Energy Economics of the Green Transition\\EnergyEconGroupWork\\Data\\mBasicPH_storage_Data.xlsx'

In this last step, we combine the different excel files into one excel file split up into multiple sheets.

In [131]:
#List all excel files in folder
output_dir_final = [os.path.join(root, file) for root, folder, files in os.walk(output_dir) for file in files if file.endswith(".xlsx")]

# Define order of sheets
defined_order = ['Log.xlsx', 'Fundamentals.xlsx','LoadVariables.xlsx', 'LoadMaps.xlsx','MarketMaps.xlsx','GeneratorsVariables.xlsx','GeneratorsMaps.xlsx','StorageVariables.xlsx','StorageMaps.xlsx','HourlyVariation.xlsx','hMaps.xlsx']
output_dir_final.sort(key=lambda x: defined_order.index(os.path.basename(x)))

with pd.ExcelWriter(df_final_dir) as writer:
    for excel in output_dir_final: #For each excel
        sheet_name = pd.ExcelFile(excel).sheet_names[0] #Find the sheet name
        df = pd.read_excel(excel) #Create a dataframe
        df.to_excel(writer, sheet_name=sheet_name, index=False) #Write it to a sheet in the output excel