# Construct Dataset

In this Jupyter notebook, the dataset going into the $mBasicPH\_storage$ model is being constructed. Basically, this file constructs an excel file (stored in "EnergyEconGroupWork\Data") from the various excel files (stored in "EnergyEconGroupWork\DownloadDataForDK\ModelData") which were constructed from real world data.

## Settings

Import standard packages:

In [1]:
import pandas as pd,os, numpy as np

Let's specify an output folder:

In [2]:
output_dir = os.path.join(os.getcwd(),'Final_Dataset')

In [3]:
print(output_dir)

c:\Users\mpher\Documents\Uni\Master\02_Exchange\01_Academics\Energy Economics of the Green Transition\EnergyEconGroupWork\DownloadDataForDK\ModelData\Final_Dataset


## Sheet "Log"

Here we construct the sheet "Log", which defines the **UNITS** of variables.

Define **UNITS**:

In [4]:
UNITS = {
    'FuelPrice':'EUR/MWh',
    'EmissionIntensity':'Ton CO2/MWh input',
    'EmissionTax':'EUR/TCO2',
    'Load':'MWh',
    'FuelMix':'MWh input / MWh output', # Data: 'TWh input / TWh output' -> no need to adjust ratio stays
    'GeneratingCapacity':'MW', 
    'OtherMC':'EUR/MWh output',
    'FOM':'EUR/(MW/(hours per model year))/8760', # convert from year to hours per model year
    'InvestCost':'EUR2015/MWhCapacity', # Data: 'Million EUR2015/GWhCapacity' -> adjust
    'LoadVariation':'Percent of annual demand',
    'CapVariation':'Percent of generating capacity',
    'MWP_E':'EUR/MWh',
    'MWP_H':'EUR/MWh',
    'E2H':'Coefficient (negative for heat pumps, positive for backpressure)'
}

Add dictionary to "Log" dataset:

In [5]:
df_Log = pd.DataFrame(list(UNITS.items()), columns=['Parameter', 'Unit/description'])

Save as excel:

In [6]:
df_Log.to_excel(os.path.join(output_dir,'Log.xlsx'),sheet_name='Log', index=False)

## Sheet "Fundamentals"

### FuelPrice

#### FuelPrice/BFt

Get different fuel types from "FuelMix" excel file in cwd.

In [7]:
BFt = pd.read_excel(os.path.join(os.getcwd(), 'FuelMix.xlsx'))

Subset:

In [8]:
BFt = BFt['BFt'].drop_duplicates()

Convert to df and set column name:

In [9]:
Fundamentals_df = pd.DataFrame({'FuelPrice/BFt': BFt})

In [10]:
Fundamentals_df

Unnamed: 0,FuelPrice/BFt
0,Biogas
3,Biomass
6,Coal
7,Natgas
10,Oil
13,Waste


#### FuelPrice/FuelPrice

We use the file "FuelProjections" (data from DEA) from the "EnergyEconomicsE2023" GitHub repository. Unfortunately, fuel prices are only stated from 2020 onwards. Therefore, we use the prices for 2020 as they are closest to 2019. We do so, because most fuel prices in the "FuelProjections" dataset increase over time, so we are the closest to 2019 prices by using the 2020 estimates.

In [11]:
FuelPrice = pd.read_excel(os.path.join(os.getcwd(), 'FuelProjections.xlsx'), sheet_name='prices')

In the FuelPrice df the prices are in EUR/GJ but we want EUR/MWh:

1 GJ = 0.2777777778 MWh ([Source](https://www.unitconverters.net/energy/gigajoule-to-megawatt-hour.htm))

Add to fundamentals df:

In [12]:
# Create empty new column in Fundamentals_df
Fundamentals_df['FuelPrice/FuelPrice'] = np.nan

In [13]:
# Biogas
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Biogas', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Biogas'] / 0.2777777778

# Biomass (we assume Biomass only consists of Wood pellets so we get close to prices in the mBasicPH_storageLarge)
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Biomass', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Wood pellets'] / 0.2777777778

# Coal
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Coal', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Coal'] / 0.2777777778
                                                                                                          
# Natgas
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Natgas', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Natural gas'] / 0.2777777778

# Oil
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Oil', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Oil'] / 0.2777777778

# Waste
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Waste', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Waste'] / 0.2777777778


In [14]:
Fundamentals_df

Unnamed: 0,FuelPrice/BFt,FuelPrice/FuelPrice
0,Biogas,57.90268
3,Biomass,34.279615
6,Coal,7.339794
7,Natgas,13.265219
10,Oil,26.22273
13,Waste,0.048322


### EmissionIntensity

#### EmissionIntensity/BFt

Copy row "FuelPrice/BFt"

In [15]:
Fundamentals_df['EmissionIntensity/BFt'] = Fundamentals_df['FuelPrice/BFt']

#### EmissionIntensity/EmissionType

Fill new column with value "CO2", i.e. the same emission type for all fuels.

In [16]:
Fundamentals_df['EmissionIntensity/EmissionType'] = 'CO2'

#### EmissionIntensity/EmissionIntensity

As with the FuelPrice/FuelPrice, we use the file "FuelProjections" (data from DEA) from the "EnergyEconomicsE2023" GitHub repository. However, "EmissionIntensity" does not depend on time.

In [17]:
EmissionIntensity = pd.read_excel(os.path.join(os.getcwd(), 'FuelProjections.xlsx'), sheet_name='emissionIntensity')

In [18]:
EmissionIntensity

Unnamed: 0,EmissionIntensity/EmissionType/BFt,Coal,Oil,Natural gas,Straw,Wood pellets,Wood chips,Wood waste,Waste,Biogas,Hydrogen,Uranium
0,CO2,94.37,76.645,57,0.0,0,0,0,42.5,0,0,0
1,SO2,0.272,0.159884,0,0.2,0,0,0,0.075,0,0,0


Add the emission intensity of CO2 to Fundamentals_df:

In the "EmissionIntensity" datafile above the values are in **kg/GJ** but we want **Ton CO2/MWh**. 

Thus, we devide by $1000*0.2777777778$. 

In [19]:
# Create empty new column in Fundamentals_df
Fundamentals_df['EmissionIntensity/EmissionIntensity'] = np.nan

# Biogas
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Biogas', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Biogas'] / (1000*0.2777777778)

# Biomass (we assume Biomass only consists of Wood pellets so we get close to prices in the mBasicPH_storageLarge)
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Biomass', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Wood pellets'] / (1000*0.2777777778)

# Coal
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Coal', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Coal'] / (1000*0.2777777778)
                                                                                                          
# Natgas
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Natgas', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Natural gas'] / (1000*0.2777777778)

# Oil
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Oil', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Oil'] / (1000*0.2777777778)

# Waste
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Waste', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Waste'] / (1000*0.2777777778)

In [20]:
Fundamentals_df

Unnamed: 0,FuelPrice/BFt,FuelPrice/FuelPrice,EmissionIntensity/BFt,EmissionIntensity/EmissionType,EmissionIntensity/EmissionIntensity
0,Biogas,57.90268,Biogas,CO2,0.0
3,Biomass,34.279615,Biomass,CO2,0.0
6,Coal,7.339794,Coal,CO2,0.339732
7,Natgas,13.265219,Natgas,CO2,0.2052
10,Oil,26.22273,Oil,CO2,0.275922
13,Waste,0.048322,Waste,CO2,0.153


### EmissionTax

We are goint to assume the average EU ETS price during the year 2019. The average EU Carbon Permits prices during 2019 was **24.64 EUR/TCO2** ([Source](https://tradingeconomics.com/commodity/carbon)).

We can add this information to the existing *Fundamentals_df*.

In [21]:
Fundamentals_df['EmissionTax/EmissionType'] = ['CO2'] + [np.nan] * (len(Fundamentals_df) - 1)
Fundamentals_df['EmissionTax/EmissionTax'] = [24.64] + [np.nan] * (len(Fundamentals_df) - 1)

### Save as excel

In [22]:
Fundamentals_df.to_excel(os.path.join(output_dir,'Fundamentals.xlsx'),sheet_name='Fundamentals', index=False)

## Sheet "LoadVariables"

*Note: Subtitles to previous section slightly differ (as we use less subtitles, i.e. subtitles per category and not per column as in "Sheet Fundamentals" section) to make the code more readable.*

### Electricity

We get load from *Load_E.xlsx* file. In this step, we also already rename columns we want in our output df "LoadVariables_df".

In [23]:
LoadVariables_df = pd.read_excel(os.path.join(os.getcwd(), 'Load_E.xlsx')).rename(columns={
    'c_E':'Load_E/c_E',
    'Load_E':'Load_E/Load_E'})

We get load variation from *LoadVariation_E.xlsx* file.

In [24]:
LoadVariation_E = pd.read_excel(os.path.join(os.getcwd(), 'LoadVariation_E.xlsx')).rename(columns={
    'c_E':'LoadVariation_E/c_E',
    'h':'LoadVariation_E/h',
    'LoadVariation_E':'LoadVariation_E/LoadVariation_E'})

Add the rows to LoadVariables_df.

In [25]:
LoadVariables_df = pd.concat([LoadVariables_df, LoadVariation_E], axis=1)

### Heat

We do the same steps as for electricity.

In [26]:
Load_H = pd.read_excel(os.path.join(os.getcwd(), 'Load_H.xlsx')).rename(columns={
    'index':'Load_H/c_H',
    'Load_H':'Load_H/Load_H'})

In [27]:
LoadVariables_df = pd.concat([LoadVariables_df, Load_H], axis=1)

In [28]:
LoadVariation_H = pd.read_excel(os.path.join(os.getcwd(), 'LoadVariation_H.xlsx')).rename(columns={
    'c_H':'LoadVariation_H/c_H',
    'h':'LoadVariation_H/h',
    'LoadVariation_H':'LoadVariation_H/LoadVariation_H'})

In [29]:
LoadVariables_df = pd.concat([LoadVariables_df, LoadVariation_H], axis=1)

### Save as excel

In [30]:
LoadVariables_df.to_excel(os.path.join(output_dir,'LoadVariables.xlsx'),sheet_name='LoadVariables', index=False)

## Sheet "LoadMaps"

We take "c_E" column of electricity consumer from before created "LoadVariables" excel file.

In [31]:
LoadMaps_E = pd.read_excel(os.path.join(output_dir, 'LoadVariables.xlsx'),
                            usecols=['Load_E/c_E']).dropna().rename(columns={
                                'Load_E/c_E':'c_E2g_E/c_E'})

Create mapping from consumers to generators:

In [32]:
LoadMaps_E['c_E2g_E/g_E'] = 'DK'

In [33]:
LoadMaps_E

Unnamed: 0,c_E2g_E/c_E,c_E2g_E/g_E
0,c_DK,DK


We do the same steps for heat consumers:

In [34]:
LoadMaps_H = pd.read_excel(os.path.join(output_dir, 'LoadVariables.xlsx'),
                            usecols=['Load_H/c_H']).dropna().rename(columns={
                                'Load_H/c_H':'c_H2g_H/c_H'})

Define a function to create correct mapping of heat consumers and generators:

In [35]:
def mapping_heat(value):
    if isinstance(value, str):
        if 'c_DK_Central' in value:
            return 'DK_Central'
        elif 'c_DK_Decentral' in value:
            return 'DK_Decentral'
    return np.nan

Apply function:

In [36]:
LoadMaps_H['c_H2g_H/g_H'] = LoadMaps_H['c_H2g_H/c_H'].apply(mapping_heat)

Combine mapping for electricity and heat markets:

In [37]:
LoadMaps_df = pd.concat([LoadMaps_E,LoadMaps_H],axis=1)

In [38]:
LoadMaps_df

Unnamed: 0,c_E2g_E/c_E,c_E2g_E/g_E,c_H2g_H/c_H,c_H2g_H/g_H
0,c_DK,DK,c_DK_Central,DK_Central


### Save as excel

In [39]:
LoadMaps_df.to_excel(os.path.join(output_dir,'LoadMaps.xlsx'),sheet_name='LoadMaps', index=False)

## Sheet "GeneratorsVariables"

### FuelMix

Import from "FuelMix.xlsx".

In [40]:
FuelMix = pd.read_excel(os.path.join(os.getcwd(), 'FuelMix.xlsx')).rename(columns={
    'id':'FuelMix/id',
    'BFt':'FuelMix/BFt',
    'FuelMix':'FuelMix/FuelMix'}).dropna()

In [41]:
FuelMix

Unnamed: 0,FuelMix/id,FuelMix/BFt,FuelMix/FuelMix
0,id_DK_Central_BH_Biogas,Biogas,1.5
1,id_DK_Central_BP_Biogas,Biogas,2.595057
2,id_DK_Central_IndustryH_Biogas,Biogas,0.657895
3,id_DK_Central_BH_Biomass,Biomass,1.014458
4,id_DK_Central_BP_Biomass,Biomass,4.26106
5,id_DK_Central_IndustryH_Biomass,Biomass,0.971154
6,id_DK_Central_BP_Coal,Coal,2.650372
7,id_DK_Central_BH_Natgas,Natgas,1.019108
8,id_DK_Central_BP_Natgas,Natgas,2.489051
9,id_DK_Central_IndustryH_Natgas,Natgas,1.0


(Remove "id_" prefix from "FuelMix/id" column:)

In [42]:
#FuelMix['FuelMix/id'] = FuelMix['FuelMix/id'].str.replace('id_', '')

There is one problem with the plant *id_DK_nan_IndustryE_Biomass*. It has a value of "inf" of use of fuel (column: FuelMix/FuelMix). We are going to drop this column.

In [43]:
FuelMix = FuelMix[FuelMix['FuelMix/FuelMix'] != np.inf]

### GeneratingCap Electricity

In [44]:
GeneratingCap_E = pd.read_excel(os.path.join(os.getcwd(), 'GeneratingCapacity_E.xlsx')).rename(columns={
    'id':'GeneratingCap_E/id',
    'GeneratingCapacity_E':'GeneratingCap_E/GeneratingCap_E'})

In [45]:
GeneratingCap_E

Unnamed: 0,GeneratingCap_E/id,GeneratingCap_E/GeneratingCap_E
0,id_DK_Central_BP_Biogas,72.398
1,id_DK_Central_BP_Biomass,1648.792
2,id_DK_Central_BP_Coal,1091.285
3,id_DK_Central_BP_Natgas,1833.323
4,id_DK_Central_BP_Oil,75.204
5,id_DK_Central_BP_Waste,210.999
6,id_DK_nan_CD_Biogas,11.174
7,id_DK_nan_IndustryE_Biogas,49.789
8,id_DK_nan_IndustryE_Biomass,1.781
9,id_DK_nan_CD_Coal,21.34


In [46]:
#GeneratingCap_E['GeneratingCap_E/id'] = GeneratingCap_E['GeneratingCap_E/id'].str.replace('id_', '')

In [47]:
# Filter out rows containing "ImportFrom"
#GeneratingCap_E = GeneratingCap_E[~GeneratingCap_E['GeneratingCap_E/id'].str.contains('ImportFrom')]

In [48]:
# Filter out rows containing "nan"
#GeneratingCap_E = GeneratingCap_E[~GeneratingCap_E['GeneratingCap_E/id'].str.contains('nan')]

### GeneratingCap Heat

In [49]:
GeneratingCap_H = pd.read_excel(os.path.join(os.getcwd(), 'GeneratingCapacity_H.xlsx')).rename(columns={
    'id':'GeneratingCap_H/id',
    'GeneratingCapacity_H':'GeneratingCap_H/GeneratingCap_H'})

In [50]:
GeneratingCap_H

Unnamed: 0,GeneratingCap_H/id,GeneratingCap_H/GeneratingCap_H
0,id_DK_Central_BH_Biogas,30.059
1,id_DK_Central_IndustryH_Biogas,33.022
2,id_DK_Central_BH_Biomass,2663.333
3,id_DK_Central_IndustryH_Biomass,64.684
4,id_DK_Central_BH_Natgas,5490.355
5,id_DK_Central_IndustryH_Natgas,8.619
6,id_DK_Central_BH_Oil,4518.727
7,id_DK_Central_IndustryH_Oil,53.102
8,id_DK_Central_BH_Waste,99.549
9,id_DK_Central_EP,10.8


In [51]:
#GeneratingCap_H['GeneratingCap_H/id'] = GeneratingCap_H['GeneratingCap_H/id'].str.replace('id_', '')

### OtherMC

In [52]:
OtherMC = pd.read_excel(os.path.join(os.getcwd(), 'OtherMC.xlsx')).rename(columns={
    'id':'OtherMC/id',
    'OtherMC':'OtherMC/OtherMC'}).dropna()

In [53]:
OtherMC

Unnamed: 0,OtherMC/id,OtherMC/OtherMC
0,id_DK_Central_BH_Biogas,2.715967
1,id_DK_Central_BP_Biogas,9.280955
2,id_DK_Central_IndustryH_Biogas,0.0
3,id_DK_Central_BH_Biomass,1.038205
4,id_DK_Central_BP_Biomass,4.441489
5,id_DK_Central_IndustryH_Biomass,0.0
6,id_DK_Central_BP_Coal,3.046955
7,id_DK_Central_BH_Natgas,1.225786
8,id_DK_Central_BP_Natgas,6.88942
9,id_DK_Central_IndustryH_Natgas,0.0


Filter out importing MC:

In [54]:
#OtherMC = OtherMC.filter(regex='^(?!.*ImportFrom).*$', axis=1)

Transfer df from wide to long:

In [55]:
#OtherMC = OtherMC.melt(var_name='OtherMC/id', value_name='OtherMC/OtherMC')

Drop "id" again:

In [56]:
#OtherMC['OtherMC/id'] = OtherMC['OtherMC/id'].str.replace('id_', '')

In [57]:
# Filter out rows containing "nan"
#OtherMC = OtherMC[~OtherMC['OtherMC/id'].str.contains('nan')]

### FOM

In [58]:
FOM = pd.read_excel(os.path.join(os.getcwd(), 'FOM.xlsx')).rename(columns={
    'id':'FOM/id',
    'FOM':'FOM/FOM'}).dropna()

In [59]:
FOM

Unnamed: 0,FOM/id,FOM/FOM
0,id_DK_Central_BH_Biogas,41780.640644
1,id_DK_Central_BP_Biogas,109751.668909
2,id_DK_Central_IndustryH_Biogas,0.0
3,id_DK_Central_BH_Biomass,40675.097972
4,id_DK_Central_BP_Biomass,213076.714365
5,id_DK_Central_IndustryH_Biomass,0.0
6,id_DK_Central_BP_Coal,32570.898152
7,id_DK_Central_BH_Natgas,1908.724677
8,id_DK_Central_BP_Natgas,18871.998659
9,id_DK_Central_IndustryH_Natgas,0.0


In [60]:
#FOM['FOM/id'] = FOM['FOM/id'].str.replace('id_', '')

In [61]:
# Filter out rows containing "nan"
#FOM = FOM[~FOM['FOM/id'].str.contains('nan')]

Filter out importing MC:

In [62]:
# Filter out rows containing "ImportFrom"
#FOM = FOM[~FOM['FOM/id'].str.contains('ImportFrom')]

### InvestCost

We abstract from investment costs in generators.

### E2H

In [63]:
E2H = pd.read_excel(os.path.join(os.getcwd(), 'E2H.xlsx')).rename(columns={
    'id':'E2H/id',
    'E2H':'E2H/E2H'})

In [64]:
E2H

Unnamed: 0,E2H/id,E2H/E2H
0,id_DK_Central_BP_Biogas,0.831903
1,id_DK_Central_BP_Biomass,0.495391
2,id_DK_Central_BP_Coal,0.825316
3,id_DK_Central_BP_Natgas,0.855208
4,id_DK_Central_BP_Oil,0.94025
5,id_DK_Central_BP_Waste,0.212297
6,id_DK_Central_EP,-10.0
7,id_DK_Central_HPstandard,-0.294999
8,id_DK_Central_HPsurplusheat,-0.21845
9,id_DK_Central_IH,-1.0


In [65]:
#E2H['E2H/id'] = E2H['E2H/id'].str.replace('id_', '')

### Put dataframes together

In [66]:
GeneratorsVariables_df = pd.concat([FuelMix,GeneratingCap_E,GeneratingCap_H,OtherMC,FOM,E2H], axis=1)

### Save as excel

In [67]:
GeneratorsVariables_df.to_excel(os.path.join(output_dir,'GeneratorsVariables.xlsx'),sheet_name='GeneratorsVariables', index=False)

## Sheet "GeneratorsMaps"

### id2tech

Import from *id2tech.xlsx*:

In [68]:
id2tech = pd.read_excel(os.path.join(os.getcwd(), 'id2tech.xlsx')).rename(columns={
    'id':'id2tech/id',
    'tech':'id2tech/tech'})

In [69]:
id2tech

Unnamed: 0,id2tech/id,id2tech/tech
0,id_DK_Central_BH_Biogas,BH_Biogas
1,id_DK_Central_BP_Biogas,BP_Biogas
2,id_DK_Central_IndustryH_Biogas,IndustryH_Biogas
3,id_DK_Central_BH_Biomass,BH_Biomass
4,id_DK_Central_BP_Biomass,BP_Biomass
5,id_DK_Central_IndustryH_Biomass,IndustryH_Biomass
6,id_DK_Central_BP_Coal,BP_Coal
7,id_DK_Central_BH_Natgas,BH_Natgas
8,id_DK_Central_BP_Natgas,BP_Natgas
9,id_DK_Central_IndustryH_Natgas,IndustryH_Natgas


Remove *id_* prefix from *id2tech/id* column:

In [70]:
#id2tech['id2tech/id'] = id2tech['id2tech/id'].str.replace('id_', '')

Filter out rows containing *ImportFrom*:

In [71]:
#id2tech = id2tech[~id2tech['id2tech/id'].str.contains('ImportFrom')]

In [72]:
# Filter out rows containing "nan"
#id2tech = id2tech[~id2tech['id2tech/id'].str.contains('nan')]

### id2hvt

Import from *id2hvt.xlsx*:

In [73]:
id2hvt = pd.read_excel(os.path.join(os.getcwd(), 'id2hvt.xlsx')).rename(columns={
    'id':'id2hvt/id',
    'hvt':'id2hvt/hvt'})

In [74]:
id2hvt

Unnamed: 0,id2hvt/id,id2hvt/hvt
0,id_DK_Central_SH,SH_DK_Central
1,id_DK_nan_PV,PV_DK
2,id_DK_nan_ROR,ROR_DK
3,id_DK_nan_WL,WL_DK
4,id_DK_nan_WS,WS_DK
5,id_DK_Central_BH_Biogas,Standard
6,id_DK_Central_BP_Biogas,Standard
7,id_DK_Central_IndustryH_Biogas,Standard
8,id_DK_Central_BH_Biomass,Standard
9,id_DK_Central_BP_Biomass,Standard


Remove *id_* prefix from *id2hvt/id* column:

In [75]:
#id2hvt['id2hvt/id'] = id2hvt['id2hvt/id'].str.replace('id_', '')

Filter out rows containing *ImportFrom*:

In [76]:
#id2hvt = id2hvt[~id2hvt['id2hvt/id'].str.contains('ImportFrom')]

In [77]:
# Filter out rows containing "nan"
#id2hvt = id2hvt[~id2hvt['id2hvt/id'].str.contains('nan')]

### id2g_E

Import from *id2g_E.xlsx*:

In [78]:
id2g_E = pd.read_excel(os.path.join(os.getcwd(), 'id2g_E.xlsx')).rename(columns={
    'id':'id2g_E/id',
    'g_E':'id2g_E/g_E'})

In [79]:
id2g_E

Unnamed: 0,id2g_E/id,id2g_E/g_E
0,id_DK_Central_BP_Biogas,DK
1,id_DK_Central_BP_Biomass,DK
2,id_DK_Central_BP_Coal,DK
3,id_DK_Central_BP_Natgas,DK
4,id_DK_Central_BP_Oil,DK
5,id_DK_Central_BP_Waste,DK
6,id_DK_Central_EP,DK
7,id_DK_Central_HPstandard,DK
8,id_DK_Central_HPsurplusheat,DK
9,id_DK_Central_IH,DK


Remove *id_* prefix from *id2g_E/id* column:

In [80]:
#id2g_E['id2g_E/id'] = id2g_E['id2g_E/id'].str.replace('id_', '')

Filter out rows containing *ImportFrom*:

In [81]:
#id2g_E = id2g_E[~id2g_E['id2g_E/id'].str.contains('ImportFrom')]

In [82]:
# Filter out rows containing "nan"
#id2g_E = id2g_E[~id2g_E['id2g_E/id'].str.contains('nan')]

### id2g_H

Import from *id2g_H.xlsx*:

In [83]:
id2g_H = pd.read_excel(os.path.join(os.getcwd(), 'id2g_H.xlsx')).rename(columns={
    'id':'id2g_H/id',
    'g_H':'id2g_H/g_H'})

In [84]:
id2g_H

Unnamed: 0,id2g_H/id,id2g_H/g_H
0,id_DK_Central_BH_Biogas,DK_Central
1,id_DK_Central_BP_Biogas,DK_Central
2,id_DK_Central_IndustryH_Biogas,DK_Central
3,id_DK_Central_BH_Biomass,DK_Central
4,id_DK_Central_BP_Biomass,DK_Central
5,id_DK_Central_IndustryH_Biomass,DK_Central
6,id_DK_Central_BP_Coal,DK_Central
7,id_DK_Central_BH_Natgas,DK_Central
8,id_DK_Central_BP_Natgas,DK_Central
9,id_DK_Central_IndustryH_Natgas,DK_Central


Remove *id_* prefix from *id2g_H/id* column:

In [85]:
#id2g_H['id2g_H/id'] = id2g_H['id2g_H/id'].str.replace('id_', '')

Merge *id2g_E* and *id2g_H*:

In [86]:
id2g = pd.concat([id2g_E,id2g_H], axis=1).reset_index(drop=True)

In [87]:
id2g

Unnamed: 0,id2g_E/id,id2g_E/g_E,id2g_H/id,id2g_H/g_H
0,id_DK_Central_BP_Biogas,DK,id_DK_Central_BH_Biogas,DK_Central
1,id_DK_Central_BP_Biomass,DK,id_DK_Central_BP_Biogas,DK_Central
2,id_DK_Central_BP_Coal,DK,id_DK_Central_IndustryH_Biogas,DK_Central
3,id_DK_Central_BP_Natgas,DK,id_DK_Central_BH_Biomass,DK_Central
4,id_DK_Central_BP_Oil,DK,id_DK_Central_BP_Biomass,DK_Central
5,id_DK_Central_BP_Waste,DK,id_DK_Central_IndustryH_Biomass,DK_Central
6,id_DK_Central_EP,DK,id_DK_Central_BP_Coal,DK_Central
7,id_DK_Central_HPstandard,DK,id_DK_Central_BH_Natgas,DK_Central
8,id_DK_Central_HPsurplusheat,DK,id_DK_Central_BP_Natgas,DK_Central
9,id_DK_Central_IH,DK,id_DK_Central_IndustryH_Natgas,DK_Central


### tech2modelTech

Import from *tech2modelTech.xlsx*:

In [88]:
tech2modelTech = pd.read_excel(os.path.join(os.getcwd(), 'tech2modelTech.xlsx')).rename(columns={
    'tech':'tech2modelTech/tech',
    'modelTech':'tech2modelTech/modelTech'})

In [89]:
tech2modelTech

Unnamed: 0,tech2modelTech/tech,tech2modelTech/modelTech
0,BH_Biogas,standard_H
1,BP_Biogas,BP
2,IndustryH_Biogas,standard_H
3,BH_Biomass,standard_H
4,BP_Biomass,BP
5,IndustryH_Biomass,standard_H
6,BP_Coal,BP
7,BH_Natgas,standard_H
8,BP_Natgas,BP
9,IndustryH_Natgas,standard_H


### Put dataframes together

In [90]:
GeneratorsMaps_df = pd.concat([id2tech,id2hvt,id2g,tech2modelTech], axis=1)

In [91]:
GeneratorsMaps_df

Unnamed: 0,id2tech/id,id2tech/tech,id2hvt/id,id2hvt/hvt,id2g_E/id,id2g_E/g_E,id2g_H/id,id2g_H/g_H,tech2modelTech/tech,tech2modelTech/modelTech
0,id_DK_Central_BH_Biogas,BH_Biogas,id_DK_Central_SH,SH_DK_Central,id_DK_Central_BP_Biogas,DK,id_DK_Central_BH_Biogas,DK_Central,BH_Biogas,standard_H
1,id_DK_Central_BP_Biogas,BP_Biogas,id_DK_nan_PV,PV_DK,id_DK_Central_BP_Biomass,DK,id_DK_Central_BP_Biogas,DK_Central,BP_Biogas,BP
2,id_DK_Central_IndustryH_Biogas,IndustryH_Biogas,id_DK_nan_ROR,ROR_DK,id_DK_Central_BP_Coal,DK,id_DK_Central_IndustryH_Biogas,DK_Central,IndustryH_Biogas,standard_H
3,id_DK_Central_BH_Biomass,BH_Biomass,id_DK_nan_WL,WL_DK,id_DK_Central_BP_Natgas,DK,id_DK_Central_BH_Biomass,DK_Central,BH_Biomass,standard_H
4,id_DK_Central_BP_Biomass,BP_Biomass,id_DK_nan_WS,WS_DK,id_DK_Central_BP_Oil,DK,id_DK_Central_BP_Biomass,DK_Central,BP_Biomass,BP
5,id_DK_Central_IndustryH_Biomass,IndustryH_Biomass,id_DK_Central_BH_Biogas,Standard,id_DK_Central_BP_Waste,DK,id_DK_Central_IndustryH_Biomass,DK_Central,IndustryH_Biomass,standard_H
6,id_DK_Central_BP_Coal,BP_Coal,id_DK_Central_BP_Biogas,Standard,id_DK_Central_EP,DK,id_DK_Central_BP_Coal,DK_Central,BP_Coal,BP
7,id_DK_Central_BH_Natgas,BH_Natgas,id_DK_Central_IndustryH_Biogas,Standard,id_DK_Central_HPstandard,DK,id_DK_Central_BH_Natgas,DK_Central,BH_Natgas,standard_H
8,id_DK_Central_BP_Natgas,BP_Natgas,id_DK_Central_BH_Biomass,Standard,id_DK_Central_HPsurplusheat,DK,id_DK_Central_BP_Natgas,DK_Central,BP_Natgas,BP
9,id_DK_Central_IndustryH_Natgas,IndustryH_Natgas,id_DK_Central_BP_Biomass,Standard,id_DK_Central_IH,DK,id_DK_Central_IndustryH_Natgas,DK_Central,IndustryH_Natgas,standard_H


### Save as excel

In [92]:
GeneratorsMaps_df.to_excel(os.path.join(output_dir,'GeneratorsMaps.xlsx'),sheet_name='GeneratorsMaps', index=False)

## Sheet "StorageVariables"

For the different values below we take the DEA's *technology_datasheet_for_energy_storage.xlsx*. We consider the case of a **141 Large hot water tank**.

Notes:
- All prices in the datasheet are in EUR2020. Thus, we are going to inflation adjust them for 2019.
- As the technology *141 Large scale hot water tank* was last updated in 2018 (see *Index* sheet in excel file) we are goint to use the data for year 2015 (observed and thus not estimated data). We do this without loss of generality as the data for the 2020 middle estimate remained unchanged compared to 2015.

Import data:

In [93]:
technology_datasheet_for_energy_storage = pd.read_excel(os.path.join(os.getcwd(), 'technology_datasheet_for_energy_storage.xlsx'),sheet_name='alldata_flat').drop(columns=['Technology'])

Subset for *141 Large hot water tank*:

In [94]:
technology_datasheet_for_energy_storage = technology_datasheet_for_energy_storage[technology_datasheet_for_energy_storage['ws'] == '141 Large hot water tank']

Subset for 2015 data:

In [95]:
technology_datasheet_for_energy_storage = technology_datasheet_for_energy_storage[technology_datasheet_for_energy_storage['year'] == 2015]

### GeneratingCap_H

First we define the id column for the heat storage technology.

In [96]:
GeneratingCap_HS = pd.read_excel(os.path.join(output_dir,'GeneratorsVariables.xlsx'), usecols=['GeneratingCap_H/id']).dropna()
GeneratingCap_HS = GeneratingCap_HS[GeneratingCap_HS['GeneratingCap_H/id'].str.contains('HPstandard')]
GeneratingCap_HS['GeneratingCap_H/id'] = GeneratingCap_HS['GeneratingCap_H/id'].str.replace('HPstandard', 'HS')
GeneratingCap_HS

Unnamed: 0,GeneratingCap_H/id
11,id_DK_Central_HS


In DEA's *technology datasheet*, `GeneratingCap_H` corresponds to *Output capacity for one unit [MW]*. 

In [97]:
DEA_GeneratingCap_H = pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Output capacity for one unit [MW]', 'val']).reset_index(drop=True))
DEA_GeneratingCap_H

0    276.3
Name: val, dtype: float64

In [98]:
GeneratingCap_HS['GeneratingCap_H/GeneratingCap_H'] = DEA_GeneratingCap_H.iloc[0]
GeneratingCap_HS

Unnamed: 0,GeneratingCap_H/id,GeneratingCap_H/GeneratingCap_H
11,id_DK_Central_HS,276.3


### chargeCap_H

In [99]:
chargeCap_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'chargeCap_H/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In DEA's *technology datasheet*, `chargeCap_H` corresponds to *Input capacity for one unit [MW]*. 

In [100]:
DEA_chargeCap_H = pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Input capacity for one unit [MW]', 'val']).reset_index(drop=True))
DEA_chargeCap_H

0    276.3
Name: val, dtype: float64

In [101]:
chargeCap_HS['chargeCap_H/chargeCap_H'] = DEA_chargeCap_H.iloc[0]
chargeCap_HS

Unnamed: 0,chargeCap_H/id,chargeCap_H/chargeCap_H
11,id_DK_Central_HS,276.3


### sCap

In [102]:
sCap = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'sCap/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In DEA's *technology datasheet*, `sCap` corresponds to *Energy storage capacity for one unit [MWh)*. 

In [103]:
DEA_sCap = pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Energy storage capacity for one unit [MWh)', 'val']).reset_index(drop=True))
DEA_sCap

0    16575
Name: val, dtype: int64

In [104]:
sCap['sCap/sCap'] = DEA_sCap.iloc[0]
sCap

Unnamed: 0,sCap/id,sCap/sCap
11,id_DK_Central_HS,16575


### effC

In DEA's *technology datasheet*, `effC` corresponds to *Charge efficiency [%]*. 

In [105]:
effC = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'effC/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [106]:
DEA_effC = pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='- Charge efficiency [%]', 'val']).reset_index(drop=True) / 100)
DEA_effC

0    1.0
Name: val, dtype: float64

In [107]:
effC['effC/effC'] = DEA_effC.iloc[0]

In [108]:
effC

Unnamed: 0,effC/id,effC/effC
11,id_DK_Central_HS,1.0


### effD

In DEA's *technology datasheet*, `effD` corresponds to *Discharge efficiency [%]*. 

In [109]:
effD = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'effD/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [110]:
DEA_effD = pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='- Discharge efficiency [%]', 'val']).reset_index(drop=True) / 100)
DEA_effD

0    1.0
Name: val, dtype: float64

In [111]:
effD['effD/effD'] = DEA_effD.iloc[0]

In [112]:
effD

Unnamed: 0,effD/id,effD/effD
11,id_DK_Central_HS,1.0


### selfDischarge

In DEA's *technology datasheet*, `selfDischarge` corresponds to *Energy losses during storage [%/day]*.

We follow the calculations in *mBasicPH_storage.xlsx* (file path: EnergyEconomicsE2023\Documentation\Data\mBasicPH_storage.xlsx) and calculate the selfDischarge as follows:

$selfDischarge = 1 - \left(\frac{\overbrace{\text{Round trip efficiency [\%] from DEA}}^{\equiv \ 98}}{100}\right)^{\frac{1}{24}} = 1 - 0.98^{\frac{1}{24}}$.

In [113]:
selfDischarge = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'selfDischarge/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [114]:
DEA_RoundTripEfficiency = (pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Round trip efficiency [%]', 'val']).reset_index(drop=True))).iloc[0]
DEA_RoundTripEfficiency

98

In [115]:
beta = 1 - (DEA_RoundTripEfficiency/100)**(1/24)
beta

0.0008414252746161699

Construct df

In [116]:
selfDischarge['selfDischarge/selfDischarge'] = beta

In [117]:
selfDischarge

Unnamed: 0,selfDischarge/id,selfDischarge/selfDischarge
11,id_DK_Central_HS,0.000841


### OtherMC

In DEA's *technology datasheet*, `OtherMC` corresponds to *Variable O&M [EUR2015/MWhoutput]*. We note that they are zero and thus we do not have to inflation adjust them.

In [118]:
OtherMC_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'OtherMC/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [119]:
DEA_OtherMC = (pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Variable O&M [EUR2015/MWhoutput]', 'val']).reset_index(drop=True))).iloc[0]
DEA_OtherMC

0

In [120]:
OtherMC_HS['OtherMC/OtherMC'] = DEA_OtherMC

In [121]:
OtherMC_HS

Unnamed: 0,OtherMC/id,OtherMC/OtherMC
11,id_DK_Central_HS,0


### FOM

In DEA's *technology datasheet*, `FOM` corresponds to *Fixed O&M [EUR2015/MWhCapacity/year)*. They are not zero and thus we inflation adjust them.

In [122]:
FOM_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'FOM/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [123]:
DEA_FOM_2020 = (pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Fixed O&M [EUR2015/MWhCapacity/year)', 'val']).reset_index(drop=True))).iloc[0]
DEA_FOM_2020

9.14524

Inflation rates:

In [124]:
π = pd.Series([1.531122704, 3.289449396, 2.662841655, 1.219993423, 0.199343827, -0.06164468, 0.183334861, 1.429107433, 1.73860862, 1.630522608, 0.476498853, 2.554506996, 8.833698867],index=pd.Index(range(2010,2023),name='t')).div(100).add(1)

In [125]:
DEA_FOM_2019 = DEA_FOM_2020 / π[2020]
DEA_FOM_2019

9.101869695300339

Add to df:

In [126]:
FOM_HS['FOM/FOM'] = DEA_FOM_2019

In [127]:
FOM_HS

Unnamed: 0,FOM/id,FOM/FOM
11,id_DK_Central_HS,9.10187


### InvestCost

`As with generators we abstract from investment costs.`

In DEA's *technology datasheet*, `InvestCost` corresponds to *Specific investment [MEUR2015/GWhCapacity]*. They are not zero and thus we inflation adjust them. Additionally, we need to adjust to have them in EUR (not in millions of EUR) and MWh (not in GWh).

Formally, we do the following:

$InvestCost \cdot \frac{MEUR}{1 \ GWh} = InvestCost \cdot \frac{1'000'000 \ EUR}{1'000 \ MWh} = InvestCost \cdot 1'000 \frac{EUR}{MWh}$

In [128]:
#InvestCost_HS = GeneratingCap_HS.rename(columns={
#    'GeneratingCap_H/id':'InvestCost/tech'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [129]:
#DEA_InvestCost_2020 = (pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Specific investment [MEUR2015/GWhCapacity]', 'val']).reset_index(drop=True))).iloc[0]
#DEA_InvestCost_2020

Adjust units:

In [130]:
#DEA_InvestCost_2020_adj = DEA_InvestCost_2020 * 1000
#DEA_InvestCost_2020_adj

In [131]:
#DEA_InvestCost_2019_adj = DEA_InvestCost_2020_adj / π[2020]
#DEA_InvestCost_2019_adj

Add to df:

In [132]:
#InvestCost_HS['InvestCost/InvestCost'] = DEA_InvestCost_2019_adj

### Put dataframes together

In [133]:
StorageVariables_df = pd.concat([GeneratingCap_HS,chargeCap_HS,sCap,effC,effD,selfDischarge,OtherMC_HS,FOM_HS], axis=1)

In [134]:
StorageVariables_df

Unnamed: 0,GeneratingCap_H/id,GeneratingCap_H/GeneratingCap_H,chargeCap_H/id,chargeCap_H/chargeCap_H,sCap/id,sCap/sCap,effC/id,effC/effC,effD/id,effD/effD,selfDischarge/id,selfDischarge/selfDischarge,OtherMC/id,OtherMC/OtherMC,FOM/id,FOM/FOM
11,id_DK_Central_HS,276.3,id_DK_Central_HS,276.3,id_DK_Central_HS,16575,id_DK_Central_HS,1.0,id_DK_Central_HS,1.0,id_DK_Central_HS,0.000841,id_DK_Central_HS,0,id_DK_Central_HS,9.10187


### Save as excel

In [135]:
StorageVariables_df.to_excel(os.path.join(output_dir,'StorageVariables.xlsx'),sheet_name='StorageVariables', index=False)

## Sheet "StorageMaps"

### id2tech

In [136]:
id2tech_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'id2tech/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H').reset_index(drop=True)

In [137]:
id2tech_HS['id2tech/tech'] = 'HS'

In [138]:
id2tech_HS

Unnamed: 0,id2tech/id,id2tech/tech
0,id_DK_Central_HS,HS


### id2hvt

In [139]:
id2hvt_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'id2hvt/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H').reset_index(drop=True)

In [140]:
id2hvt_HS['id2hvt/hvt'] = 'Standard'

In [141]:
id2hvt_HS

Unnamed: 0,id2hvt/id,id2hvt/hvt
0,id_DK_Central_HS,Standard


### id2g_H

In [142]:
id2g_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'id2g_H/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H').reset_index(drop=True)

Define a function to create correct mapping of heat storage to district heat areas:

In [143]:
def mapping_HS(value):
    if isinstance(value, str):
        if 'id_DK_Central' in value:
            return 'DK_Central'
        elif 'id_DK_Decentral' in value:
            return 'DK_Decentral'
    return np.nan

In [144]:
id2g_HS['id2g_H/g_H'] = id2g_HS['id2g_H/id'].apply(mapping_HS)

In [145]:
id2g_HS

Unnamed: 0,id2g_H/id,id2g_H/g_H
0,id_DK_Central_HS,DK_Central


### tech2modelTech

In [146]:
tech2modelTech_data = {
    'tech2modelTech/tech':['HS'],
    'tech2modelTech/modelTech':['HS']
}

As dataframe:

In [147]:
tech2modelTech = pd.DataFrame(tech2modelTech_data)

In [148]:
tech2modelTech

Unnamed: 0,tech2modelTech/tech,tech2modelTech/modelTech
0,HS,HS


### Put dataframes together

In [149]:
StorageMaps_df = pd.concat([id2tech_HS,id2hvt_HS,id2g_HS,tech2modelTech], axis=1)

In [150]:
StorageMaps_df

Unnamed: 0,id2tech/id,id2tech/tech,id2hvt/id,id2hvt/hvt,id2g_H/id,id2g_H/g_H,tech2modelTech/tech,tech2modelTech/modelTech
0,id_DK_Central_HS,HS,id_DK_Central_HS,Standard,id_DK_Central_HS,DK_Central,HS,HS


### Save as excel

In [151]:
StorageMaps_df.to_excel(os.path.join(output_dir,'StorageMaps.xlsx'),sheet_name='StorageMaps', index=False)

## Sheet "HourlyVariation"

We use the excel file *CapVariation.xlsx* (file path: EnergyEconGroupWork\DownloadDataForDK\ModelData\CapVariation.xlsx).

In [152]:
HourlyVariation_df = pd.read_excel(os.path.join(os.getcwd(), 'CapVariation.xlsx')).rename(columns={'h':'CapVariation/h/hvt'})

In [153]:
HourlyVariation_df

Unnamed: 0,CapVariation/h/hvt,PV_DK,WS_DK,WL_DK,SH_DK_Central,Standard,ROR_DK
0,1,0.00036,0.307815,0.591691,8.973731e-07,1,0.145054
1,2,0.00036,0.350893,0.619209,8.973731e-07,1,0.145054
2,3,0.00036,0.186378,0.584036,8.973731e-07,1,0.145054
3,4,0.00036,0.160754,0.565829,8.973731e-07,1,0.145054
4,5,0.00036,0.158525,0.553001,8.973731e-07,1,0.145054
...,...,...,...,...,...,...,...
8755,8756,0.00036,0.229828,0.242859,8.973731e-07,1,0.290107
8756,8757,0.00036,0.206432,0.205617,8.973731e-07,1,0.290107
8757,8758,0.00036,0.194919,0.183685,8.973731e-07,1,0.290107
8758,8759,0.00036,0.199004,0.174582,8.973731e-07,1,0.290107


Drop "import" columns:

In [154]:
#HourlyVariation_df = HourlyVariation_df.filter(regex='^(?!.*ImportFrom).*$', axis=1)

Save as excel:

In [155]:
HourlyVariation_df.to_excel(os.path.join(output_dir,'HourlyVariation.xlsx'),sheet_name='HourlyVariation', index=False)

## Sheet "Scalars"

We get the data from *MWP_E.xlsx*:

In [156]:
Scalars_E = pd.read_excel(os.path.join(os.getcwd(), 'MWP_E.xlsx')).rename(columns={'c_DK':'MWP_E'}).drop(columns='index')

We assume the same MWP on the heat market:

In [157]:
Scalars_H = pd.read_excel(os.path.join(os.getcwd(), 'MWP_E.xlsx')).rename(columns={'c_DK':'MWP_H'}).drop(columns='index')

Put dataframes together:

In [158]:
Scalars_df = pd.concat([Scalars_E,Scalars_H], axis=1)

In [159]:
Scalars_df

Unnamed: 0,MWP_E,MWP_H
0,1000,1000


Add *lineLoss*:

In [160]:
Scalars_df['lineLoss'] = 0

Transfer df from wide to long:

In [161]:
Scalars_df = Scalars_df.melt(var_name='Variable', value_name='Value')

In [162]:
Scalars_df

Unnamed: 0,Variable,Value
0,MWP_E,1000
1,MWP_H,1000
2,lineLoss,0


Save as excel:

In [163]:
Scalars_df.to_excel(os.path.join(output_dir,'Scalars.xlsx'),sheet_name='Scalars', index=False, header=False)

## Sheet "TransmissionLines"

We copy the sheet from *E42_Data.xlsx* as we have the same data in file *lineCapacity.xlsx*. But as we do not include Transmission in our model we set the *linecapacity* to zero.

## Sheet "MarketMaps"

We copy the sheet from *E42_Data.xlsx* and adjust the data manually directly in the excel spreadsheet.

## Sheet "hMaps"

We copy the sheet from *E44_Data.xlsx*, so we do not have to do the headeradjustment ourselves.

## Combine excel files as different sheets within one file

Define directory where final dataset is saved to:

In [164]:
df_final_dir = 'C:\\Users\\mpher\\Documents\\Uni\\Master\\02_Exchange\\01_Academics\\Energy Economics of the Green Transition\\EnergyEconGroupWork\\Data\\mBasicPH_storage_Data.xlsx'

In this last step, we combine the different excel files into one excel file split up into multiple sheets.

In [165]:
#List all excel files in folder
output_dir_final = [os.path.join(root, file) for root, folder, files in os.walk(output_dir) for file in files if file.endswith(".xlsx")]

# Define order of sheets
defined_order = ['log.xlsx', 'Fundamentals.xlsx','LoadVariables.xlsx', 'LoadMaps.xlsx','GeneratorsVariables.xlsx','GeneratorsMaps.xlsx','StorageVariables.xlsx','StorageMaps.xlsx','HourlyVariation.xlsx','Scalars.xlsx','TransmissionLines.xlsx','MarketMaps.xlsx','hMaps.xlsx']
output_dir_final.sort(key=lambda x: defined_order.index(os.path.basename(x)))

with pd.ExcelWriter(df_final_dir) as writer:
    for excel in output_dir_final: #For each excel
        sheet_name = pd.ExcelFile(excel).sheet_names[0] #Find the sheet name
        df = pd.read_excel(excel) #Create a dataframe
        df.to_excel(writer, sheet_name=sheet_name, index=False) #Write it to a sheet in the output excel

We do some minor adjustments manually (yellow marked in file):
- GeneratorsVariables: shifting empy cells up, so no gap with empy lines (came from omitting NAs and not adjusting index number of dataframes)
- HourlyVariation: 
    - The WS_DK column contained 44 cells with negative values. This makes no sense, as it would mean that the wind turbine is producing wind instead of utilising it. We have overwritten the cells in question with a formula that calculates the average of the preceding and subsequent cell. It does not significantly influence our results as the absolute values were very low (Max: 0.000048) and only 44 out of 8760 hours were affected.
    - As the HourlyVariation for ROR does not get imported when we aggregate the heating and electricity areas, we manually copy this data from the deaggregated data file *mBasicPH_storage_Data.xlsx* (stored locally on our computer).