# Construct Dataset

In this Jupyter notebook, the dataset going into the $mBasicPH\_storage$ model is being constructed. Basically, this file constructs an excel file (stored in "EnergyEconGroupWork\Data") from the various excel files (stored in "EnergyEconGroupWork\DownloadDataForDK\ModelData") which were constructed from real world data.

## Settings

Import standard packages:

In [None]:
import pandas as pd,os, numpy as np

Let's specify an output folder:

In [None]:
output_dir = os.path.join(os.getcwd(),'Final_Dataset')

In [None]:
print(output_dir)

## Sheet "Log"

Here we construct the sheet "Log", which defines the **UNITS** of variables.

Define **UNITS**:

In [None]:
UNITS = {
    'FuelPrice':'EUR/MWh',
    'EmissionIntensity':'Ton CO2/MWh input',
    'EmissionTax':'EUR/TCO2',
    'Load':'MWh',
    'FuelMix':'MWh input / MWh output', # Data: 'TWh input / TWh output' -> no need to adjust ratio stays
    'GeneratingCapacity':'MW', 
    'OtherMC':'EUR/MWh output',
    'FOM':'EUR/(MW/(hours per model year))/8760', # convert from year to hours per model year
    'InvestCost':'EUR2015/MWhCapacity', # Data: 'Million EUR2015/GWhCapacity' -> adjust
    'LoadVariation':'Percent of annual demand',
    'CapVariation':'Percent of generating capacity',
    'MWP_E':'EUR/MWh',
    'MWP_H':'EUR/MWh',
    'E2H':'Coefficient (negative for heat pumps, positive for backpressure)'
}

Add dictionary to "Log" dataset:

In [None]:
df_Log = pd.DataFrame(list(UNITS.items()), columns=['Parameter', 'Unit/description'])

Save as excel:

In [None]:
df_Log.to_excel(os.path.join(output_dir,'Log.xlsx'),sheet_name='Log', index=False)

## Sheet "Fundamentals"

### FuelPrice

#### FuelPrice/BFt

Get different fuel types from "FuelMix" excel file in cwd.

In [None]:
BFt = pd.read_excel(os.path.join(os.getcwd(), 'FuelMix.xlsx'))

Subset:

In [None]:
BFt = BFt['BFt'].drop_duplicates()

Convert to df and set column name:

In [None]:
Fundamentals_df = pd.DataFrame({'FuelPrice/BFt': BFt})

In [None]:
Fundamentals_df

#### FuelPrice/FuelPrice

We use the file "FuelProjections" (data from DEA) from the "EnergyEconomicsE2023" GitHub repository. Unfortunately, fuel prices are only stated from 2020 onwards. Therefore, we use the prices for 2020 as they are closest to 2019. We do so, because most fuel prices in the "FuelProjections" dataset increase over time, so we are the closest to 2019 prices by using the 2020 estimates.

In [None]:
FuelPrice = pd.read_excel(os.path.join(os.getcwd(), 'FuelProjections.xlsx'), sheet_name='prices')

In the FuelPrice df the prices are in EUR/GJ but we want EUR/MWh:

1 GJ = 0.2777777778 MWh ([Source](https://www.unitconverters.net/energy/gigajoule-to-megawatt-hour.htm))

Add to fundamentals df:

In [None]:
# Create empty new column in Fundamentals_df
Fundamentals_df['FuelPrice/FuelPrice'] = np.nan

In [None]:
# Biogas
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Biogas', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Biogas'] / 0.2777777778

# Biomass (we assume Biomass only consists of Wood pellets so we get close to prices in the mBasicPH_storageLarge)
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Biomass', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Wood pellets'] / 0.2777777778

# Coal
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Coal', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Coal'] / 0.2777777778
                                                                                                          
# Natgas
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Natgas', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Natural gas'] / 0.2777777778

# Oil
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Oil', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Oil'] / 0.2777777778

# Waste
Fundamentals_df.loc[Fundamentals_df['FuelPrice/BFt'] == 'Waste', 'FuelPrice/FuelPrice'] = FuelPrice.loc[0,'Waste'] / 0.2777777778


In [None]:
Fundamentals_df

### EmissionIntensity

#### EmissionIntensity/BFt

Copy row "FuelPrice/BFt"

In [None]:
Fundamentals_df['EmissionIntensity/BFt'] = Fundamentals_df['FuelPrice/BFt']

#### EmissionIntensity/EmissionType

Fill new column with value "CO2", i.e. the same emission type for all fuels.

In [None]:
Fundamentals_df['EmissionIntensity/EmissionType'] = 'CO2'

#### EmissionIntensity/EmissionIntensity

As with the FuelPrice/FuelPrice, we use the file "FuelProjections" (data from DEA) from the "EnergyEconomicsE2023" GitHub repository. However, "EmissionIntensity" does not depend on time.

In [None]:
EmissionIntensity = pd.read_excel(os.path.join(os.getcwd(), 'FuelProjections.xlsx'), sheet_name='emissionIntensity')

In [None]:
EmissionIntensity

Add the emission intensity of CO2 to Fundamentals_df:

In the "EmissionIntensity" datafile above the values are in **kg/GJ** but we want **Ton CO2/MWh**. 

Thus, we devide by $1000*0.2777777778$. 

In [None]:
# Create empty new column in Fundamentals_df
Fundamentals_df['EmissionIntensity/EmissionIntensity'] = np.nan

# Biogas
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Biogas', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Biogas'] / (1000*0.2777777778)

# Biomass (we assume Biomass only consists of Wood pellets so we get close to prices in the mBasicPH_storageLarge)
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Biomass', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Wood pellets'] / (1000*0.2777777778)

# Coal
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Coal', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Coal'] / (1000*0.2777777778)
                                                                                                          
# Natgas
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Natgas', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Natural gas'] / (1000*0.2777777778)

# Oil
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Oil', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Oil'] / (1000*0.2777777778)

# Waste
Fundamentals_df.loc[Fundamentals_df['EmissionIntensity/BFt'] == 'Waste', 'EmissionIntensity/EmissionIntensity'] = EmissionIntensity.loc[0,'Waste'] / (1000*0.2777777778)

In [None]:
Fundamentals_df

### EmissionTax

We are goint to assume the average EU ETS price during the year 2019. The average EU Carbon Permits prices during 2019 was **24.64 EUR/TCO2** ([Source](https://tradingeconomics.com/commodity/carbon)).

We can add this information to the existing *Fundamentals_df*.

In [None]:
Fundamentals_df['EmissionTax/EmissionType'] = ['CO2'] + [np.nan] * (len(Fundamentals_df) - 1)
Fundamentals_df['EmissionTax/EmissionTax'] = [24.64] + [np.nan] * (len(Fundamentals_df) - 1)

### Save as excel

In [None]:
Fundamentals_df.to_excel(os.path.join(output_dir,'Fundamentals.xlsx'),sheet_name='Fundamentals', index=False)

## Sheet "LoadVariables"

*Note: Subtitles to previous section slightly differ (as we use less subtitles, i.e. subtitles per category and not per column as in "Sheet Fundamentals" section) to make the code more readable.*

### Electricity

We get load from *Load_E.xlsx* file. In this step, we also already rename columns we want in our output df "LoadVariables_df".

In [None]:
LoadVariables_df = pd.read_excel(os.path.join(os.getcwd(), 'Load_E.xlsx')).rename(columns={
    'c_E':'Load_E/c_E',
    'Load_E':'Load_E/Load_E'})

We get load variation from *LoadVariation_E.xlsx* file.

In [None]:
LoadVariation_E = pd.read_excel(os.path.join(os.getcwd(), 'LoadVariation_E.xlsx')).rename(columns={
    'c_E':'LoadVariation_E/c_E',
    'h':'LoadVariation_E/h',
    'LoadVariation_E':'LoadVariation_E/LoadVariation_E'})

Add the rows to LoadVariables_df.

In [None]:
LoadVariables_df = pd.concat([LoadVariables_df, LoadVariation_E], axis=1)

### Heat

We do the same steps as for electricity.

In [None]:
Load_H = pd.read_excel(os.path.join(os.getcwd(), 'Load_H.xlsx')).rename(columns={
    'index':'Load_H/c_H',
    'Load_H':'Load_H/Load_H'})

In [None]:
LoadVariables_df = pd.concat([LoadVariables_df, Load_H], axis=1)

In [None]:
LoadVariation_H = pd.read_excel(os.path.join(os.getcwd(), 'LoadVariation_H.xlsx')).rename(columns={
    'c_H':'LoadVariation_H/c_H',
    'h':'LoadVariation_H/h',
    'LoadVariation_H':'LoadVariation_H/LoadVariation_H'})

In [None]:
LoadVariables_df = pd.concat([LoadVariables_df, LoadVariation_H], axis=1)

### Save as excel

In [None]:
LoadVariables_df.to_excel(os.path.join(output_dir,'LoadVariables.xlsx'),sheet_name='LoadVariables', index=False)

## Sheet "LoadMaps"

We take "c_E" column of electricity consumer from before created "LoadVariables" excel file.

In [None]:
LoadMaps_E = pd.read_excel(os.path.join(output_dir, 'LoadVariables.xlsx'),
                            usecols=['Load_E/c_E']).dropna().rename(columns={
                                'Load_E/c_E':'c_E2g_E/c_E'})

Create mapping from consumers to generators:

In [None]:
LoadMaps_E['c_E2g_E/g_E'] = 'DK'

In [None]:
LoadMaps_E

We do the same steps for heat consumers:

In [None]:
LoadMaps_H = pd.read_excel(os.path.join(output_dir, 'LoadVariables.xlsx'),
                            usecols=['Load_H/c_H']).dropna().rename(columns={
                                'Load_H/c_H':'c_H2g_H/c_H'})

Define a function to create correct mapping of heat consumers and generators:

In [None]:
def mapping_heat(value):
    if isinstance(value, str):
        if 'c_DK_Central' in value:
            return 'DK_Central'
        elif 'c_DK_Decentral' in value:
            return 'DK_Decentral'
    return np.nan

Apply function:

In [None]:
LoadMaps_H['c_H2g_H/g_H'] = LoadMaps_H['c_H2g_H/c_H'].apply(mapping_heat)

Combine mapping for electricity and heat markets:

In [None]:
LoadMaps_df = pd.concat([LoadMaps_E,LoadMaps_H],axis=1)

In [None]:
LoadMaps_df

### Save as excel

In [None]:
LoadMaps_df.to_excel(os.path.join(output_dir,'LoadMaps.xlsx'),sheet_name='LoadMaps', index=False)

## Sheet "GeneratorsVariables"

### FuelMix

Import from "FuelMix.xlsx".

In [None]:
FuelMix = pd.read_excel(os.path.join(os.getcwd(), 'FuelMix.xlsx')).rename(columns={
    'id':'FuelMix/id',
    'BFt':'FuelMix/BFt',
    'FuelMix':'FuelMix/FuelMix'}).dropna()

In [None]:
FuelMix

(Remove "id_" prefix from "FuelMix/id" column:)

In [None]:
#FuelMix['FuelMix/id'] = FuelMix['FuelMix/id'].str.replace('id_', '')

There is one problem with the plant *id_DK_nan_IndustryE_Biomass*. It has a value of "inf" of use of fuel (column: FuelMix/FuelMix). We are going to drop this column.

In [None]:
FuelMix = FuelMix[FuelMix['FuelMix/FuelMix'] != np.inf]

### GeneratingCap Electricity

In [None]:
GeneratingCap_E = pd.read_excel(os.path.join(os.getcwd(), 'GeneratingCapacity_E.xlsx')).rename(columns={
    'id':'GeneratingCap_E/id',
    'GeneratingCapacity_E':'GeneratingCap_E/GeneratingCap_E'})

In [None]:
GeneratingCap_E

In [None]:
#GeneratingCap_E['GeneratingCap_E/id'] = GeneratingCap_E['GeneratingCap_E/id'].str.replace('id_', '')

In [None]:
# Filter out rows containing "ImportFrom"
#GeneratingCap_E = GeneratingCap_E[~GeneratingCap_E['GeneratingCap_E/id'].str.contains('ImportFrom')]

In [None]:
# Filter out rows containing "nan"
#GeneratingCap_E = GeneratingCap_E[~GeneratingCap_E['GeneratingCap_E/id'].str.contains('nan')]

### GeneratingCap Heat

In [None]:
GeneratingCap_H = pd.read_excel(os.path.join(os.getcwd(), 'GeneratingCapacity_H.xlsx')).rename(columns={
    'id':'GeneratingCap_H/id',
    'GeneratingCapacity_H':'GeneratingCap_H/GeneratingCap_H'})

In [None]:
GeneratingCap_H

In [None]:
#GeneratingCap_H['GeneratingCap_H/id'] = GeneratingCap_H['GeneratingCap_H/id'].str.replace('id_', '')

### OtherMC

In [None]:
OtherMC = pd.read_excel(os.path.join(os.getcwd(), 'OtherMC.xlsx')).rename(columns={
    'id':'OtherMC/id',
    'OtherMC':'OtherMC/OtherMC'}).dropna()

In [None]:
OtherMC

Filter out importing MC:

In [None]:
#OtherMC = OtherMC.filter(regex='^(?!.*ImportFrom).*$', axis=1)

Transfer df from wide to long:

In [None]:
#OtherMC = OtherMC.melt(var_name='OtherMC/id', value_name='OtherMC/OtherMC')

Drop "id" again:

In [None]:
#OtherMC['OtherMC/id'] = OtherMC['OtherMC/id'].str.replace('id_', '')

In [None]:
# Filter out rows containing "nan"
#OtherMC = OtherMC[~OtherMC['OtherMC/id'].str.contains('nan')]

### FOM

In [None]:
FOM = pd.read_excel(os.path.join(os.getcwd(), 'FOM.xlsx')).rename(columns={
    'id':'FOM/id',
    'FOM':'FOM/FOM'}).dropna()

In [None]:
FOM

In [None]:
#FOM['FOM/id'] = FOM['FOM/id'].str.replace('id_', '')

In [None]:
# Filter out rows containing "nan"
#FOM = FOM[~FOM['FOM/id'].str.contains('nan')]

Filter out importing MC:

In [None]:
# Filter out rows containing "ImportFrom"
#FOM = FOM[~FOM['FOM/id'].str.contains('ImportFrom')]

### InvestCost

We abstract from investment costs in generators.

### E2H

In [None]:
E2H = pd.read_excel(os.path.join(os.getcwd(), 'E2H.xlsx')).rename(columns={
    'id':'E2H/id',
    'E2H':'E2H/E2H'})

In [None]:
E2H

In [None]:
#E2H['E2H/id'] = E2H['E2H/id'].str.replace('id_', '')

### Put dataframes together

In [None]:
GeneratorsVariables_df = pd.concat([FuelMix,GeneratingCap_E,GeneratingCap_H,OtherMC,FOM,E2H], axis=1)

### Save as excel

In [None]:
GeneratorsVariables_df.to_excel(os.path.join(output_dir,'GeneratorsVariables.xlsx'),sheet_name='GeneratorsVariables', index=False)

## Sheet "GeneratorsMaps"

### id2tech

Import from *id2tech.xlsx*:

In [None]:
id2tech = pd.read_excel(os.path.join(os.getcwd(), 'id2tech.xlsx')).rename(columns={
    'id':'id2tech/id',
    'tech':'id2tech/tech'})

In [None]:
id2tech

Remove *id_* prefix from *id2tech/id* column:

In [None]:
#id2tech['id2tech/id'] = id2tech['id2tech/id'].str.replace('id_', '')

Filter out rows containing *ImportFrom*:

In [None]:
#id2tech = id2tech[~id2tech['id2tech/id'].str.contains('ImportFrom')]

In [None]:
# Filter out rows containing "nan"
#id2tech = id2tech[~id2tech['id2tech/id'].str.contains('nan')]

### id2hvt

Import from *id2hvt.xlsx*:

In [None]:
id2hvt = pd.read_excel(os.path.join(os.getcwd(), 'id2hvt.xlsx')).rename(columns={
    'id':'id2hvt/id',
    'hvt':'id2hvt/hvt'})

In [None]:
id2hvt

Remove *id_* prefix from *id2hvt/id* column:

In [None]:
#id2hvt['id2hvt/id'] = id2hvt['id2hvt/id'].str.replace('id_', '')

Filter out rows containing *ImportFrom*:

In [None]:
#id2hvt = id2hvt[~id2hvt['id2hvt/id'].str.contains('ImportFrom')]

In [None]:
# Filter out rows containing "nan"
#id2hvt = id2hvt[~id2hvt['id2hvt/id'].str.contains('nan')]

### id2g_E

Import from *id2g_E.xlsx*:

In [None]:
id2g_E = pd.read_excel(os.path.join(os.getcwd(), 'id2g_E.xlsx')).rename(columns={
    'id':'id2g_E/id',
    'g_E':'id2g_E/g_E'})

In [None]:
id2g_E

Remove *id_* prefix from *id2g_E/id* column:

In [None]:
#id2g_E['id2g_E/id'] = id2g_E['id2g_E/id'].str.replace('id_', '')

Filter out rows containing *ImportFrom*:

In [None]:
#id2g_E = id2g_E[~id2g_E['id2g_E/id'].str.contains('ImportFrom')]

In [None]:
# Filter out rows containing "nan"
#id2g_E = id2g_E[~id2g_E['id2g_E/id'].str.contains('nan')]

### id2g_H

Import from *id2g_H.xlsx*:

In [None]:
id2g_H = pd.read_excel(os.path.join(os.getcwd(), 'id2g_H.xlsx')).rename(columns={
    'id':'id2g_H/id',
    'g_H':'id2g_H/g_H'})

In [None]:
id2g_H

Remove *id_* prefix from *id2g_H/id* column:

In [None]:
#id2g_H['id2g_H/id'] = id2g_H['id2g_H/id'].str.replace('id_', '')

Merge *id2g_E* and *id2g_H*:

In [None]:
id2g = pd.concat([id2g_E,id2g_H], axis=1).reset_index(drop=True)

In [None]:
id2g

### tech2modelTech

Import from *tech2modelTech.xlsx*:

In [None]:
tech2modelTech = pd.read_excel(os.path.join(os.getcwd(), 'tech2modelTech.xlsx')).rename(columns={
    'tech':'tech2modelTech/tech',
    'modelTech':'tech2modelTech/modelTech'})

In [None]:
tech2modelTech

### Put dataframes together

In [None]:
GeneratorsMaps_df = pd.concat([id2tech,id2hvt,id2g,tech2modelTech], axis=1)

In [None]:
GeneratorsMaps_df

### Save as excel

In [None]:
GeneratorsMaps_df.to_excel(os.path.join(output_dir,'GeneratorsMaps.xlsx'),sheet_name='GeneratorsMaps', index=False)

## Sheet "StorageVariables"

For the different values below we take the DEA's *technology_datasheet_for_energy_storage.xlsx*. We consider the case of a **141 Large hot water tank**.

Notes:
- All prices in the datasheet are in EUR2020. Thus, we are going to inflation adjust them for 2019.
- As the technology *141 Large scale hot water tank* was last updated in 2018 (see *Index* sheet in excel file) we are goint to use the data for year 2015 (observed and thus not estimated data). We do this without loss of generality as the data for the 2020 middle estimate remained unchanged compared to 2015.

Import data:

In [None]:
technology_datasheet_for_energy_storage = pd.read_excel(os.path.join(os.getcwd(), 'technology_datasheet_for_energy_storage.xlsx'),sheet_name='alldata_flat').drop(columns=['Technology'])

Subset for *141 Large hot water tank*:

In [None]:
technology_datasheet_for_energy_storage = technology_datasheet_for_energy_storage[technology_datasheet_for_energy_storage['ws'] == '141 Large hot water tank']

Subset for 2015 data:

In [None]:
technology_datasheet_for_energy_storage = technology_datasheet_for_energy_storage[technology_datasheet_for_energy_storage['year'] == 2015]

### GeneratingCap_H

First we define the id column for the heat storage technology.

In [None]:
GeneratingCap_HS = pd.read_excel(os.path.join(output_dir,'GeneratorsVariables.xlsx'), usecols=['GeneratingCap_H/id']).dropna()
GeneratingCap_HS = GeneratingCap_HS[GeneratingCap_HS['GeneratingCap_H/id'].str.contains('HPstandard')]
GeneratingCap_HS['GeneratingCap_H/id'] = GeneratingCap_HS['GeneratingCap_H/id'].str.replace('HPstandard', 'HS')
GeneratingCap_HS

In DEA's *technology datasheet*, `GeneratingCap_H` corresponds to *Output capacity for one unit [MW]*. 

In [None]:
DEA_GeneratingCap_H = pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Output capacity for one unit [MW]', 'val']).reset_index(drop=True))
DEA_GeneratingCap_H

In [None]:
GeneratingCap_HS['GeneratingCap_H/GeneratingCap_H'] = DEA_GeneratingCap_H.iloc[0]
GeneratingCap_HS

### chargeCap_H

In [None]:
chargeCap_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'chargeCap_H/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In DEA's *technology datasheet*, `chargeCap_H` corresponds to *Input capacity for one unit [MW]*. 

In [None]:
DEA_chargeCap_H = pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Input capacity for one unit [MW]', 'val']).reset_index(drop=True))
DEA_chargeCap_H

In [None]:
chargeCap_HS['chargeCap_H/chargeCap_H'] = DEA_chargeCap_H.iloc[0]
chargeCap_HS

### sCap

In [None]:
sCap = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'sCap/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In DEA's *technology datasheet*, `sCap` corresponds to *Energy storage capacity for one unit [MWh)*. 

In [None]:
DEA_sCap = pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Energy storage capacity for one unit [MWh)', 'val']).reset_index(drop=True))
DEA_sCap

In [None]:
sCap['sCap/sCap'] = DEA_sCap.iloc[0]
sCap

### effC

In DEA's *technology datasheet*, `effC` corresponds to *Charge efficiency [%]*. 

In [None]:
effC = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'effC/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [None]:
DEA_effC = pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='- Charge efficiency [%]', 'val']).reset_index(drop=True) / 100)
DEA_effC

In [None]:
effC['effC/effC'] = DEA_effC.iloc[0]

In [None]:
effC

### effD

In DEA's *technology datasheet*, `effD` corresponds to *Discharge efficiency [%]*. 

In [None]:
effD = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'effD/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [None]:
DEA_effD = pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='- Discharge efficiency [%]', 'val']).reset_index(drop=True) / 100)
DEA_effD

In [None]:
effD['effD/effD'] = DEA_effD.iloc[0]

In [None]:
effD

### selfDischarge

In DEA's *technology datasheet*, `selfDischarge` corresponds to *Energy losses during storage [%/day]*.

We follow the calculations in *mBasicPH_storage.xlsx* (file path: EnergyEconomicsE2023\Documentation\Data\mBasicPH_storage.xlsx) and calculate the selfDischarge as follows:

$selfDischarge = 1 - \left(\frac{\overbrace{\text{Round trip efficiency [\%] from DEA}}^{\equiv \ 98}}{100}\right)^{\frac{1}{24}} = 1 - 0.98^{\frac{1}{24}}$.

In [None]:
selfDischarge = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'selfDischarge/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [None]:
DEA_RoundTripEfficiency = (pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Round trip efficiency [%]', 'val']).reset_index(drop=True))).iloc[0]
DEA_RoundTripEfficiency

In [None]:
beta = 1 - (DEA_RoundTripEfficiency/100)**(1/24)
beta

Construct df

In [None]:
selfDischarge['selfDischarge/selfDischarge'] = beta

In [None]:
selfDischarge

### OtherMC

In DEA's *technology datasheet*, `OtherMC` corresponds to *Variable O&M [EUR2015/MWhoutput]*. We note that they are zero and thus we do not have to inflation adjust them.

In [None]:
OtherMC_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'OtherMC/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [None]:
DEA_OtherMC = (pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Variable O&M [EUR2015/MWhoutput]', 'val']).reset_index(drop=True))).iloc[0]
DEA_OtherMC

In [None]:
OtherMC_HS['OtherMC/OtherMC'] = DEA_OtherMC

In [None]:
OtherMC_HS

### FOM

In DEA's *technology datasheet*, `FOM` corresponds to *Fixed O&M [EUR2015/MWhCapacity/year)*. They are not zero and thus we inflation adjust them.

In [None]:
FOM_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'FOM/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [None]:
DEA_FOM_2020 = (pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Fixed O&M [EUR2015/MWhCapacity/year)', 'val']).reset_index(drop=True))).iloc[0]
DEA_FOM_2020

Inflation rates:

In [None]:
π = pd.Series([1.531122704, 3.289449396, 2.662841655, 1.219993423, 0.199343827, -0.06164468, 0.183334861, 1.429107433, 1.73860862, 1.630522608, 0.476498853, 2.554506996, 8.833698867],index=pd.Index(range(2010,2023),name='t')).div(100).add(1)

In [None]:
DEA_FOM_2019 = DEA_FOM_2020 / π[2020]
DEA_FOM_2019

Add to df:

In [None]:
FOM_HS['FOM/FOM'] = DEA_FOM_2019

In [None]:
FOM_HS

### InvestCost

`As with generators we abstract from investment costs.`

In DEA's *technology datasheet*, `InvestCost` corresponds to *Specific investment [MEUR2015/GWhCapacity]*. They are not zero and thus we inflation adjust them. Additionally, we need to adjust to have them in EUR (not in millions of EUR) and MWh (not in GWh).

Formally, we do the following:

$InvestCost \cdot \frac{MEUR}{1 \ GWh} = InvestCost \cdot \frac{1'000'000 \ EUR}{1'000 \ MWh} = InvestCost \cdot 1'000 \frac{EUR}{MWh}$

In [None]:
#InvestCost_HS = GeneratingCap_HS.rename(columns={
#    'GeneratingCap_H/id':'InvestCost/tech'}).drop(columns='GeneratingCap_H/GeneratingCap_H')

In [None]:
#DEA_InvestCost_2020 = (pd.to_numeric((technology_datasheet_for_energy_storage.loc[technology_datasheet_for_energy_storage['par']=='Specific investment [MEUR2015/GWhCapacity]', 'val']).reset_index(drop=True))).iloc[0]
#DEA_InvestCost_2020

Adjust units:

In [None]:
#DEA_InvestCost_2020_adj = DEA_InvestCost_2020 * 1000
#DEA_InvestCost_2020_adj

In [None]:
#DEA_InvestCost_2019_adj = DEA_InvestCost_2020_adj / π[2020]
#DEA_InvestCost_2019_adj

Add to df:

In [None]:
#InvestCost_HS['InvestCost/InvestCost'] = DEA_InvestCost_2019_adj

### Put dataframes together

In [None]:
StorageVariables_df = pd.concat([GeneratingCap_HS,chargeCap_HS,sCap,effC,effD,selfDischarge,OtherMC_HS,FOM_HS], axis=1)

In [None]:
StorageVariables_df

### Save as excel

In [None]:
StorageVariables_df.to_excel(os.path.join(output_dir,'StorageVariables.xlsx'),sheet_name='StorageVariables', index=False)

## Sheet "StorageMaps"

### id2tech

In [None]:
id2tech_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'id2tech/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H').reset_index(drop=True)

In [None]:
id2tech_HS['id2tech/tech'] = 'HS'

In [None]:
id2tech_HS

### id2hvt

In [None]:
id2hvt_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'id2hvt/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H').reset_index(drop=True)

In [None]:
id2hvt_HS['id2hvt/hvt'] = 'Standard'

In [None]:
id2hvt_HS

### id2g_H

In [None]:
id2g_HS = GeneratingCap_HS.rename(columns={
    'GeneratingCap_H/id':'id2g_H/id'}).drop(columns='GeneratingCap_H/GeneratingCap_H').reset_index(drop=True)

Define a function to create correct mapping of heat storage to district heat areas:

In [None]:
def mapping_HS(value):
    if isinstance(value, str):
        if 'id_DK_Central' in value:
            return 'DK_Central'
        elif 'id_DK_Decentral' in value:
            return 'DK_Decentral'
    return np.nan

In [None]:
id2g_HS['id2g_H/g_H'] = id2g_HS['id2g_H/id'].apply(mapping_HS)

In [None]:
id2g_HS

### tech2modelTech

In [None]:
tech2modelTech_data = {
    'tech2modelTech/tech':['HS'],
    'tech2modelTech/modelTech':['HS']
}

As dataframe:

In [None]:
tech2modelTech = pd.DataFrame(tech2modelTech_data)

In [None]:
tech2modelTech

### Put dataframes together

In [None]:
StorageMaps_df = pd.concat([id2tech_HS,id2hvt_HS,id2g_HS,tech2modelTech], axis=1)

In [None]:
StorageMaps_df

### Save as excel

In [None]:
StorageMaps_df.to_excel(os.path.join(output_dir,'StorageMaps.xlsx'),sheet_name='StorageMaps', index=False)

## Sheet "HourlyVariation"

We use the excel file *CapVariation.xlsx* (file path: EnergyEconGroupWork\DownloadDataForDK\ModelData\CapVariation.xlsx).

In [None]:
HourlyVariation_df = pd.read_excel(os.path.join(os.getcwd(), 'CapVariation.xlsx')).rename(columns={'h':'CapVariation/h/hvt'})

In [None]:
HourlyVariation_df

Drop "import" columns:

In [None]:
#HourlyVariation_df = HourlyVariation_df.filter(regex='^(?!.*ImportFrom).*$', axis=1)

Save as excel:

In [None]:
HourlyVariation_df.to_excel(os.path.join(output_dir,'HourlyVariation.xlsx'),sheet_name='HourlyVariation', index=False)

## Sheet "Scalars"

We get the data from *MWP_E.xlsx*:

In [None]:
Scalars_E = pd.read_excel(os.path.join(os.getcwd(), 'MWP_E.xlsx')).rename(columns={'c_DK':'MWP_E'}).drop(columns='index')

We assume the same MWP on the heat market:

In [None]:
Scalars_H = pd.read_excel(os.path.join(os.getcwd(), 'MWP_E.xlsx')).rename(columns={'c_DK':'MWP_H'}).drop(columns='index')

Put dataframes together:

In [None]:
Scalars_df = pd.concat([Scalars_E,Scalars_H], axis=1)

In [None]:
Scalars_df

Add *lineLoss*:

In [None]:
Scalars_df['lineLoss'] = 0

Transfer df from wide to long:

In [None]:
Scalars_df = Scalars_df.melt(var_name='Variable', value_name='Value')

In [None]:
Scalars_df

Save as excel:

In [None]:
Scalars_df.to_excel(os.path.join(output_dir,'Scalars.xlsx'),sheet_name='Scalars', index=False, header=False)

## Sheet "TransmissionLines"

We copy the sheet from *E42_Data.xlsx* as we have the same data in file *lineCapacity.xlsx*. But as we do not include Transmission in our model we set the *linecapacity* to zero.

## Sheet "MarketMaps"

We copy the sheet from *E42_Data.xlsx* and adjust the data manually directly in the excel spreadsheet.

## Sheet "hMaps"

We copy the sheet from *E44_Data.xlsx*, so we do not have to do the headeradjustment ourselves.

## Combine excel files as different sheets within one file

Define directory where final dataset is saved to:

In [None]:
df_final_dir = 'C:\\Users\\mpher\\Documents\\Uni\\Master\\02_Exchange\\01_Academics\\Energy Economics of the Green Transition\\EnergyEconGroupWork\\Data\\mBasicPH_storage_Data.xlsx'

In this last step, we combine the different excel files into one excel file split up into multiple sheets.

In [None]:
#List all excel files in folder
output_dir_final = [os.path.join(root, file) for root, folder, files in os.walk(output_dir) for file in files if file.endswith(".xlsx")]

# Define order of sheets
defined_order = ['log.xlsx', 'Fundamentals.xlsx','LoadVariables.xlsx', 'LoadMaps.xlsx','GeneratorsVariables.xlsx','GeneratorsMaps.xlsx','StorageVariables.xlsx','StorageMaps.xlsx','HourlyVariation.xlsx','Scalars.xlsx','TransmissionLines.xlsx','MarketMaps.xlsx','hMaps.xlsx']
output_dir_final.sort(key=lambda x: defined_order.index(os.path.basename(x)))

with pd.ExcelWriter(df_final_dir) as writer:
    for excel in output_dir_final: #For each excel
        sheet_name = pd.ExcelFile(excel).sheet_names[0] #Find the sheet name
        df = pd.read_excel(excel) #Create a dataframe
        df.to_excel(writer, sheet_name=sheet_name, index=False) #Write it to a sheet in the output excel

We do some minor adjustments manually (yellow marked in file):
- GeneratorsVariables: shifting empy cells up, so no gap with empy lines (came from omitting NAs and not adjusting index number of dataframes)
- HourlyVariation: 
    - The WS_DK column contained 44 cells with negative values. This makes no sense, as it would mean that the wind turbine is producing wind instead of utilising it. We have overwritten the cells in question with a formula that calculates the average of the preceding and subsequent cell. It does not significantly influence our results as the absolute values were very low (Max: 0.000048) and only 44 out of 8760 hours were affected.
    - As the HourlyVariation for ROR does not get imported when we aggregate the heating and electricity areas, we manually copy this data from the deaggregated data file *mBasicPH_storage_Data.xlsx* (stored locally on our computer).