# Preprocessing Observational Data For Training Bias Correction

### In order to train the bias correction, we require data on the turbines we are simulating for. Information on the model, position and height of the turbine is required. Along with these turbines/farms we require the observed power output or capacity factor from each.

### NOTE: It is difficult to automate this process as it will be dependent on the data you are able to obtain, please adjust the code to produce the required result.

### (1) Inputing the turbine information data.
input: a csv with the variables described below
output: pandas dataframe called `data`

ESSENTIAL
* Latitude and longitude of the turbine/farm
* Max capacity of turbine/farm
* Number of turbines at this point (if it is a farm, to estimate the individual turbine capacity)

DESIRABLE (CAN BE ROUGHLY MATCHED LATER IF NOT PROVIDED)
* Individual turbine capacity (if it is a farm)
* Commisioning/decommisioning date (not sure if this is 100% needed but helps with a more accurate training)
* Onshore or Offshore
* Turbine model
* Hub height



### (2) Using the turbine metadata to fill missing variables from (1)
If the desirable variables cannot be found we can use turbine `metadata` collected from Denmarks turbine database. Currently it is coded to match the nearest capacity to a turbine with similar capacity. More considerations can be used for a more accurate match, I haven't coded this.

input: `data` and `metadata` (loaded from `model.csv`)
output: pandas dataframe called `turb_info`

### (3) Matching observational data with turbines/farms in `turb_info`
Observational data should be the observed generated capacity factor covering the desired training area. Preferably this will be monthly generation data for each turbine/farm, however this is hard to find. Try to find the best spatial and temporal resolution capacity factor you can find as this will determine the resolution of the bias correction factors.

input: `turb_info` and `obs_cap` (loaded from observational data you find)
output: `obs_data`

In [1]:
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### (1) Inputing the turbine information data.

In [2]:
data = pd.read_csv('../ninja-reimplementation/data/wind_data/UK/renewable_power_plants_UK_filtered.csv')
data.head()

Unnamed: 0,electrical_capacity,energy_source_level_1,energy_source_level_2,energy_source_level_3,technology,data_source,nuts_1_region,nuts_2_region,nuts_3_region,lon,...,country,commissioning_date,solar_mounting_type,chp,capacity_individual_turbine,number_of_turbines,site_name,uk_beis_id,operator,comment
0,1.3,Renewable energy,Wind,,Onshore,BEIS,UKE,UKE2,UKE22,-1.914154,...,England,1992-01-06,,,0.3,4,Chelker Reservoir,2921,Yorkshire Water,
1,2.7,Renewable energy,Wind,,Onshore,BEIS,UKC,UKC2,UKC21,-1.495191,...,England,1992-01-12,,,0.3,9,Blyth Harbour Wind Farm,3659,Border Wind Farms Ltd,
2,31.0,Renewable energy,Wind,,Onshore,BEIS,UKL,UKL2,UKL24,-3.430831,...,Wales,1993-01-01,,,0.3,103,Llandinam Windfarm,3057,CELTPOWER LTD,
3,4.8,Renewable energy,Wind,,Onshore,BEIS,UKD,UKD1,UKD12,-3.135725,...,England,1993-01-01,,,0.4,12,Kirkby Moor,2713,Npower Renewables,
4,9.6,Renewable energy,Wind,,Onshore,BEIS,UKD,UKD4,UKD46,-2.149984,...,England,1993-01-02,,,0.4,24,Coal Clough Wind Farm,3079,Renewable Energy Systems (RES),


In [3]:
columns = ['country','technology','lon','lat','electrical_capacity','number_of_turbines','capacity_individual_turbine', 'commissioning_date']
data = data[columns]
data['commissioning_date'] = pd.to_datetime(data['commissioning_date'])
data = data.sort_values('capacity_individual_turbine')
data.head()

Unnamed: 0,country,technology,lon,lat,electrical_capacity,number_of_turbines,capacity_individual_turbine,commissioning_date
25,Scotland,Onshore,-4.426957,57.709987,17.0,34,0.0,1997-01-09
80,England,Onshore,-3.324533,54.203954,1.2,5,0.22,2004-01-05
299,England,Onshore,-1.342652,52.826198,1.0,4,0.25,2011-10-05
0,England,Onshore,-1.914154,53.962432,1.3,4,0.3,1992-01-06
58,England,Onshore,-4.546749,50.645509,6.6,22,0.3,2002-01-06


### (2) Using the turbine metadata to fill missing variables from (1)

In [4]:
# # turn elizabeths metadata into a general turbine metadata file the heights here are a range of the min and max denmark had
# metadata = pd.read_excel('../ninja-reimplementation/data/turbine_info/Metadata_2020.xlsx')
# metadata = metadata.sort_values('Dato for \nnettilslutning')
# metadata = metadata.drop(metadata[metadata.height < 10].index)
# max = metadata.groupby('turb_match', as_index=False)['capacity'].max()
# min = metadata.groupby('turb_match', as_index=False)['height'].min()
# metadata = metadata[['Dato for \nnettilslutning', 'capacity', 'turb_match']]
# metadata.columns = ['date', 'capacity', 'model']
# metadata.drop_duplicates(subset=['model'], keep='first',inplace=True)
# metadata = metadata.reset_index(drop=True)
# metadata = metadata.sort_values('model').reset_index(drop=True)
# metadata['height_min'] = min.height
# metadata['height_max'] = max.height
# metadata.to_csv('../ninja-reimplementation/data/turbine_info/models.csv', index = None) 

In [5]:
metadata = pd.read_csv('../ninja-reimplementation/data/turbine_info/models.csv')
metadata['date'] = pd.to_datetime(metadata['date'])
metadata = metadata.sort_values('capacity')
metadata.capacity = metadata.capacity/1000
metadata.head()

Unnamed: 0,model,capacity,height_min,height_max,date
0,Bonus.B23.150,0.15,30.0,60.0,1987-04-13
18,Nordex.N27.150,0.15,35.0,40.0,1982-04-21
44,Vestas.V27.225,0.225,29.3,39.0,1980-01-03
45,Vestas.V29.225,0.225,17.0,35.0,1979-08-16
19,Nordex.N29.250,0.25,30.0,69.0,1988-06-27


In [8]:
turb_info = pd.merge_asof(data, metadata, left_on=["capacity_individual_turbine"], right_on=["capacity"], direction="nearest")
turb_info.head()

Unnamed: 0,country,technology,lon,lat,electrical_capacity,number_of_turbines,capacity_individual_turbine,commissioning_date,model,capacity,height_min,height_max,date
0,Scotland,Onshore,-4.426957,57.709987,17.0,34,0.0,1997-01-09,Bonus.B23.150,0.15,30.0,60.0,1987-04-13
1,England,Onshore,-3.324533,54.203954,1.2,5,0.22,2004-01-05,Vestas.V27.225,0.225,29.3,39.0,1980-01-03
2,England,Onshore,-1.342652,52.826198,1.0,4,0.25,2011-10-05,Nordex.N29.250,0.25,30.0,69.0,1988-06-27
3,England,Onshore,-1.914154,53.962432,1.3,4,0.3,1992-01-06,Bonus.B33.300,0.3,105.0,130.0,1991-12-15
4,England,Onshore,-4.546749,50.645509,6.6,22,0.3,2002-01-06,Bonus.B33.300,0.3,105.0,130.0,1991-12-15


### (3) Matching observational data with turbines/farms in `turb_info`