<a href="https://colab.research.google.com/github/lenare/mlses-wind-power-forecast/blob/main/prototyping.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Exploration UK
For having a first look at data and experimenting with solutions.

In [None]:
# Import dependencies
import os

import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

pd.set_option('display.max_columns', None)

UK data

In [None]:
uk_dir = "../data/raw/uk/"
uk_2016_dir = f"{uk_dir}Kelmarsh_SCADA_2016_3082/"
uk_2017_dir = f"{uk_dir}Kelmarsh_SCADA_2017_3083/"
uk_2018_dir = f"{uk_dir}Kelmarsh_SCADA_2018_3084/"
uk_2019_dir = f"{uk_dir}Kelmarsh_SCADA_2019_3085/"
uk_2020_dir = f"{uk_dir}Kelmarsh_SCADA_2020_3086/"
uk_2021_dir = f"{uk_dir}Kelmarsh_SCADA_2021_3087/"

In [None]:
kelmarsh_1_2016 = f"{uk_2016_dir}Turbine_Data_Kelmarsh_1_2016-01-03_-_2017-01-01_228.csv"
kelmarsh_1_2017 = f"{uk_2017_dir}Turbine_Data_Kelmarsh_1_2017-01-01_-_2018-01-01_228.csv"
kelmarsh_1_2018 = f"{uk_2018_dir}Turbine_Data_Kelmarsh_1_2018-01-01_-_2019-01-01_228.csv"
kelmarsh_1_2019 = f"{uk_2019_dir}Turbine_Data_Kelmarsh_1_2019-01-01_-_2020-01-01_228.csv"
kelmarsh_1_2020 = f"{uk_2020_dir}Turbine_Data_Kelmarsh_1_2020-01-01_-_2021-01-01_228.csv"
kelmarsh_1_2021 = f"{uk_2021_dir}Turbine_Data_Kelmarsh_1_2021-01-01_-_2021-07-01_228.csv"

kelmarsh_1_2016_df = pd.read_csv(kelmarsh_1_2016, skiprows=9)
kelmarsh_1_2017_df = pd.read_csv(kelmarsh_1_2017, skiprows=9)
kelmarsh_1_2018_df = pd.read_csv(kelmarsh_1_2018, skiprows=9)
kelmarsh_1_2019_df = pd.read_csv(kelmarsh_1_2019, skiprows=9)
kelmarsh_1_2020_df = pd.read_csv(kelmarsh_1_2020, skiprows=9)
kelmarsh_1_2021_df = pd.read_csv(kelmarsh_1_2021, skiprows=9)
full_kelmarsh_df = pd.concat([kelmarsh_1_2016_df, kelmarsh_1_2017_df, kelmarsh_1_2018_df,
                             kelmarsh_1_2019_df, kelmarsh_1_2020_df, kelmarsh_1_2021_df])
full_kelmarsh_df.describe()

# Notes on data
What is vane?

How is lost production correlated with other variables?

What is Potential power default PC?

What is Cascading potential power?
- scenario where the failure or underperformance of one wind turbine has a cascading effect on the overall power generation capacity of a wind farm. For example, if one turbine experiences a major fault or breakdown, it could lead to decreased power generation and potentially impact the performance or operation of other turbines in the wind farm

What is Power factor?
- Power factor, often represented by the symbol "cosφ" or "PF," is a measure of the efficiency of electrical power utilization in an AC (alternating current) circuit. It describes the ratio of real power (active power) to apparent power in the circuit
Should gearbox speed be the same as rotor rpm/speed?


In [None]:
full_kelmarsh_df.head()

In [None]:
full_kelmarsh_df.isna().sum()

In [None]:
columns = full_kelmarsh_df.columns
columns

In [None]:
selected_columns = {
        "# Date and time": "timestamp",
        "Wind speed (m/s)": "wind_speed",
        "Wind direction (°)": "wind_direction",
        # "Nacelle position (°)": "nacelle_position",
        "Ambient temperature (converter) (°C)": "ambient_temperature",
        "Rotor speed (RPM)": "rotor_speed",
        "Power (kW)": "power",
    }
list(selected_columns.keys())

In [None]:
df = full_kelmarsh_df[list(selected_columns.keys())]
df = df.rename(columns=selected_columns).set_index("timestamp")
df.index = pd.to_datetime(df.index)
df

In [None]:
df["2020-02-1":"2020-02-11"].plot(grid=True, figsize=(15, 5))

In [None]:
# diff_7 = df[["wind_speed", "speed2"]].diff(7)["2019-03":"2019-05"]

In [None]:
period = slice("2016", "2021")
df_monthly = df.resample('M').mean()  # compute the mean for each month
rolling_average_12_months = df_monthly[period].rolling(window=12).mean()

fig, ax = plt.subplots(figsize=(8, 4))
df_monthly[period].plot(ax=ax, marker=".")
rolling_average_12_months.plot(ax=ax, grid=True, legend=False)
# save_fig("long_term_ridership_plot")  # extra code – saves the figure for the book
plt.show()