# Wind Yield Assessment

### How does an energy producer decide on a location for a new wind park?

After identifying potential available locations, meteorological masts are set up to measure atmospheric data over time. Each of these time series are then used to simulate the energy potential that each location possess. 

Based on each location's energy potential an energy producer is then able to make an educated investment decision based on a location's profitability.

### Wind Power

Investment decisions are made on expected profitability of a location. The expected profitability is dependent on the location's expected energy production. The expected energy production is dependent on the expected power production over time. And the expected power production may roughly be estimated by the wind turbine power formula, where $\rho$ is the air density, $v$ the wind speed, $A$ the wind turbine's sweap area, and $C_{p}$ the wind turbine's power coefficient:

$$
P = \frac{1}{2}\rho A v^{3} C_{p}
$$

$\rho$ is dependent on temperature, pressure, and relative humidity, and both $A$ and $C_{p}$ are dependent on the way the wind turbine is constructed.

### Meteorological Masts

<img src='images/met_mast.jpg'>

The figure shows a meteorological mast (met mast). Because the wind power is dependent on the atmospheric variables temperature, pressure, relative humidity, and wind speed, met masts are used to collect data of those variables over a longer period of time. Additionally, the mast is collecting data on the wind direction over time as changes in wind direction may affect wind energy production. Commonly, a met mast collects data of wind speed and direction at multiple altitudes.

The masts are maintained by an external provider in irregular intervals. During the maintenance process, the service person will manually log the sensors' readings at that point in time. Unfortunately, the human intervention is a source for more defects. 

### Anomalies in the Sensors

The sensors might already degrade over time and may be subject to nature, e.g. by freezing in below 0Â°C time intervals.

The human intervention, counter-intuitively, induces errors like unit (the unit changed at a certain point in time) or time shifts (the time zone changed). Or a wind vane sensor might be attached to the mast not facing North but angled in some way.

Overall, these are the problems that were faced:

#### Time Shifts

The time zone might be shifted starting at a <i>random</i> point in time to the left or to the right.

#### Unit Shifts

The unit might be changed starting at a <i>random</i> point in time. An example of such a problem may be found here: <a href='case_studies/wind-yield-units.ipynb'>Unit Shifts Notebook</a>.

#### Outliers

The readings might have outliers.

#### (Partial) Icing

Mechanical sensors like anemometers and wind vane sensors might be partially or fully frozen. An example of such a problem may be found here: <a href='case_studies/wind-yield-icing.ipynb'>Icing Notebook</a>.

#### Sensor Degradation

A sensor might degrade and hence fail over time.

#### Wind Vane Offsets

A wind vane sensor might not be attached to the met mast tower facing true North. An example of such a problem may be found here: <a href='case_studies/wind-yield-offsets.ipynb'>Offsets Notebook</a>.

### The Solution

The traditional approach in cleaning those sensors involves two analysts who will clean each location manually. The manual process takes about 1 month per location. So it shouldn't come as a surprise that there is a huge potential to leverage data science to automate the whole process.

The architecture of the proposed solutions looks as follows:

<img src='images/pipeline_architecture.png'>

When triggered the data cleaning algorithm takes the master data of each location and the sensor data into its Python environment to conduct the data cleaning procedure.

It first cleans independent channels like temperature, pressure, and relative humidity. Then it moves to clean the wind direction and wind speed sensors, which are interdependent on each other, and also takes the temperature into account.

The output of the data cleaning algorithm are two-fold:

Firstly, it outputs the cleaned data, which may be used for further analysis. The format of the output may either be in the form of a flat file, or as a data base ingestion.

Secondly, it outputs all flagged data with descriptions of why that particular point got flagged, and which algorithm decided to flag it.

### Data Cleaning Monitoring Tool

The latter in combination with the cleaned data is used as an input for the data cleaning monitoring tool. A prototype of the dashboard as a live version may be found here: <a href='http://wy-data-cleaning.herokuapp.com/'>http://wy-data-cleaning.herokuapp.com/</a>

The dashboard is a self-service tool for an analyst to check the performance of the data cleaning algorithm. It was developed by leveraging the <a href='https://dash.plotly.com/'>Plotly Dash</a> framework.

An analyst is able to switch between the different channels (top) and within each channel between individual masts or the location as a whole (left).

On the top of each tab the analyst can check out the time series of the chosen channel including all flagged data. When hovering over a particular point, the analyst can read a short description of why that point got flagged.

<img src='images/dashboard_ts.png'>

Furthermore, the analyst can check per mast and per channel how many points were flagged for each type of flag (e.g. Outliers). Additionally, the analyst may check out how a channel is distributed over the day for each month to quickly access if daily cycles make sense.

<img src='images/dashboard_daily.png'>

The wind speed and wind direction channels have additional context specific plots. Below you may see for example the wind rose for a location before (left) and after (right) the data cleaning. In the wind speed tab, the analyst may in addition to the above depicted plots also encounter the tower shadow plot that was also presented in the <a href='case_studies/wind-yield-icing.ipynb'>Icing Case Study</a>.

<img src='images/dashboard_rose.png'>