# Simulated Dataset

## Real-World Phenomenon
- The real world phenomenon I have selected for this project is wind.
- The data simulated will imitate variables relating to wind that could be measured at Cork Airport.
- The simulated dataset will consist of 100 hourly data points selected from 1 year.

## Variables to Simulate


### 1. Wind Speed
- <b>Description</b>

Wind is caused by air moving from high to low pressure, usually due to changes in temperature. The speed at which the air moves is the wind speed. It is sometimes measured in nautical miles per hour (knots) but for the purposes of this project it will be considered in metres/second (m/s). According to Met Eireann's records, the 30-year average wind speed at Cork Airport is 10.5 knots. [1] This is equivalent to 5.4 m/s.

- <b>Likely distribution</b>

When measuring wind speed, it can be observed that strong gale force winds occur rarely while moderate winds are much more common. Studies have concluded that the best probability density distribution for approximating hourly or 10 minute wind speeds at a site is the Weibull distribution. [2]
    
<img src = "./Images/Weibull.png">
<center>[3]</center>

The Weibull probability distribution is a two-parameter function that is widely used in statistical analysis. Its two parameters are scale and shape. The scale relates to the long term mean wind speed at the site and the shape factor can be calculated from the standard deviation of the dataset if known.

The statistical distribution of wind speeds differs from site to site. It can be influenced by local climate, the landscape and topography. These can give different versions of a Weibull distribution with different mean values and shapes.

When the shape factor of a Weibull distribution is exactly 2, it is called a Rayleigh distribution. It has been shown that for mean wind speeds above 4 m/s, the Rayleigh distribution is an acceptable approximation for wind speed. [4] This is equivalent to a standard deviation of 52% of the mean wind speed.[5] This distribution is commonly used by wind turbine manufacturers to estimate the power that would be produced by a wind turbine at that particular site.

<img src = "./Images/Rayleigh.png">
<center>[6]</center>
The scale parameter for a Rayleigh wind distribution can be calculated as 2/π x long-term average wind speed. [7]

Python can produce an array of random numbers with a Rayleigh probability distribution using the numpy.random.rayleigh function, which accepts the scale factor and size of the output array as inputs.


### 2. Wind Direction
#### Description
-
#### Likely distribution
- 


### 3. Atmospheric Pressure
#### Description
-
#### Likely distribution
- 


### 4. Season
- <b>Description</b>

Wind speed, direction and atmospheric pressure will all vary with the time of year.

To examine this, the 4th variable generated will be the season, spring, summer, autumn or winter.

- <b>Likely distribution</b>

The 100 data points are taken at random across one year. As a result, each data point is equally as likely to occur in any of the 4 seasons. This is a uniform probability distribution.

<img src = "./Images/Uniform.png">


## Relationships Between Variables
- 
- 
- 

## Simulation of Dataset

In [None]:
# Import pandas for data analysis
import pandas as pd

# Import numpy for random number generator
import numpy as np

# Import math for mathematical functions
import math

# Import matplotlib.pyplot for plotting
import matplotlib.pyplot as plt

# Make matplotlib show interactive plots in the notebook
%matplotlib inline

In [None]:
# Import monthly average measurements from Met Eireann file
Long_Term_Averages = pd.read_csv("Data/mly3904.csv",header=[0],skiprows = 19)

# Add a column to the dataframe containing wind speed in m/s (converted from knots)
Long_Term_Averages['wdsp m/s'] = Long_Term_Averages['wdsp']*0.51444

# Create an empty dataframe which will be populated with the long term average values for each month
Monthly_Mean = pd.DataFrame()

# Populate dataframe with long term monthly averages
for i in range(1,13):
    Monthly_Mean.loc[i,'Mean_wdsp'] = np.mean(Long_Term_Averages.loc[Long_Term_Averages.loc[:,'month'] == i,'wdsp m/s'])
    Monthly_Mean.loc[i,'Mean_temp'] = np.mean(Long_Term_Averages.loc[Long_Term_Averages.loc[:,'month'] == i,'meant'])

In [None]:
# Create array of possible seasons
Season_Names = ('Spring','Summer','Autumn','Winter')

# Select seasons at random for the 100 data points
Seasons = np.random.choice(Season_Names, size=100)
# Create a dataframe from the Seasons array
Wind_Variables = pd.DataFrame(Seasons, columns=['Season'])

In [None]:
# Create a function to generate wind speed from a Rayleigh probability distribution based on the season of the data point
def Wind_Speed_Generator(Season):
    if Season == 'Spring':
        # Average wind speed calculated from data for three months of spring from Met Eireann
        Mean_Wind_Speed = np.mean(Monthly_Mean.loc[2:4,'Mean_wdsp'])
        # Calculate scale factor for Rayleigh distribution from seasonal wind speed average
        c_Factor = 2/(math.pi)*Mean_Wind_Speed
        # Generate wind speed from Rayleigh distribution with seasonal scale factor
        Wind_Speed = np.random.rayleigh(c_Factor)
    if Season == 'Summer':
        Mean_Wind_Speed = np.mean(Monthly_Mean.loc[5:7,'Mean_wdsp'])
        c_Factor = 2/(math.pi)*Mean_Wind_Speed
        Wind_Speed = np.random.rayleigh(c_Factor)  
    if Season == 'Autumn':
        Mean_Wind_Speed = np.mean(Monthly_Mean.loc[8:10,'Mean_wdsp'])
        c_Factor = 2/(math.pi)*Mean_Wind_Speed
        Wind_Speed = np.random.rayleigh(c_Factor)
    if Season == 'Winter':
        np.mean(Monthly_Mean.loc[[1,11,12],'Mean_wdsp'])
        c_Factor = 2/(math.pi)*Mean_Wind_Speed
        Wind_Speed = np.random.rayleigh(c_Factor)
    return Wind_Speed

# Loop through each row of wind variables dataframe and generate a wind speed based on the season using the function above.
for i, row in Wind_Variables.iterrows():
    Wind_Variables.loc[i,'Wind_Speed'] = Wind_Speed_Generator(Wind_Variables.loc[i,'Season'])
    

## Resulting Dataset

In [None]:
# Code

## References
- [1] Met Eireann. 30 Year Averages.
  https://www.met.ie/climate-ireland/1981-2010/cork.html
  
- [2] Justus et al. Methods for Estimating Wind Speed Frequency Distributions.
  http://ecreee.wikischolars.columbia.edu/file/view/Justus+1977+-+Estimation+of+Wind+Power+Distributions.pdf
- [3] The Swiss Wind Power Data Website. Weibull Calculator.
  https://wind-data.ch/tools/weibull.php?lng=en
- [4]US Department of Energy. The Effect of Generalized Wind Characteristics on Annual Power Estimates from Wind Turbine Generators.
  https://www.osti.gov/servlets/purl/5197838
- [5] Wind Power Program. Wind statistics and the Weibull distribution. 
  http://www.wind-power-program.com/wind_statistics.htm
- [6] Mohamed Hatim Ouahabi. Yearly comparison of data observed and predicted wind speed frequencies using Weibull and Rayleigh distributions at Lafarge cement plant.
 https://www.researchgate.net/figure/Yearly-comparison-of-data-observed-and-predicted-wind-speed-frequencies-using-Weibull-and_fig4_317070870
- [7] Tony Burton et al. Wind Energy Handbook. https://books.google.ie/books?id=4UYm893y-34C&pg=PA14&lpg=PA14&dq=%22annual+mean+wind+speed%22+%22scale+parameter+c%22+rayleigh&source=bl&ots=2Q5w2O-aev&sig=F03_ke478E9n0wwNQjCTHdsIY6k&hl=en&sa=X&ved=2ahUKEwiS_-TljpjfAhX_RhUIHWk5Ds8Q6AEwBXoECAQQAQ#v=onepage&q=%22annual%20mean%20wind%20speed%22%20%22scale%20parameter%20c%22%20rayleigh&f=false