# Simulated Dataset

## Real-World Phenomenon
- The real world phenomenon I have selected for this project is wind.
- The data simulated will imitate features relating to wind that could be measured at Cork Airport.
- The simulated dataset will consist of 100 hourly data points selected from 1 year.

## Variables to Simulate


### 1. Wind Speed
- <b>Description</b>

Wind is caused by air moving from high to low pressure, usually due to changes in temperature. The speed at which the air moves is the wind speed. It is sometimes measured in nautical miles per hour (knots) but for the purposes of this project it will be considered in metres/second (m/s). According to Met Eireann's records, the 30-year average wind speed at Cork Airport is 10.5 knots. [1] This is equivalent to 5.4 m/s.

- <b>Likely distribution</b>

When measuring wind speed, it can be observed that strong gale force winds occur rarely while moderate winds are much more common. Studies have concluded that the best probability density distribution for approximating hourly or 10 minute wind speeds at a site is the Weibull distribution.
    
<img src = "./Images/Weibull.png">

The Weibull probability distribution is a two-parameter function that is widely used in statistical analysis. Its two parameters are scale and shape. The scale relates to the long term mean wind speed at the site and the shape factor can be calculated from the standard deviation of the dataset if known.

The statistical distribution of wind speeds differs from site to site. It can be influenced by local climate, the landscape and topography. These can give different versions of a Weibull distribution with different mean values and shapes.

When the shape factor of a Weibull distribution is exactly 2, it is called a Rayleigh distribution. It has been shown that for mean wind speeds above 4 m/s, the Rayleigh distribution is an acceptable approximation for wind speed. This is equivalent to a standard deviation of 52% of the mean wind speed. This distribution is commonly used by wind turbine manufacturers to estimate the power that would be produced by a wind turbine at that particular site.

<img src = "./Images/Rayleigh.png">

The scale parameter for a Rayleigh wind distribution can be calculated as 2/π x long-term average wind speed.

Python can produce an array of random numbers with a Rayleigh probability distribution using the numpy.random.rayleigh function, which accepts the scale factor and size of the output array as inputs.


### 2. Wind Direction
#### Description
-
#### Likely distribution
- 


### 3. Atmospheric Pressure
#### Description
-
#### Likely distribution
- 


### 4. Season
#### Description
-
#### Likely distribution
- 


## Relationships Between Variables
- 
- 
- 

## Simulation of Dataset

In [1]:
# Import pandas for data analysis
import pandas as pd

# Import numpy for random number generator
import numpy as np

# Import math for mathematical functions
import math

# Import matplotlib.pyplot for plotting
import matplotlib.pyplot as plt

# Make matplotlib show interactive plots in the notebook
%matplotlib inline

In [2]:
# Create array of monthly average wind speeds in knots from Met Eireann
Months = ('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec')
Monthly_Avgs_Knots = (12.1,12.0,11.6,10.3,10.1,9.4,9.0,9.0,9.4,10.7,10.9,11.6)

#Create dataframe of monthly average wind speeds
Monthly_Avgs = pd.DataFrame([Monthly_Avgs_Knots], columns=Months)

# Calculate monthly average wind speeds in m/s
Monthly_Avgs.loc[1] = Monthly_Avgs.loc[0] * 0.51444

In [3]:
# Create array of possible seasons
Season_Names = ('Spring','Summer','Autumn','Winter')

# Select seasons at random for the 100 data points
Seasons = np.random.choice(Season_Names, size=100)
Wind_Variables = pd.DataFrame(Seasons, columns=['Season'])

In [6]:
def Wind_Speed_Generator(Season):
    if Season == 'Spring':
        Mean_Wind_Speed = np.mean(Monthly_Avgs.loc[1,'Feb':'Apr'])
        c_Factor = 2/(math.pi)*Mean_Wind_Speed
        Wind_Speed = np.random.rayleigh(c_Factor)
    if Season == 'Summer':
        Mean_Wind_Speed = np.mean(Monthly_Avgs.loc[1,'May':'Jul'])
        c_Factor = 2/(math.pi)*Mean_Wind_Speed
        Wind_Speed = np.random.rayleigh(c_Factor)  
    if Season == 'Autumn':
        Mean_Wind_Speed = np.mean(Monthly_Avgs.loc[1,'Aug':'Oct'])
        c_Factor = 2/(math.pi)*Mean_Wind_Speed
        Wind_Speed = np.random.rayleigh(c_Factor)
    if Season == 'Winter':
        Mean_Wind_Speed = np.mean(Monthly_Avgs.loc[1,['Nov','Dec','Jan']])
        c_Factor = 2/(math.pi)*Mean_Wind_Speed
        Wind_Speed = np.random.rayleigh(c_Factor)
    return Wind_Speed

for i, row in Wind_Variables.iterrows():
    Wind_Variables.loc[i,'Wind_Speed'] = Wind_Speed_Generator(Wind_Variables.loc[i,'Season'])

pd.DataFrame.describe(Wind_Variables)


Unnamed: 0,Wind_Speed
count,100.0
mean,4.359
std,2.283015
min,0.605959
25%,2.666032
50%,4.102866
75%,5.560004
max,12.103031


## Resulting Dataset

In [None]:
# Code

## References
- [1] Met Eireann. 30 Year Averages.
  https://www.met.ie/climate-ireland/1981-2010/cork.html
- 
- 
