# Simulated Dataset

## Real-World Phenomenon
- The real world phenomenon I have selected for this project is climatic conditions.
- The data simulated will imitate variables that could be measured at Cork Airport.
- The simulated dataset will consist of 100 hourly data points selected from 1 year.
- The simulated dataset will be based around monthly average values at Cork Airport provided from Met Eireann. These monthly values are available from January 1962 to October 2018. [1] This data is stored in the "Data" folder accompanying this notebook.

### Variables to Simulate


### 1. Wind Speed
- <b>Description</b>

Wind is caused by air moving from high to low pressure, usually due to changes in temperature. The speed at which the air moves is the wind speed. It is sometimes measured in nautical miles per hour (knots) but for the purposes of this project it will be considered in metres/second (m/s). According to Met Eireann's records, the long term average wind speed at Cork Airport is 10.4 knots. [1] This is equivalent to 5.35 m/s.

- <b>Likely distribution</b>

When measuring wind speed, it can be observed that strong gale force winds occur rarely while moderate winds are much more common. Studies have concluded that the best probability density distribution for approximating hourly or 10 minute wind speeds at a site is the Weibull distribution. [2]
    
<img src = "./Images/Weibull.png">
<center>[3]</center>

The Weibull probability distribution is a two-parameter function that is widely used in statistical analysis. Its two parameters are scale and shape. The scale relates to the long term mean wind speed at the site and the shape factor can be calculated from the standard deviation of the dataset if known.

The statistical distribution of wind speeds differs from site to site. It can be influenced by local climate, the landscape and topography. These can give different versions of a Weibull distribution with different mean values and shapes.

When the shape factor of a Weibull distribution is exactly 2, it is called a Rayleigh distribution. It has been shown that for mean wind speeds above 4 m/s, the Rayleigh distribution is an acceptable approximation for wind speed. [4] This is equivalent to a standard deviation of 52% of the mean wind speed.[5] This distribution is commonly used by wind turbine manufacturers to estimate the power that would be produced by a wind turbine at that particular site.

<img src = "./Images/Rayleigh.png">
<center>[6]</center>
The scale parameter for a Rayleigh wind distribution (c) can be calculated as 2/π x long-term average wind speed. [7]

Python can produce an array of random numbers with a Rayleigh probability distribution using the numpy.random.rayleigh function, which accepts the scale factor and size of the output array as inputs.


### 2. Air Temperature
- <b>Description</b>

Air temperature is the most commonly measured weather parameter, measured in degrees Celsius. It causes gradients in atmospheric pressure which causes wind to blow.
According to Met Eireann's records, the long term average temperature at Cork Airport is 9.65 degrees Celsius. [1]

- <b>Likely distribution</b>

It has been found that air temperature follows a normal or Gaussian distribution. [8] This is a bell shaped, symmetrical distribution. It takes parameters of mean, which is the line of symmetry of the plot, and standard distribution which determines the spread of the plot.

<img src = "./Images/Normal.png">
<center>[9]</center>
Python can produce an array of random numbers with a normal probability distribution using the numpy.random.normal function, which accepts the mean, standard deviation and size of the output array as inputs.


### 3. Atmospheric Pressure
#### Description
-
#### Likely distribution
- 


### 4. Season
- <b>Description</b>

Wind speed, direction and atmospheric pressure will all vary with the time of year.

To examine this, the 4th variable generated will be the season, spring, summer, autumn or winter.

- <b>Likely distribution</b>

The 100 data points are taken at random across one year. As a result, each data point is equally as likely to occur in any of the 4 seasons. This is a uniform probability distribution.

<img src = "./Images/Uniform.png">


## Relationships Between Variables
- 
- 
- 

## Simulation of Dataset

In [2]:
# Import pandas for data analysis
import pandas as pd

# Import numpy for random number generator
import numpy as np

# Import math for mathematical functions
import math

# Import seaborn for data visualisation
import seaborn as sns

# Import matplotlib.pyplot for plotting
import matplotlib.pyplot as plt

# Make matplotlib show interactive plots in the notebook
%matplotlib inline

In [3]:
# Import monthly average measurements from Met Eireann file
Monthly_Averages = pd.read_csv("Data/mly3904.csv",header=[0],skiprows = 19)

# Add a column to the dataframe containing wind speed in m/s (converted from knots)
Monthly_Averages['wdsp m/s'] = Monthly_Averages['wdsp']*0.51444

# Create an empty dataframe which will be populated with the long term average values for each month
Long_Term_Averages = pd.DataFrame()

# Populate dataframe with long term monthly averages
for i in range(1,13):
    Long_Term_Averages.loc[i,'Mean_wdsp'] = np.mean(Monthly_Averages.loc[Monthly_Averages.loc[:,'month'] == i,'wdsp m/s'])
    Long_Term_Averages.loc[i,'Mean_temp'] = np.mean(Monthly_Averages.loc[Monthly_Averages.loc[:,'month'] == i,'meant'])
    
print(Long_Term_Averages)

    Mean_wdsp  Mean_temp
1    6.165157   5.447368
2    6.057757   5.400000
3    5.880861   6.475439
4    5.358299   8.103509
5    5.170573  10.521053
6    4.720213  13.207018
7    4.497289  14.936842
8    4.563173  14.717544
9    4.867324  12.957895
10   5.397107  10.449123
11   5.577999   7.483929
12   6.009762   6.116071


In [4]:
# Create array of possible seasons
Season_Names = ('Spring','Summer','Autumn','Winter')

# Select seasons at random for the 100 data points
Seasons = np.random.choice(Season_Names, size=100)
# Create a dataframe from the Seasons array
Climate_Variables = pd.DataFrame(Seasons, columns=['Season'])

In [5]:
# Create a function to generate:
#   1. Wind speed from a Rayleigh probability distribution based on the season of the data point
#   2. Temperature from a normal probability distribution based on the season of the data point
def Variable_Generator(Season):
    
    if Season == 'Spring':
        # Average wind speed calculated from data for three months of spring from Met Eireann
        Mean_Wind_Speed = np.mean(Long_Term_Averages.loc[2:4,'Mean_wdsp'])
        # Calculate scale factor for Rayleigh distribution from seasonal wind speed average
        c_Factor = 2/(math.pi)*Mean_Wind_Speed
        # Generate wind speed from Rayleigh distribution with seasonal scale factor
        Wind_Speed = np.random.rayleigh(c_Factor)
        
        # Average temperature calculated from data for three months of spring from Met Eireann
        Mean_Temp = np.mean(Long_Term_Averages.loc[2:4,'Mean_temp'])
        # Standard deviation of distribution determined by season
        St_dist = 3
        # Generate wind speed from normal distribution with standard deviation determined by season
        Temperature = np.random.normal(loc=Mean_Temp,scale=St_dist)
        
        # Calculate coefficients for pressure equation        
        a = -9.5 * 10**(-7)
        b = -5.7 * 10**(-7) * Wind_Speed - 9.7 * 10**(-7) * Temperature + 0.001949
        c = -1 + 0.00585 * Wind_Speed + 0.000996 * Temperature + 1.13 * 10**(-7) * (Wind_Speed)**2 - 3.2 * 10**(-7) * (Temperature)**2 - 3.4 * 10**(-7) * Wind_Speed * Temperature
        d = b**2 - 4 * a * c
        # Calculate both results of pressure equation
        P1 = (-b + math.sqrt(d))/(2*a)
        P2 = (-b - math.sqrt(d))/(2*a)
        # Select appropriate pressure result determined by season
        Pressure = -b/(2*a)
        
    if Season == 'Summer':
        Mean_Wind_Speed = np.mean(Long_Term_Averages.loc[5:7,'Mean_wdsp'])
        c_Factor = 2/(math.pi)*Mean_Wind_Speed
        Wind_Speed = np.random.rayleigh(c_Factor) 

        Mean_Temp = np.mean(Long_Term_Averages.loc[5:7,'Mean_temp'])
        St_dist = 2
        Temperature = np.random.normal(loc=Mean_Temp,scale=St_dist) 
        
        a = -9.5 * 10**(-7)
        b = -5.7 * 10**(-7) * Wind_Speed - 9.7 * 10**(-7) * Temperature + 0.001949
        c = -1 + 0.00585 * Wind_Speed + 0.000996 * Temperature + 1.13 * 10**(-7) * (Wind_Speed)**2 - 3.2 * 10**(-7) * (Temperature)**2 - 3.4 * 10**(-7) * Wind_Speed * Temperature
        d = b**2 - 4 * a * c
        P1 = (-b + math.sqrt(d))/(2*a)
        P2 = (-b - math.sqrt(d))/(2*a)
        Pressure = np.maximum(P1,P2)
        
    if Season == 'Autumn':
        Mean_Wind_Speed = np.mean(Long_Term_Averages.loc[8:10,'Mean_wdsp'])
        c_Factor = 2/(math.pi)*Mean_Wind_Speed
        Wind_Speed = np.random.rayleigh(c_Factor)
        
        Mean_Temp = np.mean(Long_Term_Averages.loc[8:10,'Mean_temp'])
        St_dist = 3
        Temperature = np.random.normal(loc=Mean_Temp,scale=St_dist) 
        
        a = -9.5 * 10**(-7)
        b = -5.7 * 10**(-7) * Wind_Speed - 9.7 * 10**(-7) * Temperature + 0.001949
        c = -1 + 0.00585 * Wind_Speed + 0.000996 * Temperature + 1.13 * 10**(-7) * (Wind_Speed)**2 - 3.2 * 10**(-7) * (Temperature)**2 - 3.4 * 10**(-7) * Wind_Speed * Temperature
        d = b**2 - 4 * a * c
        Pressure = -b/(2*a)
        
    if Season == 'Winter':
        Mean_Wind_Speed = np.mean(Long_Term_Averages.loc[[1,11,12],'Mean_wdsp'])
        c_Factor = 2/(math.pi)*Mean_Wind_Speed
        Wind_Speed = np.random.rayleigh(c_Factor)
        
        Mean_Temp = np.mean(Long_Term_Averages.loc[[1,11,12],'Mean_temp'])
        St_dist = 2.75
        Temperature = np.random.normal(loc=Mean_Temp,scale=St_dist) 
        
        a = -9.5 * 10**(-7)
        b = -5.7 * 10**(-7) * Wind_Speed - 9.7 * 10**(-7) * Temperature + 0.001949
        c = -1 + 0.00585 * Wind_Speed + 0.000996 * Temperature + 1.13 * 10**(-7) * (Wind_Speed)**2 - 3.2 * 10**(-7) * (Temperature)**2 - 3.4 * 10**(-7) * Wind_Speed * Temperature
        d = b**2 - 4 * a * c
        P1 = (-b + math.sqrt(d))/(2*a)
        P2 = (-b - math.sqrt(d))/(2*a)
        Pressure = np.minimum(P1,P2)
        
    return Wind_Speed, Temperature, Pressure

# Loop through each row of wind variables dataframe and generate a wind speed and temperature based on the season using the function above.
for i, row in Climate_Variables.iterrows():
    Climate_Variables.loc[i,'WindSpeed'] = Variable_Generator(Climate_Variables.loc[i,'Season'])[0]
    Climate_Variables.loc[i,'Temperature'] = Variable_Generator(Climate_Variables.loc[i,'Season'])[1]
    Climate_Variables.loc[i,'Pressure'] = Variable_Generator(Climate_Variables.loc[i,'Season'])[2]


## Resulting Dataset

In [11]:
print(Climate_Variables)

pd.DataFrame.describe(Climate_Variables)

    Season  WindSpeed  Temperature     Pressure
0   Spring   1.409115     5.681322  1021.635515
1   Winter   1.536029     8.837625   859.848213
2   Winter   4.296514     7.309285   865.032700
3   Winter   3.500901     8.821318   834.299942
4   Summer   6.248373    11.638384  1152.830712
5   Summer   6.888260    10.827745  1148.695678
6   Summer   3.063619    15.966852  1210.676704
7   Winter   4.560496     3.930342   890.001681
8   Summer   2.139692    15.355359  1220.525962
9   Spring   9.055199     7.796936  1021.372304
10  Spring   4.908199     4.963969  1021.353332
11  Summer   5.254162    12.129138  1204.962782
12  Autumn   5.377458    11.889357  1017.073829
13  Spring   4.879425     5.225390  1022.089512
14  Summer   5.877831    14.440248  1197.724595
15  Summer   5.346207    12.447755  1161.142937
16  Autumn   7.298579    12.282949  1017.828509
17  Summer   0.814562    12.965097  1133.435760
18  Winter   7.220236     6.255692   855.564085
19  Summer   3.779393    14.862122  1098

Unnamed: 0,WindSpeed,Temperature,Pressure
count,100.0,100.0,100.0
mean,4.431706,10.180899,1028.024626
std,2.065392,4.520109,119.924005
min,0.814562,-1.013239,781.689669
25%,2.707049,6.54954,942.118183
50%,4.32171,10.827576,1019.70265
75%,5.758234,13.330314,1144.610956
max,9.867202,21.158199,1220.525962


# References
- [1] Met Eireann. Historical Data.
  https://www.met.ie/climate/available-data/historical-data
  
- [2] Justus et al. Methods for Estimating Wind Speed Frequency Distributions.
  http://ecreee.wikischolars.columbia.edu/file/view/Justus+1977+-+Estimation+of+Wind+Power+Distributions.pdf
- [3] The Swiss Wind Power Data Website. Weibull Calculator.
  https://wind-data.ch/tools/weibull.php?lng=en
- [4]US Department of Energy. The Effect of Generalized Wind Characteristics on Annual Power Estimates from Wind Turbine Generators.
  https://www.osti.gov/servlets/purl/5197838
- [5] Wind Power Program. Wind statistics and the Weibull distribution. 
  http://www.wind-power-program.com/wind_statistics.htm
- [6] Mohamed Hatim Ouahabi. Yearly comparison of data observed and predicted wind speed frequencies using Weibull and Rayleigh distributions at Lafarge cement plant.
 https://www.researchgate.net/figure/Yearly-comparison-of-data-observed-and-predicted-wind-speed-frequencies-using-Weibull-and_fig4_317070870
- [7] Tony Burton et al. Wind Energy Handbook. https://books.google.ie/books?id=4UYm893y-34C&pg=PA14&lpg=PA14&dq=%22annual+mean+wind+speed%22+%22scale+parameter+c%22+rayleigh&source=bl&ots=2Q5w2O-aev&sig=F03_ke478E9n0wwNQjCTHdsIY6k&hl=en&sa=X&ved=2ahUKEwiS_-TljpjfAhX_RhUIHWk5Ds8Q6AEwBXoECAQQAQ#v=onepage&q=%22annual%20mean%20wind%20speed%22%20%22scale%20parameter%20c%22%20rayleigh&f=false
- [8] International Journal of Advances in Science Engineering and Technology. Estimate the Mean Daily Temperature from Mean Monthly (using Gaussian Function)
  http://www.iraj.in/journal/journal_file/journal_pdf/6-277-147142876171-73.pdf
- [9] Giulia Antinori. Sensitivity Analysis and Uncertainty Quantification for a Coupled Secondary Air System Thermo-Mechanical Model of a Jet Engine Low Pressure Turbine Rotor.
  https://www.researchgate.net/figure/The-temperature-at-location-x-follows-a-normal-distribution-The-nominal-solution-which_fig4_269191283
- Journal of Applied Sciences. Statistical Analysis of the Relationship Between Wind Speed, Pressure and Temperature.
  https://scialert.net/fulltextmobile/?doi=jas.2011.2712.2722