## PREDICTING WIND TURBINE POWER OUTPUT FROM WIND SPEED VALUES 
## using LINEAR REGRESSION

#### Objectives
##### 1 Perform linear regression on the dataset.
##### 2 Explain what this shows.
##### 3 Predict wind speed power output from wind speed values.

#### SOURCES

https://www.w3schools.com/python/python_ml_linear_regression.asp
    
https://realpython.com/linear-regression-in-python/#regression
    
https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html

https://www.oreilly.com/library/view/machine-learning-with/9781491989371/ch01.html
Tranforming data from list to array.




### IMPORTING AND PLOTTING THE DATASET

In [None]:
# importing library and dataset.
import pandas as pd

dfpower = pd.read_csv("https://raw.githubusercontent.com/ianmcloughlin/2020A-machstat-project/master/dataset/powerproduction.csv")

# examining what type is our data.
dfpower.dtypes


In [None]:
# looking at the heads and ends of dataset.
print(dfpower)

In [None]:
# import libraries
import matplotlib.pyplot as plt
import numpy as np

# x = speed, y= power
x = dfpower.iloc[:,[0]]
y = dfpower.iloc[:,[1]]

# plot each (x,y) dot on graph.
plt.style.use("ggplot")
plt.rcParams["figure.figsize"] = (18,10)
plt.title("Wind Turbine Power")
plt.xlabel("Speed")
plt.ylabel("Power")
plt.plot(x, y, '.')




### USING SKLEARN LINEARREGRESSION TO FIND THE RELATIONSHIP BETWEEN X & Y - Objective 1

In [None]:
# import library
from sklearn.linear_model import LinearRegression

# x = speed and y = power from the dataframe dfpower
x = dfpower.iloc[:,[0]]
y = dfpower.iloc[:,[1]]

# create the variable model as the instance of LinearRegression
model=LinearRegression()

# call .fit on the model which calculates the optimal values of the 
# weights bo (the intercept of the y axis) and b1 (slope of the estimated
# regression line)
model.fit(x,y)

# r_sq is the proportion of the variance for a dependent variable that may
# be explained by the influence of independent variable(s)
# calling .score returns the coefficient of determination (r_sq)
r_sq = model.score(x,y)

# the closer the coefficient of determination is to 1 the better the fit,
# because it indicates a sum of squared residuals (SSR) is 0 which is perfect. 
print("coefficient of determination:", r_sq)

# .intercept is the intercept of y axis, bo
print("intercept:", model.intercept_)

# .coef is the slope of regression line, b1.
print("slope:", model.coef_)

In [None]:
# we can now use the y intercept and the slope multiplied by each value
# of x in turn to predict y (y_pred)
y_pred = model.intercept_+ model.coef_ * x
print("predicted response:",y_pred)

In [None]:

# Plot outputs of the x and y values as red dots and the x and y_pred
# as a blue line.
plt.scatter(x, y,  color='red')
plt.plot(x, y_pred, color='blue', linewidth=3)

plt.title("Linear Regression on Wind Speed and Turbine Power")
plt.xlabel("Speed")
plt.ylabel("Power")

plt.show()




### USING POLYFIT TO FIND THE RELATIONSHIP BETWEEN X & Y - Objective 1

In [None]:
# Using polyfit to find the relationship between the x and y variables
import numpy as np

# polyfit expects x to be a 1D vector, convert x to an np.array called speed_data_array.
speed_data = dfpower["speed"]
print("Original List:", speed_data)

speed_data_array = np.array(speed_data)
print("One-dimensional NumPy array: ", speed_data_array)

# this will return an array which includes the coefficients; aka the intercept of y axis, bo and
# the slope of regression line, b1.  These are the same figures as I produced using LinearRegression.
np.polyfit(speed_data_array,y,1)


In [None]:
# Continuing data comparison method from Linear Regression Lecture.
# As above the coeffs are produced by using polyfit to find the connection between x and y.
# The coeffecients are the missing data from the equation of the line 4.9x - 13.9 = y
coeffs = np.polyfit(speed_data_array,y,1)

# This plots the original data
plt.plot(speed_data_array,y,".", label="Data")

# This plots the best fit line.
plt.plot(speed_data_array, coeffs[0]*speed_data_array+coeffs[1], "-","Bestfit")

# Again the plot is the same as the one produced using LinearRegression.

In [None]:
# Calculating the cost of the line
# Cost(m,c)= Σ(yi - mxi-c)² where m is the slope and c is the constant
# This is the same idea as polyfit and produces the same results.

# To use this method I have had to take the values of x/speed and y/power out of the dataframe and use
# them as an array
x_avg = np.mean(dfpower["speed"])
y_avg = np.mean(dfpower["power"])

# subtract the means from speed (x) and power (y)
x_zero = dfpower["speed"] - x_avg

y_zero = dfpower["power"] - y_avg


# to calculate the best m (slope)
#print(x_zero * y_zero)
#print(np.sum(x_zero * y_zero))
m = np.sum(x_zero * y_zero) / np.sum(x_zero * x_zero)

c = y_avg - m * x_avg

print("m is %8.6f and c is %6.6f." % (m,c))




### INTERPRETING THE DATA - Objective 2

Source  https://enerpower.ie/portfolio/wind-turbine-faq-ireland/
    
How strong does the wind have to blow for the wind turbines to work?

"Wind turbines typically start operating at wind speeds around Beaufort Force 3 
(which is around 3-5 metres per second (m/s), or 8-12 miles per hour (mph). 
Turbines reach maximum power output at Beaufort 5 (around 11-14 m/s or 25-30 mph). 
At very high wind speeds, i.e. Beaufort Storm Force 10 winds, (around 24 m/s or 55 mph) 
or greater the wind turbines shut down to prevent excessive wear and tear. 
Since winds of this strength occur only for a handful of hours per year, 
very little energy is lost in high wind periods."
 
The information contained in this faq, from a wind energy company Enerpower, shows us how to interpret
our plot.  The illustrated data shows that no power is produced until the wind speed reaches 
8 mph, at 20-25 mph it has reached peak production and levels off.  Winds above 20 mph on our plot seem
to indicate the best speed for producing power.  Higher wind speeds do not produce more power.  As the faq
from Enerpower indicate, a turbine will turn itself off in storm force winds.