#### SOLAR POWER PLANT COORDINATES USED: 142.110216 , -19.461907
#### WIND POWER PLANT COORDINATES USED: 53.556563, 8.598084

Data Used:
- Sunshine
- Cloudcover
- Temperature
- Wind Speed

Source(s) of Data: 
- 7Timer API.
- Solar Farm and Wind Farm Monthly Schedule CSV Files
- Annual Generation Data for Solar Farm and Wind Farm


ML MODEL 1 (SOLAR POWER PLANT)
Chosen method: Random Forest Regression

In [203]:
#import libraries

import matplotlib.pyplot as plt
import pandas as pd
import datetime
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

from sklearn.pipeline import Pipeline

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

In [205]:
#loading the data for analysis
#dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
solar_data = pd.read_csv('solar_generation_data.csv')

In [206]:
#inspecting the data
solar_data.head()

Unnamed: 0,Month,Day,Temp Hi,Temp Low,Solar,Cloud Cover Percentage,Rainfall in mm,Power Generated in MW
0,Jan,1,109°,85°,30.0,9,0.0,9.93
1,Jan,2,106°,71°,30.1,9,0.0,9.97
2,Jan,3,106°,81°,29.5,9,0.0,9.77
3,Jan,4,102°,83°,13.0,4,0.0,4.3
4,Jan,5,105°,80°,30.1,9,0.0,9.97


In [207]:
solar_data.isnull().sum(axis=0) 

Month                      0
Day                        0
Temp Hi                    0
Temp Low                   0
Solar                      0
Cloud Cover Percentage     0
Rainfall in mm            53
Power Generated in MW      0
dtype: int64

Rainfall shows 53 missing rows of information.

A quick google search about the impact on rainfall on solar power output shows that "While the rain itself will have no impact on the panels, the rain clouds will likely lower your production. However, the occasional rainstorm could actually be good for your solar system's production, because it's a no-fuss, safe way to clean your panels."

The main weather conditions that affect solar power output are solar irradiation, cloudcover and temperature. We therefore choose to proceed with the ML model but excluding the rainfall column in the prediction. We instead fill the empty data rows with the median rainfall values to avoid any issues moving forward

In [208]:
solar_data['Rainfall in mm'] = solar_data['Rainfall in mm'].fillna(0)

In [209]:
solar_data.isnull().sum(axis=0) 

Month                     0
Day                       0
Temp Hi                   0
Temp Low                  0
Solar                     0
Cloud Cover Percentage    0
Rainfall in mm            0
Power Generated in MW     0
dtype: int64

In [210]:
solar_data.shape

(365, 8)

In [211]:
solar_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 8 columns):
Month                     365 non-null object
Day                       365 non-null int64
Temp Hi                   365 non-null object
Temp Low                  365 non-null object
Solar                     365 non-null float64
Cloud Cover Percentage    365 non-null int64
Rainfall in mm            365 non-null float64
Power Generated in MW     365 non-null float64
dtypes: float64(3), int64(2), object(3)
memory usage: 22.9+ KB


From the data types, we determine that temperature values need to be converted to floats for the model. 

In [212]:
#solar_data['Temp Hi'] = pd.to_numeric(solar_data['Temp Hi'], errors='coerce')
#solar_data['Temp Low'] = pd.to_numeric(solar_data['Temp Low'], errors='coerce')

In [213]:
solar_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 8 columns):
Month                     365 non-null object
Day                       365 non-null int64
Temp Hi                   365 non-null object
Temp Low                  365 non-null object
Solar                     365 non-null float64
Cloud Cover Percentage    365 non-null int64
Rainfall in mm            365 non-null float64
Power Generated in MW     365 non-null float64
dtypes: float64(3), int64(2), object(3)
memory usage: 22.9+ KB


In [214]:
solar_data.isnull().sum(axis=0) 

Month                     0
Day                       0
Temp Hi                   0
Temp Low                  0
Solar                     0
Cloud Cover Percentage    0
Rainfall in mm            0
Power Generated in MW     0
dtype: int64

In [215]:
solar_data['Temp Hi'] = solar_data['Temp Hi'].replace('\u00b0','', regex=True)
solar_data['Temp Hi'] = pd.to_numeric(solar_data['Temp Hi'], downcast="float")

solar_data['Temp Low'] = solar_data['Temp Low'].replace('\u00b0','', regex=True)
solar_data['Temp Low'] = pd.to_numeric(solar_data['Temp Low'], downcast="float")

#solar_data['Temp Lo'] = pd.to_numeric(solar_data['Temp Lo'])

In [216]:
solar_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 8 columns):
Month                     365 non-null object
Day                       365 non-null int64
Temp Hi                   365 non-null float32
Temp Low                  365 non-null float32
Solar                     365 non-null float64
Cloud Cover Percentage    365 non-null int64
Rainfall in mm            365 non-null float64
Power Generated in MW     365 non-null float64
dtypes: float32(2), float64(3), int64(2), object(1)
memory usage: 20.0+ KB


In [228]:
#training and test sets

# Values of attributes

dataset = solar_data.drop(['Month ', 'Day', 'Power Generated in MW', 'Rainfall in mm'], axis=1)
X = dataset.values
y = solar_data['Power Generated in MW'].values

#data splitting

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

#data transformation (scaling)

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

#creation of regressor model

forest_model = RandomForestRegressor(n_jobs=-1)

#fitting model
forest_model.fit(X_train, y_train) # fit model

#predicting
y_predicted = forest_model.predict(X_test)

In [229]:
#accuracy
#accuracy determination of random forest regression
from sklearn.metrics import r2_score
score = r2_score(y_test, y_predicted)
score

0.9991094319242667

In [230]:
y_predicted.size

110

In [231]:
y_predicted

array([5.967, 3.754, 8.806, 9.646, 8.727, 5.742, 8.504, 5.406, 5.872,
       6.509, 9.927, 9.533, 9.467, 7.06 , 6.181, 7.683, 9.915, 8.504,
       9.229, 7.48 , 8.962, 5.79 , 8.648, 7.654, 7.531, 5.967, 6.57 ,
       9.111, 9.997, 8.54 , 6.497, 7.117, 9.446, 9.797, 5.967, 6.761,
       8.56 , 8.316, 9.97 , 9.791, 7.302, 9.794, 2.434, 9.102, 7.108,
       7.024, 7.777, 8.236, 8.695, 9.446, 7.   , 9.888, 8.864, 8.049,
       5.76 , 9.97 , 7.792, 4.185, 9.888, 8.464, 7.1  , 9.464, 8.418,
       9.464, 7.316, 7.435, 6.702, 6.553, 6.09 , 5.052, 7.768, 5.931,
       8.841, 6.307, 9.997, 6.1  , 8.56 , 5.38 , 9.707, 9.622, 8.66 ,
       8.319, 9.869, 6.872, 6.553, 9.651, 9.64 , 7.753, 2.694, 6.506,
       9.785, 8.909, 5.462, 7.975, 6.315, 6.649, 6.088, 9.317, 5.964,
       6.226, 6.051, 9.785, 9.915, 8.233, 6.084, 5.825, 6.849, 8.73 ,
       6.279, 9.758])

In [None]:
#creation of pipeline to store and add predicted data to the dataframe