#### SOLAR POWER PLANT COORDINATES USED: 142.110216 , -19.461907
#### WIND POWER PLANT COORDINATES USED: 53.556563, 8.598084

Data Used:
- Sunshine
- Cloudcover
- Temperature
- Wind Speed

Source(s) of Data: 
- 7Timer API.
- Solar Farm and Wind Farm Monthly Schedule CSV Files
- Annual Generation Data for Solar Farm and Wind Farm


ML MODEL 1 (SOLAR POWER PLANT)
Chosen method: Random Forest Regression

In [2]:
#import libraries

import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import Pipeline

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

In [None]:
#loading the data for analysis
#dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
solar_data = pd.read_csv('solar_generation_data.csv')

In [None]:
#inspecting the data
solar_data.head()

In [None]:
solar_data.isnull().sum(axis=0) 

Rainfall shows 53 missing rows of information.

A quick google search about the impact on rainfall on solar power output shows that "While the rain itself will have no impact on the panels, the rain clouds will likely lower your production. However, the occasional rainstorm could actually be good for your solar system's production, because it's a no-fuss, safe way to clean your panels."

The main weather conditions that affect solar power output are solar irradiation, cloudcover and temperature. We therefore choose to proceed with the ML model but excluding the rainfall column in the prediction. We instead fill the empty data rows with the median rainfall values to avoid any issues moving forward

In [None]:
solar_data['Rainfall in mm'] = solar_data['Rainfall in mm'].fillna(0)

In [None]:
solar_data.isnull().sum(axis=0) 

In [None]:
solar_data.shape

In [None]:
solar_data.info()

From the data types, we determine that temperature values need to be converted to floats for the model. 

In [None]:
#solar_data['Temp Hi'] = pd.to_numeric(solar_data['Temp Hi'], errors='coerce')
#solar_data['Temp Low'] = pd.to_numeric(solar_data['Temp Low'], errors='coerce')

In [None]:
solar_data.info()

In [None]:
solar_data.isnull().sum(axis=0) 

In [None]:
solar_data['Temp Hi'] = solar_data['Temp Hi'].replace('\u00b0','', regex=True)
solar_data['Temp Hi'] = pd.to_numeric(solar_data['Temp Hi'], downcast="float")

solar_data['Temp Low'] = solar_data['Temp Low'].replace('\u00b0','', regex=True)
solar_data['Temp Low'] = pd.to_numeric(solar_data['Temp Low'], downcast="float")

#solar_data['Temp Lo'] = pd.to_numeric(solar_data['Temp Lo'])

In [None]:
solar_data.info()

In [None]:
#training and test sets

# Values of attributes

dataset = solar_data.drop(['Month ', 'Day', 'Power Generated in MW', 'Rainfall in mm'], axis=1)
X = dataset.values
y = solar_data['Power Generated in MW'].values

#data splitting

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

#data transformation (scaling)

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

#creation of regressor model

forest_model = RandomForestRegressor(n_jobs=-1)

#fitting model
forest_model.fit(X_train, y_train) # fit model

#predicting
y_predicted = forest_model.predict(X_test)

In [None]:
#accuracy
#accuracy determination of random forest regression
from sklearn.metrics import r2_score
score = r2_score(y_test, y_predicted)
score

In [None]:
y_predicted.size

In [None]:
y_predicted

In [None]:
#creation of pipeline to store and add predicted data to the dataframe

ML MODEL 2 (WIND POWER PLANT)
Chosen method: Random Forest Regression

In [None]:
wind_data = pd.read_csv('wind_generation_data.csv')

In [None]:
#inspecting the data
wind_data.head()

In [None]:
wind_data.isnull().sum(axis=0)