### Linear regression and log regression
#### Can we predict the power generation for next couple of days? - this allows for better grid management
While analyzing data, do we see some patterns in power generation?

In [11]:
import numpy as np
import pandas as pd
from sklearn import preprocessing
from sklearn import utils

#### Data

In [7]:
train1 = pd.read_csv("ML2020-p26-main/Data/Train_Plant1.csv")
test1 = pd.read_csv("ML2020-p26-main/Data/Test_Plant1.csv")

train2 = pd.read_csv("ML2020-p26-main/Data/Train_Plant2.csv")
test2 = pd.read_csv("ML2020-p26-main/Data/Test_Plant2.csv")

In [8]:
def create_features(in_data):
    in_data['DATE_TIME'] = pd.to_datetime(in_data['DATE_TIME'])
    in_data["DATE"] = in_data['DATE_TIME'].dt.date
    in_data["HOUR"] = in_data['DATE_TIME'].dt.hour
    in_data["MINUTE"] = in_data['DATE_TIME'].dt.minute
    in_data["DAY_OF_YEAR"] = in_data['DATE_TIME'].dt.dayofyear
    in_data["DAY_OF_WEEK"] = in_data['DATE_TIME'].dt.dayofweek
    in_data["MONTH"] = in_data['DATE_TIME'].dt.month
    in_data["DAY_OF_MONTH"] = in_data['DATE_TIME'].dt.day
    
create_features(train1)
create_features(test1)
    
create_features(train2)
create_features(test2)

#### Linear regression

In [16]:
from sklearn.linear_model import LinearRegression
from sklearn import metrics

In [34]:
train = train2
test = test2

train_features = ['HOUR','MINUTE','DAY_OF_YEAR','DAY_OF_WEEK','MONTH','DAY_OF_MONTH', 'AMBIENT_TEMPERATURE', 'MODULE_TEMPERATURE', 'IRRADIATION']
predict_column = 'AC_POWER'

np.random.seed(1111)
lr = LinearRegression()
lr.fit(train[train_features], train[[predict_column]])

# learnt weights
print(f'[{round(lr.coef_[0][0], 8)}, {round(lr.coef_[0][1], 8)}, {round(lr.coef_[0][2], 8)}, {round(lr.coef_[0][3], 8)}, {round(lr.coef_[0][4], 8)}, {round(lr.coef_[0][5], 8)}, {round(lr.coef_[0][6], 8)}, {round(lr.coef_[0][7], 8)}, {round(lr.coef_[0][8], 8)} , {round(lr.intercept_[0], 8)}]')

[0.01285528, -0.0020498, -0.12589129, -2.28440674, -0.03024164, 0.81159965, 3.34587997, -1.10258629, 912.94071074 , -15.29409425]


In [35]:
test_predictions = lr.predict(test[train_features])

# RMSE
print(np.sqrt(metrics.mean_squared_error( test[[predict_column]] , test_predictions )))

177.04688121107873


RMSE when predicting AC POWER in Plant 1 : 335.74 -> 48.61

RMSE when predicting DC POWER in Plant 1 : 3433.54 -> 495.67

RMSE when predicting AC POWER in Plant 2 : 289.39 -> 177.05

RMSE when predicting DC POWER in Plant 2 : 295.81 -> 180.86

#### Log regression

In [13]:
from sklearn.linear_model import LogisticRegression 
lab_enc = preprocessing.LabelEncoder()

In [68]:
# Koodi testimise andmestikud

#train22 = train2[:5000]
#test22 = test2[:1000]

#train22 = preprocessing.scale(train22)
#test22 = preprocessing.scale(test22)

In [14]:
train = train1[:9000]
test = test1

train_features = ['HOUR','MINUTE','DAY_OF_YEAR','DAY_OF_WEEK','MONTH','DAY_OF_MONTH']
predict_column = 'AC_POWER'

logr = LogisticRegression(solver='lbfgs', max_iter=1000)

train_pred = lab_enc.fit_transform(preprocessing.scale( train[predict_column] ))

# FIT
logr.fit(preprocessing.scale( train[train_features] ), train_pred)
# test22.loc['logr'] = logr.predict(test22[train_features])

test_pred = lab_enc.fit_transform(preprocessing.scale( test[predict_column] ))

print(f"Accuracy of LOG {logr.score(preprocessing.scale( test[train_features] ), test_pred )*100}%")

Accuracy of LOG 69.34046345811052%


In [17]:
# RMSE
print(np.sqrt(metrics.mean_squared_error( test[[predict_column]] , test_pred )))

1339.8456528489367


Accuracy of predicting AC POWER in Plant 1 : 69.34046345811052 % (with training data length 9000)

RMSE : 1339.85

Accuracy of predicting DC POWER in Plant 1 : 69.34046345811052 % (with training data length 9000)

RMSE : 2011.80

Accuracy of predicting AC POWER in Plant 2 : 50.32308377896613 % (with training data length 9000)

RMSE : 3193.82

Accuracy of predicting DC POWER in Plant 2 : 50.32308377896613 % (with training data length 9000)

RMSE : 3194.76