#Stability of Grid System :

##Problem Statement : 
Electrical grids require a balance between electricity supply and demand in order to be stable. Conventional systems achieve this balance through demand-driven electricity production. For future grids with a high share of inflexible (i.e., renewable) energy source, the concept of demand response is a promising solution. This implies changes in electricity consumption in relation to electricity price changes. In this work, we’ll build a binary classification model to predict if a grid is stable or unstable using the UCI Electrical Grid Stability Simulated dataset.

Dataset: https://archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data+


In [110]:
#The first step in this problem would be to import the required packages.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [111]:
#Now, for the first step, we import the data as a pandas dataframe for further processing

df = pd.read_csv ('https://archive.ics.uci.edu/ml/machine-learning-databases/00471/Data_for_UCI_named.csv')
df.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab,stabf
0,2.95906,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,0.055347,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.78176,-0.005957,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.27721,-0.920492,0.163041,0.766689,0.839444,0.109853,0.003471,unstable
3,0.716415,7.6696,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,0.028871,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.79711,0.45545,0.656947,0.820923,0.04986,unstable


In [112]:
#We first have a look at the data using info function

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   tau1    10000 non-null  float64
 1   tau2    10000 non-null  float64
 2   tau3    10000 non-null  float64
 3   tau4    10000 non-null  float64
 4   p1      10000 non-null  float64
 5   p2      10000 non-null  float64
 6   p3      10000 non-null  float64
 7   p4      10000 non-null  float64
 8   g1      10000 non-null  float64
 9   g2      10000 non-null  float64
 10  g3      10000 non-null  float64
 11  g4      10000 non-null  float64
 12  stab    10000 non-null  float64
 13  stabf   10000 non-null  object 
dtypes: float64(13), object(1)
memory usage: 1.1+ MB


The data has 12 primary predictive features and two dependent variables.

**Predictive features:**

'tau1' to 'tau4': the reaction time of each network participant, a real value within the range 0.5 to 10 ('tau1' corresponds to the supplier node, 'tau2' to 'tau4' to the consumer nodes);

'p1' to 'p4': nominal power produced (positive) or consumed (negative) by each network participant, a real value within the range -2.0 to -0.5 for consumers ('p2' to 'p4'). As the total power consumed equals the total power generated, p1 (supplier node) = - (p2 + p3 + p4);

'g1' to 'g4': price elasticity coefficient for each network participant, a real value within the range 0.05 to 1.00 ('g1' corresponds to the supplier node, 'g2' to 'g4' to the consumer nodes; 'g' stands for 'gamma');

**Dependent variables:**

'stab': the maximum real part of the characteristic differential equation root (if positive, the system is linearly unstable; if negative, linearly stable);

'stabf': a categorical (binary) label ('stable' or 'unstable').

In [113]:
#We can observe that 'stab' and 'stabf' share a direct relation i.e. 'stabf' = 'stable' if 'stab' <= 0, 'unstable' otherwise
#Hence, we go ahead and drop 'stab'

df.drop('stab',axis=1,inplace=True)

In [114]:
#We check for null values

df.isnull().sum()

tau1     0
tau2     0
tau3     0
tau4     0
p1       0
p2       0
p3       0
p4       0
g1       0
g2       0
g3       0
g4       0
stabf    0
dtype: int64

In [115]:
#Since, there are no null values,we can now break the data into X(features) and y(to be predicted)

X = df.drop('stabf',axis = 1)
y = df['stabf']

In [116]:
#We use train_test_split to split data into train and test sets

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 1)

In [117]:
#Scaling using StandardScaler

from sklearn.preprocessing import StandardScaler
stdsc = StandardScaler()
X_train_scaled = pd.DataFrame(stdsc.fit_transform(X_train, y_train))
X_test_scaled = pd.DataFrame(stdsc.transform(X_test))

In [118]:
print (X_train_scaled.head())
print (X_test_scaled.head())

         0         1         2   ...        9         10        11
0  0.367327 -0.986042  0.650447  ...  0.339859  0.585568  0.492239
1 -0.064659  0.089437  1.035079  ... -1.558488  1.429649 -1.443521
2 -1.467850  1.298418 -0.502536  ...  1.451534 -1.045743  0.492489
3  0.820081  0.529920  1.299657  ...  1.361958  1.604140  0.275303
4  0.665424 -1.425627  0.312300  ...  0.695660  1.137504 -1.312575

[5 rows x 12 columns]
         0         1         2   ...        9         10        11
0  0.593951 -0.412733  1.503924  ...  1.167203 -1.507330  1.084726
1  0.202190  0.374416 -0.188800  ... -0.395660  1.414651  1.226011
2 -1.079044 -0.313745 -0.884634  ... -1.438495  0.651821 -1.682168
3 -0.083120 -1.107327  0.372805  ... -1.672322 -0.357714  1.055865
4  0.873921  1.438466  0.086662  ... -1.469731  0.956396 -0.819727

[5 rows x 12 columns]


In [143]:
#Creating function to find metrics

def met(y_test, y_pred, name):
  print (' Name of model : ',name)
  print ('\n Accuracy score of model : ',"{:.4f}".format(accuracy_score(y_test,y_pred)))

In [144]:
#RandomForestClassifier

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(random_state = 1)
rf.fit(X_train_scaled,y_train)
y_pred = rf.predict(X_test_scaled)

In [145]:
#Checking the metrics for above model

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
met (y_test,y_pred,'Random Forest Classifier')

 Name of model :  Random Forest Classifier

 Accuracy score of model :  0.9290


In [146]:
#ExtraTreesClassifier

from sklearn.ensemble import ExtraTreesClassifier
et = ExtraTreesClassifier (random_state = 1)
et.fit(X_train_scaled,y_train)
y_pred = et.predict(X_test_scaled)

In [147]:
#Checking the metrics for above model

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
met (y_test,y_pred,'Extra Trees Classifier')

 Name of model :  Extra Trees Classifier

 Accuracy score of model :  0.9280


In [148]:
#XGBoostClassifier

from xgboost import XGBClassifier
xgbc = XGBClassifier (random_state = 1)
xgbc.fit(X_train_scaled,y_train)
y_pred = xgbc.predict(X_test_scaled)

In [149]:
#Checking the metrics for above model

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
met (y_test,y_pred,'XGBoost Classifier')

 Name of model :  XGBoost Classifier

 Accuracy score of model :  0.9195


In [150]:
#LightGBMClassifier

from lightgbm import LGBMClassifier
lgbm = LGBMClassifier (random_state = 1)
lgbm.fit(X_train_scaled,y_train)
y_pred = lgbm.predict(X_test_scaled)

In [151]:
#Checking the metrics for above model

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
met (y_test,y_pred,'LGBM Classifier')

 Name of model :  LGBM Classifier

 Accuracy score of model :  0.9375


In [152]:
#Adding RandomizedSearchCV to ExtraTreesClassifier

n_estimators = [50, 100, 300, 500, 1000]
min_samples_split = [2, 3, 5, 7, 9]
min_samples_leaf = [1, 2, 4, 6, 8]
max_features = ['auto', 'sqrt', 'log2', None]
hyperparameter_grid = {'n_estimators': n_estimators,'min_samples_leaf': min_samples_leaf,'min_samples_split': min_samples_split,'max_features': max_features}

from sklearn.model_selection import RandomizedSearchCV
eth = ExtraTreesClassifier (random_state = 1)
rdcv = RandomizedSearchCV(eth, param_distributions = hyperparameter_grid, cv=5, n_iter=10, n_jobs=-1, verbose=1, random_state=1, scoring='accuracy')
rdcv.fit(X_train_scaled,y_train)

rdcv.best_params_

Fitting 5 folds for each of 10 candidates, totalling 50 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  46 tasks      | elapsed:  1.4min
[Parallel(n_jobs=-1)]: Done  50 out of  50 | elapsed:  1.5min finished


{'max_features': None,
 'min_samples_leaf': 8,
 'min_samples_split': 2,
 'n_estimators': 1000}

In [153]:
#Using best parameters for ExtraTreesClassifier

etc_best = ExtraTreesClassifier(random_state=1, max_features=None,min_samples_leaf=8,min_samples_split=2, n_estimators=1000)
etc_best.fit(X_train_scaled, y_train)
y_pred = etc_best.predict(X_test_scaled)

In [154]:
#Checking the metrics for above model

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
met (y_test,y_pred,'Extra Trees Classifier with best parameters')

 Name of model :  Extra Trees Classifier with best parameters

 Accuracy score of model :  0.9270


#Accuracy of the models evaluated :



*   Random Forest Classifier : 92.9%
*   Extra Trees Classifier : 92.8%
*   XGBoost Classifier : 91.95%
*   LGBM Classifier : 93.75%
*   Extra Trees Classifier (with best parameters) : 92.7%





In [159]:
#Extra Trees Classifier with optimal parameters has lower accuracy than Extra Trees Classifier with no Hyperparameter Tuning

In [160]:
#Finding F1 score

355/(355 + (0.5*(1480 + 45)))

0.31767337807606266

In [161]:
#Finding feature importances for Optimal Extra Trees Classifier model

fimp = etc_best.feature_importances_
fimp = pd.DataFrame(fimp, index = X_train.columns, columns = ['Feature Importance'])
fimp

Unnamed: 0,Feature Importance
tau1,0.13724
tau2,0.140508
tau3,0.13468
tau4,0.135417
p1,0.003683
p2,0.005337
p3,0.005429
p4,0.004962
g1,0.102562
g2,0.107578


In [162]:
#Most and least important features

print(fimp['Feature Importance'].idxmax())
print(fimp['Feature Importance'].idxmin())

tau2
p1
