<a href="https://colab.research.google.com/github/keshari112k/HDSC_2022_Stage_C_stability_of_the_grid_system/blob/main/HDSC_2022_Stage_C_Stability_of_the_Grid_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Name : Shivam Kesharwani

## Hamoye ID : 1476f9e72bc1f000

## Electrical grids require a balance between electricity supply and demand in order to be stable. Conventional systems achieve this balance through demand-driven electricity production. For future grids with a high share of inflexible (i.e., renewable) energy sources, the concept of demand response is a promising solution. This implies changes in electricity consumption in relation to electricity price changes. In this work, we’ll build a binary classification model to predict if a grid is stable or unstable using the UCI Electrical Grid Stability Simulated dataset.

In [233]:
import numpy as np
import pandas as pd

In [234]:
path = '/content/drive/MyDrive/ practice/Data_for_UCI_named.csv'
df = pd.read_csv(path)

In [235]:
df.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab,stabf
0,2.95906,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,0.055347,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.78176,-0.005957,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.27721,-0.920492,0.163041,0.766689,0.839444,0.109853,0.003471,unstable
3,0.716415,7.6696,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,0.028871,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.79711,0.45545,0.656947,0.820923,0.04986,unstable


It has 12 primary predictive features and two dependent variables.

Predictive features:

'tau1' to 'tau4': the reaction time of each network participant, a real value within the range 0.5 to 10 ('tau1' corresponds to the supplier node, 'tau2' to 'tau4' to the consumer nodes);

'p1' to 'p4': nominal power produced (positive) or consumed (negative) by each network participant, a real value within the range -2.0 to -0.5 for consumers ('p2' to 'p4'). As the total power consumed equals the total power generated, p1 (supplier node) = - (p2 + p3 + p4);

'g1' to 'g4': price elasticity coefficient for each network participant, a real value within the range 0.05 to 1.00 ('g1' corresponds to the supplier node, 'g2' to 'g4' to the consumer nodes; 'g' stands for 'gamma');


Dependent variables:

'stab': the maximum real part of the characteristic differential equation root (if positive, the system is linearly unstable; if negative, linearly stable);
'stabf': a categorical (binary) label ('stable' or 'unstable')

In [236]:
print("Rows {} Columns {}".format(df.shape[0], df.shape[1]))

Rows 10000 Columns 14


In [237]:
df.describe()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,5.25,5.250001,5.250004,5.249997,3.75,-1.25,-1.25,-1.25,0.525,0.525,0.525,0.525,0.015731
std,2.742548,2.742549,2.742549,2.742556,0.75216,0.433035,0.433035,0.433035,0.274256,0.274255,0.274255,0.274255,0.036919
min,0.500793,0.500141,0.500788,0.500473,1.58259,-1.999891,-1.999945,-1.999926,0.050009,0.050053,0.050054,0.050028,-0.08076
25%,2.874892,2.87514,2.875522,2.87495,3.2183,-1.624901,-1.625025,-1.62496,0.287521,0.287552,0.287514,0.287494,-0.015557
50%,5.250004,5.249981,5.249979,5.249734,3.751025,-1.249966,-1.249974,-1.250007,0.525009,0.525003,0.525015,0.525002,0.017142
75%,7.62469,7.624893,7.624948,7.624838,4.28242,-0.874977,-0.875043,-0.875065,0.762435,0.76249,0.76244,0.762433,0.044878
max,9.999469,9.999837,9.99945,9.999443,5.864418,-0.500108,-0.500072,-0.500025,0.999937,0.999944,0.999982,0.99993,0.109403


In [238]:
df.duplicated().sum()

0

# **Preprocessing**

In [239]:
#dropping the stab column
df.drop(columns=['stab'], inplace = True)

## encoding stabf

In [240]:
from sklearn.preprocessing import LabelEncoder
number = LabelEncoder()
df.stabf = number.fit_transform(df.stabf.astype('str'))
df.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stabf
0,2.95906,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,1
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.78176,0
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.27721,-0.920492,0.163041,0.766689,0.839444,0.109853,1
3,0.716415,7.6696,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,1
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.79711,0.45545,0.656947,0.820923,1


In [241]:
X = df.iloc[:,:12].values

In [242]:
X

array([[2.95906002, 3.0798852 , 8.38102539, ..., 0.85957811, 0.88744492,
        0.95803399],
       [9.30409723, 4.90252411, 3.04754073, ..., 0.86241408, 0.56213905,
        0.78175991],
       [8.97170691, 8.84842842, 3.04647875, ..., 0.76668866, 0.83944402,
        0.10985324],
       ...,
       [2.36403419, 2.84203025, 8.77639096, ..., 0.98650532, 0.14928646,
        0.14598403],
       [9.63151069, 3.9943976 , 2.75707093, ..., 0.58755755, 0.88911835,
        0.81839133],
       [6.53052662, 6.7817899 , 4.34969522, ..., 0.50544105, 0.37876093,
        0.94263083]])

In [243]:
y = df.iloc[:,-1].values
y

array([1, 0, 1, ..., 0, 1, 1])

## train test split

In [244]:
from sklearn.model_selection import train_test_split

In [245]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state = 1)

In [246]:
X_train.shape, y_train.shape

((8000, 12), (8000,))

In [247]:
X_test.shape, y_test.shape

((2000, 12), (2000,))

## Scaling

In [248]:
from sklearn.preprocessing import StandardScaler

In [249]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_train

array([[ 0.36732671, -0.98604156,  0.65044706, ...,  0.33985949,
         0.58556788,  0.49223946],
       [-0.06465869,  0.08943734,  1.03507899, ..., -1.5584875 ,
         1.42964862, -1.44352101],
       [-1.46785   ,  1.29841758, -0.50253617, ...,  1.45153362,
        -1.04574277,  0.49248925],
       ...,
       [ 0.65760851, -0.72275633, -1.4058879 , ...,  0.29310048,
        -1.55058661,  0.81034412],
       [-0.05931596, -1.26053241, -1.01047147, ..., -0.38825455,
        -0.72678059,  1.66791568],
       [-1.47321368,  0.63843757,  0.25012249, ..., -1.17410957,
         1.179282  ,  0.78362657]])

In [250]:
X_test = scaler.transform(X_test)
X_test

array([[ 0.59395058, -0.41273345,  1.50392381, ...,  1.1672034 ,
        -1.50732963,  1.08472557],
       [ 0.2021896 ,  0.37441634, -0.18880047, ..., -0.39566024,
         1.41465051,  1.22601069],
       [-1.079044  , -0.31374544, -0.88463426, ..., -1.43849538,
         0.65182081, -1.6821675 ],
       ...,
       [ 0.94782488, -1.66372653, -1.65391963, ...,  0.12639128,
         0.57344494,  1.31934985],
       [-1.1202346 ,  0.19397855, -0.2378051 , ...,  0.79408717,
        -1.36232268, -0.80197116],
       [-1.37764025,  1.51186671,  0.28265058, ..., -0.91749729,
         0.00295027,  1.18902334]])

In [251]:
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import roc_auc_score, confusion_matrix, accuracy_score, classification_report

import warnings
warnings.filterwarnings('ignore')


# **A. Random Forest Classifier and questions**

In [252]:
from sklearn.ensemble import RandomForestClassifier
rf_classifier = RandomForestClassifier(random_state=1)

In [253]:
rf_classifier.fit(X_train, y_train)

RandomForestClassifier(random_state=1)

In [254]:
rf_train_pred = rf_classifier.predict(X_train)
rf_pred = rf_classifier.predict(X_test)

In [255]:
print('Confusion Matrix (Random Forest Classifier):\n\n',confusion_matrix(y_test, rf_pred))
print('\n\nClassification Report:\n\n',classification_report(y_test, rf_pred))

Confusion Matrix (Random Forest Classifier):

 [[ 625   87]
 [  55 1233]]


Classification Report:

               precision    recall  f1-score   support

           0       0.92      0.88      0.90       712
           1       0.93      0.96      0.95      1288

    accuracy                           0.93      2000
   macro avg       0.93      0.92      0.92      2000
weighted avg       0.93      0.93      0.93      2000



##19. What is the accuracy on the test set using the random forest classifier? In 4 decimal places.

In [256]:
round(accuracy_score(y_test, rf_pred),4)

0.929

#**B. Extra Tree Classifier and questions**

In [257]:
from sklearn.ensemble import ExtraTreesClassifier

In [258]:
et_classifier = ExtraTreesClassifier(random_state=1)

In [259]:
et_classifier.fit(X_train, y_train)

ExtraTreesClassifier(random_state=1)

In [260]:
train_pred = et_classifier.predict(X_train)
et_pred = et_classifier.predict(X_test)

In [261]:
print('Confusion Matrix (Extra Tree Classifier):\n\n',confusion_matrix(y_test, et_pred))
print('\n\nClassification Report:\n\n',classification_report(y_test, et_pred))

Confusion Matrix (Extra Tree Classifier):

 [[ 606  106]
 [  38 1250]]


Classification Report:

               precision    recall  f1-score   support

           0       0.94      0.85      0.89       712
           1       0.92      0.97      0.95      1288

    accuracy                           0.93      2000
   macro avg       0.93      0.91      0.92      2000
weighted avg       0.93      0.93      0.93      2000



## 14. What is the accuracy on the test set using the Extra Tree classifier? In 4 decimal places.

In [262]:
round(accuracy_score(y_test, et_pred),4)

0.928

## 8. Using the ExtraTreesClassifier as your estimator with cv=5, n_iter=10, scoring = 'accuracy', n_jobs = -1, verbose = 1 and random_state = 1. What are the best hyperparameters from the randomized search CV?

N_estimators = 300 , min_samples_split = 5 , min_samples_leaf = 6, max_features = ‘auto’

N_estimators = 100 , min_samples_split = 7 , min_samples_leaf = 4, max_features = None

N_estimators = 500 , min_samples_split = 2 , min_samples_leaf = 8, max_features = ‘log2‘

N_estimators = 1000 , min_samples_split = 2 , min_samples_leaf = 8, max_features = None

In [263]:
et_classifier1 = ExtraTreesClassifier(random_state=1)
grid_values = {'n_estimators' : [100, 300, 500, 1000], 'min_samples_split' : [7, 5, 2, 2], 'min_samples_leaf' : [4, 6, 8, 8], 'max_features' : [None, 'auto','log2', None]}
rand_search = RandomizedSearchCV(et_classifier1, param_distributions=grid_values, cv = 5, n_iter = 10, scoring = 'accuracy', n_jobs = -1, verbose = 1, random_state=1)

In [264]:
rand_search.fit(X_train, y_train)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


RandomizedSearchCV(cv=5, estimator=ExtraTreesClassifier(random_state=1),
                   n_jobs=-1,
                   param_distributions={'max_features': [None, 'auto', 'log2',
                                                         None],
                                        'min_samples_leaf': [4, 6, 8, 8],
                                        'min_samples_split': [7, 5, 2, 2],
                                        'n_estimators': [100, 300, 500, 1000]},
                   random_state=1, scoring='accuracy', verbose=1)

In [265]:
rand_search.best_params_

{'max_features': None,
 'min_samples_leaf': 6,
 'min_samples_split': 2,
 'n_estimators': 1000}

##6. Train a new ExtraTreesClassifier Model with the new Hyperparameters from the RandomizedSearchCV (with random_state = 1). Is the accuracy of the new optimal model higher or lower than the initial ExtraTreesClassifier model with no hyperparameter tuning?

In [266]:
et1_pred = rand_search.predict(X_test)

In [267]:
round(accuracy_score(y_test, et1_pred), 4)

0.932

### Accuracy of new optimal model is higher than the Extra Tree Classifier model with no hyperparameter tuning.
### Accuracy without hyperparameter tuning is 0.928
### Accuracy with hyperparameter tuning is 0.932


##20. Find the feature importance using the optimal ExtraTreesClassifier model. Which features are the most and least important respectively?

In [269]:
cols = df.drop(columns=['stabf']).columns
cols

Index(['tau1', 'tau2', 'tau3', 'tau4', 'p1', 'p2', 'p3', 'p4', 'g1', 'g2',
       'g3', 'g4'],
      dtype='object')

In [270]:
feature_imp = rand_search.best_estimator_.feature_importances_
feature_imp

array([0.13546158, 0.13842146, 0.13312751, 0.13396645, 0.00535433,
       0.00743871, 0.00728074, 0.00687441, 0.10306406, 0.10798251,
       0.11231919, 0.10870905])

In [271]:
print('Most important features:',max(zip(feature_imp,cols)))
print('Least important features:',min(zip(feature_imp,cols)))

Most important features: (0.13842145507674694, 'tau2')
Least important features: (0.005354328485013283, 'p1')


#**C. XGboost and questions**

In [284]:
from xgboost import XGBClassifier

In [285]:
xgb_classifier = XGBClassifier(random_state=1)

In [286]:
xgb_classifier.fit(X_train, y_train)

XGBClassifier(random_state=1)

In [287]:
train_pred = xgb_classifier.predict(X_train)
xgb_pred = xgb_classifier.predict(X_test)

In [288]:
print('Confusion Matrix (XGboost Classifier):\n\n',confusion_matrix(y_test, xgb_pred))
print('\n\nClassification Report:\n\n',classification_report(y_test, xgb_pred))

Confusion Matrix (XGboost Classifier):

 [[ 603  109]
 [  52 1236]]


Classification Report:

               precision    recall  f1-score   support

           0       0.92      0.85      0.88       712
           1       0.92      0.96      0.94      1288

    accuracy                           0.92      2000
   macro avg       0.92      0.90      0.91      2000
weighted avg       0.92      0.92      0.92      2000



##10. What is the accuracy on the test set using the XGboost classifier? In 4 decimal places.

In [289]:
round(accuracy_score(y_test, xgb_pred),4)

0.9195

#**D. LightGBM an questions**

In [290]:
from lightgbm import LGBMClassifier
lgbm_classifier = LGBMClassifier(random_state=1)

In [291]:
lgbm_classifier.fit(X_train, y_train)

LGBMClassifier(random_state=1)

In [292]:
train_pred = lgbm_classifier.predict(X_train)
lgbm_pred = lgbm_classifier.predict(X_test)

In [293]:
round(accuracy_score(y_train, train_pred),4)

0.9982

In [294]:
print('Confusion Matrix (lightGBM Classifier):\n\n',confusion_matrix(y_test, lgbm_pred))
print('\n\nClassification Report:\n\n',classification_report(y_test, lgbm_pred))

Confusion Matrix (lightGBM Classifier):

 [[ 635   77]
 [  48 1240]]


Classification Report:

               precision    recall  f1-score   support

           0       0.93      0.89      0.91       712
           1       0.94      0.96      0.95      1288

    accuracy                           0.94      2000
   macro avg       0.94      0.93      0.93      2000
weighted avg       0.94      0.94      0.94      2000




##14. What is the accuracy on the test set using the LGBM classifier? In 4 decimal places.

In [295]:
round(accuracy_score(y_test, lgbm_pred),4)

0.9375