## Electrical Grid Stability Simulated

**Description of Dataset**

Stability of the Grid System Electrical grids require a balance between electricity supply and demand in order to be stable. Conventional systems achieve this balance through demand-driven electricity production. For future grids with a high share of inflexible (i.e., renewable) energy sources, the concept of demand response is a promising solution. This implies changes in electricity consumption in relation to electricity price changes. In this work, we’ll build a binary classification model to predict if a grid is stable or unstable using the UCI Electrical Grid Stability Simulated dataset.

**Dataset**: https://archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data

It has 12 primary predictive features and two dependent variables.

**Predictive features:**

**1.** 'tau1' to 'tau4': the reaction time of each network participant, a real value within the range 0.5 to 10 ('tau1' corresponds to the supplier node, 'tau2' to 'tau4' to the consumer nodes);    

**2.** 'p1' to 'p4': nominal power produced (positive) or consumed (negative) by each network participant, a real value within the range -2.0 to -0.5 for consumers ('p2' to 'p4'). As the total power consumed equals the total power generated, p1 (supplier node) = - (p2 + p3 + p4);

**3.** 'g1' to 'g4': price elasticity coefficient for each network participant, a real value within the range 0.05 to 1.00 ('g1' corresponds to the supplier node, 'g2' to 'g4' to the consumer nodes; 'g' stands for 'gamma');

**Dependent variables:**

**1.**'stab': the maximum real part of the characteristic differential equation root (if positive, the system is linearly unstable; if negative, linearly stable);     

**2.**'stabf': a categorical (binary) label ('stable' or 'unstable').
Because of the direct relationship between 'stab' and 'stabf' ('stabf' = 'stable' if 'stab' <= 0, 'unstable' otherwise), 'stab' should be dropped and 'stabf' will remain as the sole dependent variable (binary classification).

Split the data into an 80-20 train-test split with a random state of “1”. Use the standard scaler to transform the train set (x_train, y_train) and the test set (x_test). Use scikit learn to train a random forest and extra trees classifier. And use xgboost and lightgbm to train an extreme boosting model and a light gradient boosting model. Use random_state = 1 for training all models and evaluate on the test set.

In [2]:
#importing libraries
import pandas as pd
import numpy as np

In [3]:
# Read the data called 'df' and check its head
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/00471/Data_for_UCI_named.csv')
df.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab,stabf
0,2.95906,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,0.055347,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.78176,-0.005957,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.27721,-0.920492,0.163041,0.766689,0.839444,0.109853,0.003471,unstable
3,0.716415,7.6696,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,0.028871,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.79711,0.45545,0.656947,0.820923,0.04986,unstable


In [4]:
# check the shape of data
df.shape

(10000, 14)

In [5]:
# check the none values
df.isna().sum()

tau1     0
tau2     0
tau3     0
tau4     0
p1       0
p2       0
p3       0
p4       0
g1       0
g2       0
g3       0
g4       0
stab     0
stabf    0
dtype: int64

In [6]:
# description of the data
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   tau1    10000 non-null  float64
 1   tau2    10000 non-null  float64
 2   tau3    10000 non-null  float64
 3   tau4    10000 non-null  float64
 4   p1      10000 non-null  float64
 5   p2      10000 non-null  float64
 6   p3      10000 non-null  float64
 7   p4      10000 non-null  float64
 8   g1      10000 non-null  float64
 9   g2      10000 non-null  float64
 10  g3      10000 non-null  float64
 11  g4      10000 non-null  float64
 12  stab    10000 non-null  float64
 13  stabf   10000 non-null  object 
dtypes: float64(13), object(1)
memory usage: 1.1+ MB


In [7]:
# Drop "stab" column and check head values
df.drop('stab', axis=1, inplace=True)
df.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stabf
0,2.95906,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.78176,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.27721,-0.920492,0.163041,0.766689,0.839444,0.109853,unstable
3,0.716415,7.6696,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.79711,0.45545,0.656947,0.820923,unstable


In [8]:
#split features and target variable
X = df.drop('stabf', axis=1)
y = df['stabf']

In [9]:
#spliting data into training and testing
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size= 0.2 , random_state= 1 )
print('X_train shape: {}'.format(x_train.shape))
print('y_train shape: {}'.format(y_train.shape))
print('X_test shape: {}'.format(x_test.shape))
print('y_test shape: {}'.format(y_test.shape))

X_train shape: (8000, 12)
y_train shape: (8000,)
X_test shape: (2000, 12)
y_test shape: (2000,)


In [10]:
#scaling dataset using standard scaler
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

In [11]:
# using standard scaler to transform train and test
x_train_scaler = scaler.fit_transform(x_train)
x_test_scaler = scaler.transform(x_test)

In [12]:
# scaled data sets into a dataframe
x_train_scaled = pd.DataFrame(x_train_scaler, columns = x_train.columns)
x_test_scaled = pd.DataFrame(x_test_scaler, columns = x_test.columns)

---
---

### F1_score

In [13]:
# f1_score
Tp = 255
Tn = 20
Fp = 1380
Fn = 45
precision = Tp / (Tp + Fp)
recall= Tp / (Tp + Fn)
f1_score=  2*((precision* recall)/(precision + recall))

round(f1_score,4)

0.2636

### RandomForestClassifier

In [14]:
# Random Forest Classifier
from sklearn.ensemble import RandomForestClassifier
random_forest = RandomForestClassifier(random_state = 1)
random_forest.fit(x_train_scaled, y_train)

In [15]:
#make predictions on test set
randf_pred = random_forest.predict(x_test_scaled)

In [20]:
#model accuracy
from sklearn.metrics import recall_score, accuracy_score, f1_score
RFC_accuracy = accuracy_score(y_true=y_test, y_pred=randf_pred)
print('Accuracy: {}'.format(RFC_accuracy,4))
#print('Accuracy: {}'.format(round(RFC_accuracy*100), 4))

Accuracy: 0.929


### XGBClassifier

In [17]:
#pip uninstall xgboost 

Found existing installation: xgboost 1.7.4
Uninstalling xgboost-1.7.4:
  Would remove:
    /usr/local/lib/python3.8/dist-packages/xgboost-1.7.4.dist-info/*
    /usr/local/lib/python3.8/dist-packages/xgboost.libs/libgomp-a34b3233.so.1.0.0
    /usr/local/lib/python3.8/dist-packages/xgboost/*
Proceed (Y/n)? y
  Successfully uninstalled xgboost-1.7.4


In [18]:
#pip install xgboost==0.90

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting xgboost==0.90
  Downloading xgboost-0.90-py2.py3-none-manylinux1_x86_64.whl (142.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m142.8/142.8 MB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: xgboost
Successfully installed xgboost-0.90


In [16]:
from xgboost import XGBClassifier
xgbc= XGBClassifier(random_state = 1)

#fit on train set
xgbc.fit(x_train_scaled, y_train)

In [17]:
#predict on test set
xgbc_pred = xgbc.predict(x_test_scaled)

In [18]:
#Accuracy
xgbc_accuracy = accuracy_score(y_test, xgbc_pred)
print('Accuracy: {}'.format(xgbc_accuracy))
#print('Accuracy: {}'.format(round(xbg_accuracy*100), 2))

Accuracy: 0.9195


### LGBM Classifier

In [19]:
from lightgbm import LGBMClassifier

lgbm = LGBMClassifier(random_state = 1)

#fit on train set
lgbm.fit(x_train_scaled, y_train)

#predict on test set
lgbm_pred = lgbm.predict(x_test_scaled)

In [20]:
# model accuracy
lgbm_accuracy = accuracy_score(y_test, lgbm_pred)
#print('Accuracy: {}'.format(round(lgbm_accuracy*100), 2))
print('Accuracy: {}'.format(lgbm_accuracy))

Accuracy: 0.9375


### ExtraTreesClassifier

In [21]:
# importing extra tree classifier
from sklearn.ensemble import ExtraTreesClassifier
ETC = ExtraTreesClassifier(random_state=1)

#fit train set
ETC.fit(x_train_scaler, y_train)

# predict test set
ETC_pred = ETC.predict(x_test_scaler)


In [22]:
# accuracy of test set
ETC_accuracy = accuracy_score(y_true=y_test, y_pred=ETC_pred)
print('Accuracy: {}'.format(ETC_accuracy))

Accuracy: 0.928


Best hyperparameters randomized search CV

In [23]:
# parameters range to find the best hyperparameters

n_estimators = [50, 100, 300, 500, 1000]
min_samples_split = [2, 3, 5, 7, 9]
min_samples_leaf = [1, 2, 4, 6, 8]
max_features = ['auto', 'sqrt', 'log2', None]

hyperparameters = {'n_estimators': n_estimators,'min_samples_leaf': min_samples_leaf,
                   'min_samples_split': min_samples_split,'max_features': max_features}

In [24]:
#import randomized search CV
from sklearn.model_selection import RandomizedSearchCV
RS_CV =  RandomizedSearchCV(ETC,hyperparameters, cv=5, n_iter=10, scoring = 'accuracy', n_jobs = -1, verbose = 1, random_state = 1)

#fit train set
RS_CV1 = RS_CV.fit(x_train_scaler, y_train)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


In [25]:
# best hyperparameters from the randomized search CV
RS_CV1.best_params_

{'n_estimators': 1000,
 'min_samples_split': 2,
 'min_samples_leaf': 8,
 'max_features': None}

new optimal ExtraTreesClassifier

In [26]:
#new optimal ExtraTreesClassifier
from sklearn.ensemble import ExtraTreesClassifier
ETC1 = ExtraTreesClassifier(max_features=None,min_samples_leaf=8,min_samples_split=2,n_estimators=1000,random_state=1)

#fit train set
ETC1.fit(x_train_scaler, y_train)

# predict test set
ETC1_pred = ETC1.predict(x_test_scaler)

In [27]:
#model accuracy
etc_accuracy = accuracy_score(y_test, ETC1_pred)
#print('Accuracy: {}'.format(round(etc_accuracy*100), 2))

print('Accuracy: {}'.format(etc_accuracy))

Accuracy: 0.927


Accuracy of new optimal ExtraTreesClassifier model is **lower** than the initial ExtraTreesClassifier mode with no hyperparameter tuning

Most and least importance Feature using optimal ExtraTreesClassifier model. 

In [28]:
importance_feature  = X.columns

# importance feature
feature = pd.DataFrame(ETC.feature_importances_,index=importance_feature)
feature1 = feature.sort_values(0)
feature1

Unnamed: 0,0
p1,0.039507
p2,0.040371
p4,0.040579
p3,0.040706
g1,0.089783
g2,0.093676
g4,0.094019
g3,0.096883
tau3,0.113169
tau4,0.115466


In [29]:
# most important feature
print('most important feature: {}'.format(feature1.idxmax()))

# least important feature
print('least important feature: {}'.format(feature1.idxmin()))

most important feature: 0    tau2
dtype: object
least important feature: 0    p1
dtype: object
