<a href="https://colab.research.google.com/github/ednamilgo/EASY_ML/blob/main/Classification_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Building A classification Model

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

steps:

1. Load the dataset.
2. Preprocess the data -remove null values
3. Split into features/input and target/output variable
4. Split the dataset into training and testing sets
5. Create a classification model
6. Fit the model to the dataset
7. Make predictions using the model
8. Evaluate the model

# Step 1:  Load the Electrical Fault detection Dataset

We will use the Electrical Fault detection Dataset. This Dataset has been shared and can also be downloaded from https://www.kaggle.com/datasets/esathyaprakash/electrical-fault-detection-and-classification?select=detect_dataset.csv


The electrical power system comprises numerous complex, dynamic, and interdependent elements that are constantly susceptible to disturbances or faults. Faults in transmission lines must be accurately detected, classified, and resolved in the shortest possible time to ensure the system's reliability and efficiency.

The dataset contains the following

Inputs - [Ia,Ib,Ic,Va,Vb,Vc]

Outputs - 0 (No-fault) or 1(Fault is present)

In [None]:
detectfalt_df = pd.read_csv('detect_dataset.csv')

In [None]:
# Inspect the first 5 rows to understand the structure
detectfalt_df.head()

Unnamed: 0,Output (S),Ia,Ib,Ic,Va,Vb,Vc,Unnamed: 7,Unnamed: 8
0,0,-170.472196,9.219613,161.252583,0.05449,-0.659921,0.605431,,
1,0,-122.235754,6.168667,116.067087,0.102,-0.628612,0.526202,,
2,0,-90.161474,3.813632,86.347841,0.141026,-0.605277,0.464251,,
3,0,-79.904916,2.398803,77.506112,0.156272,-0.602235,0.445963,,
4,0,-63.885255,0.590667,63.294587,0.180451,-0.591501,0.41105,,


In [None]:
detectfalt_df.info() # get the information of the dataset

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12001 entries, 0 to 12000
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Output (S)  12001 non-null  int64  
 1   Ia          12001 non-null  float64
 2   Ib          12001 non-null  float64
 3   Ic          12001 non-null  float64
 4   Va          12001 non-null  float64
 5   Vb          12001 non-null  float64
 6   Vc          12001 non-null  float64
 7   Unnamed: 7  0 non-null      float64
 8   Unnamed: 8  0 non-null      float64
dtypes: float64(8), int64(1)
memory usage: 843.9 KB


In [None]:
detectfalt_df.describe()

Unnamed: 0,Output (S),Ia,Ib,Ic,Va,Vb,Vc,Unnamed: 7,Unnamed: 8
count,12001.0,12001.0,12001.0,12001.0,12001.0,12001.0,12001.0,0.0,0.0
mean,0.457962,6.709369,-26.557793,22.353043,0.010517,-0.015498,0.00498,,
std,0.49825,377.15847,357.458613,302.052809,0.346221,0.357644,0.349272,,
min,0.0,-883.542316,-900.526951,-883.357762,-0.620748,-0.659921,-0.612709,,
25%,0.0,-64.348986,-51.421937,-54.562257,-0.23761,-0.313721,-0.278951,,
50%,0.0,-3.239788,4.711283,-0.399419,0.002465,-0.007192,0.008381,,
75%,1.0,53.823453,69.637787,45.274542,0.285078,0.248681,0.289681,,
max,1.0,885.738571,889.868884,901.274261,0.609864,0.627875,0.608243,,


# Step 2: Data Preprocessing

Different strategies for preproceesing needs to be applied. e.g in this particular dataset columns Unnamed: 7	 and Unnamed: * seem not to be useful. We can remove

In [None]:
detectfalt_df = detectfalt_df.drop('Unnamed: 7', axis=1) # Remove column Unnamed: 7
detectfalt_df = detectfalt_df.drop('Unnamed: 8', axis=1) # Remove Column Unnamed: 8
print(detectfalt_df.head()) # print the first 5 lines to see if the columns have been removed

   Output (S)          Ia        Ib          Ic        Va        Vb        Vc
0           0 -170.472196  9.219613  161.252583  0.054490 -0.659921  0.605431
1           0 -122.235754  6.168667  116.067087  0.102000 -0.628612  0.526202
2           0  -90.161474  3.813632   86.347841  0.141026 -0.605277  0.464251
3           0  -79.904916  2.398803   77.506112  0.156272 -0.602235  0.445963
4           0  -63.885255  0.590667   63.294587  0.180451 -0.591501  0.411050


# Step 3 : Step Split the dataset into input and output

We need to separate the dataset into the features (X) and the target variable (y). The target variable is the Output (S) column

In [None]:
# Separate features (X) and target variable (y)
X = detectfalt_df[[ 'Ia' ,'Ib' , 'Ic' ,'Va' , 'Vb' ,'Vc']]  # All features except ''Output (S)''
y = detectfalt_df['Output (S)']  # The target variable is the Output (S) column

# Step 4: Split the dataset into training and testing sets

Next we will split the inputs and outputs into traing and testing set. The default ration is 80% for training and 20% for testing



In [None]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Check the shape of the training and test sets
(X_train.shape, X_test.shape)

((9600, 6), (2401, 6))

# Step 5:  Building and Training the Model

We will initialize a logistic regression model

In [None]:
# Initialize the Linear Regression model
lr_model = LogisticRegression()

#Step 6: Train the model on the training data

In [None]:
# Train the model on the training data
lr_model.fit(X_train, y_train)

# Step 7: Make predictions on the test set

In [None]:
# Make predictions on the test set
y_pred = lr_model.predict(X_test)

# Step 8:  Model Evaluation

Next, we will evaluate the model's performance using
 accuracy_score

In [None]:
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

Accuracy: 0.74
