# EDA on Modbus Network Dataset

## Data Cleaning

#### Importing pandas, numpy and reading the dataset as Pandas Dataframe :

In [None]:
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv('../data/Modbus_Data_EXP.csv')

In [None]:
df.head()

#### Collecting the very basic info of the dataset :

In [None]:
df.info()

The dataframe contains 311496 rows and 102 columns.
7 columns with float dtype
83 columns with int dtype
12 columns with object dtype

In [None]:
df.describe()

In [None]:
for col in df:
    print(col, df[col].dtype)

#### Checking the null values :

In [None]:
for col in df:
    print(col, df[col].isnull().sum())

The Dataframe only consists prominent null values in the last 9 columns, in which 3 of them have all of their values as null.

#### Checking the unique values :

In [None]:
for col in df:
    print(col, df[col].nunique())

#### Dealing with null values by deletion :

In [None]:
for col in df.columns:
    if 'Unnamed' in col:
        if  df[col].isna().sum() == df.shape[0]:
            df = df.drop(columns=[col])

Deleting the Unnamed columns with complete column as NULL.

In [None]:
for col in df:
    print(col, df[col].isnull().sum())

#### Dealing null values by Mean and Mode :

In [None]:
mode_value = df['count_of_write_op.1'].mean()  
df['count_of_write_op.1'].fillna(mode_value, inplace=True) 

mode_value = df['count_of_read_op.1'].mean() 
df['count_of_read_op.1'].fillna(mode_value, inplace=True) 

These columns having null values replaced by mean of the column values because most of them were in between 0 and 1 and can be significantly important to the dataset.

In [None]:
mean_value = df['count_of_other_frame.1'].mode().iloc[0] 
df['count_of_other_frame.1'].fillna(mean_value, inplace=True)  

mean_value = df['count_of_diagnostic_op.1'].mode().iloc[0]  
df['count_of_diagnostic_op.1'].fillna(mean_value, inplace=True)  

These columns having null values replaced by mode of the column values because most of them were either 0 or 1 and can be significantly important to the dataset.

In [None]:
for col in df:
    print(col, df[col].isnull().sum())

No Null values left in the Dataset.

#### Renaming the Unnamed columns :

In [None]:
unnamed_columns = [col for col in df.columns if 'Unnamed' in col]
new_column_names = ['NewName1', 'NewName2']
df.columns = [new_column_names.pop(0) if col in unnamed_columns else col for col in df.columns]

In [None]:
df.info()

The dataframe has been cleaned and the final info can be analysed as above.

#### Saving the cleaned df into a csv :

In [None]:
df.to_csv('Cleaned_Modbus_data.csv')

Data Cleaning Done !