# Gear Box Fault Detection

Gearbox Fault Diagnosis Data set include the vibration dataset recorded by using SpectraQuest’s Gearbox Fault Diagnostics Simulator. Dataset has been recorded with the help of 4 vibration sensors placed in four different direction. Data set has been recorded under variation of load from '0' to '90' percent. Data set has been recorded in two different scenario: 

1) Healthy condition and 

2) Broken Tooth Condition


In [1]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [21]:
# name_list = []
# for root, dirs, files in os.walk("."):  
#     for filename in files:
#         name_list.append(filename)
        
# name_list.remove('Untitled.ipynb')
# name_list.remove('.DS_Store')
# name_list.remove('Untitled-checkpoint.ipynb')

### 1  Load_data Function

Load_data function will load a text file in the data folder and will open as a Pandas DataFrame. The file can be in '.txt' or '.csv' extention file. This function developed to be used in the Join_table function. Input arguments for the function is 

- file_path = file name without the location of the file. Keep the files in the same directory of the python file. 

In [46]:

def Load_data(file_path):
    
    path = file_path.lower().split(".")
    
    if path[1] == "csv":
        freq = int(path[0][1:3])
        load = int(path[0][5:])
        
        df = pd.read_csv(file_path, header= None, sep = ",")
        if df.shape[1] == 5:
            df.columns = ["S1", "S2", "S3", "S4", "S5"]
            
        df = df.drop(["S5"], axis = 1)
        df["Freq"] = [freq]*df.shape[0]
        df["Load_percent"] = [load]*df.shape[0]
        if path[0] == "b":
            df["Fault"] = [1]*df.shape[0]
        elif path[0] == "h":
             df["Fault"] = [0]*df.shape[0]
    
    elif path[1] == "txt":
        freq = int(path[0][1:3])
        load = int(path[0][5:])
        
        df = pd.read_csv(file_path, header= None, sep = "\t")
        df.columns = ["S1", "S2", "S3", "S4", "S5"]
        df = df.drop(["S5"], axis = 1)
        df["Freq"] = [freq]*df.shape[0]
        df["Load_percent"] = [load]*df.shape[0]
        if path[0][0] == "b":
            df["Fault"] = [1]*df.shape[0]
        elif path[0][0] == "h":
             df["Fault"] = [0]*df.shape[0]
        
    else:
        print("Invald format of file. Make sure file is either csv or txt.")
        
    return df

### 2 Join_table funciton

Join_table function will load all the files present in the file list given as an argument and will concat them all as a pandas Dataframe. The only argument for this function is as follows :- 

- file_list = list of all the files to be loaded. All files extentions must be either ".txt" or ".csv"

In [47]:
def Join_table(file_list):
    
    df = Load_data(file_list[0])
    
    for file in file_list[1:]:
        data = Load_data(file)
        df = pd.concat([df, data], axis = 0)

    df = df.reset_index().drop(["index"], axis = 1)
        
    return df
    

### 3 Trimming function

Dataset consist of 10 files of faulty gearbox data and healty gearbox data each at different load percentage. Each of these files has different number of data points beacuse of the different time length of experiment. Hence, to make each load percentage experiment data of same length, we need to bring all the dataset at the same dimension. Hence, this function will select first 87000 datapoints from each file. 
$$ 290 \times 300 = 87000 $$

In [113]:
## Function to trim each load to 87000 points. 
def Trimming(data):
    
    empty = data[:0]
    for load in np.linspace(0,90,10):
        dff = data[data["Load_percent"] == load ]
        df = dff[:87000]
        empty = pd.concat([empty, df], axis = 0)
    
    return empty

### 4 Feature Creation from frequency domain dataset

The shape of the dataset after trimming is $(1740000, 7)$. The dataset shall be seperate on the basis of fault, i.e. "Healthy" or "Broken". From this data, we have to create feature which shall be used to train our machine learning algorithms to predict the type of fault. To create the features, Feature_create function is used. This function uses various sub functions, which are as follows: - 

1) FFT : - FFT function is to convert a signal from time domain to frequency domain. This function takes an pandas Dataframe as an input argument and spit out an absolute value of FFT in Pandas Dataframe format. 

2) Concate_features : -  This function is the sub-function of the RMS function. It is used to rearrange the the RMS values of frequency of each sensor and rename the columns as $ S*_F# $, where * is the sensor number and # is frequency number (ranging from 1 to 15 Hz). 

3) RMS : - This function is responsible to take the RMS values of all the datapoint lies in the frequency bin of a single frequency. In this setting, each bin contains 10 datapoints. 


Using above mentioned functions, Feature_create function create the features for the given dataset. 
As an output, you get 60 features, 15 for each sensors. 

The logic of the function is as follows:- 

- Separate the data on the basis of load_percentage. Output should have 87000 rows. 

- A subset of 300 datapoints from the separated dataset is used for the FFT. Thus likewise, we get 290 chunks of subset. 

- This subset of dataset is used for FFT. 


In [157]:
## 4.1
def FFT(data):
    col = data.columns.values
    df_fft = np.fft.fft(data, axis = 0)
    abs_df = pd.DataFrame(abs(df_fft), columns= col)
    
    return abs_df

In [258]:
## 4.2
def Concate_features(x):
    index_list = ["S" + str(n) for n in np.arange(1,5)]
    dff = pd.DataFrame({})
    for index in index_list:

        xx = x.loc[[index]]
        col = [index + "_" + "F" + str(n) for n in np.arange(1,16)]
        xx.columns = col 
        xx = xx.reset_index().drop(["index"], axis = 1)
        dff = pd.concat([dff, xx], axis = 1)
    return dff

In [319]:
## 4.3

def RMS(data):
    
    col = ["F"+ str(n) for n in np.arange(1,16)]
    
    x = data[0:10]
    x = x.applymap(lambda a : a**2)
    x = pd.DataFrame((x.sum()/10),columns=["F1"]).applymap(lambda a : math.sqrt(a))

    for n in np.arange(1,15):
        y = data[n*0:(n+1)*10]
        y = y.applymap(lambda a : a**2)
        y = pd.DataFrame((y.sum()/10) , columns= [col[n]]).applymap(lambda a : math.sqrt(a))
        x = pd.concat([x, y], axis = 1)
    
    df = Concate_features(x)
    return df

In [321]:
## 4.0

def Feature_create(data):
    df = pd.DataFrame({})
    for load in np.linspace(0,90,10):
        dff = data[data["Load_percent"] == load ]
        X = dff[['S1', 'S2', 'S3', 'S4']]
        for n in range(290):
            time_domain = X[n*300:(n+1)*300]
            freq_domain = FFT(time_domain)
            f = freq_domain[0:150]
            f = RMS(f)
            
            df = pd.concat([df, f], axis = 0 )
            
    return df 

In [322]:
"""Loading the file name list. The File_name.csv is the file with all the file names created separately """

name_list = pd.read_csv("File_name.csv", header = None).T
name_list = name_list.values
name_list = list(name_list[0])

In [323]:
"""Code 2 """
data = Join_table(file_list= name_list)

In [325]:
"""Seperating the dataset on the basis of Fault, i.e. Healthy and broken."""
healthy = data[data["Fault"] == 0]
broken = data[data["Fault"] == 1]

In [326]:
"""Code 3"""
df_health = Trimming(healthy)
df_broken = Trimming(broken)

In [327]:
"""Code 4 """

health_data = Feature_create(df_health)
health_data = health_data.reset_index().drop(["index"], axis=1) ## Reset the index inorder to concat.
broken_data = Feature_create(df_broken)
broken_data = broken_data.reset_index().drop(["index"], axis=1) ## Reset the index inorder to concat.

In [328]:
"""Creating Load_percentage data to join the main dataset."""

load_data = pd.DataFrame(pd.DataFrame([[n]*290 for n in np.arange(0,100,10)]).values.flatten(), columns= ["Load"])

In [330]:
"""Concating Load_percentage and Fault data with the healthy and broken dataset"""

healthy = pd.concat([health_data, load_data], axis = 1)
broken = pd.concat([broken_data, load_data], axis = 1)

healthy["Fault"] = [0]*healthy.shape[0]
broken["Fault"] = [1]*broken.shape[0]

In [331]:
final_data = pd.concat([healthy, broken], axis = 0)

In [333]:
final_data.to_csv("Processed_final_data.csv", index= False)