# Reading the original data.
In this file, we show how to use the utility function provided in utility.py to read data and perform basic preprocessing. We use the function get_data_from_excel from utility.py to import data and do some basic preprocessing.

Syntax: df_org = get_data_from_excel(filename)
* filename: A string indicating the path of the file to be read.
* df_org - Dataframe containing the original data from the data sheet.

Here is the script for importing data using the utility functions.

In [1]:
from utility import get_data_from_excel

filename = r'C:\Users\Zhiguo\OneDrive - CentraleSupelec\Code\Python\ge_case_study\2022_ST4\XFD_freq_replacement - Names.xlsx'
df_org = get_data_from_excel(filename)
display(df_org)

Unnamed: 0,SN,MFG date,Repair PO,Repair Date,V700 Mosfet,Gate driver,Plastic Protection,V700 busbar,V700 filter board,EDLC,...,IGBT,V700 Rectifier board,Total Components,NbTime repaired,Previous repair po,Previous Repair Date,Failure_date,ITEM,Install_date,Duration(days)
0,101096WH6,2016-12-08,760137409,2020-08-05,0,0,0,0,0,1,...,0,0,1,0,MFG,2016-12-08,2020-06-12,5341543-2-R,2016-12-08,1282
1,101097WH4,2016-12-08,600809572,2019-05-24,0,0,0,0,0,1,...,0,0,1,0,MFG,2016-12-08,2019-05-24,5341543-2-R,2016-12-08,897
2,101100WH6,2016-12-08,600816726,2019-11-04,0,0,0,0,0,1,...,0,0,1,0,MFG,2016-12-08,2019-01-22,5341543-2-R,2016-12-08,775
3,101101WH4,2016-12-08,600850651,2020-02-03,0,0,0,0,0,1,...,0,0,1,0,MFG,2016-12-08,2020-02-03,5341543-2-R,2016-12-08,1152
4,101102WH2,2016-12-08,600868262,2020-04-07,0,0,0,0,0,1,...,0,0,1,0,MFG,2016-12-08,2020-04-07,5341543-2-R,2016-12-08,1216
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3438,99862WH5,2016-12-08,600894310,2020-08-31,0,0,0,0,0,1,...,0,0,1,1,600797780,2019-03-22,2020-08-31,5341543-2-R,2019-03-22,528
3439,99863WH3,2010-05-26,600725063,2018-02-15,0,0,0,0,0,1,...,0,0,2,0,MFG,2010-05-26,2018-02-15,5341543-2,2010-05-26,2822
3440,99864WH1,2016-12-08,600901223,2020-10-12,0,0,0,0,0,1,...,0,0,1,0,MFG,2016-12-08,2020-08-26,5341543-2-R,2016-12-08,1357
3441,99867WH4,2016-12-08,600755139,2018-08-23,1,1,0,1,0,1,...,0,0,4,0,MFG,2016-12-08,2018-08-23,5341543-2-R,2016-12-08,623


# Get useful information for part reliability estimation.

Then, we can use get_part_data() function to do some preprocessing on the part level data. Syntax: df_part = get_part_data(filename)
* filename: A string indicating the path of the file to be read.
* df_part - Dataframe containing the needed information for estimating the part reliability.

In [2]:
from utility import get_part_data

filename = r'C:\Users\Zhiguo\OneDrive - CentraleSupelec\Code\Python\ge_case_study\2022_ST4\XFD_freq_replacement - Names.xlsx'
df_part = get_part_data(filename)

In the original data Table (df_org), each row corresponds to one part failure. For most of the part in the original data Table, they do not fail until today since the last repair. These data should be treated as right censored data. In the returned dataframe (df_part):
* SN is the series number of the part;
* Censoring indicates whether this row is a failure or is a right censoring event;
* Repair History indicates whether this part has been repaired before: if it is 'Prime', that means before the current failure, the part was not failed (and repaired); while 'Repaired' means that before the current failure event, the part has failed and was repaired at least once.
* Duration records the time since the previous repair (for the 'prime' parts, the time since instalation);
* ITEM shows which types the part is. There are two types of parts with '5341543-2' and '5341543-52'. They are the same part, but different generations with some component/technology upgrades.

In [3]:
display(df_part)

Unnamed: 0,SN,Censoring,Repair History,Duration(days),ITEM
0,101096WH6,0,Prime,1282,5341543-2
1,101096WH6,1,Repaired,667,5341543-2
2,101097WH4,0,Prime,897,5341543-2
3,101097WH4,1,Repaired,1106,5341543-2
4,101100WH6,0,Prime,775,5341543-2
...,...,...,...,...,...
6156,99864WH1,0,Prime,1357,5341543-2
6157,99864WH1,1,Repaired,599,5341543-2
6158,99867WH4,0,Prime,623,5341543-2
6159,99867WH4,0,Repaired,1105,5341543-2


## Get useful information for component reliability estimation.

In the original data Table (df_org), each row corresponds to one part failure, and each column corresponds to a component. When used them to estimate the component reliability, we should consider the following censoring:
* If a part failed, but a given component does not fail (part failure is caused by other components), then, for the unfailed component, this part failure time should be treated as right-censoring;
* Like in the part data, if a part does not fail until today since the last repair, for the components, right-censoring should be considered.

In the utility function, you can use components_failure() to get a dataframe for component reliability estimation:
* SN is the series number of the part;
* Censoring indicates whether this row is a failure or is a right censoring event;
* Duration records time to failure or censoring time for the components;


In [4]:
from utility import components_failure

df_component_EDLC = components_failure('EDLC', df_org)
display(df_component_EDLC)

Unnamed: 0,SN,Repair PO,Duration(days),Censored
0,101096WH6,760137409,1282,0
1,101096WH6,760137409,667,1
2,101097WH4,600809572,897,0
3,101097WH4,600809572,1106,1
4,101100WH6,600816726,775,0
...,...,...,...,...
5493,99864WH1,600901223,1357,0
5494,99864WH1,600901223,599,1
5495,99867WH4,600755139,623,0
5496,99867WH4,600973386,1105,0


In [5]:
df_component_V700_Mosfet = components_failure('V700 Mosfet', df_org)
display(df_component_V700_Mosfet)

Unnamed: 0,SN,Repair PO,Duration(days),Censored
0,101096WH6,760137409,1949,1
1,101097WH4,600809572,2003,1
2,101100WH6,600816726,1717,1
3,101101WH4,600850651,2003,1
4,101102WH2,600868262,2003,1
...,...,...,...,...
2987,99862WH5,600894310,1169,1
2988,99863WH3,600725063,4391,1
2989,99864WH1,600901223,1956,1
2990,99867WH4,600755139,623,0
