# Exercise 1 Business and Data Understanding

## Business Understanding
To reduce the number of unexpected failures and optimize maintenance intervals, large aircraft maintenance companies aim to change their maintenance models from scheduled maintenance to predictive maintenance. One important aspect of this is the assessment of the current health status of the aircraft components and the prediction of their remaining lifetime based on measured sensor data. In this exercise, we focus on the prediction of aircraft engines.

The provided data contains run-to-failure simulations for different engines generated by the C-MAPSS (Commercial Modular Aero-Propulsion System Simulation) tool which is used for the simulation of a large commercial aircraft engine. Detailed information on the generation of the data can be found in the paper "Damage Propagation Modeling for Aircraft Engine Run-To-Failure Simulation" by Saxena, Goebel, Simon and Eklund that we provide you in Moodle.

The goal is to estimate the remaining useful lifetime (RUL) of the aircraft engines. The total exercise will be divided into three sub-tasks following the CRISP-DM model. In this week, the focus will be on the understanding of the available dataset.

## Data Understanding
In Moodle, you can download the file “train_FD001.txt” that contains run-to-failure curves from several engines. The dataset contains in total 26 parameters that are shown in the table below.
For each data entry, the corresponding engine number and cycle number is given. All engines were simulated until failure, meaning that the cycle numbers reach from 1 to the end of life of the engines. Parameters 2 - 4 indicate the operating conditions that were input parameters for the simulation. The remaining parameters 5 - 25 are the simulated sensor data and therefore the output of the simulation.



| Parameter | Name       | Description                      | Units    |
|-----------|------------|----------------------------------|----------|
| 0     	   | Engine     | 	Engine number                   | 	-       |
| 1	        | Cycle	     | Cycle number                     | 	-       |
| 2	        | Altitude   | 	Altitude                        | 	1000 ft |
| 3	        | MachNumber | 	Mach Number                     | 	-       |
| 4	        | TRA        | 	Thrust Resolver Angle           | 	-       |
| 5	        | T2         | 	Total temperature at fan inlet  | 	°R      |
| 6	        | T24        | 	Total temperature at LPC outlet | 	°R      |
| 7	        | T30        | 	Total temperature at HPC outlet | 	°R      |
| 8	        | T50        | 	Total temperature at LPT outlet | 	°R      |
| 9	        | P2         | 	Pressure at fan inlet           | 	psia    |
| 10	       | P15        | 	Total pressure in bypass-duct   | 	psia    |
| 11	       | P30        | 	Total pressure at HPC outlet    | 	psia    |
| 12	       | Nf         | 	Physical fan speed              | 	rpm     |
| 13	       | Nc         | 	Physical core speed             | 	rpm     |
| 14	       | epr        | 	Engine pressure ratio           | 	-       |
| 15	       | Ps30       | 	Static pressure at HPC outlet   | 	psia    |
| 16	       | phi        | 	Ratio of fuel flow to Ps30      | 	pps/psi |
| 17	       | NRf        | 	Corrected fan speed             | 	rpm     |
| 18	       | NRc        | 	Corrected core speed            | 	rpm     |
| 19	       | BPR        | 	Bypass ratio                    | 	-       |
| 20	       | farB       | 	Burner fuel-air ratio           | 	-       |
| 21	       | htBleed    | 	Bleed enthalpy                  | 	-       |
| 22	       | Nf_dmd     | 	Demanded fan speed              | 	rpm     |
| 23	       | PCNfR_dmd  | 	Demanded corrected fan speed    | 	rpm     |
| 24	       | W31        | 	HPT coolant bleed               | 	lbm/s   |
| 25	       | W32        | 	LPT coolant bleed               | 	lpm/s   |



For the simulation, the operating conditions were held constant at the following values:
- altitude = 0
- Mach number = 0
- Thrust Resolver Angle = 100




Please follow the following tasks to explore the given data:


### 1. Load the data into your program, e.g. into a pandas dataframe. As the txt-file does not contain a header, rename the columns with the parameter names from the table above.

### 2. Look at the basic statistical properties of the data, for example using pandas "describe" function. What do you notice?

### 3. Take a closer look at the operating conditions and plot their distributions. Is the assumption of constant operating conditions fulfilled?

### 4. Now inspect the sensor data and plot their distributions. What can you conclude regarding their utility for predicting the remaining useful life?

### 5. Perform a correlation analysis for the different sensors.

### 6. For every engine, calculate the maximum number of cycles and display the distribution of the maximum number of cycles

### 7. Calculate the target value for the prediction "RUL" and estimate the correlations between the sensors and the target. Hint: the remaining useful lifetime is defined as the number of cycles until the end of life.

### 8. Plot the time series of interesting sensors for different engines to investigate the found correlations.