### **Tasks**
1. Calculate how many times the equipment has failed.
During the FPSO’s operation, various factors can cause the machine to fail and prolong its failure state. We ask you to explore the available data, identify, and calculate the number of times the equipment has failed throughout its operation.
2. Categorize equipment failures by setup configurations (Preset 1 and Preset 2).
How do the variables Preset_1 and Preset_2 behave during operation? What insights can we derive from these variables?
3. Categorize equipment failures by their nature/root cause according to parameter readings (temperature, pressure, and others).
Analyze patterns in these readings that could indicate specific failure types. How do these patterns differ across operational regimes? Provide insights based on your findings.
4. Create a model (or models) using the technique you think is most appropriate and measure its performance.
Based on the given time-series dataset, which models or techniques are suitable for predicting whether the equipment will fail before it occurs? Additionally, how can the model's performance be tuned and measured for this task?
5. Analyze variable importance.
After developing a model, how can we determine which variables had the greatest impact on the prediction?

### **1. Library Imports**

In [1]:
import pandas as pd 

### **2. Data**

#### **2.1 Data Loading**

In [2]:
df = pd.read_excel("Test O_G_Equipment_Data.xlsx")
print(df.shape)
df.head()

(800, 10)


Unnamed: 0,Cycle,Preset_1,Preset_2,Temperature,Pressure,VibrationX,VibrationY,VibrationZ,Frequency,Fail
0,1,3,6,44.235186,47.657254,46.441769,64.820327,66.45452,44.48325,False
1,2,2,4,60.807234,63.172076,62.005951,80.714431,81.246405,60.228715,False
2,3,2,1,79.027536,83.03219,82.64211,98.254386,98.785196,80.993479,False
3,4,2,3,79.716242,100.508634,122.362321,121.363429,118.652538,80.315567,False
4,5,2,5,39.989054,51.764833,42.514302,61.03791,50.716469,64.245166,False


#### **2.2 Data Quality Check**

#### **2.2.1 Missing Values Check**

In [3]:
df.isnull().sum()

Cycle          0
Preset_1       0
Preset_2       0
Temperature    0
Pressure       0
VibrationX     0
VibrationY     0
VibrationZ     0
Frequency      0
Fail           0
dtype: int64

#### **2.2.2 Duplicate Check**

In [4]:
# Check for duplicated columns in the DataFrame
duplicated_columns = df.columns[df.columns.duplicated()]

# Check for duplicated rows in the DataFrame
number_of_duplicated_rows = df.duplicated().sum()

# Display the results
print("Duplicated columns:", list(duplicated_columns))
print("Number of duplicated rows:", number_of_duplicated_rows)


Duplicated columns: []
Number of duplicated rows: 0


#### **2.2.3 Feature Uniqueness Check**

In [5]:
df.nunique()

Cycle          800
Preset_1         3
Preset_2         8
Temperature    800
Pressure       800
VibrationX     800
VibrationY     800
VibrationZ     800
Frequency      800
Fail             2
dtype: int64

### **🟥 TASK 1**
- Calculate how many times the equipment has failed.
    - During the FPSO’s operation, various factors can cause the machine to fail and prolong its failure state. We ask you to explore the available data, identify, and calculate the number of times the equipment has failed throughout its operation.


In [6]:
df["Fail"] = df["Fail"].astype(int)

# Counts the number of normal (0) and failure (1) states
df["Fail"].value_counts()

Fail
0    734
1     66
Name: count, dtype: int64

In [7]:
# A new failure is counted when the 'Fail' column transitions from 0 to 1
df["New_fail"] = df["Fail"].diff() == 1
df["New_fail"] = df["New_fail"].astype(int)

# A failure is considered resolved when the 'Fail' column transitions from 1 to 0
df["Resolved_fail"] = (df["Fail"].diff() == -1).astype(int)

print("Number of equipment failures:", df["New_fail"].sum())
print("Number of resolved equipment failures:", df["Resolved_fail"].sum())

Number of equipment failures: 10
Number of resolved equipment failures: 9


#### **Task 1: Answer**
- A total of 10 distinct failure events were detected, based on transitions from 0 to 1 in the Fail column.
- Additionally, 9 failure resolutions were identified by transitions from 1 to 0.
- This indicates that one failure was still active at the end of the recorded data, which may suggest an ongoing issue or an incomplete recovery log.

### **🟥 TASK 2**
- Categorize equipment failures by setup configurations (Preset 1 and Preset 2).
    - How do the variables Preset_1 and Preset_2 behave during operation? What insights can we derive from these variables?

### **🟥 TASK 3**
- Categorize equipment failures by their nature/root cause according to parameter readings (temperature, pressure, and others).
    - Analyze patterns in these readings that could indicate specific failure types. How do these patterns differ across operational regimes? Provide insights based on your findings.

### **🟥 TASK 4**
- Create a model (or models) using the technique you think is most appropriate and measure its performance.
    - Based on the given time-series dataset, which models or techniques are suitable for predicting whether the equipment will fail before it occurs? Additionally, how can the model's performance be tuned and measured for this task?

### **🟥 TASK 5**
- Analyze variable importance.
    - After developing a model, how can we determine which variables had the greatest impact on the prediction?