# Predictive Maintenance with Azure Dataset

## Project imports

In [1]:
import pandas as pd

## Data Imports

Importing data from downloaded files and creating dataframes for each file:

**Telemetry**

This dataset contains telemetry data collected from machines at different timestamps. It includes measurements of voltage, rotation, pressure, and vibration.

- Columns: `datetime`, `machineID`, `volt`, `rotate`, `pressure`, `vibration`

**Errors**

This dataset records the occurrence of errors in machines at different timestamps. Each error is identified by an error code.

- Columns: `datetime`, `machineID`, `errorID`

**Failures**

This dataset tracks the failures of machines at different timestamps. Each failure is associated with a specific component.

- Columns: `datetime`, `machineID`, `failure`

**Machines**

This dataset contains information about the machines, including their IDs, models, and ages.

- Columns: `machineID`, `model`, `age`

**Maintenance**

This dataset logs the maintenance activities performed on machines at different timestamps. Each maintenance record is associated with a specific component.

- Columns: `datetime`, `machineID`, `comp`

In [2]:
def read(name: str, parse_dates: list[str] | None = ["datetime"]) -> pd.DataFrame:
    path = "data/"
    ext = ".csv"
    file = path + name + ext
    return pd.read_csv(file, parse_dates=parse_dates)


telemetry = read("PdM_telemetry")
errors = read("PdM_errors")
failures = read("PdM_failures")
maint = read("PdM_maint")
machines = read("PdM_machines", parse_dates=None)

### Merging all the dataframes into a single dataframe:

- Merging `telemetry`, `errors`, `failures`, `machines`, and `maintenance` dataframes into a single dataframe `df` using the `machineID` and `datetime` columns as keys.

In [3]:
from functools import reduce

# Merge 'telemetry' last to reduce cost of joining operations
dataframes = [errors, failures, maint, telemetry]
data = reduce(
    lambda left, right: pd.merge(
        left, right, on=["datetime", "machineID"], how="outer"
    ),
    dataframes,
)

# Merge 'machines' separately on 'machineID'
data = data.merge(machines, on="machineID", how="outer")

# Sort by 'machineID' and 'datetime'
data = data.sort_values(by=["machineID" , "datetime"])

print(data.head())

             datetime  machineID errorID failure   comp        volt  \
0 2014-06-01 06:00:00          1     NaN     NaN  comp2         NaN   
1 2014-07-16 06:00:00          1     NaN     NaN  comp4         NaN   
2 2014-07-31 06:00:00          1     NaN     NaN  comp3         NaN   
3 2014-12-13 06:00:00          1     NaN     NaN  comp1         NaN   
4 2015-01-01 06:00:00          1     NaN     NaN    NaN  176.217853   

       rotate    pressure  vibration   model  age  
0         NaN         NaN        NaN  model3   18  
1         NaN         NaN        NaN  model3   18  
2         NaN         NaN        NaN  model3   18  
3         NaN         NaN        NaN  model3   18  
4  418.504078  113.077935  45.087686  model3   18  


### Saving merged dataframe to a CSV file for future use.

In [4]:
# Replace missing values for better visualization in csv file
raw_data = data.fillna("NaN")

raw_data.to_csv("data/raw_data.csv", index=False)