# Pre-processing of the ALFA dataset

Reference:

Keipour, A., Mousaei, M., & Scherer, S. (2021). Alfa: A dataset for uav fault and anomaly detection. The International Journal of Robotics Research, 40(2-3), 515-520.
        

Dataset source: 

[ALFA: A Dataset for UAV Fault and Anomaly Detection](https://kilthub.cmu.edu/articles/dataset/ALFA_A_Dataset_for_UAV_Fault_and_Anomaly_Detection/12707963)

## Short description of some useful topics (signals)

Topics (Signals):

* mavros-time_reference:
    * stamp is system time for which measurement was valid
    * time_ref corresponding time from this external source
    * frame_id is not used
* mavros-imu-atm_pressure:
    * Air pressure (Atmospheric pressure)
    * fluid_pressure is absolute pressure reading in Pascals
* ~mavros_battery~:
    * Variance is 0
    * Not relevant
* ~mavros-global_position-*~:
    * Not relevant
* mavros-imu-data:
    * This is a message to hold data from an IMU (Inertial Measurement Unit)
    * orientation x, y, z
    * ~orientation_covariance, angular_velocity_covariance and linear_acceleration_covariance are not relevant~
* ~mavros-imu-mag~:
    * FCU compass data 
    * Not relevant
* mavros-imu-temperature:
    * Temperature reported by FCU (usually from barometer)
* mavros-vfr_hud:
    * HUD: Head-up Display
    * std_msgs/Header header
    * float32 airspeed # m/s
    * float32 groundspeed # m/s
    * int16 heading # degrees 0..360
    * float32 throttle # normalized to 0.0..1.0
    * float32 altitude # MSL (Mean Sea Level) 
    * float32 climb # current climb rate m/s
* mavros-wind_estimation
    * linear x, y, z
   
Flights with an engine failure:

* Min flight duration: 62.0 sec.
* Mean of flight durations: 114.4 sec.
* Median of flight durations: 116.0 sec.
* Max flight duration: 156.0 sec.

Flights without a failure:

* Min flight duration: 26.0 sec.
* Mean of flight durations: 55.3 sec.
* Median of flight durations: 57.0 sec.
* Max flight duration: 89.0 sec.

## Read and process the dataset

In [1]:
import os
import dill
import numpy as np
import pandas as pd
import seaborn as sns
from glob import glob
import matplotlib.pyplot as plt
from sklearn.feature_selection import VarianceThreshold

In [11]:
data_path = "data/processed/"
prefix = "carbonZ_2018-07-18-15-53-31_1_engine_failure"
failure1_path = prefix + "/"
time_column = "%time"
timestamp_column = "timestamp"

In [12]:
def read_data(full_path):
    timestamp_column = "timestamp"
    df_tmp = pd.read_csv(full_path)
    df_tmp = df_tmp.rename(columns={time_column: timestamp_column})

    df_tmp[timestamp_column] = pd.to_datetime(df_tmp[timestamp_column], unit="ns")
    df_tmp.set_index(timestamp_column, inplace=True)
    return df_tmp

In [171]:
def extract_topic_name(flight_name, file_name):
    topic_name = file_name.split(flight_name)
    topic_name = topic_name[1]
    topic_name = topic_name[1:]
    topic_name = topic_name.split(".csv")
    topic_name = topic_name[0] 
    return topic_name

In [245]:
time_dict = {}
flight_topic_dict = {}
topic_list = []
all_columns = []
df_dict = {}



unused_flight_list = ["no_ground_truth", "aileron_failure", 
                      "rudder", "elevator", "aileron"]
unused_topic_list = ["mavlink",  "diagnostics", "failure_status", 
                     "global_position", "local_position", 
                     "setpoint_raw", "mavctrl-path_dev", 
                     "mavros-battery", "mavros-imu-mag", 
                     "mavros-imu-data", "mavros-rc", 
                     "mavros-time_reference", "mavros-state", 
                     "emergency_responder-traj_file", 
                     "mavctrl-rpy", "mavros-mission-reached"]
unused_columns = ['field.header.seq', 'field.header.stamp', 
                  'field.header.frame_id', 'field.commanded', 
                  'field.variance', 'field.twist.angular.x', 
                  'field.twist.angular.y', 'field.twist.angular.z', 
                  'field.coordinate_frame']

# Iterate over the list of flight names
for i, flight in enumerate(glob(os.path.join(data_path + "*"))):
    if any(x in flight for x in unused_flight_list):
        continue
    
    flight_name = os.path.basename(flight)
    
    print(flight_name)
    
    if flight_name not in time_dict:
        time_dict[flight_name] = []

    
    if flight_name not in flight_topic_dict:
        flight_topic_dict[flight_name] = []
    
    df_merged = None
    
    # Iterate over the list of topics
    for k, topic in enumerate(glob(flight + "/*.csv")):
        if any(x in topic for x in unused_topic_list):
            continue
        
        file_name = os.path.basename(topic)
        topic_list.append(file_name)
        topic_name = extract_topic_name(flight_name, file_name)
        flight_topic_dict[flight_name].append(topic_name)
        
        dfx = read_data(topic)
        dfx = dfx.drop(unused_columns, axis=1, errors="ignore")
        new_columns = list(map(lambda x: f"{topic_name}.{x.replace('field.', '')}", dfx.columns))
        dfx = dfx.set_axis(new_columns, axis=1, inplace=False)
        # Resample the dataset to 5Hz frequency 
        dfx = dfx.resample("200ms").last()
        
        if df_merged is None:
            df_merged = dfx
        else:
            df_merged = df_merged.merge(dfx, left_index=True, right_index=True, how="outer")
        
        df_merged.iloc[0] = df_merged.iloc[0].fillna(0)
        df_merged = df_merged.pad()

        all_columns.append(list(dfx.columns))
        
        diff_seconds = pd.to_timedelta((dfx.index[-1] - dfx.index[0])).total_seconds()
        diff_seconds = int(diff_seconds)
        time_dict[flight_name].append(diff_seconds)
    
    df_dict[flight_name] = df_merged

carbonZ_2018-07-18-15-53-31_1_engine_failure
carbonZ_2018-07-18-15-53-31_2_engine_failure
carbonZ_2018-07-18-16-22-01_engine_failure_with_emr_traj
carbonZ_2018-07-18-16-37-39_1_no_failure
carbonZ_2018-07-18-16-37-39_2_engine_failure_with_emr_traj
carbonZ_2018-07-30-16-29-45_engine_failure_with_emr_traj
carbonZ_2018-07-30-16-39-00_1_engine_failure
carbonZ_2018-07-30-16-39-00_2_engine_failure
carbonZ_2018-07-30-16-39-00_3_no_failure
carbonZ_2018-07-30-17-10-45_engine_failure_with_emr_traj
carbonZ_2018-07-30-17-20-01_engine_failure_with_emr_traj
carbonZ_2018-07-30-17-36-35_engine_failure_with_emr_traj
carbonZ_2018-07-30-17-46-31_engine_failure_with_emr_traj
carbonZ_2018-09-11-11-56-30_engine_failure
carbonZ_2018-09-11-14-16-55_no_failure
carbonZ_2018-09-11-14-22-07_1_engine_failure
carbonZ_2018-09-11-14-22-07_2_engine_failure
carbonZ_2018-09-11-14-41-38_no_failure
carbonZ_2018-09-11-15-05-11_2_no_failure
carbonZ_2018-10-05-14-34-20_1_no_failure
carbonZ_2018-10-05-14-37-22_1_no_failure
car

### Save the preprocessed data

In [None]:
with open(os.path.join("data", "df_dict.pkl"), "wb") as f:
    dill.dump(df_dict, f)

### Load the preprocessed data

In [None]:
with open(os.path.join("data", "df_dict.pkl"), 'rb') as f:
    df_dict_loaded = dill.load(f)