# Consolidating data

This file consists of functions that consolidate our disparate datasets into one large dataset that is useful in training our model. 

The goal is to generate a file with 30 columns (this number should be variable), such that each column is a state in time. 

Ideally, this will be done with heirachical data, ie `p1` is the first point in time, and within `p1` you have an x component, y component, etc.

https://pandas.pydata.org/docs/user_guide/advanced.html

## Input data format

It is assumed that the input data with have the columns: `[timestamp,tx,ty,tz,qx,qy,qz,qw]`

## Extracting the data we want

In our case, we want just the velocity data (for now).

In [None]:
import numpy as np
import pandas as pd

# Hardcode the data source for now, we will access other data later
pos_df = pd.read_csv("../data/fpv_uzh/indoor_forward_3_davis_with_gt.txt", usecols=['timestamp', 'tx', 'ty', 'tz'])

def generate_velocity(position_data: pd.DataFrame) -> pd.DataFrame:
    velocity_data = {
        'timestamp': position_data['timestamp'],
        'vx': position_data['tx'].diff() / position_data['timestamp'].diff(),
        'vy': position_data['ty'].diff() / position_data['timestamp'].diff(),
        'vz': position_data['tz'].diff() / position_data['timestamp'].diff()
    }

    return pd.DataFrame(velocity_data).dropna().reset_index(drop=True)

vel_df = generate_velocity(pos_df)

print(vel_df.head())