In each data fold, there is a raw data subfolder and a syn data subfolder, which represent the raw data collection without synchronisation but with high precise timestep, and the synchronised data but without high precise timestep.

Here is the header of the sensor file and ground truth file.

## vicon (vi*.csv)

Time  Header  translation.x translation.y translation.z rotation.x rotation.y rotation.z rotation.w

## Sensors (imu*.csv)

Time attitude_roll(radians) attitude_pitch(radians) attitude_yaw(radians) rotation_rate_x(radians/s) rotation_rate_y(radians/s) rotation_rate_z(radians/s) gravity_x(G) gravity_y(G) gravity_z(G) user_acc_x(G) user_acc_y(G) user_acc_z(G) magnetic_field_x(microteslas) magnetic_field_y(microteslas) magnetic_field_z(microteslas)

## Structure

In this folder
```
user@fedora ~/C/magnetic_localization (master)> ls data/Oxford\ Inertial\ Odometry\ Dataset/handheld/data1/syn/
imu1.csv*  imu4.csv*  imu7.csv*  vi3.csv*  vi6.csv*
imu2.csv*  imu5.csv*  vi1.csv*   vi4.csv*  vi7.csv*
imu3.csv*  imu6.csv*  vi2.csv*   vi5.csv*
user@fedora ~/C/magnetic_localization (master)> ls data/Oxford\ Inertial\ Odometry\ Dataset/handheld/data2/syn/
imu1.csv*  imu2.csv*  imu3.csv*  vi1.csv*  vi2.csv*  vi3.csv*
user@fedora ~/C/magnetic_localization (master)> ls data/Oxford\ Inertial\ Odometry\ Dataset/handheld/
data1/  data3/  data5/          Test.txt*
data2/  data4/  handheld.xlsx*  Train.txt*
user@fedora ~/C/magnetic_localization (master)> pwd
/home/user/Code/magnetic_localization
```

Also like each of them are the same length, so no need to sync the timesteps
```
user@fedora ~/C/magnetic_localization (master)> cat data/Oxford\ Inertial\ Odometry\ Dataset/handheld/data1/syn/imu2.csv |wc
  23446   23446 3282548
user@fedora ~/C/magnetic_localization (master)> cat data/Oxford\ Inertial\ Odometry\ Dataset/handheld/data1/syn/vi2.csv |wc
  23446   23446 1740520
```

Just fucking ignore the Time and Header

## Goal

Our goal is to predict the current `x`, `y`, `z` based on the previous all previous data(but not previous `x`, `y`, `z`)

## Below is a dumb load of magnetic data

In [3]:
import pandas as pd
import numpy as np
import os

# Base path to the handheld folder
base_path = './data/Oxford Inertial Odometry Dataset/handheld/'

# Function to read and process IMU data
def process_imu_data(file_path):
    df = pd.read_csv(file_path, header=None)
    mag_x, mag_y, mag_z = df.iloc[:, -3], df.iloc[:, -2], df.iloc[:, -1]
    mag_total = np.sqrt(mag_x**2 + mag_y**2 + mag_z**2)
    return np.column_stack((mag_x, mag_y, mag_z, mag_total))

# Function to read and process Vicon data
def process_vicon_data(file_path):
    df = pd.read_csv(file_path, header=None)
    x, y, z = df.iloc[:, 2], df.iloc[:, 3], df.iloc[:, 4]
    return np.column_stack((x, y, z))

# Read train and test folder names
with open(os.path.join(base_path, 'Train.txt'), 'r') as f:
    train_folders = f.read().splitlines()
with open(os.path.join(base_path, 'Test.txt'), 'r') as f:
    test_folders = f.read().splitlines()

# Lists to store sequences
X_train, y_train = [], []
X_test, y_test = [], []

# Process data for each folder
for data_folder in train_folders + test_folders:
    folder_path = os.path.join(base_path, data_folder, 'syn')
    
    if not os.path.exists(folder_path):
        print(f"Folder not found: {folder_path}")
        continue

    imu_files = sorted([f for f in os.listdir(folder_path) if f.startswith('imu')])
    vicon_files = sorted([f for f in os.listdir(folder_path) if f.startswith('vi')])

    for imu_file, vicon_file in zip(imu_files, vicon_files):
        imu_data = process_imu_data(os.path.join(folder_path, imu_file))
        vicon_data = process_vicon_data(os.path.join(folder_path, vicon_file))

        if data_folder in train_folders:
            X_train.append(imu_data)
            y_train.append(vicon_data)
        else:
            X_test.append(imu_data)
            y_test.append(vicon_data)

print("Number of sequences in X_train:", len(X_train))
print("Number of sequences in y_train:", len(y_train))
print("Number of sequences in X_test:", len(X_test))
print("Number of sequences in y_test:", len(y_test))

print("\nShapes of sequences in X_train:")
for i, seq in enumerate(X_train):
    print(f"Sequence {i+1}: {seq.shape}")

print("\nShapes of sequences in X_test:")
for i, seq in enumerate(X_test):
    print(f"Sequence {i+1}: {seq.shape}")

# Calculate and print some statistics
train_lengths = [len(seq) for seq in X_train]
test_lengths = [len(seq) for seq in X_test]

print("\nTraining set statistics:")
print(f"Min length: {min(train_lengths)}")
print(f"Max length: {max(train_lengths)}")
print(f"Mean length: {np.mean(train_lengths):.2f}")
print(f"Median length: {np.median(train_lengths):.2f}")

print("\nTest set statistics:")
print(f"Min length: {min(test_lengths)}")
print(f"Max length: {max(test_lengths)}")
print(f"Mean length: {np.mean(test_lengths):.2f}")
print(f"Median length: {np.median(test_lengths):.2f}")

Number of sequences in X_train: 20
Number of sequences in y_train: 20
Number of sequences in X_test: 4
Number of sequences in y_test: 4

Shapes of sequences in X_train:
Sequence 1: (37602, 4)
Sequence 2: (23446, 4)
Sequence 3: (18850, 4)
Sequence 4: (21641, 4)
Sequence 5: (32160, 4)
Sequence 6: (32537, 4)
Sequence 7: (14098, 4)
Sequence 8: (32618, 4)
Sequence 9: (31179, 4)
Sequence 10: (30059, 4)
Sequence 11: (30756, 4)
Sequence 12: (37910, 4)
Sequence 13: (60868, 4)
Sequence 14: (53796, 4)
Sequence 15: (38322, 4)
Sequence 16: (31724, 4)
Sequence 17: (32228, 4)
Sequence 18: (60580, 4)
Sequence 19: (43841, 4)
Sequence 20: (35017, 4)

Shapes of sequences in X_test:
Sequence 1: (31040, 4)
Sequence 2: (59445, 4)
Sequence 3: (55979, 4)
Sequence 4: (36578, 4)

Training set statistics:
Min length: 14098
Max length: 60868
Mean length: 34961.60
Median length: 32382.50

Test set statistics:
Min length: 31040
Max length: 59445
Mean length: 45760.50
Median length: 46278.50
