Stationary Dataset Creation
from the KITTI dataset
In my data folder, there are other folders from the KITTI dataset.
Each of those folders is of a sequence. For each sequence, there is a folder for
- image_00: black and white images from camera 1
- image_01: black and white images from camera 2
- image_02: color images from camera 1
- image_03: color images from camera 2
- oxts: Oxford Technical Solutions - the gps and positional data for the vehicle.
- velodyne_points: the lidar frames from the sequence
- tracklet_labels.xml: the labels for the lidar

What I would like to end up with:
- The data in kitti format. Images, lidar frames, labels, and camera images
(by the way, I would have the calibration data for this).

Intermediate data:
- for each sequence folder, I would like to make a folder for each stationary sequence.
- Each one of those folders contains 
    - a set of images, starting from 00000000.
    - a set of corresponding original lidar frames
    - a set of filtered lidar frames
    - (later) a set of labeled points frames
    - corresponding labels for each frame

To do list to get to the intermediate data
For each of the folders in the original data:
1. Get the stationary sequences
2. Create folders for each stationary sequence
   The folders can be the name of the original folder, plus the included frames
    - images
    - lidar_original
    - lidar_filtered
    - lidar_labeled_points
    - labels
3. For each stationary sequence, copy over the images (reindexing so the first one is 000000)
4. Also copy over the corresponding original lidar frames.
5. Filter the lidar frames and copy the filtered frames over.
6. Convert the .xml files to label frames.
7. Move the frame labels over.
8. Extract the labeled points, and copy those frames over.

## Get stationary sequences

In [10]:
import pandas as pd
import numpy as np
import os

In [33]:
DATA_DIRECTORY = '../data/'
DATA_DIRECTORY_EXTENSION = '/oxts/data'

In [32]:
def get_data_from_dir(directory):
    directory = directory + DATA_DIRECTORY_EXTENSION
    data = []
    # Loop through each file in the directory
    for filename in os.listdir(directory):
        if filename.endswith('.txt'):
            # Extract the numeric part of the filename and convert it to integer
            file_index = int(filename.split('.')[0])  # Removes the extension and converts to int
            filepath = os.path.join(directory, filename)
            with open(filepath, 'r') as file:
                values = file.read().strip().split()
                values = [float(value) for value in values]  # Convert each value to float
                data.append([file_index] + values)  # Add file index and values to data list
    return data

In [13]:
COLUMNS = [
    'file_index', 'lat', 'lon', 'alt', 'roll', 'pitch', 'yaw', 
    'vn', 've', 'vf', 'vl', 'vu', 'ax', 'ay', 'az', 'af', 
    'al', 'au', 'wx', 'wy', 'wz', 'wf', 'wl', 'wu', 
    'pos_accuracy', 'vel_accuracy', 'navstat', 'numsats', 
    'posmode', 'velmode', 'orimode'
]

In [14]:
# Identify stationary frames
VELOCITY_COLUMNS = ['vn', 've', 'vf', 'vl', 'vu']

In [15]:
STATIONARY_THRESHOLD = 0.05

In [46]:
def get_sequences(df):
    # Boolean series to identify stationary frames
    is_stationary = df[VELOCITY_COLUMNS].abs().max(axis=1) < STATIONARY_THRESHOLD
    # Find sequences of consecutive stationary frames
    sequences = []
    current_sequence = []
    for i in df[is_stationary].index:
        if current_sequence and i == current_sequence[-1] + 1:
            current_sequence.append(i)
        else:
            if current_sequence:
                sequences.append((current_sequence[0], current_sequence[-1]))
            current_sequence = [i]

    # Append the last sequence if it ended at the last index
    if current_sequence:
        sequences.append((current_sequence[0], current_sequence[-1]))
    
    return sequences

In [42]:
def process_one_folder(directory):
    print(directory)
    # Load data
    data = get_data_from_dir(directory)
    df = pd.DataFrame(data, columns=COLUMNS)
    # Set 'file_index' as the index of the DataFrame
    df.set_index('file_index', inplace=True)
    df.sort_index(inplace=True)  # Ensure the DataFrame is sorted by index
    
    sequences = get_sequences(df)
    
    # Print the sequences
    if sequences:
        for start, end in sequences:
            print(f"Stationary sequence from frame {start} to frame {end}")
    print()

In [36]:
# For each folder in the data directory
def process_all_data_folders(directory):
    # Loop through each item in the directory
    for item in os.listdir(directory):
        item_path = os.path.join(directory, item)
        # Check if the item is a directory
        if os.path.isdir(item_path):
            # Process the data in the folder
            process_one_folder(item_path)


In [45]:
process_all_data_folders(DATA_DIRECTORY)

../data/2011_09_26_drive_0017_sync
file_index
0      True
1      True
2      True
3      True
4      True
       ... 
109    True
110    True
111    True
112    True
113    True
Length: 114, dtype: bool

../data/2011_09_26_drive_0018_sync
file_index
0       True
1       True
2       True
3       True
4       True
       ...  
265    False
266    False
267    False
268    False
269    False
Length: 270, dtype: bool

../data/2011_09_26_drive_0051_sync
file_index
0      False
1      False
2      False
3      False
4      False
       ...  
433    False
434    False
435    False
436    False
437    False
Length: 438, dtype: bool

../data/2011_09_26_drive_0060_sync
file_index
0     True
1     True
2     True
3     True
4     True
      ... 
73    True
74    True
75    True
76    True
77    True
Length: 78, dtype: bool

../data/2011_09_26_drive_0001_sync
file_index
0      False
1      False
2      False
3      False
4      False
       ...  
103    False
104    False
105    False
106    Fals

To do list to get from intermediate to kitti format
- Go through each folder, keeping track of current index
- move/copy all frame sets to the kitti format while reindexing
- There will be 3 kitti sets, one each for stationary, filtered, and labeled points
- Should the test/train split be the same?
- also, perhaps also have three folders for just running inference and testing with the nuscenes model on kitti data