# Create the stationary session data

Stationary Dataset Creation
from the KITTI dataset
In my data folder, there are other folders from the KITTI dataset.
Each of those folders is of a sequence. For each sequence, there is a folder for
- image_00: black and white images from camera 1
- image_01: black and white images from camera 2
- image_02: color images from camera 1
- image_03: color images from camera 2
- oxts: Oxford Technical Solutions - the gps and positional data for the vehicle.
- velodyne_points: the lidar frames from the sequence
- tracklet_labels.xml: the labels for the lidar

What I would like to end up with:
- The data in kitti format. Images, lidar frames, labels, and camera images
(by the way, I would have the calibration data for this).

Intermediate data:
- for each sequence folder, I would like to make a folder for each stationary sequence.
- Each one of those folders contains 
    - a set of images, starting from 00000000.
    - a set of corresponding original lidar frames
    - a set of filtered lidar frames
    - (later) a set of labeled points frames
    - corresponding labels for each frame

To do list to get to the intermediate data
For each of the folders in the original data:
1. Get the stationary sequences
2. Create folders for each stationary sequence
   The folders can be the name of the original folder, plus the included frames
    - images
    - lidar_original
    - lidar_filtered
    - lidar_labeled_points
    - labels
3. For each stationary sequence, copy over the images (reindexing so the first one is 000000)
4. Also copy over the corresponding original lidar frames.
5. Filter the lidar frames and copy the filtered frames over.
6. Convert the .xml files to label frames.
7. Move the frame labels over.
8. Extract the labeled points, and copy those frames over.

## Get stationary sequences

In [40]:
import pandas as pd
import numpy as np
import os
import shutil
from pathlib import Path

In [41]:
DATA_DIRECTORY = '../data/'
DATA_DIRECTORY_EXTENSION = '/oxts/data'
STATIONARY_DATA_DIR = '../stationary_data'

In [42]:
def get_data_from_dir(directory):
    directory = directory + DATA_DIRECTORY_EXTENSION
    data = []
    # Loop through each file in the directory
    for filename in os.listdir(directory):
        if filename.endswith('.txt'):
            # Extract the numeric part of the filename and convert it to integer
            file_index = int(filename.split('.')[0])  # Removes the extension and converts to int
            filepath = os.path.join(directory, filename)
            with open(filepath, 'r') as file:
                values = file.read().strip().split()
                values = [float(value) for value in values]  # Convert each value to float
                data.append([file_index] + values)  # Add file index and values to data list
    return data

In [43]:
COLUMNS = [
    'file_index', 'lat', 'lon', 'alt', 'roll', 'pitch', 'yaw', 
    'vn', 've', 'vf', 'vl', 'vu', 'ax', 'ay', 'az', 'af', 
    'al', 'au', 'wx', 'wy', 'wz', 'wf', 'wl', 'wu', 
    'pos_accuracy', 'vel_accuracy', 'navstat', 'numsats', 
    'posmode', 'velmode', 'orimode'
]

In [44]:
# Identify stationary frames
VELOCITY_COLUMNS = ['vn', 've', 'vf', 'vl', 'vu']

In [45]:
STATIONARY_THRESHOLD = 0.05

In [46]:
# Sequence end indices are inclusive!
def get_sequences(df):
    # Boolean series to identify stationary frames
    is_stationary = df[VELOCITY_COLUMNS].abs().max(axis=1) < STATIONARY_THRESHOLD
    # Find sequences of consecutive stationary frames
    sequences = []
    current_sequence = []
    for i in df[is_stationary].index:
        if current_sequence and i == current_sequence[-1] + 1:
            current_sequence.append(i)
        else:
            if current_sequence:
                sequences.append((current_sequence[0], current_sequence[-1]))
            current_sequence = [i]

    # Append the last sequence if it ended at the last index
    if current_sequence:
        sequences.append((current_sequence[0], current_sequence[-1]))
    
    return sequences

In [47]:
def process_one_folder(directory):
    print(directory)
    # Load data
    data = get_data_from_dir(directory)
    df = pd.DataFrame(data, columns=COLUMNS)
    # Set 'file_index' as the index of the DataFrame
    df.set_index('file_index', inplace=True)
    df.sort_index(inplace=True)  # Ensure the DataFrame is sorted by index
    
    sequences = get_sequences(df)
    
    # Print the sequences
    if sequences:
        for start, end in sequences:
            print(f"Stationary sequence from frame {start} to frame {end}")
    print()
    
    return sequences

In [48]:
def copy_over_sequence_files(start, end, data_path, stationary_data_path):
    # for frame number fill from start to end (inclusive)
    for i in range(start, end + 1):

        # Reindex target files
        from_bin_str = str(i).zfill(10) + '.bin'
        to_bin_str = str(i - start).zfill(10) + '.bin'
        
        # Get the binary frame from the data folder
        from_path_bin = Path(data_path, 'velodyne_points', 'data', from_bin_str)
        # Print out the frame_folder path
        to_path_bin = Path(stationary_data_path, 'velodyne_points', to_bin_str)
        
        shutil.copy(from_path_bin, to_path_bin)
        
        # Do the same for labels
        from_label_str = str(i).zfill(10) + '.txt'
        to_label_str =  str(i - start).zfill(10) + '.txt'
        # Get the binary frame from the data folder
        from_path_label = Path(data_path, 'labels', from_label_str)
        # Print out the frame_folder path
        to_path_label = Path(stationary_data_path, 'labels', to_label_str)
        
        shutil.copy(from_path_label, to_path_label)

        # Do the same for labels
        from_image_str = str(i).zfill(10) + '.png'
        to_image_str =  str(i - start).zfill(10) + '.png'
        # Get the binary frame from the data folder
        from_path_image = Path(data_path, 'image_02', 'data', from_image_str)
        # Print out the frame_folder path
        to_path_image = Path(stationary_data_path, 'images', to_image_str)
        
        shutil.copy(from_path_image, to_path_image)

In [49]:
# For each folder in the data directory
def process_all_data_folders(directory):

    # Make a folder for stationary data
    label_path = Path(STATIONARY_DATA_DIR)
    label_path.mkdir(exist_ok=True)
    
    # Loop through each item in the directory
    for item in os.listdir(directory):
        item_path = os.path.join(directory, item)
        # Check if the item is a directory
        if os.path.isdir(item_path) and 'drive' in item_path:
            # Process the data in the folder
            sequences = process_one_folder(item_path)
            
            # Add one folder per sequence
            for start, end in sequences:
                # Make folder path and directory
                folder_path = Path(STATIONARY_DATA_DIR, item + '_' + str(start) + '_to_' + str(end))
                print(folder_path)
                folder_path.mkdir(exist_ok=True)

                # Create sub folders (actually, this should probably be a function
                frame_folder_path = Path(folder_path, 'velodyne_points')
                frame_folder_path.mkdir(exist_ok=True)

                label_folder_path = Path(folder_path, 'labels')
                label_folder_path.mkdir(exist_ok=True)

                image_folder_path = Path(folder_path, 'images')
                image_folder_path.mkdir(exist_ok=True)

                # Then copy the correct stuff to the folder
                copy_over_sequence_files(start, end, item_path, folder_path)
                

In [50]:
process_all_data_folders(DATA_DIRECTORY)

../data/2011_09_26_drive_0017_sync
Stationary sequence from frame 0 to frame 113

..\stationary_data\2011_09_26_drive_0017_sync_0_to_113
../data/2011_09_26_drive_0018_sync
Stationary sequence from frame 0 to frame 178

..\stationary_data\2011_09_26_drive_0018_sync_0_to_178
../data/2011_09_26_drive_0051_sync
Stationary sequence from frame 210 to frame 210
Stationary sequence from frame 224 to frame 360

..\stationary_data\2011_09_26_drive_0051_sync_210_to_210
..\stationary_data\2011_09_26_drive_0051_sync_224_to_360
../data/2011_09_26_drive_0060_sync
Stationary sequence from frame 0 to frame 77

..\stationary_data\2011_09_26_drive_0060_sync_0_to_77
../data/2011_09_26_drive_0001_sync

../data/2011_09_26_drive_0002_sync

../data/2011_09_26_drive_0005_sync

../data/2011_09_26_drive_0009_sync
Stationary sequence from frame 404 to frame 404
Stationary sequence from frame 422 to frame 446

..\stationary_data\2011_09_26_drive_0009_sync_404_to_404
..\stationary_data\2011_09_26_drive_0009_sync_42

To do list to get from intermediate to kitti format
- Go through each folder, keeping track of current index
- move/copy all frame sets to the kitti format while reindexing
- There will be 3 kitti sets, one each for stationary, filtered, and labeled points
- Should the test/train split be the same?
- also, perhaps also have three folders for just running inference and testing with the nuscenes model on kitti data