# Annotation Workflow

First, start by running the annotation GUI app. It requires that the audio clips are located in a directory called "audio_files" in your project folder. It will also require you to specify a path to your filtered training data.

# Run Annotation App

Define input data filepath and starting index on command line

Once exited, the app will create a CSV containing the annotation for each audio file to be used in a merge operation with the original dataset.

In [6]:
!python3 annotate_app.py N_1000_filtered_train_data.csv 0

2025-04-11 14:38:22.621 Python[36556:112625712] +[IMKClient subclass]: chose IMKClient_Modern
2025-04-11 14:38:22.621 Python[36556:112625712] +[IMKInputSession subclass]: chose IMKInputSession_Modern


In [4]:
import pandas as pd
import numpy as np

# Join Training Set Data and Annotations

Merge the new annotations with the existing database, new annotations are under the "question type" column

In [7]:
# File path to the dataset
data_path = "N_1000_filtered_train_data.csv"
annotations_path = "atc_audio_annotations.csv"


# Load original dataset  
df_data = pd.read_csv(data_path)  

# Load annotations (assuming it has 'audio_path' and 'new_text' columns)  
df_annotations = pd.read_csv(annotations_path)  

# Merge while keeping all original rows  
df_merged = df_data.merge(df_annotations, on="audio_path", how="left")  
 

print(df_merged.head(15))

    index                                               text  \
0   10500              austrian five two hotel juliett praha   
1   10425                  austrian seven two nine thank you   
2     882                            roger standby for climb   
3   10309  csa three charlie tango runway three one clear...   
4   11616  sky travel one zero one zero confirm ready for...   
5   10676  initial one zero zero sierra papa sierra thank...   
6    9401  warsaw good morning lufthansa seven seven nine...   
7    1895                   thank you good bye five nine six   
8    3296  twojet four one six proceed to odnem and conta...   
9    3470  lufthansa seven five two request flight level ...   
10   4484  csa three foxtrot contact ruzyne ground one tw...   
11   2061  i just confirm what cleared flight level easy ...   
12  11586  and report ready for departure once your re cl...   
13   8831                              chanex one three nine   
14   6193                               

If you are satisfied with the new labels, update label column and drop "question type"

In [8]:
df_merged['label'] = df_merged['question_type']
df_merged.drop(['question_type'], axis=1, inplace=True)

print(df_merged.head(15))

    index                                               text  \
0   10500              austrian five two hotel juliett praha   
1   10425                  austrian seven two nine thank you   
2     882                            roger standby for climb   
3   10309  csa three charlie tango runway three one clear...   
4   11616  sky travel one zero one zero confirm ready for...   
5   10676  initial one zero zero sierra papa sierra thank...   
6    9401  warsaw good morning lufthansa seven seven nine...   
7    1895                   thank you good bye five nine six   
8    3296  twojet four one six proceed to odnem and conta...   
9    3470  lufthansa seven five two request flight level ...   
10   4484  csa three foxtrot contact ruzyne ground one tw...   
11   2061  i just confirm what cleared flight level easy ...   
12  11586  and report ready for departure once your re cl...   
13   8831                              chanex one three nine   
14   6193                               

In [9]:
# Save to new file
# Define output filepath (default is to overwrite original)

output_path = data_path

df_merged.to_csv(output_path, index=False)