# **Aligning Video with LSL-Synced Audio & Cutting Trial Segments**
#### *Author: Hamza Nalbantoğlu$^{1,2}$, Šárka Kadavá$^{1}$*  
#### *Affiliation: **1 -** Leibniz-Centre General Linguistics (ZAS); **2 -** University of Potsdam (UP)*  
#### *Contact: nalbantogluhamza@gmail.com*  
---

## **Overview**

This module demonstrates a pipeline for aligning an **external video with audio** with **Lab Streaming Layer (LSL)-synced audio** of the same event. While LSL provides precise timing for recorded data streams, but some data can be still recorded outside of LSL, for instance for back-up purposes. Imagine that you recorded your video stream and audio stream via LSL, but you also decided to place to the lab one extra camera with microphone that essentially records the same experimental event. While LSL audio and video are inherently synchronized, this external camera runs on an independent timeline. However, we can use the LSL-synced audio and the audio of the external camera to align the external video with the LSL timeline.

Here, we will use [**Shign**](https://github.com/KnurpsBram/shign)—a Python package designed for precise audio alignment—to synchronize the external video's audio with the LSL-synced audio. Instead of relying on manual adjustments, Shign detects timing mismatches between the two audio streams and computes the necessary **time shift** to align them. Once the correct time shift is determined, the external video will be trimmed to ensure synchronization with the LSL timeline.

If you also collect timestamps to cut your LSL into individual trials (e.g., via buttonbox) you can also use these timestamps with the external video, once it is synchronized with the LSL time. We demonstrate how to compute trial start and end times for the video based on timeline of the LSL-synced audio stream. Lastly, we also show how to concatenate the external video with the LSL audio, in case you want to 1) check that the external video is indeed synchronized with the LSL data, 2) you want to replace the low-quality audio of the external video with the high-quality LSL audio.

This script provides an **automated and accurate approach** to video-audio alignment, making it particularly useful for behavioural data analysis, experimental research, and other applications requiring precise synchronization.

---
## **Workflow**
### **1. Preprocessing**
- Extracting audio from the external video.
- Converting the **raw LSL-synced audio** stored in a CSV file into a WAV format.

### **2. Audio Alignment**
- Using **Shign** package to align the LSL-synced audio with the extracted video audio.
- Computing the **time shift** needed to synchronize them (shift_ms).

### **3. Video Adjustment**
- Trimming the **external video** based on the computed **time shift**.
- Saving the **aligned video**.

### **4. Trial Segmentation**
- Extracting and computing the **trial start & end times** for the video from LSL timeline of the audio.
- Segmenting the **aligned video** into trial-sized clips (without audio) using these timestamps.

### **5. (Optional) Audio Overlay**
- Overlaying the **corresponding trial audio** files onto each trial video to check synchronization, if available.

---
## **GitHub Repository & Installation**
To reproduce this notebook, follow these steps:

```bash
# 1 - Clone the Repository
git clone https://github.com/hamzanalbantoglu/lsl_audio_video_alignment.git
cd lsl_audio_video_alignment

# 2 - Create a Conda Environment (Recommended)
conda create --name lsl_env python=3.12
conda activate lsl_env

# 3 - Install Dependencies
pip install -r requirements.txt

# 4 - Add Conda Environment to Jupyter Notebook
pip install ipykernel
python -m ipykernel install --user --name=lsl_env --display-name "Python (lsl_env)"

# 5 - Run the Jupyter Notebook
jupyter notebook

In [None]:
# Imports
import os
import re
import time
import numpy as np
import pandas as pd
import sys
from matplotlib import pyplot as plt
import librosa
import subprocess
from IPython.display import Audio, HTML
import shign
from shign.shign import ms_to_samples
from scipy.io.wavfile import write

## Loading Input Data & Defining Output Paths

Before processing, we first specify the paths to the **external video** and the **raw LSL-synced audio**. These will serve as our primary input files to synchronize.

Additionally, we define paths for storing processed audio files at different stages of the workflow.

In [None]:
# File paths:
video_file = "external_video/external_video.mp4"                # External video to be aligned
csv_file = "lsl_synced_audio/lsl_synced_long_audio_raw.csv"     # Raw LSL-synced audio in CSV format

# Output paths:
extracted_video_audio = "outputs/extracted_video_audio.wav"     # Audio extracted from the video
lsl_synced_audio = "outputs/lsl_synced_audio.wav"               # LSL-synced audio converted to WAV format
aligned_video_audio = "outputs/aligned_video_audio.wav"         # Aligned version of the extracted video audio

## Inspecting the External Video for Audio Sample Rate

The **sample rate** of the external video's audio track is a crucial information for two reasons:

- To extract the audio while preserving its original quality.
- To align it with the LSL-synced audio using Shign, which requires both audio tracks to have the same sample rate.

FFmpeg retrieves metadata for both the video and audio streams. From the output, we can confirm that the audio sample rate is **44.1 kHz** (44100 Hz), which will be used in the next steps.

In [None]:
# Inspecting the metadata of the video using FFmpeg:
result = subprocess.run(
    ["ffmpeg",
     "-hide_banner",
     "-i", video_file],
    capture_output=True,
    text=True
)

print(result.stderr)

## Extracting the Audio from the External Video

We extract the audio track from the external video using FFmpeg, making sure that it is formatted correctly for aligning with the LSL-synced audio.

In [None]:
# Extracting audio from the video:
subprocess.run([
    "ffmpeg",              
    "-i", video_file,           # Input video  
    "-vn",                      # Remove video stream, keep only audio stream
    "-acodec", "pcm_s16le",     # Use PCM 16-bit encoding
    "-ar", "44100",             # Set sample rate to 44.1 kHz (checked earlier)
    "-ac", "1",                 # Convert stereo to mono for consistency with the lsl_synced_audio
    extracted_video_audio       
], check=True)

## Converting the Raw LSL-Synced Audio from CSV to WAV

The LSL-synced audio is stored as raw sound pressure values in a CSV file. To use it for synchronization, we need to convert it to a WAV file with the correct sample rate.

- The first column of the file contains **LSL timestamps** (in milliseconds).
- The second column contains audio **amplitude values**.

We know the recording sample rate is **16 kHz** (16000 Hz). Now we extract the pressure values and save them as a WAV file using **SciPy’s** ```write()``` **function**.

In [None]:
# Loading the CSV file and extracting the amplitude values:
audio_data = pd.read_csv(csv_file)
pressure = audio_data["1"].values.astype(np.int16)

# Saving it as a WAV file with a sample rate of 16 kHz:
sample_rate = 16000
write(lsl_synced_audio, sample_rate, pressure)

## Aligning the Two Audio Tracks Using "shign"

To synchronize the extracted_video_audio with the LSL-synced audio, we use the **```shift_align()```** function from the **Shign package**. It compares the two audio tracks using correlation, detects the timing mismatch between them, and align them based on the detected mismatch.

Since the LSL audio has a sample rate of 16 kHz, while the extracted video audio has 44.1 kHz, we must **downsample the video audio** to the same sample rate before alignment. This first step is crucial to get more accurate results from the ```shift_align()``` function.

In [None]:
# Loading the LSL-synced and extracted video audio:
lsl_audio, sr_lsl = librosa.load(lsl_synced_audio, sr=None)
video_audio, sr_ext = librosa.load(extracted_video_audio, sr=None)

# Downsampling the extracted video audio to match the LSL audio sample rate:
video_audio_downsampled = librosa.resample(video_audio, orig_sr=sr_ext, target_sr=sr_lsl)
sr_ext_downsampled = sr_lsl

Before aligning, we can plot both audio waveforms to inspect their initial timing differences:

In [None]:
plt.plot(lsl_audio, label='lsl_audio')
plt.plot(video_audio_downsampled, label='video_audio')
plt.legend(loc='lower left')
plt.show()

We use the modified **```shift_align()```** function from **Shign** to determine the **time shift** (in milliseconds) needed to align the video audio with the LSL audio.

By default, shift_align() returns the aligned versions of both audio tracks. However, we also extract **shift_ms**, which tells us how much the video needs to be adjusted to synchronize with the LSL timeline.

Key Parameters:
- **```audio_a```** → The LSL-synced audio (reference audio).
- **```audio_b```** → The extracted video audio (to be aligned).
- **```sr_a```** & **```sr_b```** → Sample rates of both tracks (must match).
- **```align_how```** → **```"pad_and_crop_one_to_match_other"```** ensures that only the second audio (video audio) is adjusted, keeping the LSL audio unchanged.
- **```max_shift_sec```** = 300 → Sets a limit of 300 seconds for possible shifts (can be adjusted if needed).

In [None]:
# Aligning the video audio with the LSL-synced audio:
_, video_audio_aligned, shift_ms = shign.shift_align(  # saving shift_ms here for later use
    audio_a = lsl_audio,
    audio_b = video_audio_downsampled,
    sr_a    = sr_lsl,
    sr_b    = sr_ext_downsampled,
    align_how = "pad_and_crop_one_to_match_other",
    max_shift_sec = 300
)

print(f"Mismatch between two audio track is \033[1m{shift_ms/1000} seconds\033[0m.")

---
Negative mismatch time means the second audio (video audio) starts later than the first one (LSL audio).

Now, we can plot both audio signals again to verify that the extracted video audio is correctly shifted and aligned with the LSL-synced audio.

We can also print the lengths of both audio arrays and play them for further verification.

In [None]:
plt.plot(lsl_audio, label='lsl_audio')
plt.plot(video_audio_aligned, label='video_audio_aligned')
plt.legend(loc='lower left')
plt.show()

# Printing the lengths of both audio arrays for verification:
print(f"Length of LSL audio: {len(lsl_audio)} samples")
print(f"Length of aligned video audio: {len(video_audio_aligned)} samples\n")

# Play back the two audio to further verify synchronization:
print("\033[1mLSL-Synced Audio:\033[0m")
display(Audio(lsl_audio, rate=sr_lsl))
print("\033[1mAligned Video Audio:\033[0m")
display(Audio(video_audio_aligned, rate=sr_ext_downsampled))

In [None]:
# Saving the extracted_audio_aligned after verifying the synchronization:
write(aligned_video_audio, sr_ext_downsampled, video_audio_aligned)

## Aligning the External Video to LSL Timeline

After extracting the audio from the external video, we aligned it with the LSL-synced audio using **Shign**. Since both recordings captured the same event but were not synchronized, ```shign_align()``` function detected their timing mismatch and computed the **"shift_ms"** needed to align them.

Using this precise time shift, we first adjusted the video's audio to match the LSL-synced audio. Now, we apply the same **"shift_ms"** value to **trim the external video** itself, ensuring that it is perfectly synchronized with the **LSL timeline**.

To do this, we use **```ms_to_samples()```** from Shign, which converts **milliseconds into audio samples**. This conversion allows us to locate the start and end samples in the aligned video audio, which we then use to compute the corresponding **video timestamps (in seconds)** for trimming it.

In [None]:
# File paths:
video_file = "external_video/external_video.mp4"         # External video to be aligned
aligned_video = "outputs/aligned_video.mp4"              # Path to save the aligned output video

# Start sample computed from shift_ms using ms_to_samples():
start_sample = ms_to_samples(abs(shift_ms), sr=sr_ext_downsampled)
print(f"Aligned video audio starts at: \033[1m{abs(shift_ms):.2f}\033[0m milliseconds")
print(f"Aligned video audio starts at: \033[1m{start_sample:.2f}\033[0m sample")

# End sample computed from start sample + total samples:
total_samples = len(video_audio_aligned)
end_sample = start_sample + total_samples
print(f"Aligned video audio ends at: \033[1m{end_sample:.2f}\033[0m sample\n")

# Converting audio samples to video timestamps (seconds) using the sample rate of the audio:
start_time_sec = start_sample / sr_ext_downsampled
end_time_sec = end_sample / sr_ext_downsampled
print(f"Aligned video starts at: \033[1m{start_time_sec:.2f}\033[0m second")
print(f"Aligned video ends at: \033[1m{end_time_sec:.2f}\033[0m second")

---
Once we compute the ```start_time_sec``` and ```end_time_sec``` values, we can now trim the video using **FFmpeg** to align it with the LSL-synced audio.

**This process may take a while depending on the total video duration and your computer's specifications...**

In [None]:
# Trimming and saving the video using FFmpeg:
print(f"Trimming the video from \033[1m{start_time_sec:.2f}s\033[0m to \033[1m{end_time_sec:.2f}s\033[0m...")
total_length_sec = end_time_sec-start_time_sec
minutes, seconds = divmod(total_length_sec, 60)
print(f"Aligned video length will be \033[1m{int(minutes)} minutes {int(seconds)} seconds\033[0m...")

# Setting up the FFmpeg command:
command = [
    "ffmpeg",
    "-ss", f"{start_time_sec:.3f}",                    # start time BEFORE -i ensures frame accuracy
    "-i", video_file,                                  # input video
    "-to", f"{end_time_sec - start_time_sec:.3f}",     # duration after start
    "-c:v", "libx264",                                 # re-encode video for frame accuracy
    "-c:a", "aac",                     
    "-preset", "fast",                                 # encoding speed optimization
    "-reset_timestamps", "1",                          # reset timestamps after cutting (necessary)
    aligned_video                                      # output video path
]

# Starting the subprocess:
process = subprocess.Popen(command, stderr=subprocess.PIPE, stdout=subprocess.PIPE, text=True)

# Regular expression to find the time from FFmpeg outputs:
time_pattern = re.compile(r"time=(\d+:\d+:\d+\.\d+)")

start = time.time()  # Recording the start time for estimated time calculations

# Calculating the estimated time left:
try:
    calculating_now = False
    while True:
        line = process.stderr.readline()
        if not line:
            break
        # Find the current time from ffmpeg output:
        match = time_pattern.search(line)
        if match:
            current_time = match.group(1)
            h, m, s = map(float, current_time.split(':')) # hour, minute, seconds
            elapsed_seconds = h * 3600 + m * 60 + s - start_time_sec
            if elapsed_seconds > 30:  # Wait 30s before printing the first estimation
                elapsed_time = time.time() - start
                # Estimated total time:
                estimated_total_time = (elapsed_time / elapsed_seconds) * (end_time_sec - start_time_sec)
                # Estimated time left:
                etl = estimated_total_time - elapsed_time
                # Converting seconds to a more readable format:
                eta_hours = int(etl // 3600)
                eta_minutes = int((etl % 3600) // 60)
                eta_seconds = int(etl % 60)
                # Print and update the estimated time left on the same line:
                print(f"Estimated time remaining: {eta_hours}h {eta_minutes}m {eta_seconds}s", end='\r')
        elif not calculating_now:
            print("Calculating remaining time...", end='\r')
            calculating_now = True
finally:
    process.stderr.close()

print(f"Trimming completed! Aligned video saved in \033[1m{aligned_video}\033[0m\n")

# We can now play the aligned_video for verification:
HTML(f"""
<video width="1000" controls>
  <source src="{aligned_video}" type="video/mp4">
  Your browser does not support the video tag.
</video>
""")

## Computing Trial Start and End Times from LSL-Synced Raw Audio Files

Once the video is also aligned, we determine the **trial start and end times** using LSL timestamps from **trial-sized raw LSL-synced audio files**. Each of these CSV files, stored in the **csv_files** folder, contain LSL timestamps and audio amplitude values for a **single trial**.

*(Alternatively, trial timestamps can be extracted from an LSL event marker file).*

To map these LSL times to video timestamps, we:

- 1. Extract **trial start and end times** (first and last LSL timestamp in each CSV file).
- 2. Save them in a **trial_times.csv** file.
- 3. Convert these timestamps into **audio sample positions**.
- 4. Compute the **video timestamps (seconds)** using the audio **sample rate**.

Once these steps are completed, we can use the computed video timestamps to segment the aligned video into **trial-sized clips**.

In [None]:
# Folder containing the raw LSL-synced audio files of all trials:
csv_folder = "csv_files"

results = []

# Iterating over all CSV files in the folder:
for file_name in os.listdir(csv_folder):
    if file_name.endswith(".csv"):
        file_path = os.path.join(csv_folder, file_name)
            
        # For each file:
        try:
            # Read the CSV file (assuming two columns: "time_ms" and "amplitude"):
            data = pd.read_csv(file_path, header=None, names=["time_ms", "amplitude"])
                
            # Extract the trial start and end times (first and last timestamps)
            start_time = data["time_ms"].iloc[1]
            end_time = data["time_ms"].iloc[-1]
                
            # Store results, including the current filename:
            results.append({
                "file_name": file_name,
                "start_time_ms": start_time,
                "end_time_ms": end_time
            })
            
        except Exception as e:
            print(f"Error processing file {file_name}: {e}")
    
# Converting results to DataFrame and saving:
trial_times_df = pd.DataFrame(results)
save_dir = "outputs/trial_times.csv"
trial_times_df.to_csv(save_dir, index=False)

# Printing the first 10 rows to inspect the accuracy of the saved times:
trial_times_df.head(10)

Now, we can map these **trial times** to **audio sample positions** and **video timestamps**:

In [None]:
# File paths:
csv_file = "lsl_synced_audio/lsl_synced_long_audio_raw.csv"     # Raw LSL-synced audio in CSV format
mapped_output_file = "outputs/mapped_event_markers.csv"         # Path to save the mapped timestamps

# Sample rate of LSL-synced audio:
lsl_audio_sr = sr_lsl

# Loading files:
lsl_audio_raw = pd.read_csv(csv_file, skiprows=1, header=None, names=["time_ms", "value"])       # Renaming the columns

# Adding new columns to the trial_times_df for the mapped audio sample positions and video timestamps:

trial_times_df["lsl_audio_start_sample"] = None             # Sample number in LSL-synced audio for start
trial_times_df["lsl_audio_end_sample"] = None               # Sample number in LSL-synced audio for end
trial_times_df["video_start_time"] = None                   # Time in video (seconds) for start
trial_times_df["video_end_time"] = None                     # Time in video (seconds) for end

# Iterating over all rows of the trial_times_df:
for idx, row in trial_times_df.iterrows():
    
    # Calculating lsl_audio_start_sample by counting rows up to start_time_ms in lsl_audio_raw:
    lsl_audio_start_sample = lsl_audio_raw[lsl_audio_raw["time_ms"] <= row["start_time_ms"]].shape[0]
    
    # Calculate lsl_audio_end_sample by counting rows up to end_time_ms in lsl_audio_raw:
    lsl_audio_end_sample = lsl_audio_raw[lsl_audio_raw["time_ms"] <= row["end_time_ms"]].shape[0]

    # Calculate video_start_time and video_end_time (in seconds) based on the sample rate of the LSL-synced audio:
    video_start_time = lsl_audio_start_sample / lsl_audio_sr
    video_end_time = lsl_audio_end_sample / lsl_audio_sr

    # Updating the DataFrame:
    trial_times_df.at[idx, "lsl_audio_start_sample"] = lsl_audio_start_sample
    trial_times_df.at[idx, "lsl_audio_end_sample"] = lsl_audio_end_sample
    trial_times_df.at[idx, "video_start_time"] = round(video_start_time, 6)
    trial_times_df.at[idx, "video_end_time"] = round(video_end_time, 6)

# Saving the updated DataFrame including the computed video timestamps:
trial_times_df.to_csv(mapped_output_file, index=False)

# Inspecting the first 10 rows:
trial_times_df.head(10)

## Segmenting the Aligned Video into Trial-Sized Clips

Now, we computed the video start and end times for each trial.

Using these timestamps, we can **segment the aligned video** into **trial-sized clips**.

**Segmenting trial clips may take a while, depending on video length, trial count, and your computer's specifications...**

In [None]:
# File paths:
input_video = "outputs/aligned_video.mp4"
output_folder = "outputs/cut_videos/"

os.makedirs(output_folder, exist_ok=True)

for i, row in trial_times_df.iterrows():
    start_time = row["video_start_time"]
    end_time = row["video_end_time"]
    file_name = row["file_name"]
    
    # Creating an output filename based on the input filename (.mp4 instead of .csv):
    output_file = os.path.join(output_folder, f"{file_name.replace('.csv', '')}.mp4")
    
    # Using FFmpeg to segment the videos - without audio:
    ffmpeg_command = [
    "ffmpeg",
    "-ss", f"{start_time:.3f}",         
    "-i", input_video,                  
    "-to", f"{end_time - start_time:.3f}", 
    "-c:v", "libx264",                  
    "-an",                              # Disable audio (replace with "-c:a", "aac", if audio is needed)
    "-preset", "fast",                  # Faster encoding
    "-reset_timestamps", "1",           # Reset timestamps
    "-filter_complex", 
    "[0:v]setpts=PTS-STARTPTS[v]",      # Reset video PTS
    "-map", "[v]",                      # Map video stream
    "-movflags", "+faststart",          # Optimize for playback
    output_file
    ]
    
    # Execute the command using subprocess:
    subprocess.run(ffmpeg_command, check=True)
    print(f"Segment saved in {output_file}")

## (Optional) Overlaying the Audio of Each Trial onto Their Cut Videos

After segmenting the trial videos without the audio stream, we can optionally **overlay the corresponding LSL audio** to each trial clip for verification or further analysis.

**Note:** This step assumes that all raw trial audios in csv_files folder have already been converted to WAV files and are stored in the **```audio_files/```** folder.

To ensure correct pairing of audio and video of trials, the following chunk:

- **Sorts video files** from the trial video_folder.
- **Sorts audio files** from the audio_folder.
- Matches video and audio files **based on their filenames**.
- Uses **FFmpeg** to merge the video with its corresponding audio.
- Saves the new audio-overlaid trial videos in the **```outputs/audio_overlay/```** folder.

In [None]:
# File paths:
video_folder = "outputs/cut_videos/"          # folder with all the trial videos
audio_folder = "audio_files/"                 # folder with all the trial audio files (names should match)
output_folder = "outputs/audio_overlay/"      # output folder to save the audio-overlaid videos

os.makedirs(output_folder, exist_ok=True)

# Getting sorted lists of video and audio files:
video_files = sorted([f for f in os.listdir(video_folder) if f.endswith(".mp4")])
audio_files = sorted([f for f in os.listdir(audio_folder) if f.endswith(".wav")])

for video_file, audio_file in zip(video_files, audio_files):
    
    # Checking again if the first 10 characters of the filenames match:
    if video_file[:10] == audio_file[:10]:
        
        video_path = os.path.join(video_folder, video_file)
        audio_path = os.path.join(audio_folder, audio_file)
        
        # Saving the audio-pverlaid video with same name as the original trial video:
        output_path = os.path.join(output_folder, video_file)
        
        # FFmpeg command:
        ffmpeg_command = [
            "ffmpeg",
            "-i", video_path,    
            "-i", audio_path,     
            "-c:v", "copy",       
            "-c:a", "aac",        
            "-map", "0:v:0",      # Map the first input's video
            "-map", "1:a:0",      # Map the second input's audio
            output_path           
        ]
        
        # Running the FFmpeg command:
        subprocess.run(ffmpeg_command, check=True, capture_output=True)
        print(f"Audio-overlaid video saved in {output_path}")
    
    else:
        print(f"No matching audio found for the video: {video_file}")

print("All videos processed!")

## Example Segmented Trial Videos with Overlaid Audio

In [None]:
# File path:
video_path_1 = "outputs/audio_overlay/0_1_trial_12_Mic_nominal_srate16000_p1_ei_geluiden.mp4"

HTML(f"""
<video width="1000" controls>
  <source src="{video_path_1}" type="video/mp4">
  Your browser does not support the video tag.
</video>
""")

In [None]:
# File path:
video_path_2 = "outputs/audio_overlay/0_1_pr_1_Mic_nominal_srate16000_p0_dansen_geluiden_corrected.mp4"

HTML(f"""
<video width="1000" controls>
  <source src="{video_path_2}" type="video/mp4">
  Your browser does not support the video tag.
</video>
""")