# EDANSA Inference Example

This notebook demonstrates how to set up the EDANSA environment, download the pre-trained model, and run inference on a sample audio file using `inference.py`.

## 1. Setup Environment

First, we clone the repository, install necessary dependencies including the `edansa` package itself, and ensure `ffmpeg` is available for audio processing.

In [None]:
# Clone the repository (replace with your actual repo URL if different)
!git clone https://github.com/berk/EDANSA-2019.git
%cd EDANSA-2019

# Install the edansa package and dependencies
# Note: In Colab, we might need to install specific versions if needed
!pip install -q .


# Install ffmpeg (for audio loading backend)
!apt-get update && apt-get install -y -qq ffmpeg

## 2. Prepare Input Data List

The inference script can take a list of absolute file paths as input. We'll create a simple list containing one of the test audio files included in the repository.

**Note:** The script requires *absolute* paths within the Colab environment.

In [None]:
import os

# Define the sample audio file path relative to the repo root
sample_audio_relative = 'src/edansa/tests/assets/audio/real/anwr/47/2022/S4A10341_20220802_235902.flac'

# Get the absolute path within the Colab environment
repo_absolute_path = '/content/EDANSA-2019' # Default cloned repo directory in Colab
sample_audio_absolute = os.path.join(repo_absolute_path, sample_audio_relative)

# Create a directory for the input list (if it doesn't exist)
input_list_dir = '/content/input_lists'
os.makedirs(input_list_dir, exist_ok=True)

# Define the path for the input file list
input_list_file = os.path.join(input_list_dir, 'sample_files.txt')

# Write the absolute path to the list file
with open(input_list_file, 'w') as f:
    f.write(f'{sample_audio_absolute}\n')

# Print the contents of the list file to verify
print(f'Created input list file at: {input_list_file}')
!cat {input_list_file}

## 3. Run Inference

Now we execute the `inference.py` script. We need to provide:
*   `--model_path`: Path to the pre-trained model directory.
*   `--input_files_list`: Path to the text file containing the list of audio files.
*   `--output_folder`: Directory where the prediction CSVs will be saved.

In [None]:
# Define paths (relative paths work from the repo root: /content/EDANSA-2019)
MODEL_PATH = 'src/edansa/tests/assets/31m2plxv-V1' # User specified model path within the repo
INPUT_LIST = '/content/input_lists/sample_files.txt' # Absolute path created above
OUTPUT_DIR = '/content/edansa_output' # Define an output directory in Colab
SCRIPT_PATH = 'src/edansa/inference.py' # Relative path to the script within the repo

# Create the output directory
os.makedirs(OUTPUT_DIR, exist_ok=True)

# Construct and run the inference command
# We are running this from the repository root directory (/content/EDANSA-2019)
!python {SCRIPT_PATH} \
    --model_path {MODEL_PATH} \
    --input_files_list {INPUT_LIST} \
    --output_folder {OUTPUT_DIR}

## 4. Check Results

The script saves predictions in the specified output folder, mirroring the input file's path relative to a common root directory identified from the input paths. Let's list the contents of the output directory and view the generated CSV file.

In [None]:
import pandas as pd
import os

# List the output directory contents recursively to see the structure
print(f'Contents of {OUTPUT_DIR}:')
!ls -R {OUTPUT_DIR}

# Construct the expected output path
# The inference script determines the common root from the input paths 
# (here, '/content/EDANSA-2019/') and replicates the relative path 
# ('src/edansa/tests/assets/audio/real/anwr/47/2022/...') under the output folder.
relative_output_path = 'src/edansa/tests/assets/audio/real/anwr/47/2022/S4A10341_20220802_235902.csv'
expected_output_csv = os.path.join(OUTPUT_DIR, relative_output_path)

print(f'\nAttempting to read: {expected_output_csv}')

# Display the first few rows of the output CSV using pandas
try:
    # Make sure pandas is installed (usually is in Colab)
    !pip install -q pandas
    df = pd.read_csv(expected_output_csv)
    print('\nPrediction CSV Head:')
    print(df.head())
except FileNotFoundError:
    print(f'\nError: Output CSV not found at {expected_output_csv}')
    print('Please check the `ls -R` output above and the inference script logs to confirm the correct path and if the script ran successfully.')
except Exception as e:
    print(f'\nAn error occurred while reading the CSV: {e}')