# EDANSA Inference Example

This notebook demonstrates how to set up the EDANSA environment, download the pre-trained model and sample audio data, and run inference on a sample audio file using `inference.py`.

## 1. Setup Environment

First, we clone the repository, install necessary dependencies including the `edansa` package itself, and ensure `ffmpeg` is available for audio processing.

In [None]:
# Clone the repository
# Make sure you are in the desired parent directory before running this
!git clone https://github.com/speechLabBcCuny/EDANSA.git
%cd EDANSA

# Install the edansa package and dependencies
# This installs the package in editable mode, along with requirements
!pip install -q -e .

# Install ffmpeg (required audio loading backend for torchaudio)
# Use sudo for Colab environment, adjust if running locally
!sudo apt-get update && sudo apt-get install -y -qq ffmpeg

## 2. Download and Prepare Input Data

The main EDANSA repository does not include large audio files. We need to download some test audio assets separately.

We will download the `edansa-test-assets-pack-v1.zip` file, which contains the same assets as `src/edansa/tests/assets/` in the repository. We will then unzip it and create an input list file pointing to one of the audio samples within the pack.

**Note:** The inference script requires *absolute* paths for the files listed in the input list, especially when run in environments like Colab.

In [None]:
import os

# Define URL and local paths
ASSET_URL = 'https://github.com/speechLabBcCuny/EDANSA/releases/download/dev-test-data-v1.0/edansa-test-assets-pack-v1.zip'
ZIP_FILE = 'edansa-test-assets-pack-v1.zip'
DOWNLOAD_DEST_DIR = '/content' # Location to download zip in Colab
EXTRACT_TARGET_DIR = os.path.join(DOWNLOAD_DEST_DIR, 'downloaded_assets') # Location to unzip assets

# Download the assets
print(f'Downloading {ASSET_URL}...')
!wget -q -O {os.path.join(DOWNLOAD_DEST_DIR, ZIP_FILE)} {ASSET_URL}

# Unzip the assets into the target directory
print(f'Unzipping {ZIP_FILE} into {EXTRACT_TARGET_DIR}...')
os.makedirs(EXTRACT_TARGET_DIR, exist_ok=True)
!unzip -q -o {os.path.join(DOWNLOAD_DEST_DIR, ZIP_FILE)} -d {EXTRACT_TARGET_DIR}

# Define the path to the sample audio file WITHIN the unzipped assets
# Based on user feedback, the structure inside EXTRACT_TARGET_DIR is likely 'audio/...', '31m2plxv-V1/...', etc.
sample_audio_relative_in_assets = 'audio/real/anwr/47/2022/S4A10341_20220802_235902.flac'
sample_audio_absolute = os.path.join(EXTRACT_TARGET_DIR, sample_audio_relative_in_assets)

# Verify the sample file exists
if not os.path.exists(sample_audio_absolute):
    print(f'ERROR: Sample audio file not found at {sample_audio_absolute}')
    # You might want to list the contents of the unzipped directory to debug
    print(f'Listing contents of {EXTRACT_TARGET_DIR}:')
    !ls -R {EXTRACT_TARGET_DIR}
else:
    print(f'Found sample audio file: {sample_audio_absolute}')

# Create a directory for the input list (if it doesn't exist)
input_list_dir = os.path.join(DOWNLOAD_DEST_DIR, 'input_lists')
os.makedirs(input_list_dir, exist_ok=True)

# Define the path for the input file list
input_list_file = os.path.join(input_list_dir, 'sample_files.txt')

# Write the absolute path to the list file
with open(input_list_file, 'w') as f:
    f.write(f'{sample_audio_absolute}\n')

# Print the contents of the list file to verify
print(f'Created input list file at: {input_list_file}')
!cat {input_list_file}

## 3. Run Inference

Now we execute the `inference.py` script using the main pre-trained model (`31m2plxv-V1`) included in the repository's `assets` directory. We need to provide:
*   `--model_path`: Path to the specific model checkpoint file (`.pt`).
*   `--config_file`: Path to the model's configuration file (`.json`).
*   `--input_files_list`: Path to the text file containing the list of audio files (created above).
*   `--output_folder`: Directory where the prediction CSVs will be saved.

In [None]:
import os

# Define paths relative to the repository root (/content/EDANSA)
# Note: We are running this from the '/content/EDANSA' directory (see %cd in first cell)
MODEL_PT_PATH = 'assets/31m2plxv-V1/model_info/best_model_370_val_f1_min=0.8028.pt'
CONFIG_JSON_PATH = 'assets/31m2plxv-V1/model_info/model_config.json'
INPUT_LIST = '/content/input_lists/sample_files.txt' # Absolute path created above
OUTPUT_DIR = '/content/edansa_output' # Define an output directory in Colab
SCRIPT_PATH = 'runs/augment/inference.py' # Relative path to the script

# Verify model and config files exist in the cloned repo
if not os.path.exists(MODEL_PT_PATH):
    print(f'ERROR: Model file not found at {MODEL_PT_PATH}')
if not os.path.exists(CONFIG_JSON_PATH):
    print(f'ERROR: Config file not found at {CONFIG_JSON_PATH}')

# Create the output directory
os.makedirs(OUTPUT_DIR, exist_ok=True)

# Construct and run the inference command
print('\nRunning inference...')
!python {SCRIPT_PATH} \
    --model_path {MODEL_PT_PATH} \
    --config_file {CONFIG_JSON_PATH} \
    --input_files_list {INPUT_LIST} \
    --output_folder {OUTPUT_DIR} \
    --device cpu # Force CPU for potentially limited Colab GPU resources/setup

## 4. Check Results

The script saves predictions in the specified output folder. It determines a common 'root' directory from the paths in the input list (here, likely `/content/downloaded_assets/`) and replicates the file's relative path structure under the output folder.

Let's list the contents of the output directory and view the generated CSV file.

In [None]:
import pandas as pd
import os

# Define where the output is expected and where assets were extracted
OUTPUT_DIR = '/content/edansa_output' # Must match the --output_folder used above
EXTRACT_TARGET_DIR = '/content/downloaded_assets' # Where we unzipped the assets

# List the output directory contents recursively to see the structure
print(f'Contents of {OUTPUT_DIR}:')
!ls -R {OUTPUT_DIR}

# Construct the expected output path
# Input file was: /content/downloaded_assets/audio/real/anwr/47/2022/S4A10341_20220802_235902.flac
# Common root identified by script is likely: /content/downloaded_assets/
# Relative path should be: audio/real/anwr/47/2022/S4A10341_20220802_235902.flac
input_file_absolute = '/content/downloaded_assets/audio/real/anwr/47/2022/S4A10341_20220802_235902.flac'
assumed_root_for_output = EXTRACT_TARGET_DIR

# Calculate relative path based on the assumed root
relative_path = os.path.relpath(input_file_absolute, assumed_root_for_output)
output_filename_base = os.path.splitext(relative_path)[0] # Get path without original extension
expected_output_csv = os.path.join(OUTPUT_DIR, output_filename_base + '.csv') # Add .csv extension

print(f'\nAttempting to read: {expected_output_csv}')

# Display the first few rows of the output CSV using pandas
try:
    df = pd.read_csv(expected_output_csv)
    print('\nPrediction CSV Head:')
    print(df.head())
except FileNotFoundError:
    print(f'\nError: Output CSV not found at {expected_output_csv}')
    print('Please check the `ls -R` output above and the inference script logs.')
    print('Possible issues: Inference script failed, or the output path structure calculation is different than expected.')
    # Manually check the expected path based on the ls -R output if needed
    # e.g., print('/content/edansa_output/audio/real/anwr/47/2022/S4A10341_20220802_235902.csv')
except Exception as e:
    print(f'\nAn error occurred while reading the CSV: {e}')