# EDANSA Inference Example

This notebook demonstrates how to set up the EDANSA environment, download the pre-trained model and sample audio data, and run inference on a sample audio file using `inference.py`.

## 1. Setup Environment

First, we clone the repository, install necessary dependencies including the `edansa` package itself, and ensure `ffmpeg` is available for audio processing.

In [None]:
# Clone the repository
# Make sure you are in the desired parent directory before running this
!git clone https://github.com/speechLabBcCuny/EDANSA.git
# change this if not running from Google Colab 
%cd /content/EDANSA

# Install the edansa package and dependencies
# This installs the package in editable mode, along with requirements
!pip install -q -e .

# Install ffmpeg (required audio loading backend for torchaudio)
# Use sudo for Colab environment, adjust if running locally
!sudo apt-get update && sudo apt-get install -y -qq ffmpeg

## 2. Download and Prepare Input Data

The main EDANSA repository does not include large audio files. We need to download some test audio assets separately.

We will download the `edansa-test-assets-pack-v1.zip` file, which contains the same assets as `src/edansa/tests/assets/` in the repository. We will then unzip it and create an input list file pointing to one of the audio samples within the pack.

**Note:** The inference script requires *absolute* paths for the files listed in the input list, especially when run in environments like Colab.

In [None]:
import os

# Define URL and local paths
ASSET_URL = 'https://github.com/speechLabBcCuny/EDANSA/releases/download/dev-test-data-v1.0/edansa-test-assets-pack-v1.zip'
ZIP_FILE = 'edansa-test-assets-pack-v1.zip'
DOWNLOAD_DEST_DIR = '/content' # Location to download zip in Colab
EXTRACT_TARGET_DIR = os.path.join(DOWNLOAD_DEST_DIR, 'downloaded_assets') # Location to unzip assets

# Download the assets
print(f'Downloading {ASSET_URL}...')
!wget -q -O {os.path.join(DOWNLOAD_DEST_DIR, ZIP_FILE)} {ASSET_URL}

# Unzip the assets into the target directory
print(f'Unzipping {ZIP_FILE} into {EXTRACT_TARGET_DIR}...')
os.makedirs(EXTRACT_TARGET_DIR, exist_ok=True)
!unzip -q -o {os.path.join(DOWNLOAD_DEST_DIR, ZIP_FILE)} -d {EXTRACT_TARGET_DIR}


## 3. Run Inference

Now we execute the `inference.py` script using the main pre-trained model (`31m2plxv-V1`) included in the repository's `assets` directory. We need to provide:
*   `--model_path`: Path to the specific model checkpoint file (`.pt`).
*   `--config_file`: Path to the model's configuration file (`.json`).
*   `--input_folder`: Path to the folder with audio files.
*   `--output_folder`: Directory where the prediction CSVs will be saved.

In [None]:

# Construct and run the inference command
print('\nRunning inference...')
!python runs/augment/inference.py \
    --model_path 'assets/31m2plxv-V1/model_info/best_model_370_val_f1_min=0.8028.pt' \
    --config_file 'assets/31m2plxv-V1/model_info/model_config.json' \
    --input_folder '/content/downloaded_assets/audio/real/' \
    --output_folder '/content/edansa_output' \
    --device cpu # Force CPU for potentially limited Colab GPU resources/setup

## 4. Check Results

The script saves predictions in the specified output folder. It determines a common 'root' directory from the paths in the input list (here, likely `/content/downloaded_assets/`) and replicates the file's relative path structure under the output folder.

Let's list the contents of the output directory and view the generated CSV file.

In [None]:
import pandas as pd

# Define where the output is expected and where assets were extracted
OUTPUT_DIR = '/content/edansa_output'  # Must match the --output_folder used above

example_output_csv = '/content/edansa_output/dalton/04/2023/S4A10291_20230606_025958.csv'
print(f'\nAttempting to read: {example_output_csv}')

# Display the first few rows of the output CSV using pandas
df = pd.read_csv(example_output_csv)
print('\nPrediction CSV Head:')
print(df.head())


In [None]:
#Display files with errors
errors_file = '/content/edansa_output/failed_files.csv'
df = pd.read_csv(errors_file)
# print('\Errors CSV Head:')
print(df.head())