# Benchmarking Audioseal on the SHUSH attack applied on RAVDESS Dataset

In this notebook, we outline the steps taken to benchmark the Audioseal architecture against different attacks on a dataset of audio files.  
In particular, we follow these steps:
- Load audio files from a dataset 
- Watermark each audio file using Audioseal
- Perform perturbations/attacks to the audio files
- Detect the watermarks on these attacked files and keep track of the confidence of Audioseal in its predictions that the files are watermarked.


For a better understanding of Audioseal and its functionalities, it is highly recommended to go through the [Getting started notebook](https://github.com/facebookresearch/audioseal/blob/main/examples/Getting_started.ipynb).

## Dataset

We use the [RAVDESS Emotional Speech audio](https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio) dataset for this experiment.   
This notebook provide two options to download and load the dataset either `manually` or `automatically` within the notebook.

### 1. Manual Dataset Download
To download the dataset manually, follow these steps:

- Visit Kaggle's [RAVDESS Emotional Speech audio](https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio)o dataset page.
- Download the dataset to your local machine and place the unzipped files in the ./kaggle folder.
- Skip `Auto Download Step 1 & 2`

### 2. Automatic Dataset Download in Notebook
For automated download: Run `Auto Download Step 1 & 2`

- Obtain your Kaggle API credentials by navigating to `Account Settings` on Kaggle and generating a `kaggle.json` file.
- Place the `kaggle.json` file in the `./kaggle` folder at the same location as this notebook.
- The code will handle moving the `kaggle.json` file to the appropriate location and `download/unzip` the dataset into the `./kaggle` folder automatically.

In [2]:
# Auto Download Step 1

import os
import shutil
!pip install kaggle

if not os.path.exists('./kaggle'):
    os.makedirs('./kaggle')

if not os.path.exists(os.path.expanduser('~/.kaggle')):
    os.makedirs(os.path.expanduser('~/.kaggle'))

shutil.copy('./kaggle/kaggle.json', os.path.expanduser('~/.kaggle/kaggle.json'))
os.chmod(os.path.expanduser('~/.kaggle/kaggle.json'), 0o600)



In [3]:
# Auto Download Step 2

from kaggle.api.kaggle_api_extended import KaggleApi

if not os.path.exists('./kaggle'):
    os.makedirs('./kaggle')

api = KaggleApi()
api.authenticate()

api.dataset_download_files('uwrfkaggler/ravdess-emotional-speech-audio', path='./kaggle', unzip=True)

print("Dataset downloaded and extracted successfully to ./kaggle folder!")

Dataset URL: https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio
Dataset downloaded and extracted successfully to ./kaggle folder!


In [4]:
import numpy as np
import pandas as pd

all_input_files = []
PARENT_FILES_DIR = './kaggle'

# Load .wav audio files from the dataset in ./kaggle folder
for dirname, _, filenames in os.walk(PARENT_FILES_DIR):
    for filename in filenames:
        if filename.endswith(".wav"):
            all_input_files.append(os.path.join(dirname, filename))

print(f"Number of input files: {len(all_input_files)}")

Number of input files: 2880


### Installations and Imports 

In [5]:
import sys
!{sys.executable} -m pip install -q torchaudio soundfile matplotlib audioseal

import typing as tp
import julius
import torch
import torchaudio
import urllib

### Load Audioseal models

In [6]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

Using device: cpu


In [7]:
from audioseal import AudioSeal

model = AudioSeal.load_generator("audioseal_wm_16bits")
detector = AudioSeal.load_detector("audioseal_detector_16bits")

### Helper functions to load audio data, watermark audio, and get prediction scores for audio

In [8]:
model = model.to(device)
detector = detector.to(device)

In [9]:
secret_message = torch.randint(0, 2, (1, 16), dtype=torch.int32)
secret_message = secret_message.to(device)
print(f"Secret message: {secret_message}")

# Function to load an audio file from its file path
def load_audio_file(
    file_path: str
) -> tp.Optional[tp.Tuple[torch.Tensor, int]]:
    try:
        wav, sample_rate = torchaudio.load(file_path)
        return wav, sample_rate
    except Exception as e:
        print(f"Error while loading audio: {e}")
        return None
    
# Function to generate a watermark for the audio and embed it into a new audio tensor
def generate_watermark_audio(
    tensor: torch.Tensor,
    sample_rate: int
) -> tp.Optional[torch.Tensor]:
    try:
        global model, device, secret_message
        audios = tensor.unsqueeze(0).to(device)
        watermarked_audio = model(audios, sample_rate=sample_rate, message=secret_message.to(device), alpha=1)
        return watermarked_audio

    
    except Exception as e:
        print(f"Error while watermarking audio: {e}")
        return None
    
# Function to get the confidence score that an audio tensor was watermarked by Audioseal
def detect_watermark_audio(
    tensor: torch.Tensor,
    sample_rate: int,
    message_threshold: float = 0.50
) -> tp.Optional[float]:
    try:
        global detector, device
        # In our analysis we are not concerned with the hidden/embedded message as of now
        result, _ = detector.detect_watermark(tensor, sample_rate=sample_rate, message_threshold=message_threshold)
        return float(result)
    except Exception as e:
        print(f"Error while detecting watermark: {e}")
        return None

Secret message: tensor([[1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1]], dtype=torch.int32)


## Audio attacks

- In this notebook, we use the `SHUSH` attack.
- For more attacks and their descriptions, please refer to the [source](https://github.com/facebookresearch/audioseal/blob/main/examples/attacks.py).
- To run this notebook on cloud ENVs (Colab/Kaggle), copy [attaks.py](https://github.com/facebookresearch/audioseal/blob/main/examples/attacks.py) to your root folder as this notebook.

In [10]:
from attacks import AudioEffects as af

### Experimental setup
- `fraction` values: \{0.1\%, 1\%, 10\%, 30\%\}
- `nomenclature` : n, s, m, l

In this notebook, we set the above parameters for the SHUSH attack and note the average confidence scores of Audioseal in predicting the presence of watermarks for these attacked audio files.

In [11]:
import random
random.seed(42)
torch.backends.cudnn.benchmark = True
np.random.seed(42)
torch.manual_seed(42)

<torch._C.Generator at 0x754dff9ad0d0>

In [12]:
from tqdm import tqdm

all_scores_n = []
all_scores_s = []
all_scores_m = []
all_scores_l = []
all_saved_files = []

for input_file in tqdm(all_input_files):
    try:
        # Load audio
        audio, sample_rate = load_audio_file(input_file)

        # Generate watermarked audio
        watermarked_audio = generate_watermark_audio(audio, sample_rate)

        # Perform SHUSH attacks
        shush_attack_audio_n = af.shush(watermarked_audio, fraction=0.001)
        shush_attack_audio_s = af.shush(watermarked_audio, fraction=0.01)
        shush_attack_audio_m = af.shush(watermarked_audio, fraction=0.1)
        shush_attack_audio_l = af.shush(watermarked_audio, fraction=0.3)

        # Compute scores
        shush_score_n = detect_watermark_audio(shush_attack_audio_n, sample_rate)
        shush_score_s = detect_watermark_audio(shush_attack_audio_s, sample_rate)
        shush_score_m = detect_watermark_audio(shush_attack_audio_m, sample_rate)
        shush_score_l = detect_watermark_audio(shush_attack_audio_l, sample_rate)

        # Store scores
        all_scores_n.append(float(shush_score_n))
        all_scores_s.append(float(shush_score_s))
        all_scores_m.append(float(shush_score_m))
        all_scores_l.append(float(shush_score_l))
        all_saved_files.append(input_file)
    except Exception as e:
        print(f"Skipping file {input_file} due to {e}")
        pass

  0%|          | 0/2880 [00:00<?, ?it/s]

  9%|▊         | 249/2880 [05:38<40:45,  1.08it/s]  

Error while watermarking audio: Given groups=1, weight of size [32, 1, 7], expected input[1, 2, 52324] to have 1 channels, but got 2 channels instead
Skipping file ./kaggle/Actor_01/03-01-08-01-02-02-01.wav due to 'NoneType' object has no attribute 'size'


  9%|▉         | 271/2880 [06:02<44:22,  1.02s/it]  

Error while watermarking audio: Given groups=1, weight of size [32, 1, 7], expected input[1, 2, 57663] to have 1 channels, but got 2 channels instead
Skipping file ./kaggle/Actor_01/03-01-02-01-01-02-01.wav due to 'NoneType' object has no attribute 'size'


 14%|█▎        | 389/2880 [08:01<45:48,  1.10s/it]  

Error while watermarking audio: Given groups=1, weight of size [32, 1, 7], expected input[1, 2, 69942] to have 1 channels, but got 2 channels instead
Skipping file ./kaggle/Actor_20/03-01-06-01-01-02-20.wav due to 'NoneType' object has no attribute 'size'


 14%|█▍        | 406/2880 [08:16<39:30,  1.04it/s]

Error while watermarking audio: Given groups=1, weight of size [32, 1, 7], expected input[1, 2, 55528] to have 1 channels, but got 2 channels instead
Skipping file ./kaggle/Actor_20/03-01-03-01-02-01-20.wav due to 'NoneType' object has no attribute 'size'


 36%|███▌      | 1029/2880 [19:04<28:56,  1.07it/s] 

Error while watermarking audio: Given groups=1, weight of size [32, 1, 7], expected input[1, 2, 52324] to have 1 channels, but got 2 channels instead
Skipping file ./kaggle/audio_speech_actors_01-24/Actor_01/03-01-08-01-02-02-01.wav due to 'NoneType' object has no attribute 'size'


 36%|███▋      | 1051/2880 [19:26<30:44,  1.01s/it]

Error while watermarking audio: Given groups=1, weight of size [32, 1, 7], expected input[1, 2, 57663] to have 1 channels, but got 2 channels instead
Skipping file ./kaggle/audio_speech_actors_01-24/Actor_01/03-01-02-01-01-02-01.wav due to 'NoneType' object has no attribute 'size'


 41%|████      | 1169/2880 [21:20<29:49,  1.05s/it]

Error while watermarking audio: Given groups=1, weight of size [32, 1, 7], expected input[1, 2, 69942] to have 1 channels, but got 2 channels instead
Skipping file ./kaggle/audio_speech_actors_01-24/Actor_20/03-01-06-01-01-02-20.wav due to 'NoneType' object has no attribute 'size'


 41%|████      | 1186/2880 [21:36<27:26,  1.03it/s]

Error while watermarking audio: Given groups=1, weight of size [32, 1, 7], expected input[1, 2, 55528] to have 1 channels, but got 2 channels instead
Skipping file ./kaggle/audio_speech_actors_01-24/Actor_20/03-01-03-01-02-01-20.wav due to 'NoneType' object has no attribute 'size'


 59%|█████▉    | 1697/2880 [29:41<15:43,  1.25it/s]

Error while watermarking audio: Given groups=1, weight of size [32, 1, 7], expected input[1, 2, 67807] to have 1 channels, but got 2 channels instead
Skipping file ./kaggle/audio_speech_actors_01-24/Actor_05/03-01-02-01-02-02-05.wav due to 'NoneType' object has no attribute 'size'


 82%|████████▏ | 2357/2880 [39:21<07:08,  1.22it/s]

Error while watermarking audio: Given groups=1, weight of size [32, 1, 7], expected input[1, 2, 67807] to have 1 channels, but got 2 channels instead
Skipping file ./kaggle/Actor_05/03-01-02-01-02-02-05.wav due to 'NoneType' object has no attribute 'size'


100%|██████████| 2880/2880 [46:57<00:00,  1.02it/s]


## Store results and calculate metrics

In [13]:
df = pd.DataFrame({
    "input_file" : all_saved_files,
    "watermark_confidence_n" : all_scores_n,
    "watermark_confidence_s" : all_scores_s,
    "watermark_confidence_m" : all_scores_m,
    "watermark_confidence_l" : all_scores_l,
})

In [14]:
df.describe()

Unnamed: 0,watermark_confidence_n,watermark_confidence_s,watermark_confidence_m,watermark_confidence_l
count,2870.0,2870.0,2870.0,2870.0
mean,0.998851,0.990335,0.900382,0.699686
std,0.000114,0.000347,0.000789,0.000478
min,0.996135,0.986962,0.876117,0.695407
25%,0.998792,0.99022,0.900214,0.699631
50%,0.998841,0.990385,0.900339,0.699782
75%,0.998895,0.99057,0.900625,0.699918
max,0.999432,0.990909,0.901558,0.700586


: 

## We note that Audioseal performs very well in recalling the watermarks - even in extreme conditions of masking the first 30\% of the audio, the average confidence is $0.699678$. 