# Training a New AudioSeal Watermarking Model Using AudioCraft and Dora 🚀🎶

This notebook demonstrates how to train a new AudioSeal model using AudioCraft and Dora. The training pipeline was developed using AudioCraft (version 0.1.4 and later). We use the Librispeech dataset for this example, and the training involves preparing the dataset, setting up configurations, running a training job using Dora, and finally evaluating the trained model.


### Step-by-Step Guide to Training a New Watermarking Model 📝

## Step 1: Install Dependencies and Set Up Environment ⚙️

First, we need to install all required dependencies for AudioCraft and AudioSeal.

- **Torch and Torchaudio** are necessary for working with audio data and training models.
- **Hydra-core** is used for managing configurations.
- **Dora** is a tool used for training management, specifically for defining and running grid-based experiments.
- **Flashy** is required by AudioCraft for various utility functions.

In [1]:
# Install required packages for AudioCraft and AudioSeal
!pip install torch==2.1.0 torchaudio==2.1.0 hydra-core dora flashy ipdb

Collecting torch==2.1.0
  Downloading torch-2.1.0-cp310-cp310-manylinux1_x86_64.whl.metadata (25 kB)
Collecting torchaudio==2.1.0
  Downloading torchaudio-2.1.0-cp310-cp310-manylinux1_x86_64.whl.metadata (5.7 kB)
Collecting hydra-core
  Downloading hydra_core-1.3.2-py3-none-any.whl.metadata (5.5 kB)
Collecting dora
  Downloading Dora-0.0.3.tar.gz (4.9 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting flashy
  Downloading flashy-0.0.2.tar.gz (72 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.4/72.4 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting ipdb
  Downloading ipdb-0.13.13-py3-none-any.whl.metadata (14 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.1.0)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 

# Clone and install AudioCraft from source

In [2]:
!git clone https://github.com/hastagab/audiocraft.git
%cd /content/audiocraft
!pip install -e .[wm]

Cloning into 'audiocraft'...
remote: Enumerating objects: 1594, done.[K
remote: Counting objects: 100% (858/858), done.[K
remote: Compressing objects: 100% (318/318), done.[K
remote: Total 1594 (delta 590), reused 617 (delta 533), pack-reused 736 (from 1)[K
Receiving objects: 100% (1594/1594), 18.04 MiB | 17.69 MiB/s, done.
Resolving deltas: 100% (924/924), done.
/content/audiocraft
Obtaining file:///content/audiocraft
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting av==11.0.0 (from audiocraft==1.4.0a1)
  Downloading av-11.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.5 kB)
Collecting flashy>=0.0.1 (from audiocraft==1.4.0a1)
  Using cached flashy-0.0.2.tar.gz (72 kB)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting hydra-core>=1.1 (from audiocraft==1.4.0a1)
  Using cached hydra_core-1.3.2-py3-none-any.whl.m

In [None]:
# Clone and install AudioSeal from source
!git clone https://github.com/facebookresearch/audioseal.git
%cd audioseal
!pip install -e .

Cloning into 'audioseal'...
remote: Enumerating objects: 236, done.[K
remote: Counting objects: 100% (84/84), done.[K
remote: Compressing objects: 100% (49/49), done.[K
^C
[Errno 2] No such file or directory: 'audioseal'
/content/audiocraft
Obtaining file:///content/audiocraft
  Preparing metadata (setup.py) ... [?25l[?25hdone


In [None]:
# Install ffmpeg for audio processing
!apt-get install -y ffmpeg

## Step 2: Prepare the Dataset 📂

The dataset must be in AudioCraft's required format for training. Here, we use the Librispeech dataset, which is commonly used for speech processing tasks.

- Download the Librispeech dataset (dev-clean subset).
- Extract the dataset to the appropriate directory.

In [1]:
# Download the Librispeech dataset to be used for training
!wget https://www.openslr.org/resources/12/dev-clean.tar.gz -P /content/audiocraft/egs/librispeech

--2024-11-30 21:54:16--  https://www.openslr.org/resources/12/dev-clean.tar.gz
Resolving www.openslr.org (www.openslr.org)... 46.101.158.64
Connecting to www.openslr.org (www.openslr.org)|46.101.158.64|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://openslr.elda.org/resources/12/dev-clean.tar.gz [following]
--2024-11-30 21:54:17--  https://openslr.elda.org/resources/12/dev-clean.tar.gz
Resolving openslr.elda.org (openslr.elda.org)... 141.94.109.138, 2001:41d0:203:ad8a::
Connecting to openslr.elda.org (openslr.elda.org)|141.94.109.138|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 337926286 (322M) [application/x-gzip]
Saving to: ‘/content/audiocraft/egs/librispeech/dev-clean.tar.gz’


2024-11-30 21:54:42 (13.9 MB/s) - ‘/content/audiocraft/egs/librispeech/dev-clean.tar.gz’ saved [337926286/337926286]



In [2]:
# Extract the dataset
!tar -xzf /content/audiocraft/egs/librispeech/dev-clean.tar.gz -C /content/audiocraft/egs/librispeech

We download the dataset from OpenSLR and extract it to `/content/audiocraft/egs/librispeech` using `wget` and `tar` commands. This prepares the dataset for further processing.


## Step 3: Create Dataset Manifests in AudioCraft Format 📝

To convert the Librispeech dataset into AudioCraft's required format, we create manifest files for training, validation, evaluation, and generation splits.

In [3]:
%cd /content/audiocraft

# Create the manifest files for the Librispeech dataset. This is necessary to convert the dataset to AudioCraft's required format
!python -m audiocraft.data.audio_dataset /content/audiocraft/egs/librispeech /content/audiocraft/egs/librispeech/train.jsonl.gz
!python -m audiocraft.data.audio_dataset /content/audiocraft/egs/librispeech /content/audiocraft/egs/librispeech/valid.jsonl.gz
!python -m audiocraft.data.audio_dataset /content/audiocraft/egs/librispeech /content/audiocraft/egs/librispeech/evaluate.jsonl.gz
!python -m audiocraft.data.audio_dataset /content/audiocraft/egs/librispeech /content/audiocraft/egs/librispeech/generate.jsonl.gz

/content/audiocraft
2024-11-30 21:55:09.527766: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-11-30 21:55:09.544386: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-30 21:55:09.565469: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-30 21:55:09.571839: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-30 21:55:09.587033: I tensorflow/core/platf

Manifests are metadata files that describe the dataset and its structure. They allow AudioCraft to efficiently locate and use the audio files. We use `audiocraft.data.audio_dataset` module to create these manifest files.


## Step 4: Create YAML Configuration Files 🛠️

We need a configuration file that defines the dataset and the parameters for the training job. The YAML configuration allows us to customize the training and the dataset we are using.

### Create Librispeech Dataset Configuration 📄

In [4]:
# Create a custom YAML configuration file for the Librispeech dataset
librispeech_yaml = """
datasource:
  max_sample_rate: 16000
  max_channels: 1

  train: egs/librispeech/train.jsonl.gz
  valid: egs/librispeech/valid.jsonl.gz
  evaluate: egs/librispeech/evaluate.jsonl.gz
  generate: egs/librispeech/generate.jsonl.gz
"""

In [5]:
# Save the YAML configuration for Librispeech dataset
with open("/content/audiocraft/config/dset/audio/librispeech.yaml", "w") as file:
    file.write(librispeech_yaml)

This YAML file specifies the paths to the manifest files we created earlier and sets properties like `max_sample_rate` and `max_channels` for the dataset. These properties ensure that all audio files are processed consistently.


### Create Dora Configuration File 📄


In [6]:
# Create a custom YAML configuration for Dora output directory and SLURM partitions
my_config_yaml = """
default:
  dora_dir: /tmp/audiocraft_outputs
  partitions:
    global: null
    team: null
  reference_dir: /tmp
"""

In [7]:
# Save the custom Dora configuration
with open("/content/my_config.yaml", "w") as file:
    file.write(my_config_yaml)

The Dora configuration specifies the directory where outputs and checkpoints will be saved (`dora_dir`). It also sets the SLURM partitions to `null` since we are running the experiment locally without using a cluster.


## Step 5: Set Up Environment Variables and Train with Dora 💻

To ensure everything runs smoothly, we need to set up environment variables. Here, we are running the training with a single epoch for simplicity.


In [8]:
# Set user environment variable to avoid any issues with Dora
import os
os.environ['USER'] = 'colab_user'

In [9]:
# Export Hydra error variable to provide full error messages if something fails
%env HYDRA_FULL_ERROR=1

env: HYDRA_FULL_ERROR=1


# Run Dora to start the training process with 1 epoch on the Librispeech dataset

We set up the environment to avoid any issues with user variables. We also set `HYDRA_FULL_ERROR` to get full error messages in case something goes wrong during the training. Finally, we initiate the training with `dora run`, specifying the configuration file, dataset, and training parameters.


In [10]:
# Run Dora to start the training process with 1 epoch on the Librispeech dataset
!AUDIOCRAFT_CONFIG=/content/my_config.yaml dora run solver=watermark/robustness dset=audio/librispeech dataset.num_workers=0 optim.epochs=1

2024-11-30 21:56:21.414210: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-11-30 21:56:21.431389: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-30 21:56:21.452608: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-30 21:56:21.459021: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-30 21:56:21.474147: I tensorflow/core/platform/cpu_feature_guar

FileNotFoundError: Cannot find file: /tmp/audiocraft_colab_user/xps/81d59a67/checkpoint.th

**⚠️ Note:** If you encounter errors or warnings during execution, it's likely due to Google Colab's resource limitations. This code is designed to run effectively on systems with higher computational resources. Please ignore runtime errors in Colab that stem from these limitations.


## Step 6: Download the Trained Checkpoint 📥

Once the training is complete, the checkpoint will be available in the specified Dora directory. We can download it for further evaluation.


In [11]:
from google.colab import files

# Replace the path with your checkpoint file path
checkpoint_path = "/tmp/audiocraft_colab_user/xps/81d59a67/checkpoint.th"

# Download the checkpoint file to verify the output of training
files.download(checkpoint_path)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The `files.download` method is used to download the checkpoint file, which contains the trained model parameters.

## Step 7: Evaluate the Trained Checkpoint 📊

After training, it is crucial to evaluate the model to assess its performance under different conditions. We use Dora again for evaluation.


In [None]:
# Evaluate the trained checkpoint to assess the model performance with different settings for nbits

!AUDIOCRAFT_CONFIG=/content/my_config.yaml dora run solver=watermark/robustness execute_only=evaluate dset=audio/example continue_from=/tmp/audiocraft_outputs/xps/4b7f280b/checkpoint.th +dummy_watermarker.nbits=16

2024-12-01 05:15:26.056646: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-12-01 05:15:26.074598: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-01 05:15:26.096120: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-01 05:15:26.102636: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-01 05:15:26.118243: I tensorflow/core/platform/cpu_feature_guar

The evaluation script loads the trained model from the checkpoint and assesses its robustness, specifically testing different settings for `nbits` which affects the watermarking strength.


This notebook provides a detailed, step-by-step walkthrough of the entire process of training a new watermarking model using AudioCraft and Dora, including dataset preparation, configuration setup, training, and evaluation. By following these steps, you can quickly set up the environment, train models, and evaluate their performance. 🌟
