# Training a New AudioSeal Watermarking Model Using AudioCraft and Dora 🚀🎶

This notebook demonstrates how to train a new AudioSeal model using AudioCraft and Dora. The training pipeline was developed using AudioCraft (version 0.1.4 and later). We use the Librispeech dataset for this example, and the training involves preparing the dataset, setting up configurations, running a training job using Dora, and finally evaluating the trained model.


### Step-by-Step Guide to Training a New Watermarking Model 📝

## Step 1: Install Dependencies and Set Up Environment ⚙️

First, we need to install all required dependencies for AudioCraft and AudioSeal.

- **Torch and Torchaudio** are necessary for working with audio data and training models.
- **Hydra-core** is used for managing configurations.
- **Dora** is a tool used for training management, specifically for defining and running grid-based experiments.
- **Flashy** is required by AudioCraft for various utility functions.

In [1]:
# Install required packages for AudioCraft and AudioSeal
!pip install torch torchaudio hydra-core dora flashy ipdb

Collecting hydra-core
  Using cached hydra_core-1.3.2-py3-none-any.whl.metadata (5.5 kB)
Collecting dora
  Using cached dora-0.0.3-py3-none-any.whl
Collecting flashy
  Using cached flashy-0.0.2-py3-none-any.whl
Collecting ipdb
  Using cached ipdb-0.13.13-py3-none-any.whl.metadata (14 kB)
Collecting pandas>=0.17.1 (from dora)
  Using cached pandas-2.2.3-cp312-cp312-win_amd64.whl.metadata (19 kB)
Collecting sklearn (from dora)
  Using cached sklearn-0.0.post12-py3-none-any.whl
Collecting dora_search (from flashy)
  Using cached dora_search-0.1.12-py3-none-any.whl
Collecting colorlog (from flashy)
  Using cached colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Collecting pytz>=2020.1 (from pandas>=0.17.1->dora)
  Using cached pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas>=0.17.1->dora)
  Using cached tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting retrying (from dora_search->flashy)
  Using cached retrying-1.3.4-py3-none-any.whl.m



# Clone and install AudioCraft from source

In [11]:
!git --version

git version 2.45.2.windows.1


In [21]:
import torch

In [20]:
!git clone https://github.com/hastagab/audiocraft.git
%cd D:/programs/audioseal/examples/audiocraft
!pip install -e .[wm]

D:\programs\audioseal\examples\audiocraft


fatal: destination path 'audiocraft' already exists and is not an empty directory.


Obtaining file:///D:/programs/audioseal/examples/audiocraft
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Checking if build backend supports build_editable: started
  Checking if build backend supports build_editable: finished with status 'done'
  Getting requirements to build editable: started
  Getting requirements to build editable: finished with status 'done'
  Preparing editable metadata (pyproject.toml): started
  Preparing editable metadata (pyproject.toml): finished with status 'done'
Collecting av==11.0.0 (from audiocraft==1.4.0a1)
  Using cached av-11.0.0-cp312-cp312-win_amd64.whl.metadata (4.7 kB)
Collecting einops (from audiocraft==1.4.0a1)
  Using cached einops-0.8.1-py3-none-any.whl.metadata (13 kB)
Collecting hydra_colorlog (from audiocraft==1.4.0a1)
  Using cached hydra_colorlog-1.2.0-py3-none-any.whl.metadata (949 bytes)
Collecting num2words (from audiocraft==1.4.0a1)
  Using cached num2words-0.5.14-py3-none-any

  error: subprocess-exited-with-error
  
  Getting requirements to build wheel did not run successfully.
  exit code: 1
  
  [20 lines of output]
  Traceback (most recent call last):
    File "D:\programs\audioseal\.venv\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 389, in <module>
      main()
    File "D:\programs\audioseal\.venv\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 373, in main
      json_out["return_val"] = hook(**hook_input["kwargs"])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "D:\programs\audioseal\.venv\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 143, in get_requires_for_build_wheel
      return hook(config_settings)
             ^^^^^^^^^^^^^^^^^^^^^
    File "C:\Users\wuann\AppData\Local\Temp\pip-build-env-3xk_9v_u\overlay\Lib\site-packages\setuptools\build_meta.py", line 331, in get_requires_for_build_wheel
      return self._get_build_re

In [22]:
# Clone and install AudioSeal from source
!git clone https://github.com/facebookresearch/audioseal.git
%cd audioseal
!pip install -e .

D:\programs\audioseal\examples\audiocraft\audioseal


Cloning into 'audioseal'...


Obtaining file:///D:/programs/audioseal/examples/audiocraft/audioseal
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Checking if build backend supports build_editable: started
  Checking if build backend supports build_editable: finished with status 'done'
  Getting requirements to build editable: started
  Getting requirements to build editable: finished with status 'done'
  Preparing editable metadata (pyproject.toml): started
  Preparing editable metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: audioseal
  Building editable for audioseal (pyproject.toml): started
  Building editable for audioseal (pyproject.toml): finished with status 'done'
  Created wheel for audioseal: filename=audioseal-0.1.7-py3-none-any.whl size=5662 sha256=30411e2a2b19a2fca9ad48862f402c56ff1c60c2142bd609129dd3989a3da472
  Stored in directory: C:\Users\wuann\AppData\Local\Temp\pip-ephem-wheel-cache-ta1c17ca\w


[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [28]:
# Install ffmpeg for audio processing
!apt-get install -y ffmpeg

'apt-get' is not recognized as an internal or external command,
operable program or batch file.


In [36]:
!D:/env/ffmpeg-2025-05-07-git-1b643e3f65-essentials_build/bin/ffmpeg.exe -version

ffmpeg version 2025-05-07-git-1b643e3f65-essentials_build-www.gyan.dev Copyright (c) 2000-2025 the FFmpeg developers
built with gcc 15.1.0 (Rev1, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-dxva2 --enable-d3d11va --enable-d3d12va --enable-ffnvcodec --enable-libvpl --enable-nvdec --enable-nvenc --enable-vaapi --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --ena

## Step 2: Prepare the Dataset 📂

The dataset must be in AudioCraft's required format for training. Here, we use the Librispeech dataset, which is commonly used for speech processing tasks.

- Download the Librispeech dataset (dev-clean subset).
- Extract the dataset to the appropriate directory.

In [40]:
# Download the Librispeech dataset to be used for training
!D:/env/wget-1.21.4-win64/wget.exe https://www.openslr.org/resources/12/dev-clean.tar.gz -P /content/audiocraft/egs/librispeech

--2025-05-09 18:38:05--  https://www.openslr.org/resources/12/dev-clean.tar.gz
Resolving www.openslr.org (www.openslr.org)... 46.101.158.64
Connecting to www.openslr.org (www.openslr.org)|46.101.158.64|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://openslr.trmal.net/resources/12/dev-clean.tar.gz [following]
--2025-05-09 18:38:10--  https://openslr.trmal.net/resources/12/dev-clean.tar.gz
Resolving openslr.trmal.net (openslr.trmal.net)... 136.243.171.4
Connecting to openslr.trmal.net (openslr.trmal.net)|136.243.171.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 337926286 (322M) [application/x-gzip]
Saving to: '/content/audiocraft/egs/librispeech/dev-clean.tar.gz'

     0K .......... .......... .......... .......... ..........  0% 85.6K 64m14s
    50K .......... .......... .......... .......... ..........  0%  171K 48m9s
   100K .......... .......... .......... .......... ..........  0% 18.7M 32m11s
   150K .......... ..

In [44]:
# Extract the dataset
!tar -xzf D:/programs/audioseal/examples/audiocraft/egs/librispeech/dev-clean.tar.gz -C D:/programs/audioseal/examples/audiocraft/egs/librispeech

tar (child): Cannot connect to D: resolve failed

gzip: stdin: unexpected end of file
tar: Child returned status 128
tar: Error is not recoverable: exiting now


We download the dataset from OpenSLR and extract it to `/content/audiocraft/egs/librispeech` using `wget` and `tar` commands. This prepares the dataset for further processing.


## Step 3: Create Dataset Manifests in AudioCraft Format 📝

To convert the Librispeech dataset into AudioCraft's required format, we create manifest files for training, validation, evaluation, and generation splits.

In [45]:
%cd D:/programs/audioseal/examples/audiocraft

# Create the manifest files for the Librispeech dataset. This is necessary to convert the dataset to AudioCraft's required format
!python -m audiocraft.data.audio_dataset D:/programs/audioseal/examples/audiocraft/egs/librispeech /content/audiocraft/egs/librispeech/train.jsonl.gz
!python -m audiocraft.data.audio_dataset /content/audiocraft/egs/librispeech /content/audiocraft/egs/librispeech/valid.jsonl.gz
!python -m audiocraft.data.audio_dataset /content/audiocraft/egs/librispeech /content/audiocraft/egs/librispeech/evaluate.jsonl.gz
!python -m audiocraft.data.audio_dataset /content/audiocraft/egs/librispeech /content/audiocraft/egs/librispeech/generate.jsonl.gz

D:\programs\audioseal\examples\audiocraft


Traceback (most recent call last):
  File "<frozen runpy>", line 189, in _run_module_as_main
  File "<frozen runpy>", line 112, in _get_module_details
  File "D:\programs\audioseal\examples\audiocraft\audiocraft\__init__.py", line 24, in <module>
    from . import data, modules, models
  File "D:\programs\audioseal\examples\audiocraft\audiocraft\data\__init__.py", line 10, in <module>
    from . import audio, audio_dataset, info_audio_dataset, music_dataset, sound_dataset
  File "D:\programs\audioseal\examples\audiocraft\audiocraft\data\audio.py", line 22, in <module>
    import av
ModuleNotFoundError: No module named 'av'
Traceback (most recent call last):
  File "<frozen runpy>", line 189, in _run_module_as_main
  File "<frozen runpy>", line 112, in _get_module_details
  File "D:\programs\audioseal\examples\audiocraft\audiocraft\__init__.py", line 24, in <module>
    from . import data, modules, models
  File "D:\programs\audioseal\examples\audiocraft\audiocraft\data\__init__.py", li

Manifests are metadata files that describe the dataset and its structure. They allow AudioCraft to efficiently locate and use the audio files. We use `audiocraft.data.audio_dataset` module to create these manifest files.


## Step 4: Create YAML Configuration Files 🛠️

We need a configuration file that defines the dataset and the parameters for the training job. The YAML configuration allows us to customize the training and the dataset we are using.

### Create Librispeech Dataset Configuration 📄

In [4]:
# Create a custom YAML configuration file for the Librispeech dataset
librispeech_yaml = """
datasource:
  max_sample_rate: 16000
  max_channels: 1

  train: egs/librispeech/train.jsonl.gz
  valid: egs/librispeech/valid.jsonl.gz
  evaluate: egs/librispeech/evaluate.jsonl.gz
  generate: egs/librispeech/generate.jsonl.gz
"""

In [5]:
# Save the YAML configuration for Librispeech dataset
with open("/content/audiocraft/config/dset/audio/librispeech.yaml", "w") as file:
    file.write(librispeech_yaml)

This YAML file specifies the paths to the manifest files we created earlier and sets properties like `max_sample_rate` and `max_channels` for the dataset. These properties ensure that all audio files are processed consistently.


### Create Dora Configuration File 📄


In [6]:
# Create a custom YAML configuration for Dora output directory and SLURM partitions
my_config_yaml = """
default:
  dora_dir: /tmp/audiocraft_outputs
  partitions:
    global: null
    team: null
  reference_dir: /tmp
"""

In [7]:
# Save the custom Dora configuration
with open("/content/my_config.yaml", "w") as file:
    file.write(my_config_yaml)

The Dora configuration specifies the directory where outputs and checkpoints will be saved (`dora_dir`). It also sets the SLURM partitions to `null` since we are running the experiment locally without using a cluster.


## Step 5: Set Up Environment Variables and Train with Dora 💻

To ensure everything runs smoothly, we need to set up environment variables. Here, we are running the training with a single epoch for simplicity.


In [46]:
# Set user environment variable to avoid any issues with Dora
import os
os.environ['USER'] = 'colab_user'

In [47]:
# Export Hydra error variable to provide full error messages if something fails
%env HYDRA_FULL_ERROR=1

env: HYDRA_FULL_ERROR=1


# Run Dora to start the training process with 1 epoch on the Librispeech dataset

We set up the environment to avoid any issues with user variables. We also set `HYDRA_FULL_ERROR` to get full error messages in case something goes wrong during the training. Finally, we initiate the training with `dora run`, specifying the configuration file, dataset, and training parameters.


In [10]:
# Run Dora to start the training process with 1 epoch on the Librispeech dataset
!AUDIOCRAFT_CONFIG=/content/my_config.yaml dora run solver=watermark/robustness dset=audio/librispeech dataset.num_workers=0 optim.epochs=1

2024-11-30 21:56:21.414210: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-11-30 21:56:21.431389: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-30 21:56:21.452608: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-30 21:56:21.459021: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-30 21:56:21.474147: I tensorflow/core/platform/cpu_feature_guar

FileNotFoundError: Cannot find file: /tmp/audiocraft_colab_user/xps/81d59a67/checkpoint.th

**⚠️ Note:** If you encounter errors or warnings during execution, it's likely due to Google Colab's resource limitations. This code is designed to run effectively on systems with higher computational resources. Please ignore runtime errors in Colab that stem from these limitations.


## Step 6: Download the Trained Checkpoint 📥

Once the training is complete, the checkpoint will be available in the specified Dora directory. We can download it for further evaluation.


In [48]:
from google.colab import files

# Replace the path with your checkpoint file path
checkpoint_path = "/tmp/audiocraft_colab_user/xps/81d59a67/checkpoint.th"

# Download the checkpoint file to verify the output of training
files.download(checkpoint_path)

ModuleNotFoundError: No module named 'google'

The `files.download` method is used to download the checkpoint file, which contains the trained model parameters.

## Step 7: Evaluate the Trained Checkpoint 📊

After training, it is crucial to evaluate the model to assess its performance under different conditions. We use Dora again for evaluation.


In [None]:
# Evaluate the trained checkpoint to assess the model performance with different settings for nbits

!AUDIOCRAFT_CONFIG=/content/my_config.yaml dora run solver=watermark/robustness execute_only=evaluate dset=audio/example continue_from=/tmp/audiocraft_outputs/xps/4b7f280b/checkpoint.th +dummy_watermarker.nbits=16

2024-12-01 05:15:26.056646: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-12-01 05:15:26.074598: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-01 05:15:26.096120: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-01 05:15:26.102636: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-01 05:15:26.118243: I tensorflow/core/platform/cpu_feature_guar

The evaluation script loads the trained model from the checkpoint and assesses its robustness, specifically testing different settings for `nbits` which affects the watermarking strength.


This notebook provides a detailed, step-by-step walkthrough of the entire process of training a new watermarking model using AudioCraft and Dora, including dataset preparation, configuration setup, training, and evaluation. By following these steps, you can quickly set up the environment, train models, and evaluate their performance. 🌟
