# Audio Watermarking: Examples

This notebook shows how we can use Omniseal Bench Python APIs to watermark audios with different models, and to apply attacks then detect the watermarks and print out different metrics

## Example 1: Quick start

In the following example, we load an example dataset, then use public WavMark model to watermark each audio. Next, we run the attacks and detect if watermarks surive them. Finally, we report the quality metrics and detection scores.

In [None]:
# Some Jupyter kernels check TORCH_DISTRIBUTED_DEBUG and throw an error if not found
import os
os.environ["TORCH_DISTRIBUTED_DEBUG"] = "OFF"

# Sanity check to make sure you run the right Jupyter Kernel
import sys
sys.executable

'/private/home/tuantran/.conda/envs/omnisealbench/bin/python'

In [4]:
from omnisealbench import task, get_model

e2e_task = task(
        "default",
        modality="audio",


        # Dataset options
        dataset_type="local",  # (Only supported in Audio) type: 'hf', 'local'
        dataset_dir="../examples/audios",
        audio_pattern="*.wav",
        # if dataset_type = 'hf'
        # dataset_name="hf-internal-testing/librispeech_asr_dummy",
        # dataset_hf_subset="clean",
        # dataset_split="validation[:4]",

        # Data loading options:
        sample_rate=24_000,
        padding_strategy="longest",  # Supported padding: 'fixed', 'longest'
        # If padding_strategy='fixed', we need to specify max_length to pad or truncate all audios to a fixed length
        # max_length=24000,  # 2 seconds of audio at 24kHz 
        batch_size=2,
        num_workers=2,
        
        # Execution options
        # attacks: Loading attack by name from attack registry (src/omnisealbench/attacks/audio_attacks.yaml) . 
        # You can specify  `attacks="all"` for all attacks
        # Or you can also specify the path to the custom attacks, i.e. `attacks="../src/omnisealbench/attacks/audio_attacks_paper.yaml"`
        attacks=["updownresample"],
                                     
        # metrics: For audios, metrics can be "stoi", "pesq", "snr", "sisnr" 
        # You can also specify `metricsd="all"` for all metrics
        # If left empty `metrics=[]`, no quality metrics are computed
        metrics=["pesq", "snr"],  
        seed=42,
    )

# Load model card by name (src/omnisealbench/cards/wavmark_fast.yaml)
model = get_model("wavmark_fast", device="cuda")


avg_results, raw_results = e2e_task(model)

omnisealbench.models.wavmark.build_model has the following dependencies: ['wavmark']


Running AudioWatermarkAttacksAndDetection with attack: no-attack
Running AudioWatermarkAttacksAndDetection with attack: updownresample


### Explanation of the results:

1) `avg_results`: Set of average metrics. For each metric, we compute the average score over the evaluation data, and return:
    - `avg`: mean value
    - `count`: Number of items
    - `square`: Mean of the squares of the values
    - `avg_ci_fn`: A function applied to the bounds of the confidence interval

2) `raw_results` - Raw scores: For each attack, we have a dictionary of a metric scores, while each score, we have the full list of `n` individual scores.

In [5]:
avg_results

# Above we have 1 single audio, 2 attacks ("updownresample" and the default "no-attack", which will always be run), 
# so 2 items in total. `avg_results` has 15 metrics, falling in:

# - Quality metrics: "snr", "pesq"
# - Profiling time: 
#    - "qual_time" (average time to run quality metrics), 
#    - "det_time" (average time to run detection), 
#    - "attack_time" (average time to run attacks), 
#    - "decoder_time" (average time to decode messages)
# - Detection scores: (All remaining scores)

{'watermark_det_score': AverageMetric(avg=0.9596928954124451, count=2, square=0.9210104535051222, avg_ci_fn=None),
 'watermark_det': AverageMetric(avg=1.0, count=2, square=1.0, avg_ci_fn=None),
 'fake_det_score': AverageMetric(avg=0.39517925679683685, count=2, square=0.15616670880399797, avg_ci_fn=None),
 'fake_det': AverageMetric(avg=0.0, count=2, square=0.0, avg_ci_fn=None),
 'bit_acc': AverageMetric(avg=0.6875, count=2, square=0.47265625, avg_ci_fn=None),
 'word_acc': AverageMetric(avg=0.0, count=2, square=0.0, avg_ci_fn=None),
 'p_value': AverageMetric(avg=0.1050567626953125, count=2, square=0.011036923388019204, avg_ci_fn=None),
 'capacity': AverageMetric(avg=1.6633882522583008, count=2, square=2.7668604777509245, avg_ci_fn=None),
 'log10_p_value': AverageMetric(avg=-0.9785759860307567, count=2, square=0.9576109604360677, avg_ci_fn=None),
 'snr': AverageMetric(avg=18.41377353668213, count=2, square=339.0670558614602, avg_ci_fn=None),
 'pesq': AverageMetric(avg=3.4214930534362793, 

In [None]:
# Here `raw_results` has 2 dictionary for each attack ("no-attack" and "updownresample"). For each dictionary, we have 15 keys corresponding to 15 metrics, and each value is a list of 1 scores.

assert len(raw_results) == 2
raw_results[0]

defaultdict(list,
            {'watermark_det_score': [0.9596928954124451],
             'watermark_det': [True],
             'fake_det_score': [0.39543184638023376],
             'fake_det': [False],
             'bit_acc': [0.6875],
             'word_acc': [False],
             'p_value': [0.1050567626953125],
             'capacity': [1.6633882522583008],
             'log10_p_value': [-0.9785759860307567],
             'snr': [18.413808822631836],
             'pesq': [3.4214959144592285],
             'decoder_time': [0.5477],
             'qual_time': [0.6647],
             'det_time': [0.1494],
             'attack_time': [0.0],
             'idx': [0]})


<hr/>

### Concepts:
#### Task:

Omniseal Bench Python API consists of 2 main building blocks: **Task** and **Model**. The Task defines the dataset to be used for the evaluation, how to load data (batch size, padding, etc.) and metrics to run. 

To construct the task, we use `omnisealbench.task()`. To construct the task, you must specify at least 2 arguments: Task type and modality. The modality can be "audio", "image", "video". The task type can be one of the three:

- **generation**: Run a generator over specific dataset to watermark each of its items, and return the quality metrics of the watermarked audios.

- **detection**: Apply a specific set of attacks over a watermarked dataset, then run detector and report the detection scores (robustness) and as well as quality metrics for each attack.

- **default**: Run end-to-end genration and detection.



In [1]:
# Get all available tasks with `omnisealbench.list_tasks` API:

from omnisealbench import list_tasks

list_tasks()

- "('generation', 'audio')": Running audio watermarking on a HuggingFace dataset and evaluating the quality of the generated watermarks. (class/function: omnisealbench.tasks.audio.evaluate_audio_watermark_generation)
- "('detection', 'audio')": Running audio watermarking on a custom dataset and evaluating the quality and robustness of the generated watermarks. (class/function: omnisealbench.tasks.audio.evaluate_audio_watermark_attacks_and_detection)
- "('default', 'audio')": Running audio watermarking end-to-end (class/function: omnisealbench.tasks.audio.evaluate_audio_watermark_end2end)
- "('default', 'image')": Running image watermarking end-to-end (class/function: omnisealbench.tasks.image.evaluate_image_watermark_end2end)
- "('generation', 'image')": Running image watermarking on a local dataset and evaluating the quality of the generated watermarks. (class/function: omnisealbench.tasks.image.evaluate_image_watermark_generation)
- "('detection', 'image')": Running image watermarkin

Each task expects a specific list of arguments, falling into three categories:
- _input data_: Specify how dataset is loaded into the task. Note that the dataset is only defined at the task definition task, the actual data loading happens when we execute the task.

- _execution_: Define the behaviour of the execution such as metrics, attacks, device, seed for reproducibility, 

- _output handling_: Define how the results are handled. These includes result directory and filename, whether to keep intermediate data in cache, whether we should log details some small subset of data for inspection, etc.


To get list of arguments for a task, call "explain()":

In [1]:
from omnisealbench import explain

print(explain("default", modality="audio"))

  from .autonotebook import tqdm as notebook_tqdm



    End-to-end watermarking-attack-detection task for a given dataset. This task is a sequence of two individual tasks,
    "evaluate_audio_watermark_generation" and "evaluate_audio_watermark_attacks_and_detection", and it re-uses most of the
    arguments from those tasks.

    Similar to generation task, 4 types of datasets are supported:
    
    1. HuggingFace dataset:
       This type of dataset is loaded from the HuggingFace Hub. The dataset is specified by the parameter `dataset_name`.
       For this to work, the `datasets` library must be installed.

    2. Local audio dataset:
       This loads audio files of a specific patterns from a local directory specified by the parameter `dataset_dir`.

    3. Custom data reader function:
       This allows users to provide their own function for reading audio data. The function should return an iterator of
       data batches of type `omnisealbench.data.base.InputBatch`. This option is when `data_reader` is specified.

    4. On-the-

#### Model

A model is a wrapper over the watermarking model, that converts the watermarking output to the format that the task understands. Please refer to the notebook [Model.ipynb](Model.ipynb) for more detailed instructions.

Omniseal Bench provides some ready-to-use wrapper for the popular Audio watermarking models: 
- [AudioSeal](https://github.com/facebookresearch/audioseal)
- [Wavmark](https://github.com/wavmark/wavmark)
- [Timbre](https://github.com/TimbreWatermarking/TimbreWatermarking)

The models are registered via model cards in "src/omnisealbench/cards". To get the model, we call `get_model()` and pass the name of the model card (e.g. "wavmark_fast"), plus any free argument to override the attributes defined in the card.


#### Attacks:

For each modality, we have implemented different attacks in `omnisealbench.attacks`. The attack can be further parameterized to different _attack variant_, with concrete function parameter values. 

For convenience, we provide some fixed set of attacks and attack variants in the _attack registry_, which is a YAML file defining how to create the attack and attack variants. These registries can be found in "src/omnisealbench/attacks" directory (For example, "src/omnisealbench/attacks/audio_attacks.yaml").


For more details on attacks, see the notebook [Attacks](Attacks.ipynb).

## Example 2: Caching and saving data

By default the API will not save the final result and data. To save the data to external files for further inspection, we pass the "cache_dir" and "result_dir":

In [None]:
# We run the notebook within the root directory, adjust the directory path

run_and_save = task(
    "default",
    modality="audio",
    
    # Dataset options
    dataset_type="local",
    dataset_dir="examples/audios",
    audio_pattern="*.wav",
    
    # Data loading options
    padding_strategy="fixed",
    max_length=16000 * 4,  # 4 seconds of audio at 16kHz
    sample_rate=16000,

    # Cache directory is to save the intermediate watermarked audios
    cache_dir="watermark_results",
    
    # Directory to save the final results.
    # This consists of several sub-directories, each correspond to the evaluation of one attack
    result_dir="final_results",
    overwrite=True, # IF true, all existing data in cache_dir and result_dir will be overwritten
    
    # Choose which items to save. 
    # We can define to save all data to hard disk ("all"), 
    # or a list of specific ids ("0,1,2"), 
    # or a range of ids ("0-3"),
    # or mixed of them ("0,2,3,4-8").
    ids_to_save="all",  # Save all audios
    
    attacks="tests/audio/audio_attacks_mini.yaml",
    metrics=[],  # We can skip the quality metrics and only report detection scores + runtime
)

model = get_model("wavmark_fast", device="cuda")

avg_metrics, raw_results = run_and_save(model)

omnisealbench.models.wavmark.build_model has the following dependencies: ['wavmark']


Result watermark_results exists and will be overriden as --overwrite is set
Result final_results exists and will be overriden as --overwrite is set
Running AudioWatermarkAttacksAndDetection with attack: no-attack
Running AudioWatermarkAttacksAndDetection with attack: updownresample
Running AudioWatermarkAttacksAndDetection with attack: bandpass_filter
Running AudioWatermarkAttacksAndDetection with attack: speed_julius__speed_1.25
Running AudioWatermarkAttacksAndDetection with attack: pink_noise__noise_std_1.0
Running AudioWatermarkAttacksAndDetection with attack: lowpass_filter__cutoff_freq_5000
Running AudioWatermarkAttacksAndDetection with attack: mp3_compression__bitrate_256k


: 

In [17]:
import os
def print_directory_tree(start_path, indent=""):
    for item in os.listdir(start_path):
        path = os.path.join(start_path, item)
        if os.path.isdir(path):
            print(f"{indent}üìÅ {item}/")
            print_directory_tree(path, indent + "¬†¬†¬† ")
        else:
            print(f"{indent}üìÑ {item}")

            
print_directory_tree("final_results")

üìÅ no-attack/
¬†¬†¬† üìÑ attack_results.jsonl
¬†¬†¬† üìÑ metrics.json
üìÅ updownresample/
¬†¬†¬† üìÑ attack_results.jsonl
¬†¬†¬† üìÑ metrics.json
üìÅ bandpass_filter/
¬†¬†¬† üìÑ attack_results.jsonl
¬†¬†¬† üìÑ metrics.json
üìÅ speed_julius__speed_1.25/
¬†¬†¬† üìÑ attack_results.jsonl
¬†¬†¬† üìÑ metrics.json
üìÅ pink_noise__noise_std_1.0/
¬†¬†¬† üìÑ attack_results.jsonl
¬†¬†¬† üìÑ metrics.json
üìÅ lowpass_filter__cutoff_freq_5000/
¬†¬†¬† üìÑ attack_results.jsonl
¬†¬†¬† üìÑ metrics.json
üìÅ mp3_compression__bitrate_256k/
¬†¬†¬† üìÑ attack_results.jsonl
¬†¬†¬† üìÑ metrics.json
üìÑ metrics.json
üìÑ report.csv


### Explanation: 

- In this new task, we load local audiodaset by specifying `dataset_type="local"` and `dataset_dir`. 

- Instead of passing attacks name, we can pass a custom attack registry file to the `attacks` argument.

- The final results are stored in the directory specified by `result_dir`. This consists of several sub-directories, each correspond to a task. In the above example, the YAML file "audio_attacks_mini.yaml" consists of 6 attacks, so we have 7 sub-directories (including the default "no-attack") under the result directory.

- For each task, we also have average metrics and raw results, similar to Example 1:
   - The average metrics are saved to the file "metrics.json". For each metric, the value is saved as a list of three values "avg", "count", "square", corresponding to the object `AverageMetric`. 
   - The raw results are saved to the file "attacks_results.jsonl". Since this is saved per attack, we have only one JSON line for each sub-directory.

- `cache_dir`: The watermark is only generated one time, and the output will be saved to `cache_dir` and are used by detection tasks of all attacks.

- `overwrite`: Whether we should overwrite all existing files and results in the `result_dir` and `cache_dir`. By default, if the task finds these existing results, it will just load them and skip the execution.

- `ids_to_save`: Choose which items to save. We can define to save all data to hard disk ("all"), or a list of specific ids ("0,1,2"), or a range of ids ("0-3") or mixed of them ("0,2,3,4-8").

### Cache:

> ‚ö†Ô∏è **Warning:** Right now the watermark generation WILL ALWAYS be cached before running the detection. If `cache_dir` is not specified, Omniseal Bench will use the default cache location in "~/.cache/omniseal/tmp" instead. The consequence is the directory size might grow significantly over time. 
We are implementing the "streaming" fashion ("keep_data_in_memory") to support quick analysis scenario on small dataset, where no data is cached. THis feature will be added soon, but until then, consider cleaning the cache after a while to save storage

### Final report:

The final report is saved to "report.csv" which consists of average metrics over all attack variants. This can also be obtained programmatically with "print_scores()" API:

In [20]:
report = run_and_save.print_scores(raw_results)
report

Unnamed: 0,watermark_det_score,watermark_det,fake_det_score,fake_det,bit_acc,word_acc,p_value,capacity,log10_p_value,decoder_time,qual_time,det_time,attack_time,idx,attack,attack_variant,cat
0,1.0,True,0.39495,False,0.6875,False,0.105057,1.663388,-0.978576,0.0863,0.0,0.0013,0.0,0,no-attack,default,none
1,1.0,True,0.413615,False,0.6875,False,0.105057,1.663388,-0.978576,0.0841,0.0,0.0012,0.0022,0,updownresample,default,TimeDomain
2,1.0,True,0.390335,False,0.6875,False,0.105057,1.663388,-0.978576,0.0836,0.0,0.0012,0.0023,0,bandpass_filter,default,AmplitudeDomain
3,0.37168,False,0.40864,False,0.6875,False,0.105057,1.663388,-0.978576,0.1678,0.0,0.0014,0.0017,0,speed_julius,speed_1.25,TimeDomain
4,0.431641,False,0.39426,False,0.6875,False,0.105057,1.663388,-0.978576,0.1325,0.0,0.0013,0.0016,0,pink_noise,noise_std_1.0,AmplitudeDomain
5,1.0,True,0.398079,False,0.6875,False,0.105057,1.663388,-0.978576,0.0886,0.0,0.0012,0.0011,0,lowpass_filter,cutoff_freq_5000,AmplitudeDomain
6,0.921875,True,0.378913,False,0.6875,False,0.105057,1.663388,-0.978576,0.0876,0.0,0.0012,0.0263,0,mp3_compression,bitrate_256k,Compression


## Example 3: Generating watermarks only

If you want to generate watermarks only (and optionally look at the quality metrics) without running any attacks, use the "generation" task:

In [29]:
# In some CUDAs in FAIR cluster, calling AudioSeal compile() results in torch dynamo error, so we disable compile() with NO_TORCH_COMPILE
os.environ["NO_TORCH_COMPILE"] = "1"
import warnings

# Suppress all warnings in this cell
warnings.filterwarnings("ignore")


generation_task = task(
    "generation",
    modality="audio",
    dataset_type="hf",
    dataset_name="hf-internal-testing/librispeech_asr_dummy",
    dataset_hf_subset="clean",
    dataset_split="validation",
    num_samples=8,
    result_dir="watermark_librispeech",  # We need to specify the result dir to pass to the detecion task
    how_to_generate_message="per_dataset",  # Options: 'per_batch', 'per_dataset'
    overwrite=True,
    ids_to_save="0-4",  # Save all audios
    padding_strategy="longest",
    sample_rate=16000,
    metrics=["snr", "stoi"],  # We just generate the watermark for the detection, so we don't need to compute any metrics
)

generator = get_model("audioseal", as_type="generator", model_card_or_path="audioseal_wm_16bits", device="cuda")

avg_metrics, raw_results = generation_task(generator)

omnisealbench.models.audioseal.build_generator has the following dependencies: ['audioseal']


  WeightNorm.apply(module, name, dim)


Result watermark_librispeech exists and will be overriden as --overwrite is set
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.


### Explanation:

- In this example, we load the full 'validation' split and then filter to the first 8 items with 'num_samples=8'

- We demonstrate another padding strategy (pad to the longest). `max_length` is not required here

- Save the watermarked results to `result_dir`. **NOTE**: In this task, no cache will be created, as we do not call detection task afterwards.

- We save the first 5 items (ids from 0 to 4, inclusive) to the `result_dir`


NOTE: We pass `as_type` argument to the `get_model()` call to indicate that we only load the generator checkpoint from the model. See [Model](Model.ipynb) for more details.

In [28]:
print_directory_tree("watermark_librispeech")

üìÅ watermarks/
¬†¬†¬† üìÑ watermarked_0.wav
¬†¬†¬† üìÑ message_0.txt
¬†¬†¬† üìÑ watermarked_1.wav
¬†¬†¬† üìÑ message_1.txt
¬†¬†¬† üìÑ watermarked_2.wav
¬†¬†¬† üìÑ message_2.txt
¬†¬†¬† üìÑ watermarked_3.wav
¬†¬†¬† üìÑ message_3.txt
¬†¬†¬† üìÑ watermarked_4.wav
¬†¬†¬† üìÑ message_4.txt
üìÑ watermark_results.jsonl
üìÑ metrics.json


The content of `avg_metrics` and `raw_results` are saved to two files `metrics.json` and `watermark_results.jsonl` accordingly.

For each of the saved items, we save the watermarked audio and secret message separately. Because we specify `how_to_generate_message="per_dataset"`, all "message_*.txt" should contain the same secret binary string.

If we want to save the original data too, specify the path in `data_subdir`:

In [30]:
# In some CUDAs in FAIR cluster, calling AudioSeal compile() results in torch dynamo error, so we disable compile() with NO_TORCH_COMPILE
os.environ["NO_TORCH_COMPILE"] = "1"
import warnings

# Suppress all warnings in this cell
warnings.filterwarnings("ignore")


generation_task_2 = task(
    "generation",
    modality="audio",
    dataset_type="hf",
    dataset_name="hf-internal-testing/librispeech_asr_dummy",
    dataset_hf_subset="clean",
    dataset_split="validation",
    num_samples=8,
    result_dir="watermark_librispeech",
    data_subdir="data",
    how_to_generate_message="per_dataset",
    overwrite=True,
    ids_to_save="0-4",  # Save all audios
    padding_strategy="longest",
    sample_rate=16000,
    metrics=["snr", "stoi"],  # We just generate the watermark for the detection, so we don't need to compute any metrics
)

avg_metrics, raw_results = generation_task_2(generator)

Result watermark_librispeech exists and will be overriden as --overwrite is set
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.


In [31]:
print_directory_tree("watermark_librispeech")

üìÅ watermarks/
¬†¬†¬† üìÑ watermarked_0.wav
¬†¬†¬† üìÑ message_0.txt
¬†¬†¬† üìÑ watermarked_1.wav
¬†¬†¬† üìÑ message_1.txt
¬†¬†¬† üìÑ watermarked_2.wav
¬†¬†¬† üìÑ message_2.txt
¬†¬†¬† üìÑ watermarked_3.wav
¬†¬†¬† üìÑ message_3.txt
¬†¬†¬† üìÑ watermarked_4.wav
¬†¬†¬† üìÑ message_4.txt
üìÅ data/
¬†¬†¬† üìÑ original_0.wav
¬†¬†¬† üìÑ original_1.wav
¬†¬†¬† üìÑ original_2.wav
¬†¬†¬† üìÑ original_3.wav
¬†¬†¬† üìÑ original_4.wav
üìÑ watermark_results.jsonl
üìÑ metrics.json


## Example 4: Run attacks and detection from an generated watermarked audios

This scenario is useful when we just want to evaluate the model that generate the watermarks externally, for example the latent watermarking model.


In [35]:
import warnings

# Suppress all warnings in this cell
warnings.filterwarnings("ignore")

detection_task = task(
    "detection",
    modality="audio",
    dataset_dir="watermark_librispeech",
    audio_pattern="data/*.wav",
    watermarked_audio_pattern="watermarks/*.wav",  # We just fake the watermarked audios
    message_pattern="watermarks/message_*.txt",
    metrics=["pesq"],  # We can add one more quality metric that was not previously computed
    padding_strategy="longest",
    sample_rate=16000,
    result_dir="detection_librispeech",
    overwrite=True,
    attacks=["white_noise", "bandpass_filter"],
    batch_size=2,
)

detector = get_model("audioseal", as_type="detector", model_card_or_path="audioseal_detector_16bits", device="cuda")

avg_metrics, raw_results = detection_task(detector)

omnisealbench.models.audioseal.build_detector has the following dependencies: ['audioseal']
Running AudioWatermarkAttacksAndDetection with attack: no-attack
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.


If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
Running AudioWatermarkAttacksAndDetection with attack: bandpass_filter
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.
If you specify a sample rate, this will be ignored.


In [36]:
detection_task.print_scores(raw_results)

Unnamed: 0,watermark_det_score,watermark_det,fake_det_score,fake_det,bit_acc,word_acc,p_value,capacity,log10_p_value,pesq,decoder_time,qual_time,det_time,attack_time,idx,attack,attack_variant,cat
0,0.952818,True,0.0,False,1.0,True,1.5e-05,16.0,-4.81648,4.585849,0.0337,2.5349,0.0008,0.0,0,no-attack,default,none
1,0.821221,True,0.001911,False,1.0,True,1.5e-05,16.0,-4.81648,4.56419,0.0337,2.5349,0.0008,0.0,1,no-attack,default,none
2,0.947197,True,0.0,False,1.0,True,1.5e-05,16.0,-4.81648,4.579144,0.021,2.7285,0.001,0.0,2,no-attack,default,none
3,0.779816,True,0.0,False,1.0,True,1.5e-05,16.0,-4.81648,4.590139,0.021,2.7285,0.001,0.0,3,no-attack,default,none
4,0.903344,True,0.002564,False,1.0,True,1.5e-05,16.0,-4.81648,4.593348,0.05,5.2462,0.0017,0.0,4,no-attack,default,none
5,0.918574,True,0.0,False,1.0,True,1.5e-05,16.0,-4.81648,4.576352,0.0088,2.4683,0.0009,0.0483,0,bandpass_filter,default,AmplitudeDomain
6,0.82231,True,0.0,False,1.0,True,1.5e-05,16.0,-4.81648,4.538594,0.0088,2.4683,0.0009,0.0483,1,bandpass_filter,default,AmplitudeDomain
7,0.930411,True,0.0,False,1.0,True,1.5e-05,16.0,-4.81648,4.564594,0.0123,2.5949,0.001,0.0467,2,bandpass_filter,default,AmplitudeDomain
8,0.718272,True,0.0,False,1.0,True,1.5e-05,16.0,-4.81648,4.579339,0.0123,2.5949,0.001,0.0467,3,bandpass_filter,default,AmplitudeDomain
9,0.9026,True,0.04301,False,1.0,True,1.5e-05,16.0,-4.81648,4.58362,0.0477,5.5001,0.0017,0.0891,4,bandpass_filter,default,AmplitudeDomain


## Example 5: Run Evaluation with custom data

This scenario demonstrates how to load a user-defined data iterator and evalute end-to-end

In [1]:
import torch

from omnisealbench.data.base import InputBatch
from omnisealbench import task, get_model


custom_data_task = task(
    "default",
    modality="audio",
    
    # For user-defined data, we do not have any dataset or dataloading options
    
    # Cache directory is to save the intermediate watermarked audios
    cache_dir="watermark_custom_results",
    
    # Directory to save the final results.
    # This consists of several sub-directories, each correspond to the evaluation of one attack
    result_dir="final_custom_results",
    overwrite=True, # IF true, all existing data in cache_dir and result_dir will be overwritten
    
    # We do not specify which ids_to_save with custom data, because there might not exists concept of IDs
    # in the custom data
    # ids_to_save="all",  # Save all audios
    
    attacks="tests/audio/audio_attacks_mini.yaml",
    metrics=["snr"],  # We can skip the quality metrics and only report detection scores + runtime
)

model = get_model("wavmark_fast", device="cuda")


# a fake evaluation data loader that has 8 batches of 4 audio samples, each
# with 1 channel and 4 seconds of audio at 16kHz
# In OmniSealBench, each data loader must return a list of `InputBatch` objects,
user_defined_data = [
    InputBatch(torch.tensor(range(i, i + 4)), torch.randn(4, 1, 24000 * 4))
    for i in range(0, 32, 4)
]


avg_metrics, raw_results = custom_data_task(model, batches=user_defined_data)

  from .autonotebook import tqdm as notebook_tqdm


omnisealbench.models.wavmark.build_model has the following dependencies: ['wavmark']
Running AudioWatermarkAttacksAndDetection with attack: no-attack


In [3]:
avg_metrics

{'watermark_det_score': AverageMetric(avg=0.375, count=32, square=0.140625, avg_ci_fn=None),
 'watermark_det': AverageMetric(avg=0.0, count=32, square=0.0, avg_ci_fn=None),
 'fake_det_score': AverageMetric(avg=0.375, count=32, square=0.140625, avg_ci_fn=None),
 'fake_det': AverageMetric(avg=0.0, count=32, square=0.0, avg_ci_fn=None),
 'bit_acc': AverageMetric(avg=0.421875, count=32, square=0.20703125, avg_ci_fn=None),
 'word_acc': AverageMetric(avg=0.0, count=32, square=0.0, avg_ci_fn=None),
 'p_value': AverageMetric(avg=0.7141342163085938, count=32, square=0.6318017898593098, avg_ci_fn=None),
 'capacity': AverageMetric(avg=1.6845126152038574, count=32, square=4.185459876364575, avg_ci_fn=None),
 'log10_p_value': AverageMetric(avg=-0.29078994494562627, count=32, square=0.3054619264695369, avg_ci_fn=None),
 'snr': AverageMetric(avg=58.6848121881485, count=32, square=3443.9447498994255, avg_ci_fn=None),
 'decoder_time': AverageMetric(avg=2.065990625, count=32, square=4.271666096562499, a