Skip to content

sony/silentcipher

Repository files navigation

🔊 SilentCipher: Deep Audio Watermarking: Link to arxiv

Python

Code for SilentCipher, a method of embedding watermarking in any audio with state-of-the-art robustness.
Currently this repository supports audio at 16kHz and 44.1kHz.
Checkout our paper for more details.

We have posted some examples from existing watermarking algorithms and how they compare to our watermarking method at EXAMPLES

[arXiv] [Colab notebook] [🤗Hugging Face]

Summary

In this paper, we address artefacts introduces by Deep learning-based watermarking methods and introduce a way to remove the need for perceptual losses which leads to stable training allowing us to achieve SOTA in terms of both perceptual quality and robustness against distortion. Unlike previous methods which work on 16kHz sampling rate, we also showcase our results on 44.1kHz sampling rates opening the path for practical applications.

Abstract

In the realm of audio watermarking, it is challenging to simultaneously encode imperceptible messages while enhancing the message capacity and robustness. Although recent advancements in deep learning-based methods bolster the message capacity and robustness over traditional methods, the encoded messages introduce audible artefacts that restricts their usage in professional settings. In this study, we introduce three key innovations. Firstly, our work is the first deep learning-based model to integrate psychoacoustic model based thresholding to achieve imperceptible watermarks. Secondly, we introduce psuedo-differentiable compression layers, enhancing the robustness of our watermarking algorithm. Lastly, we introduce a method to eliminate the need for perceptual losses, enabling us to achieve SOTA in both robustness as well as imperceptible watermarking. Our contributions lead us to SilentCipher, a model enabling users to encode messages within audio signals sampled at 44.1kHz.

🧉 Installation

SilentCipher requires Python >=3.8.
I would recommend using a python virtual environment.

python -m venv env
source env/bin/activate

To install from PyPI:

pip install silentcipher

To install from source: Clone this repo and run the following commands:

git clone https://github.com/sony/silentcipher.git
pip install build
python -m build
pip install dist/<package>.whl

⚙️ Models

Find the latest models for 44.1kHz and 16kHz sampling rate in the release section of this repository RELEASE
The models have also been released on HuggingFace

Note: We are working to release the training code for anyone wants to build their own watermarker. Stay tuned !

🧮 Usage

SilentCipher provides a simple API to watermark and detect the watermarks from an audio sample.

We showcase it in multiple ways as shown in the examples directory.
We provide a simple flask server as documented in README_FLASK
You can also find a simple front-end and backend server which can be used to demonstrate the applications of silentcipher README_UI
Some simple demo examples are also provided in the COLAB DIR

Over here we provide an usage in python:

import librosa
import silentcipher

model = silentcipher.get_model(
    model_type='44.1k', # 16k
    device='cuda'  # use 'cpu' if you want to run it without GPUs
)
# By default the model is loaded using hugging face APIs, but you can specify the ckpt_path and config_path manually as well
# ckpt_path='Models/44_1_khz/73999_iteration', 
# config_path='Models/44_1_khz/73999_iteration/hparams.yaml',

# Encode from waveform

y, sr = librosa.load('examples/colab/test.wav', sr=None)

# The message should be in the form of five 8-bit characters, giving a total message capacity of 40 bits 

encoded, sdr = model.encode_wav(y, sr, [123, 234, 111, 222, 11])

# You can specify the message SDR (in dB) along with the encode_wav function. But this may result in unexpected detection accuracy
# encoded, sdr = model.encode_wav(y, sr, [123, 234, 111, 222, 11], message_sdr=47)

# You should set phase_shift_decoding to True when you want the decoder to be robust to audio crops.
# !Warning, this can increase the decode time quite drastically.

result = model.decode_wav(encoded, sr, phase_shift_decoding=False)

print(result['status'])
print(result['messages'][0] == [123, 234, 111, 222, 11])
print(result['confidences'][0])

# Encode from filename

# The message should be in the form of five 8-bit characters, giving a total message capacity of 40 bits 

model.encode('examples/colab/test.wav', 'examples/colab/encoded.wav', [123, 234, 111, 222, 11])

# You can specify the message SDR (in dB) along with the encode function. But this may result in unexpected detection accuracy
# model.encode('test.wav', 'encoded.wav', [123, 234, 111, 222, 11], message_sdr=47)

# You should set phase_shift_decoding to True when you want the decoder to be robust to audio crops.
# !Warning, this can increase the decode time quite drastically.

result = model.decode('examples/colab/encoded.wav', phase_shift_decoding=False)

print(result['messages'][0] == [123, 234, 111, 222, 11], result['messages'][0])
print(result['confidences'][0])

Demo Programs

  1. Python demo program with more detailed usage
  2. Colab Google
  3. A standalone flask server
  4. A demo project management UI based on angular + django + flask

Want to contribute?

We welcome Pull Requests with improvements or suggestions. If you want to flag an issue or propose an improvement, but dont' know how to realize it, create a GitHub Issue.

License

  • The code in this repository is released under the MIT license as found in the LICENSE file.

Maintainers:

Citation

If you find this repository useful, please consider giving a star ⭐ and please cite as:

@inproceedings{singh24_interspeech,
  author={Mayank Kumar Singh and Naoya Takahashi and Weihsiang Liao and Yuki Mitsufuji},
  title={{SilentCipher: Deep Audio Watermarking}},
  year=2024,
  booktitle={Proc. INTERSPEECH 2024},
}