This repository hosts the official PyTorch implementation of the paper: "\CoAS: Composite Audio Steganography Based on Text and Speech Synthesis" (Accepted by IEEE TIFS 2025).
We propose Composite Audio Steganography (CoAS), a method based on text and speech synthesis that leverages the multi-carrier characteristic of audio data by utilizing side-channel information from the transcript to facilitate the steganographic process. We first maps the secret message to Gaussian noise in a distribution-preserving manner and embeds it into the generation process of a diffusion model audio sequence. To address the reduced audio diversity caused by using a fixed random seed as a key, we embed the key into the audio text, which is then retrieved by the receiver via speech recognition. This approach allows the system to randomly select a key for each transmission, ensuring both accurate message extraction and the diversity of the generated audio for enhanced concealment.
We will gradually split and implement the modules used in the CoAS system.
In the CoAS system, you can choose any provably secure linguistic steganography to embed the random number seed into the audio text. We will not go into details here, recommending one of them, Discop.
The audio steganography module in CoAS is based on FastDiff and ProDiff, and implemented in the text-to-speech (TTS) task.
git clone https://github.com/meterial/CoAS.git
cd CoAS
conda env create -f environment.yml
conda activate coas
We directly use the pre-trained audio generation models provided by Rongjie Huang. You can also train your own model according to the instructions in FastDiff and ProDiff and put your checkpoints in checkpoints/$method_name$/model_ckpt_steps_.ckpt
── checkpoints/
├── FastDiff
│ ├──config.yaml
│ └──model_ckpt_steps_500000.ckpt
├── ProDiff
│ ├──config.yaml
│ └──model_ckpt_steps_200000.ckpt
└── ProDiff_Teacher
├──config.yaml
└──model_ckpt_steps_188000.ckpt
In the message embedding phase of the CoAS, in addition to the secret message, the sender also need the audio text and random number seed used in the provably secure linguisitc steganography above.
python inference/CoAS.py embed --text $audio_text$ --message $secret_message$ --seed $random_number_seed$
The stego audio will be stored in the folder inferout/$audio_text$.wav.
During the message extraction phase of CoAS, the receiver needs to use the same audio text and random number seed as the sender to ensure correct extraction.
python inference/CoAS.py extract --text $audio_text$ --audio $stego_audio_path$ --seed $random_number_seed$
The audio text can be recognised by the following speech recognition method, and the random number seed can be extracted from the audio text by the above provably secure linguisitc steganography algorithm.
In the CoAS, we used existing speech recognition models such as parakeet and whisper. You can simply run the speech recognition by running the following command.
python speech_reco/asr.py parakeet -a $audio$
python speech_reco/asr.py whisper -a $audio$
- The default
payload=4, you can change it inmodules/FastDiff/module/util.py. Please keep the payload the same when embedding and extracting, otherwise the secret message will not be able to be extracted. - The sampling rate of the audio files generated by the diffusion models is 22.05kHz.
We heavily borrow the code from FastDiff and ProDiff. We appreciate the authors for sharing their code.
If you find our work useful for your research, please consider citing the following paper:
@ARTICLE{11036088,
author={Li, Yiming and Chen, Kejiang and Wang, Yaofei and Zhang, Xin and Wang, Guanjie and Zhang, Weiming and Yu, Nenghai},
journal={IEEE Transactions on Information Forensics and Security},
title={CoAS: Composite Audio Steganography Based on Text and Speech Synthesis},
year={2025},
volume={20},
number={},
pages={5978-5991},
keywords={Steganography;Diffusion models;Security;Speech synthesis;Receivers;Noise reduction;Gaussian noise;Entropy;Reviews;Linguistics;Steganography;provably secure;text;audio;diffusion model},
doi={10.1109/TIFS.2025.3579581} }