Skip to content

meterial/CoAS

Repository files navigation

CoAS: Composite Audio Steganography Based on Text and Speech Synthesis

This repository hosts the official PyTorch implementation of the paper: "\CoAS: Composite Audio Steganography Based on Text and Speech Synthesis" (Accepted by IEEE TIFS 2025).

Method

method We propose Composite Audio Steganography (CoAS), a method based on text and speech synthesis that leverages the multi-carrier characteristic of audio data by utilizing side-channel information from the transcript to facilitate the steganographic process. We first maps the secret message to Gaussian noise in a distribution-preserving manner and embeds it into the generation process of a diffusion model audio sequence. To address the reduced audio diversity caused by using a fixed random seed as a key, we embed the key into the audio text, which is then retrieved by the receiver via speech recognition. This approach allows the system to randomly select a key for each transmission, ensuring both accurate message extraction and the diversity of the generated audio for enhanced concealment.

Getting Started

We will gradually split and implement the modules used in the CoAS system.

Provably Secure Linguistic Steganography

In the CoAS system, you can choose any provably secure linguistic steganography to embed the random number seed into the audio text. We will not go into details here, recommending one of them, Discop.

Audio Steganography

The audio steganography module in CoAS is based on FastDiff and ProDiff, and implemented in the text-to-speech (TTS) task.

git clone https://github.com/meterial/CoAS.git
cd CoAS
conda env create -f environment.yml 
conda activate coas

We directly use the pre-trained audio generation models provided by Rongjie Huang. You can also train your own model according to the instructions in FastDiff and ProDiff and put your checkpoints in checkpoints/$method_name$/model_ckpt_steps_.ckpt

── checkpoints/
    ├── FastDiff
    │   ├──config.yaml
    │   └──model_ckpt_steps_500000.ckpt
    ├── ProDiff
    │   ├──config.yaml
    │   └──model_ckpt_steps_200000.ckpt
    └── ProDiff_Teacher
        ├──config.yaml
        └──model_ckpt_steps_188000.ckpt

Message embedding

In the message embedding phase of the CoAS, in addition to the secret message, the sender also need the audio text and random number seed used in the provably secure linguisitc steganography above.

python inference/CoAS.py embed --text $audio_text$ --message $secret_message$ --seed $random_number_seed$

The stego audio will be stored in the folder inferout/$audio_text$.wav.

Message Extraction

During the message extraction phase of CoAS, the receiver needs to use the same audio text and random number seed as the sender to ensure correct extraction.

python inference/CoAS.py extract --text $audio_text$ --audio $stego_audio_path$ --seed $random_number_seed$

The audio text can be recognised by the following speech recognition method, and the random number seed can be extracted from the audio text by the above provably secure linguisitc steganography algorithm.

Speech Recognition

In the CoAS, we used existing speech recognition models such as parakeet and whisper. You can simply run the speech recognition by running the following command.

python speech_reco/asr.py parakeet -a $audio$
python speech_reco/asr.py whisper -a $audio$

Additional Notes

  • The default payload=4, you can change it in modules/FastDiff/module/util.py. Please keep the payload the same when embedding and extracting, otherwise the secret message will not be able to be extracted.
  • The sampling rate of the audio files generated by the diffusion models is 22.05kHz.

Acknowledgements

We heavily borrow the code from FastDiff and ProDiff. We appreciate the authors for sharing their code.

Ciation

If you find our work useful for your research, please consider citing the following paper:

@ARTICLE{11036088,
 author={Li, Yiming and Chen, Kejiang and Wang, Yaofei and Zhang, Xin and Wang, Guanjie and Zhang, Weiming and Yu, Nenghai},
 journal={IEEE Transactions on Information Forensics and Security}, 
 title={CoAS: Composite Audio Steganography Based on Text and Speech Synthesis}, 
 year={2025},
 volume={20},
 number={},
 pages={5978-5991},
 keywords={Steganography;Diffusion models;Security;Speech synthesis;Receivers;Noise reduction;Gaussian noise;Entropy;Reviews;Linguistics;Steganography;provably secure;text;audio;diffusion model},
 doi={10.1109/TIFS.2025.3579581} }

About

The source data for TIFS "CoAS: Composite Audio Steganography Based on Text and Speech Synthesis"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages