Skip to content

lin9x/AV-Sepformer

Repository files navigation

AV-SepFormer

This Git repository for the official PyTorch implementation of """AV-SepFormer: Cross-attention SepFormer for Audio-Visual Target Speaker Extraction", accepted by ICASSP 2023.

📜[Full Paper] ▶[Demo] 💿[Checkpoint]

Requirements

  • Linux

  • python >= 3.8

  • Anaconda or Miniconda

  • NVIDIA GPU + CUDA CuDNN (CPU can also be supported)

Environment && Installation

Install Anaconda or Miniconda, and then install conda and pip packages:

# Create conda environment
conda create --name av_sep python=3.8
conda activate av_sep

# Install required packages
pip install -r requiremens.txt

Start Up

Clone the repository:

git clone https://github.com/lin9x/AV-Sepformer.git
cd AV-Sepformer

Data preparation

Scripts to preprocess the voxceleb2 datasets is the same as which in MuSE. You can dirctly go to this repository to preprocess your data. Pairs of our data is in data_list *

Training

First, you need to modify the various configurations in config/avsepformer.yaml for training.

Then you can run training:

source activate av_sep
CUDA_VISIBILE_DEVISIBLE=0,1 python3 run_avsepformer.py run config/avsepformer.yaml

If you want to train other audio-visual speech separation systems, AV-ConvTasNet and MuSE is available in our repo. Just turn to the run_system.py and config/system.yaml to train your own model.

References

The data preparation follows the operation in MuSE Github Repository.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages