Source code of paper "SLNSpeech: solving extended speech separation problem with the help of sign language".
Due to copyright issues, we cannot directly disclose our dataset, but we will publish the features extracted using our model and some samples later.
We hope the data to be categorized by person, that is, the videos of a certain speaker are in the same directory. First, using ffmpeg to cut the videos into video frames and audios, and storing the audios into audio directory and storing the video frames into frames directory, still categorized by person. The directory structure is as follows.
-dataset
-audio
-speaker1
-speaker2
-speaker3
...
-frames
-speaker1
-speaker2
-speaker3
...
Second, through python create data/create_ Index.py
creates a csv file contaning addresses which store speaker visual frames and audio. In data/create_ In index.py
, it is necessary to set the gender information of speakers.
python main.py --list_train 'path/train.csv' --list_test 'path/test.csv'
All parameters are included in arguments.py
and can be changed according to demand.