Skip to content

Commit

Permalink
Merge pull request Rudrabha#1 from Rudrabha/master
Browse files Browse the repository at this point in the history
pull base
  • Loading branch information
eyaler committed Jan 11, 2021
2 parents 7ea7ef8 + 143e969 commit 5475924
Show file tree
Hide file tree
Showing 12 changed files with 56,290 additions and 20 deletions.
17 changes: 8 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,11 @@ This code is part of the paper: _A Lip Sync Expert Is All You Need for Speech to
----------
**Highlights**
----------
- Lip-sync videos to any target speech with high accuracy. Try our [interactive demo](https://bhaasha.iiit.ac.in/lipsync).
- Works for any identity, voice, and language. Also works for CGI faces and synthetic voices.
- Complete training code, inference code, and pretrained models are available.
- Or, quick-start with the Google Colab Notebook: [Link](https://colab.research.google.com/drive/1tZpDWXz49W6wDcTprANRGLo2D_EbD5J8?usp=sharing). Checkpoints and samples are available in a Google Drive [folder](https://drive.google.com/drive/folders/1I-0dNLfFOSFwrfqjNa-SXuwaURHE5K4k?usp=sharing) as well.
- Several new, reliable evaluation benchmarks and metrics [[`evaluation/` folder of this repo]](https://github.com/Rudrabha/Wav2Lip/tree/master/evaluation) released.
- Code to calculate metrics reported in the paper is also made available.
- Lip-sync videos to any target speech with high accuracy :100:. Try our [interactive demo](https://bhaasha.iiit.ac.in/lipsync).
- :sparkles: Works for any identity, voice, and language. Also works for CGI faces and synthetic voices.
- Complete training code, inference code, and pretrained models are available :boom:
- Or, quick-start with the Google Colab Notebook: [Link](https://colab.research.google.com/drive/1tZpDWXz49W6wDcTprANRGLo2D_EbD5J8?usp=sharing). Checkpoints and samples are available in a Google Drive [folder](https://drive.google.com/drive/folders/1I-0dNLfFOSFwrfqjNa-SXuwaURHE5K4k?usp=sharing) as well. There is also a [tutorial video](https://www.youtube.com/watch?v=Ic0TBhfuOrA) on this, courtesy of [What Make Art](https://www.youtube.com/channel/UCmGXH-jy0o2CuhqtpxbaQgA).
- :fire: :fire: Several new, reliable evaluation benchmarks and metrics [[`evaluation/` folder of this repo]](https://github.com/Rudrabha/Wav2Lip/tree/master/evaluation) released. Instructions to calculate the metrics reported in the paper are also present.

--------
**Disclaimer**
Expand All @@ -27,9 +26,9 @@ All results from this open-source code or our [demo website](https://bhaasha.iii

Prerequisites
-------------
- `Python 3.5.2` (code has been tested with this version at our end, but several other users say that `3.6+` is the one that works instead.)
- `Python 3.6`
- ffmpeg: `sudo apt-get install ffmpeg`
- Install necessary packages using `pip install -r requirements.txt`
- Install necessary packages using `pip install -r requirements.txt`. Alternatively, instructions for using a docker image is provided [here](https://gist.github.com/xenogenesi/e62d3d13dadbc164124c830e9c453668). Have a look at [this comment](https://github.com/Rudrabha/Wav2Lip/issues/131#issuecomment-725478562) and comment on [the gist](https://gist.github.com/xenogenesi/e62d3d13dadbc164124c830e9c453668) if you encounter any issues.
- Face detection [pre-trained model](https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth) should be downloaded to `face_detection/detection/sfd/s3fd.pth`. Alternative [link](https://iiitaphyd-my.sharepoint.com/:u:/g/personal/prajwal_k_research_iiit_ac_in/EZsy6qWuivtDnANIG73iHjIBjMSoojcIV0NULXV-yiuiIg?e=qTasa8) if the above does not work.

Getting the weights
Expand Down Expand Up @@ -119,7 +118,7 @@ Will be updated.

License and Citation
----------
The software can only be used for personal/research/non-commercial purposes. Please cite the following paper if you have use this code:
The software can only be used for personal/research/non-commercial purposes. Please cite the following paper if you use this code:
```
@inproceedings{10.1145/3394171.3413532,
author = {Prajwal, K R and Mukhopadhyay, Rudrabha and Namboodiri, Vinay P. and Jawahar, C.V.},
Expand Down
2 changes: 1 addition & 1 deletion audio.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import librosa
import librosa.filters
import numpy as np
import tensorflow as tf
# import tensorflow as tf
from scipy import signal
from scipy.io import wavfile
from hparams import hparams as hp
Expand Down
8 changes: 6 additions & 2 deletions evaluation/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
# Evaluation of Lip-sync using LSE-D and LSE-C metric.
# Novel Evaluation Framework, new filelists, and using the LSE-D and LSE-C metric.

We use the pre-trained syncnet model available in this [repository](https://github.com/joonson/syncnet_python).
Our paper also proposes a novel evaluation framework (Section 4). To evaluate on LRS2, LRS3, and LRW, the filelists are present in the `test_filelists` folder. Please use `gen_videos_from_filelist.py` script to generate the videos. After that, you can calculate the LSE-D and LSE-C scores using the instructions below. Please see [this thread](https://github.com/Rudrabha/Wav2Lip/issues/22#issuecomment-712825380) on how to calculate the FID scores.

The videos of the ReSyncED benchmark for real-world evaluation will be released soon.

### Steps to set-up the evaluation repository for LSE-D and LSE-C metric:
We use the pre-trained syncnet model available in this [repository](https://github.com/joonson/syncnet_python).

* Clone the SyncNet repository.
```
git clone https://github.com/joonson/syncnet_python.git
Expand Down
5 changes: 3 additions & 2 deletions evaluation/scores_LSE/calculate_scores_real_videos.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
rm all_scores.txt
yourfilenames=`ls $1`

for eachfile in $yourfilenames
do
python run_pipeline.py --videofile $eachfile --reference wav2lip --data_dir tmp_dir
python calculate_scores_real_videos.py --videofile $eachfile --reference wav2lip --data_dir /ssd_scratch/cvit/prajwalkr_rudra/tmp_dir >> all_scores.txt
python run_pipeline.py --videofile $1/$eachfile --reference wav2lip --data_dir tmp_dir
python calculate_scores_real_videos.py --videofile $1/$eachfile --reference wav2lip --data_dir tmp_dir >> all_scores.txt
done
14 changes: 13 additions & 1 deletion evaluation/test_filelists/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,13 @@
Place LRS2, LRW, and LRS3 (and any other) test set filelists here.
This folder contains the filelists for the new evaluation framework proposed in the paper.

## Test filelists for LRS2, LRS3, and LRW.

This folder contains three filelists, each containing a list of names of audio-video pairs from the test sets of LRS2, LRS3, and LRW. The LRS2 and LRW filelists are strictly "Copyright BBC" and can only be used for “non-commercial research by applicants who have an agreement with the BBC to access the Lip Reading in the Wild and/or Lip Reading Sentences in the Wild datasets”. Please follow this link for more details: [https://www.bbc.co.uk/rd/projects/lip-reading-datasets](https://www.bbc.co.uk/rd/projects/lip-reading-datasets).


## ReSynCED benchmark

The sub-folder `ReSynCED` contains filelists for our own Real-world lip-Sync Evaluation Dataset (ReSyncED).


#### Instructions on how to use the above two filelists are available in the README of the parent folder.
Loading

0 comments on commit 5475924

Please sign in to comment.