Merge pull request Rudrabha#1 from Rudrabha/master

pull base
laurence-dorman · Jan 11, 2021 · 5475924 · 5475924
2 parents 7ea7ef8 + 143e969
commit 5475924
Show file tree

Hide file tree

Showing 12 changed files with 56,290 additions and 20 deletions.
diff --git a/README.md b/README.md
@@ -13,12 +13,11 @@ This code is part of the paper: _A Lip Sync Expert Is All You Need for Speech to
 ----------
 **Highlights**
 ----------
- - Lip-sync videos to any target speech with high accuracy. Try our [interactive demo](https://bhaasha.iiit.ac.in/lipsync).
- - Works for any identity, voice, and language. Also works for CGI faces and synthetic voices.
- - Complete training code, inference code, and pretrained models are available.
- - Or, quick-start with the Google Colab Notebook: [Link](https://colab.research.google.com/drive/1tZpDWXz49W6wDcTprANRGLo2D_EbD5J8?usp=sharing). Checkpoints and samples are available in a Google Drive [folder](https://drive.google.com/drive/folders/1I-0dNLfFOSFwrfqjNa-SXuwaURHE5K4k?usp=sharing) as well.
- - Several new, reliable evaluation benchmarks and metrics [[`evaluation/` folder of this repo]](https://github.com/Rudrabha/Wav2Lip/tree/master/evaluation) released.
- - Code to calculate metrics reported in the paper is also made available.
+ - Lip-sync videos to any target speech with high accuracy :100:. Try our [interactive demo](https://bhaasha.iiit.ac.in/lipsync).
+ - :sparkles: Works for any identity, voice, and language. Also works for CGI faces and synthetic voices.
+ - Complete training code, inference code, and pretrained models are available :boom:
+ - Or, quick-start with the Google Colab Notebook: [Link](https://colab.research.google.com/drive/1tZpDWXz49W6wDcTprANRGLo2D_EbD5J8?usp=sharing). Checkpoints and samples are available in a Google Drive [folder](https://drive.google.com/drive/folders/1I-0dNLfFOSFwrfqjNa-SXuwaURHE5K4k?usp=sharing) as well. There is also a [tutorial video](https://www.youtube.com/watch?v=Ic0TBhfuOrA) on this, courtesy of [What Make Art](https://www.youtube.com/channel/UCmGXH-jy0o2CuhqtpxbaQgA).   
+ - :fire: :fire: Several new, reliable evaluation benchmarks and metrics [[`evaluation/` folder of this repo]](https://github.com/Rudrabha/Wav2Lip/tree/master/evaluation) released. Instructions to calculate the metrics reported in the paper are also present.
 
 --------
 **Disclaimer**
@@ -27,9 +26,9 @@ All results from this open-source code or our [demo website](https://bhaasha.iii
 
 Prerequisites
 -------------
-- `Python 3.5.2` (code has been tested with this version at our end, but several other users say that `3.6+` is the one that works instead.)
+- `Python 3.6` 
 - ffmpeg: `sudo apt-get install ffmpeg`
-- Install necessary packages using `pip install -r requirements.txt`
+- Install necessary packages using `pip install -r requirements.txt`. Alternatively, instructions for using a docker image is provided [here](https://gist.github.com/xenogenesi/e62d3d13dadbc164124c830e9c453668). Have a look at [this comment](https://github.com/Rudrabha/Wav2Lip/issues/131#issuecomment-725478562) and comment on [the gist](https://gist.github.com/xenogenesi/e62d3d13dadbc164124c830e9c453668) if you encounter any issues. 
 - Face detection [pre-trained model](https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth) should be downloaded to `face_detection/detection/sfd/s3fd.pth`. Alternative [link](https://iiitaphyd-my.sharepoint.com/:u:/g/personal/prajwal_k_research_iiit_ac_in/EZsy6qWuivtDnANIG73iHjIBjMSoojcIV0NULXV-yiuiIg?e=qTasa8) if the above does not work.
 
 Getting the weights
@@ -119,7 +118,7 @@ Will be updated.
 
 License and Citation
 ----------
-The software can only be used for personal/research/non-commercial purposes. Please cite the following paper if you have use this code:
+The software can only be used for personal/research/non-commercial purposes. Please cite the following paper if you use this code:
 ```
 @inproceedings{10.1145/3394171.3413532,
 author = {Prajwal, K R and Mukhopadhyay, Rudrabha and Namboodiri, Vinay P. and Jawahar, C.V.},

diff --git a/audio.py b/audio.py
@@ -1,7 +1,7 @@
 import librosa
 import librosa.filters
 import numpy as np
-import tensorflow as tf
+# import tensorflow as tf
 from scipy import signal
 from scipy.io import wavfile
 from hparams import hparams as hp

diff --git a/evaluation/README.md b/evaluation/README.md
@@ -1,8 +1,12 @@
-# Evaluation of Lip-sync using LSE-D and LSE-C metric.
+# Novel Evaluation Framework, new filelists, and using the LSE-D and LSE-C metric.
 
-We use the pre-trained syncnet model available in this [repository](https://github.com/joonson/syncnet_python). 
+Our paper also proposes a novel evaluation framework (Section 4). To evaluate on LRS2, LRS3, and LRW, the filelists are present in the `test_filelists` folder. Please use `gen_videos_from_filelist.py` script to generate the videos. After that, you can calculate the LSE-D and LSE-C scores using the instructions below. Please see [this thread](https://github.com/Rudrabha/Wav2Lip/issues/22#issuecomment-712825380) on how to calculate the FID scores. 
+
+The videos of the ReSyncED benchmark for real-world evaluation will be released soon. 
 
 ### Steps to set-up the evaluation repository for LSE-D and LSE-C metric:
+We use the pre-trained syncnet model available in this [repository](https://github.com/joonson/syncnet_python). 
+
 * Clone the SyncNet repository.
 ``` 
 git clone https://github.com/joonson/syncnet_python.git 

diff --git a/evaluation/scores_LSE/calculate_scores_real_videos.sh b/evaluation/scores_LSE/calculate_scores_real_videos.sh
@@ -1,7 +1,8 @@
 rm all_scores.txt
 yourfilenames=`ls $1`
+
 for eachfile in $yourfilenames
 do
-   python run_pipeline.py --videofile $eachfile --reference wav2lip --data_dir tmp_dir
-   python calculate_scores_real_videos.py --videofile $eachfile --reference wav2lip --data_dir /ssd_scratch/cvit/prajwalkr_rudra/tmp_dir >> all_scores.txt
+   python run_pipeline.py --videofile $1/$eachfile --reference wav2lip --data_dir tmp_dir
+   python calculate_scores_real_videos.py --videofile $1/$eachfile --reference wav2lip --data_dir tmp_dir >> all_scores.txt
 done
diff --git a/evaluation/test_filelists/README.md b/evaluation/test_filelists/README.md
@@ -1 +1,13 @@
-Place LRS2, LRW, and LRS3 (and any other) test set filelists here.
+This folder contains the filelists for the new evaluation framework proposed in the paper. 
+
+## Test filelists for LRS2, LRS3, and LRW.
+
+This folder contains three filelists, each containing a list of names of audio-video pairs from the test sets of LRS2, LRS3, and LRW. The LRS2 and LRW filelists are strictly "Copyright BBC" and can only be used for “non-commercial research by applicants who have an agreement with the BBC to access the Lip Reading in the Wild and/or Lip Reading Sentences in the Wild datasets”. Please follow this link for more details: [https://www.bbc.co.uk/rd/projects/lip-reading-datasets](https://www.bbc.co.uk/rd/projects/lip-reading-datasets). 
+
+
+## ReSynCED benchmark
+
+The sub-folder `ReSynCED` contains filelists for our own Real-world lip-Sync Evaluation Dataset (ReSyncED).
+
+
+#### Instructions on how to use the above two filelists are available in the README of the parent folder.