movie-asd

Character diarization: This repository now supports character grouping which utilizes character faces and character speech information.

Hi there! This repo provides the code and setup for

Cross-modal identity association (CMIA) framework for active speaker detection.
Audio-visual activity guided CMIA for active speaker detection.

This setup use the TalkNet as the source for audio-visual activity information.

For a quick read on the technical aspects and video illustrations refer to the blog on medium. For queries regarding the setup reach out to rahul.sharma@usc.edu

Setup

# Recommended cuda==11.3 (cuda==11.0 also works)

# create a new conda environment
conda create --name movie_asd

# install the requirements using one of the following
# install the requirements (for cuda==11.3)
pip install -r requirements.txt

# install the requriements (for cuda==11.0)
pip install -r requirements_cu_11_0.txt

# download the required models
. setup.sh

How to run

The system runs best for the *.mp4 formatted videos. To use the unsupervised cross-modal identitiy association for active speaker detection (CMIA) use the following:

cd src
python3 main.py --videoPath <path_to_video in mp4> --cacheDir <path to store the intermediate artifacts> --partitionLength 50 --verbose

To run the setup with audio-visual activity information from TalkNet as the guides for CMIA:

cd src
python3 main.py --videoPath <path_to_video in mp4> --cacheDir <path to store the intermediate artifacts> --partitionLength 50 --talknet --verbose

The above snippet will generate a video with active speakers' faces bounded in a green bounding box while all other boxes are in the red bounding box. An example output video is shown below.

The improved performance with the use of TalkNet comes with increased processing time. In case of smaller videos (<5min) removing the field --partitionLength may improve performance with a slight increase in processing time. For the longer videos the --partitionLength is important for reasonable processing time and we recommend keeping it 50 is recommended.

Character Diarization

To generate character clusters use the --diarize flag as follows. This will generate a file characterSpeechFace.pkl in the cache directory which will have character-wise face, body and speech occurrences for all the clustered characters. It will also generate a video *_diarize.mp4 which visualizes the character diarization.

cd src
python3 main.py --videoPath <path_to_video in mp4> --cacheDir <path to store the intermediate artifacts> --partitionLength 50 --talknet --diarize

Please cite the following works if you use this framework.

@ARTICLE{10102534,
  author={Sharma, Rahul and Narayanan, Shrikanth},
  journal={IEEE Open Journal of Signal Processing}, 
  title={Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection}, 
  year={2023},
  volume={4},
  number={},
  pages={225-232},
  doi={10.1109/OJSP.2023.3267269}}


@article{sharma2022unsupervised,
  title={Unsupervised active speaker detection in media content using cross-modal information},
  author={Sharma, Rahul and Narayanan, Shrikanth},
  journal={arXiv preprint arXiv:2209.11896},
  year={2022}
}

Character Diarization

@article{sharma2022using,
  title={Using active speaker faces for diarization in TV shows},
  author={Sharma, Rahul and Narayanan, Shrikanth},
  journal={arXiv preprint arXiv:2203.15961},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
Keras_VGGFace2_ResNet50 @ 14173cb		Keras_VGGFace2_ResNet50 @ 14173cb
Pytorch_Retinaface @ ed82644		Pytorch_Retinaface @ ed82644
TalkNet @ 6afd749		TalkNet @ 6afd749
sort @ 91b3b19		sort @ 91b3b19
src		src
voxceleb_trainer @ 4567432		voxceleb_trainer @ 4567432
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
gif_v0.gif		gif_v0.gif
requirements.txt		requirements.txt
requirements_cuda_11_0.txt		requirements_cuda_11_0.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

movie-asd

Setup

How to run

Character Diarization

About

Releases

Packages

Languages

License

rash1993/movie-asd

Folders and files

Latest commit

History

Repository files navigation

movie-asd

Setup

How to run

Character Diarization

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages