Generate realistic news videos solely by providing your transcript. We summarize your news and generate a video of a speaker announcing your news.
- Text Summarization
- Text Translation
- Speech Synthesis
- Video Generation
- Lip Synchronization
- Multiple Language Support
- Make Your Own Speaker
We use HuggingFace Transformers for Natural Language Understanding tasks and then generate audio with NVIDIA's tacotron2. We use the audio on predefined video material and adjust the speakers' lips using Wav2Lip.
The processing pipeline is integrated in FastAPI and gets accessed by a small React frontend. The entire application can be started using docker-compose.
docker-compose up