Skip to content

Fully automated AI-generated fictional crime news stories in the style of a certain Estonian 90s TV show

License

Notifications You must be signed in to change notification settings

sisalik/politsaikroonika

Repository files navigation

politsAIkroonika

Python version Python version License

About

This is the code behind the politsAIkroonika project on Instagram and YouTube. With just a single command, you can produce a short video clip featuring a fictional crime news story in the style of a certain Estonian 90s TV show. The story, audio and video are all 100% AI-generated using various models. The Estonian text-to-speech model used for the news reporter's voice has been custom trained for maximum authenticity.

A brief overview of the process:

  1. Generate story title, summary and script using OpenAI GPT-3.5 and GPT-4
  2. Convert script to audio using Voice Cloning App by BenAAndrew
  3. Generate video clips to illustrate the story using ModelScope Text-to-Video
  4. Enhance the video using Topaz Video AI (optional but highly recommended - improve resolution and frame rate)
  5. Merge video clips, audio and subtitles using ffmpeg
  6. Upload to Google Drive (optional - for the convenience of sharing the clip)

Installation

Installing the package and its dependencies is a bit more involved than usual due to the need to install and configure the Voice Cloning App and Topaz Video AI. The following instructions are for Windows, but should be easily adaptable to Linux.

Prerequisites

  • Python 3.8 or 3.9
  • Poetry (tested with 1.6.1)
  • ffmpeg
  • NVIDIA GPU with at least 8 GB of VRAM (tested on a GTX 1070)
  • OpenAI account with API key (paid subscription required, but the cost is a few cents per episode)

Optional:

  • Topaz Video AI (tested with version 3.2.0)
    Whilst this is paid software, it is currently the best available option for frame interpolation and upscaling. Open source options do exist (RIFE/CAIN/DAIN etc) but would require additional development to implement.

Installing the package

Clone the repository and install the dependencies using Poetry:

poetry install

Voice-Cloning-App

Voice Cloning App is used for the text-to-speech functionality and is executed under its own virtual environment. This is because it requires specific versions of various libraries that may conflict with the versions required by this package.

Follow the manual install instructions here, except install the requirements into a virtual environment under /Voice-Cloning-App/.venv:

cd Voice-Cloning-App
python -m venv .venv
.venv\Scripts\activate  # Or the Linux equivalent
pip install -r requirements.txt

Environment variables

The code requires several environment variables to be configured. You may choose to set these in your system environment variables, or in a .env file in the root of the repository. For the latter option, you need to install the Poetry dotenv plugin.

The following environment variables are required:

  • OPENAI_API_KEY - get from your OpenAI account (instructions here)
  • The ffmpeg executable must be in your PATH variable

If you are using Topaz Video AI, the following environment variables are also required:

  • TVAI_MODEL_DIR and TVAI_MODEL_DATA_DIR - set according to instructions here
  • TVAI_FFMPEG - set to the path of the ffmpeg executable in your Topaz Video AI installation (e.g. C:\Program Files\Topaz Labs LLC\Topaz Video AI\ffmpeg.exe)

If you are using Google Drive, the following environment variable is also required:

  • GOOGLE_DRIVE_FOLDER_ID - the ID of the Google Drive folder where the videos will be uploaded. This is a long string of letters and numbers that can be found in the URL of the folder in Google Drive.

Models

Once everything has been installed, you will need to download and place the below models in the correct directories (relative to the Voice-Cloning-App directory). If the directories do not exist, create them.

  • Voice model - download from here and place in data/models/reporter
  • Vocoder model - download from here, rename from g_02500000 to model.pt and place in data/hifigan/vctk
  • Vocoder model config file - download from here and place in data/hifigan/vctk
  • Alphabet file - copy from alphabets/Estonian.txt to data/languages/Estonian and rename to alphabet.txt

The text-to-video model is automatically downloaded by the code.

For Topaz Video AI, if you have a fresh install, you may need to run the GUI first to download the required models. Simply load a video file and process it using the same models that the code uses:

  • Apollo v8 (apo-8) - frame interpolation
  • Theia Fine Tune Detail v3 (thd-3) - upscaling

Usage

If everything has been installed correctly, you should be able to run the following command to generate a new episode:

poetry run python .\politsaikroonika\make_episode.py

The above command will generate a new episode using the default settings. You can customise the episode using various command line arguments. For example, to avoid the topics of animals, theft, stealing and robbery, and to include fireworks and "new year's celebration", you can run the following command:

poetry run python .\politsaikroonika\make_episode.py -v --interactive --avoid animals,theft,stealing,robbery --include fireworks --include "new year's celebration"

The -v flag is for verbose output, and the --interactive flag is for interactive mode, which will prompt you to confirm the generated text parts before proceeding, and lets you override them if you wish.

More information on the available command line arguments can be found by running:

poetry run python .\politsaikroonika\make_episode.py --help

Training

If you are interested in training your own text-to-speech model, you can follow the instructions in the Voice Cloning App repository. For reference, the training data used for the Estonian model included over 1000 sentences with a total duration of around 1.5 hours. The training took approximately 2 days on a GTX 1070.

To gather the training data, I processed all publicly available clips of the original TV show and extracted the audio track. Then, I transcribed the audio using tekstiks.ee (with a fair amount of manual corrections) and used split_audio.py and various scripts under scripts to split the audio into individual sentences. Background noise was removed using OpenVINO's noise-suppression-poconetlike-0001 model. Finally, the audio was upsampled using NU-Wave2.

About

Fully automated AI-generated fictional crime news stories in the style of a certain Estonian 90s TV show

Topics

Resources

License

Stars

Watchers

Forks