Welcome to our repository. This project was completed for CS 685 (Spring 2023). The goal of this project is to generate clips of the Trash Taste Podcast. Each folder specifies one aspect of the data pipeline.
There are many different subroutines in this repository, and I would recommend only installing the packages needed for that subroutine to avoid package conflicts.
- Install Python 3.10.X
- Setup a virtual environment:
python -m venv .venv
- Activate virtual environment. Command is platform dependent.
- Install PyTorch. Make sure to use the command on their site. It is platform dependent.
- Install other requirements as seen in the README of the pipeline component you are running.
- Data download
- Create directory
raw_data/
, move into the directory. - Run
yt-dlp.exe --yes-playlist --write-sub --write-auto-sub --sub-lang "en.*" --sub-format json3 -f m4a https://www.youtube.com/playlist?list=PLUHmmIt9sU6i4JlDABqLeWybD_ZrJf9LB
- Create directory
- Data preprocessing (look in
data_prep/
)- Speech diarization
- Speaker naming tool
- Dataset formulation
- Training the models and generating transcripts (look in
modeling/
)- Training and generation code for GPT2-Small, GPT2-Med, Bloom 560M, and LLaMA 7B
- Voice cloning and audio performance generation (look in
voice_cloning/
)- Transcript parser
- TTS with voice cloning using tortoise-tts
- Sample text generations are in
test/
- Evaluation
- LLaMA 7B Clip
- Bloom 560M Clip
- GPT2-Medium Clip