Billion Songs, AI-powered song lyrics generator
NOTE: this repository has git submodules. So clone with --recurse-submodules. Learn about them here.
See the blog post Writing billion songs with C# and Deep Learning for a detailed explanation how it works.
This project mainly serves as a demonstration of Gradient, our TensorFlow binding for C# and other .NET languages.
What is it, and how does it work?
This is a deep learning-powered song lyrics generator, based on GPT-2, wrapped as a ASP.NET Core website.
It generates songs word by word (or rather token by token), using the statistical relationships learned by a deep learning model, called GPT-2. The actual generator code is in GradientTextGenerator class.
Text generation is pretty slow even with a powerful GPU, so we have a bunch of caches in /Web to provide a better user experience. There is also PregeneratedSongProvider, which continuously creates new texts in the background to ensure clicking "Make Random" button gives an instant result.
Detailed explanation in a blog post
- Download and install Python and TensorFlow as described in Gradient documentation
- Install Python package, called
python -m pip install regex --user)
- Install the latest .NET Core SDK
- After cloning the repository, enter the
Webfolder and run
dotnet ef database update. That should create
songs.dbfile in the same directory.
appsettings.Development.jsonfor an example):
"DefaultConnection": "Data Source=songs.db"
dotnet run web. This should print some logs. Wait for
Now listening on: http://, then open that URL in the browser. It will take up to 4 minutes to generate the first song.
NOTE: training requires a lot of RAM (>16GB), and will be slow on non-GPU
- Download the original 117M GPT-2 model by running one of download_model.* scripts in External/Gradient-Samples/GPT-2 from the same directory.
- Download and extract any lyrics dataset (I used Every song you have heard (almost)!), and unpack it if needed.
- From the command line in the same directory (GPT-2), run
dotnet run train --include *.csv --column Lyrics path/to/lyrics/folder --run Lyrics(change the
columnparameter to the name of the lyrics column in you dataset)
NOTE: dev instance was trained with
train -i "*.csv" --column=Lyrics Downloads\every-song-you-have-heard-almost -r Lyrics --checkpoint=fresh --save-every=100 -n 3. If training from IDE, set working directory to GPT-2 (which should contain
modelssubfolder downloaded previously).
- Interrupt training process, when samples start looking good.
- Try the trained model by running
dotnet run --run Lyrics