Billion Songs, AI-powered song lyrics generator

NOTE: this repository has git submodules. So clone with --recurse-submodules. Learn about them here.

See the blog post Writing billion songs with C# and Deep Learning for a detailed explanation how it works.

This project mainly serves as a demonstration of Gradient, our TensorFlow binding for C# and other .NET languages.

What is it, and how does it work?

This is a deep learning-powered song lyrics generator, based on GPT-2, wrapped as a ASP.NET Core website.

It generates songs word by word (or rather token by token), using the statistical relationships learned by a deep learning model, called GPT-2. The actual generator code is in GradientTextGenerator class.

Text generation is pretty slow even with a powerful GPU, so we have a bunch of caches in /Web to provide a better user experience. There is also PregeneratedSongProvider, which continuously creates new texts in the background to ensure clicking "Make Random" button gives an instant result.

Detailed explanation in a blog post

Prerequisites

Download and install Python and TensorFlow as described in Gradient documentation
Install Python package, called regex (python -m pip install regex --user)
Install the latest .NET Core SDK

Run instructions

Clone the repository and enter the Web folder
.NET Core 3+ only: ensure you have Entity Framework tool installed: dotnet tool install --global dotnet-ef
After cloning the repository, enter the Web folder and run dotnet ef database update. That should create songs.db file in the same directory.
Edit appsettings.json (see appsettings.Development.json for an example):
- add "DB": "sqlite"
- modify DefaultConnection to "DefaultConnection": "Data Source=songs.db"
- ensure that Generator is not dummy, if you want lyrics to actually be generated
Run dotnet run web. This should print some logs. Wait for Now listening on: http://, then open that URL in the browser. It will take up to 4 minutes to generate the first song.

NOTE: if you see "Can't choose between the following Python environments, as they are equally matching", set PYTHON_CONDA_ENV_NAME to the name of Conda environment where you installed TensorFlow and regex modules.

Train instructions

NOTE: training requires a lot of RAM (>16GB), and will be slow on non-GPU

Download the original 117M GPT-2 model by running one of download_model.* scripts in External/Gradient-Samples/GPT-2 from the same directory.
Download and extract any lyrics dataset (I used Every song you have heard (almost)!), and unpack it if needed.
From the command line in the same directory (GPT-2), run dotnet run train --include *.csv --column Lyrics path/to/lyrics/folder --run Lyrics (change the column parameter to the name of the lyrics column in you dataset)

NOTE: dev instance was trained with train -i "*.csv" --column=Lyrics Downloads\every-song-you-have-heard-almost -r Lyrics --checkpoint=fresh --save-every=100 -n 3. If training from IDE, set working directory to GPT-2 (which should contain models subfolder downloaded previously).

Interrupt training process, when samples start looking good.
Try the trained model by running dotnet run --run Lyrics

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
External		External
Web		Web
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
BillionSongs.sln		BillionSongs.sln
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Billion Songs, AI-powered song lyrics generator

What is it, and how does it work?

Prerequisites

Run instructions

Train instructions

About

Releases

Packages

Languages

License

losttech/BillionSongs

Folders and files

Latest commit

History

Repository files navigation

Billion Songs, AI-powered song lyrics generator

What is it, and how does it work?

Prerequisites

Run instructions

Train instructions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages