CS230 Project: Literary Muzak

Prerequisites

Below we assume the working directory is the repository root.

Install dependencies

Create a virtual environment

conda create -n text_to_music python=3.6

Using pip

# Install the dependencies
pip install -r requirements.txt

Train the network

Folder 'midi-lstm-gan': It contains a conditional GAN with bi-directional LSTM generator/discriminator.

cd midi-lstm-gan
python mlp_gan.py

After the script ran, it generates 10 songs for each emotions (1 = Sad, 2 = Happy, 3 = Scary, 4 = Peaceful) in the "midi-lstm-gan/results" folder. And it also generates a loss plot by epochs in midi-lstm-gan folder.

Folden 'GAN' contains code for a normal LSTM model we tried, we didn't use it in the end because it didn't perform well.

Datasets

The initial dataset we used to jump start the emotion to music generation model is Music and Emotion Datasets from Harvard Dataverse. There are 200 .wav sounds corresponding to the design matrix. The design matrix has a Melody column has 4 categories of emotions for the corresponding music, i.e 1 = Sad, 2 = Happy, 3 = Scary, 4 = Peaceful.

We prepared the datasets by converting all .wav files in one folder to .mid (MIDI) suffix using the following script. The conversion tool it is based on audio_to_midi_melodia.

# WAV to midi conversion
cd scripts
./wav_to_midi.sh

We also trained the model based on the other datasets without emotion label:

Pokemon music - 307 Songs We mark the songs from Pokemon collection as "Happy"
Pop music - 88 Songs
Final Fantasy music - 92 Songs

Sample Songs

Better Examples: We encoded the emotions as 10% of the note sequence.
- happy song
- sad song
- scary song
- peaceful: song1, song2, song3,

The songs are kinds of similar as of now, our future work can be get more training data from each emotion. Also, one good side to output midi is we can convert the notes to any instrument to fit better to the emotion, in the future, we can consider the instrument in our model as well.

Bad Examples: song1. We first encoded the emotions as a large portion (50%) of the note sequence in the Train_X input the song came out with many repetitve notes.

End-to-end workflow

Text to emotion

Our current text to emotion model is based on Multi-class Emotion Classification for Short Texts. The model uses "multi-channel" combinations of convolutional kernels (ala CNN) and Long Short-Term Memory (LSTM) units to classify short text sequences (in our case, tweets) into one of five emotional classes, as opposed to the typical binary (positive/negative) or ternary (positive/negative/neutral) classes.

The model performance achieved a positive result by achieving more than 62% overall classification accuracy and precision. In particular, they have achieved good validation accuracy on happy, sad, hate and anger (91% precision!) classes.

We will improve the model performamce in the context of music as the future work. We are planning to use music lyrics to train the model as well.

Emotion to music

We train a C-GAN (Conditional GAN):

Generator, with inputs an emotion E and white noise WN, will learn to generate emotion-styled music M from tweaking the noise, to fool the Discriminator into thinking it's real music with the emotion it claims to have.
Discriminator, with input music M and emotion label E, will learn to tell two things:
- Whether the music is real or fake.
- Whether the emotion E matches the emotion from music M.

Loss vs Epochs

We trained the C-GAN on AWS EC2 instance with GPU for 1000 epochs, it took about 4 hours to finish. The discriminator and generator is kind of converage to a loss of 0.75.

Here is another loss function without using bidirectional lstm for generator. We can see with the model improvement, the generator/discriminator loss converged faster.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
GAN		GAN
data		data
graphs		graphs
midi-lstm-gan		midi-lstm-gan
scripts		scripts
text_emotion_classification_master		text_emotion_classification_master
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS230 Project: Literary Muzak

Prerequisites

Install dependencies

Train the network

Datasets

Sample Songs

End-to-end workflow

Text to emotion

Emotion to music

Loss vs Epochs

Discriminator Architecture

Generator Architecture

About

Releases

Packages

Contributors 2

Languages

spicypig/text_to_music

Folders and files

Latest commit

History

Repository files navigation

CS230 Project: Literary Muzak

Prerequisites

Install dependencies

Train the network

Datasets

Sample Songs

End-to-end workflow

Text to emotion

Emotion to music

Loss vs Epochs

Discriminator Architecture

Generator Architecture

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages