# Markov Music

Robbie Dozier
Abhi Devathi

In [1]:
# Import all necessary packages
import numpy as np
import matplotlib
import pandas as pd
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import animation, rc
from IPython.display import HTML
import pgmpy as pgm
from sklearn.decomposition import PCA
from sklearn.manifold import LocallyLinearEmbedding, TSNE
import useful
from pygame import mixer
import os
os.chdir('..')
import midivectors

pygame 1.9.5
Hello from the pygame community. https://www.pygame.org/contribute.html


In [2]:
%matplotlib notebook
matplotlib.rcParams['figure.figsize'] = [16, 9]
matplotlib.rcParams['figure.dpi'] = 50

## Summary

In this project, we try to learn the structure of Bach Chorales and then generate music using this structure using probabilistic graphical models. 

## Dataset
The dataset we are using is found at [this link](https://archive.ics.uci.edu/ml/datasets/Bach+Choral+Harmony). It is a .csv file of Bach Chorales with harmonies. Each row of the data represents a different chord (events) in a specific chorale. The dataset contains all the chords from 60 chorales. The features are

    1. Choral ID: corresponding to the file names from [Bach Central](http://www.bachcentral.com)
    2. Event Number
    3-14. Pitch Classes, as a binary variable. 3 is C (YES/NO), 4 is C# (YES/NO),...14 is B (YES/NO)
    15. Bass Note. A character that represents which pitch is the bass note of the chord.
    16. Meter: An integer (1-5) that represents which type of note the chord is. 1 is a whole note, 2 is half note, 3 is a quarter note, 4 is an eighth note and 5 is a sixteenth note.
    17. Chord Label: denotes the name of the chord that is played in the event

### Pre-Processing
First, we parse the data into a form that is friendly for machine learning. We get a dataset that has the same number of events but each vector is now 384 dimensions.

We are only looking at the harmonic content of the data, so everything except for the name of the Chorale and features 3-14 are thrown out. We can then arrange this into a series of $m$-dimensional vectors where $m$ is the number of notes (12 in this case):
$$
\begin{bmatrix}1\\0\\0\\0\\1\\0\\0\\1\\0\\0\\0\\1\end{bmatrix}
$$
<img src="cmaj7.png" alt="Drawing" style="width: 200px;"/>
Next, we take the vectors and arrange them into $m \times n$ matrices, where $n$ is the number of subdivisions (timesteps, in this case, 16th notes.)

Robbie wrote a simple library for doing this processing, and converting the data to MIDI and back. Next, we split the chorales into 32-subdivision (2 measure) segments, then make versions for all 12 keys. We're left with a

The pre-processing functions are found in `midi_vectors`, and the parsing is done in the file `chorales_parse.py`.

The final dataset is saved in the file `data/chorales_vectors_12_32.npy`.

## Dimensionality Reduction

In this section, we employ some dimensionality reduction techniques to show how the space looks.

### Principal Component Analysis

The first technique we employ is PCA because it is simple, and can provide a good baseline for other dimensionality reduction techniques. Further, it allows us to employ an inverse transform from the reduced space to the original space because it is a linear projection. If we suspect that the high-dimensional data lay in a non-linear manifold, we will employ some non-linear methods.

First, we load the data.

In [3]:
data = np.load('data/chorales_matrices_12_32.npy')
data.shape

(12, 32, 1764)

We choose to take ***n*** components using pca. 

In [None]:
pca = PCA(n_components=200)
data_tr = pca.fit_transform(data)
data_tr.shape

Here, we plot the first 3 principal components in a 3-Dimensional plot.

In [None]:
import useful

useful.scatter_3d(
    data_tr[:, :3].T, 
    rcParams=matplotlib.rcParams)

As you can see, there is some clear structure