# Part 1. Music Data Processing

In [2]:
# Import the necessary packages
import music21
from music21 import  *
import shutil
import pandas as pd
import numpy as np
import sys, re, itertools, random, os
from collections import Counter, defaultdict
from sklearn.cluster import KMeans
import os
from functions import *

## ATTENTION

***1. All the def functions we used in this notebook are written in functions.py file and we've upload another notebook named functions with specific examples to show the detail of each def function. ***

***2. Since we will process many files in this notebook, printing all the information in this notebook will slow the processing time. Thus we will only print several lines in each file as an illustraction.***

***3. If you want to try this part by yourself, you should delete the original files(such as .txt,.chord and .note files), otherwise the new information will be written in the original files repeatly.***

## 1.1 Transfer MIDI Music to Txt File

A midi song is usually composed of multiple instrument tracks. One instrument track serves as the main melody throughout the song, and other instrument tracks serve as background tracks. Based on this, we separate the input song into main melody and background music(style) and focus on changing the background music to accomplish style transformation. However, the background music should follow a sequence of chords to make sure it is harmonic with the melody. Our project focus on the songs which has the piano track in their background music.This section deals with the input MIDI songs and extract the piano track information by using [music21](http://web.mit.edu/music21/), then save the piano track information in a txt file.

For jazz style, it has some specific chords to help this kind of music style easy to be recognized. Thus, we want to extract the usual jazz chords from the input jazz songs and produce a vocabulary of jazz style.

Now we use 60 jazz songs (put in jazz-music-example folder) as an example. The command below will transfer these songs to txt format with their track information.
Our original songs looks like:

![](img/jazz-music-mid.png)

In [3]:
print ("The first 10 songs in iazz example: ")
os.listdir("jazz-music-example")[0:10]

The first 10 songs in iazz example: 


['Four on Six - Wes Montgomery_chord.txt',
 'Body and Soul - 2 - Parker_chord.txt',
 'Corcovado - Jobim_chord.txt',
 'C jam blues - Duke Ellington_chord.txt',
 'Falling in Love - 2 - Rodgers & Hart.mid',
 'Grooving - Buddy Rich.mid',
 'Georgia on my Mind - 2 - Hoagi Charmichael_chord.txt',
 'Boplicity - Parker.mid',
 'C jam blues - Duke Ellington.mid',
 'Blue Trane - Coltrane.mid']

In [3]:
file_path = "jazz-music-example"
for file in os.listdir(file_path):
    if file.endswith(".mid"):
        path = os.path.join(file_path, file)
        TransferMidToChordTxt(path)

**TransferMidToChordTxt** is a def function and we've written it in functions.py. To save space, we will not introduct the functions in this notbook, but put all of them in functions.ipynb with specific examples.
Now all the songs have generated their chord information in .txt files in the jazz-music-example folder.

![](img/jazz-music-txt.png)

## 1.2 Devide Data

We collect 800 jazz style songs in MIDI format and transfer them into .txt based on part 1.1, then devide these txt files in three parts: train, dev and test. 

In [3]:
print ("The first 10 songs in train: ")
os.listdir("music-data/train")[0:10]

The first 10 songs in train: 


['Gaviota trio 2_chord.txt',
 'sammy_walked_in-Michel-Camilo-pt_dm_chord.txt',
 'embraceable_you_jhall_chord.txt',
 'suddenly_ocean_jlh_chord.txt',
 'masquerade_dz_chord.txt',
 'i_have_but_one_heart_rs_chord.txt',
 'TakinACh_chord.txt',
 'clean_sweep_rmb_chord.txt',
 'Corcovado - Jobim_chord.txt',
 'honky_tonk_train_blues_eh2_chord.txt']

In [4]:
print ("The first 10 songs in dev: ")
os.listdir("music-data/dev")[0:10]

The first 10 songs in dev: 


['well_meet_again_rs_chord.txt',
 'you_were_never_lovelier-1939-kar_jpp_chord.txt',
 'whydidi_chord.txt',
 'when_joanna_loved_me-tony-bennett-kar_rt_chord.txt',
 'weather_channel_ce_chord.txt',
 'washington_square_bw2_chord.txt',
 'When I Fall in Love_chord.txt',
 'youll_never_know_rs_chord.txt',
 'WhyDoILoveYou_chord.txt',
 'you_belong_to_me-dk3074_dk_chord.txt']

In [5]:
print ("The first 10 songs in test: ")
os.listdir("music-data/test")[0:10]

The first 10 songs in test: 


['big_dipper_gw_chord.txt',
 'Body and Soul - 2 - Parker_chord.txt',
 'between_the_sheets_mellod_chord.txt',
 'accentuate_the_positive_jh_chord.txt',
 'a_taste_of_honey_dc_chord.txt',
 'all_of_me-1931-vs2-kar_jpp_chord.txt',
 'bags_groove_jh_chord.txt',
 'AllTheThings Reharmonized_chord.txt',
 'blues_jc3_chord.txt',
 'bach_variations_sn_chord.txt']

In [6]:
print ("Total train:",len(os.listdir("music-data/train")))
print ("Total dev:",len(os.listdir("music-data/dev")))
print ("Total test:",len(os.listdir("music-data/test")))

Total train: 600
Total dev: 50
Total test: 150


## 1.3 Extract Note and Chord Information in Each Folder

In this part, we will sepearate the track information into note information and chord information by using **ExtractNoteAndChord** function. Using train data asexample, firstly we input the train folder with the 600 .txt files, then extract the note and chord information of each file and finally cooperate the notes and chords into train.note and train.chord files.

In [7]:
split_length = 20
input_path = "music-data/train"
out_note_path = "music-data/train.note"
out_chord_path = "music-data/train.chord"

for file in os.listdir(input_path):
    if file.endswith(".txt"):
        path = os.path.join(input_path, file)
        ExtractNoteAndChord(path, split_length,out_note_path,out_chord_path)

In [8]:
file_path = "music-data/train.note"
num = 5
print ("The first",num,"lines in train.note:")
PrintNumLines(file_path,num)

file_path = "music-data/train.chord"
num = 5
print ("The first",num,"lines in train.chord:")
PrintNumLines(file_path,num)

The first 5 lines in train.note:
{F-sharp_in_octave_2_|_C_in_octave_2} {C-sharp_in_octave_2_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2} {D_in_octave_3_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2} {C_in_octave_2_|_F-sharp_in_octave_2} {C-sharp_in_octave_2_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2} {D_in_octave_3_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2} {C_in_octave_2_|_F-sharp_in_octave_2} {C-sharp_in_octave_2_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2} {D_in_octave_3_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2} {C_in_octave_2_|_F-sharp_in_octave_2} {C-sharp_in_octave_2_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2} {D_in_octave_3_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2}
{C_in_octave_2_|_F-sharp_in_octave_2} {C-sharp_in_octave_2_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2} {D_in_octave_3_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_

In [9]:
input_path = "music-data/dev"
out_note_path = "music-data/dev.note"
out_chord_path = "music-data/dev.chord"

for file in os.listdir(input_path):
    if file.endswith(".txt"):
        path = os.path.join(input_path, file)
        ExtractNoteAndChord(path, split_length,out_note_path,out_chord_path)

In [10]:
file_path = "music-data/dev.note"
num = 5
print ("The first",num,"lines in dev.note:")
PrintNumLines(file_path,num)

file_path = "music-data/dev.chord"
num = 5
print ("The first",num,"lines in dev.chord:")
PrintNumLines(file_path,num)

The first 5 lines in dev.note:
{F_in_octave_3_|_G_in_octave_3_|_D_in_octave_3} {F_in_octave_3_|_A_in_octave_3} {C_in_octave_4_|_E_in_octave_3} {B_in_octave_1_|_F-sharp_in_octave_2_|_C-sharp_in_octave_3} {F-sharp_in_octave_2_|_C-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {F-sharp_in_octave_2_|_C-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {F-sharp_in_octave_2_|_C-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {F-sharp_in_octave_2_|_C-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {F-sharp_in_octave_2_|_C-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {F-sharp_in_octave_2_|_C-sharp_in_octave_2}
{B_in_octave_1_|_F-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {F-sharp_in_octave_2_|_C-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octav

In [11]:
input_path = "music-data/test"
out_note_path = "music-data/test.note"
out_chord_path = "music-data/test.chord"

for file in os.listdir(input_path):
    if file.endswith(".txt"):
        path = os.path.join(input_path, file)
        ExtractNoteAndChord(path, split_length,out_note_path,out_chord_path)

In [12]:
file_path = "music-data/test.note"
num = 5
print ("The first",num,"lines in test.note:")
PrintNumLines(file_path,num)

file_path = "music-data/test.chord"
num = 5
print ("The first",num,"lines in test.chord:")
PrintNumLines(file_path,num)

The first 5 lines in test.note:
{E_in_octave_4_|_B_in_octave_3_|_A_in_octave_3_|_F_in_octave_3} {D_in_octave_4_|_B_in_octave_3_|_A_in_octave_4_|_F_in_octave_3} {D_in_octave_4_|_B_in_octave_3_|_A_in_octave_4_|_F_in_octave_3} {D_in_octave_4_|_B_in_octave_3_|_A_in_octave_4_|_F_in_octave_3} {D_in_octave_4_|_B_in_octave_3_|_A_in_octave_4_|_F_in_octave_3} {D_in_octave_4_|_B_in_octave_3_|_A_in_octave_4_|_F_in_octave_3} {D_in_octave_4_|_B_in_octave_3_|_A_in_octave_4_|_F_in_octave_3} {D_in_octave_4_|_B_in_octave_3_|_A_in_octave_4_|_F_in_octave_3} {E_in_octave_4_|_B_in_octave_3_|_A_in_octave_3_|_F_in_octave_3} {E_in_octave_4_|_B_in_octave_3_|_A_in_octave_3_|_F_in_octave_3} {E_in_octave_4_|_B_in_octave_3_|_A_in_octave_3_|_F_in_octave_3} {E-flat_in_octave_4_|_B_in_octave_3_|_G-sharp_in_octave_3_|_F_in_octave_3} {E-flat_in_octave_4_|_B_in_octave_3_|_G-sharp_in_octave_3_|_F_in_octave_3} {E-flat_in_octave_4_|_B_in_octave_3_|_G-sharp_in_octave_3_|_F_in_octave_3} {A_in_octave_3_|_E_in_octave_3_|_D_in_o

In [13]:
print ("Total train chord:",len(open("music-data/train.chord",'r').readlines()))
print ("Total train note:",len(open("music-data/train.note",'r').readlines()))
print ("Total dev chord:",len(open("music-data/dev.chord",'r').readlines()))
print ("Total dev note:",len(open("music-data/dev.note",'r').readlines()))
print ("Total test chord:",len(open("music-data/test.chord",'r').readlines()))
print ("Total test note:",len(open("music-data/test.note",'r').readlines()))

Total train chord: 12103
Total train note: 12103
Total dev chord: 813
Total dev note: 813
Total test chord: 2581
Total test note: 2581


## 1.4 Build the Vocabulary of Jazz Songs

In this part, we will calculate all the chords and notes in the input jazz songs and save them as two vocabularies.

### 1.4.1 Generate The Chord Vocabulary

In [14]:
vocab = set()
with open("music-data/train.chord", 'r') as f:
    read_data = f.readlines()
for lines in read_data:
    for chord in lines.split(" "):
        if chord:
            vocab.add(chord)

In [15]:
with open("music-data/dev.chord", 'r') as f:
    read_data = f.readlines()
for lines in read_data:
    for chord in lines.split(" "):
        if chord:
            vocab.add(chord)

In [16]:
with open("music-data/test.chord", 'r') as f:
    read_data = f.readlines()
for lines in read_data:
    for chord in lines.split(" "):
        if chord:
            vocab.add(chord)

In [17]:
with open("music-data/vocab.chord", 'w') as f:
    for chord in vocab:
        if chord:
            f.write(chord+'\n')
file_path = "music-data/vocab.chord"
num = 10
print ("The first",num,"chord vocabulary of 800 Jazz songs:")
PrintNumLines(file_path,num)

The first 10 chord vocabulary of 800 Jazz songs:
E4
B4
C4
G7
C#7
A1
C#5
F9
G4
A5


In [18]:
RemoveDuplicateVocab("music-data/vocab.chord")

Remove 1 duplicate words!


### 1.4.2 Generate The Note Vocabulary

In [19]:
vocab = set()
with open("music-data/train.note", 'r') as f:
    read_data = f.readlines()
for lines in read_data:
    for note in lines.split(" "):
        if note:
            vocab.add(note)

In [20]:
with open("music-data/dev.note", 'r') as f:
    read_data = f.readlines()
for lines in read_data:
    for note in lines.split(" "):
        if note:
            vocab.add(note)

In [21]:
with open("music-data/test.note", 'r') as f:
    read_data = f.readlines()
for lines in read_data:
    for note in lines.split(" "):
        if note:
            vocab.add(note)

In [22]:
with open("music-data/vocab.note", 'w') as f:
    for chord in vocab:
        if chord:
            f.write(chord+'\n')
file_path = "music-data/vocab.note"
num = 10
print ("The first",num,"note vocabulary of 800 Jazz songs:")
PrintNumLines(file_path,num)

The first 10 note vocabulary of 800 Jazz songs:
{G-sharp_in_octave_3_|_D_in_octave_5_|_D_in_octave_4_|_B_in_octave_4}
{F_in_octave_3_|_F-sharp_in_octave_6}
{B-flat_in_octave_5_|_E-flat_in_octave_4_|_G-sharp_in_octave_3}
{D_in_octave_2_|_D_in_octave_4_|_B-flat_in_octave_4_|_F-sharp_in_octave_2_|_B_in_octave_1}
{F-sharp_in_octave_4_|_E-flat_in_octave_5_|_C_in_octave_5_|_A_in_octave_4_|_C_in_octave_6}
{D_in_octave_4_|_E-flat_in_octave_4_|_G_in_octave_4}
{G_in_octave_5_|_A_in_octave_4_|_G_in_octave_2}
{A_in_octave_4_|_C_in_octave_5_|_E_in_octave_5_|_F_in_octave_4_|_C_in_octave_4_|_A_in_octave_3_|_E_in_octave_4_|_F_in_octave_3}
{A_in_octave_4_|_C-sharp_in_octave_4_|_C_in_octave_5}
{G_in_octave_3_|_G-sharp_in_octave_6}


In [23]:
RemoveDuplicateVocab("music-data/vocab.note")

Remove 1 duplicate words!


## 1.5 Delete Empty Lines in Each File

Since we will use TensorFlow Neural Machine Translation(NMT) as our model, the input file should remove all the empty lines for the machine to recognize. Thus we use **RemoveEmptyLine** function to delete all the empty lines in each filw.

In [24]:
file_path = "music-data/train.chord"
RemoveEmptyLine(file_path)

Remove 72 empty lines!


In [25]:
file_path = "music-data/train.note"
RemoveEmptyLine(file_path)

Remove 72 empty lines!


In [26]:
file_path = "music-data/dev.chord"
RemoveEmptyLine(file_path)

Remove 5 empty lines!


In [27]:
file_path = "music-data/dev.note"
RemoveEmptyLine(file_path)

Remove 5 empty lines!


In [28]:
file_path = "music-data/test.chord"
RemoveEmptyLine(file_path)

Remove 22 empty lines!


In [29]:
file_path = "music-data/test.note"
RemoveEmptyLine(file_path)

Remove 22 empty lines!


In [30]:
file_path = "music-data/vocab.chord"
RemoveEmptyLine(file_path)

Remove 1 empty lines!


In [31]:
file_path = "music-data/vocab.note"
RemoveEmptyLine(file_path)

Remove 1 empty lines!


Now we've finished the music data processing part and get 8 files:

![](img/preprocessing.png)

## 2.1 Prepare Work

1. First, we need to download the source code from github and put the files in our project folder.

2. Second, we should build a new folder names "tmp" in nmp-model folder. 

3. Next, build a new folder names "train-data" and copy the 8 files generated in part1.5 to this new folder.

![](img/train-data.png)

## 2.2 Train NMT


***ATTENTION***

***If you want to set the num_units larger, please make sure that your device is good enough. If you do not have a gpu or the gpu memory is not enough, please reduce the num_units to avoid out of memory error!***

In Ubuntu 16.04, use this command in terminal to train the nmt model:

***python -m nmt.nmt     --src=chord --tgt=note     --vocab_prefix=nmt/tmp/train-data/vocab      --train_prefix=nmt/tmp/train-data/train     --dev_prefix=nmt/tmp/train-data/dev      --test_prefix=nmt/tmp/train-data/test     --out_dir=nmt/tmp/model-data     --num_train_steps=1200     --steps_per_stats=100     --num_layers=2     --num_units=128     --dropout=0.2     --metrics=bleu***

The training process is in tensorflow-gpu(V1.7) environment, thus the training speed is very fast, according to the log file, it is nearly 0.09s a step.For more information about the training envoironment information, please read the [README](https://github.com/huuuuusy/Music-Style-Transformation/blob/master/Readme) page. 

In fact we train our model in 24000 steps, to show the information clearly,here we just using training 1200 steps as an example:

![](img/train01.png)

We can clearly see that the command transfer some parameters to the model at first.

![](img/train02.png)

Since my GPU is GTX1070(8G), terminal shows that find a GPU(device 0) to process the data.

![](img/train03.png)

At end, terminal shows the successful information.

## 2.3 Check Model

Install the **tree** package to print the folder in model-data folder with tree structure.

![](img/modeltree.png)

This folder saves the model which can be reuse to transfer a no-jazz song to jazz-style. The specific information shows in transferMusic.ipynb.