# Part 1. Music Data Processing

In [1]:
# Import the necessary packages
import music21
from music21 import  *
import shutil
import pandas as pd
import numpy as np
import sys, re, itertools, random, os
from collections import Counter, defaultdict
from sklearn.cluster import KMeans
import os
from functions import *

## ATTENTION

***1. All the def functions we used in this notebook are written in functions.py file and we've upload another notebook named functions with specific examples to show the detail of each def function. ***

***2. Since we will process many files in this notebook, printing all the information in this notebook will slow the processing time. Thus we will only print several lines in each file as an illustraction.***

***3. If you want to try this part by yourself, you should delete the original files(such as .txt,.chord and .node files), otherwise the new information will be written in the original files repeatly.***

## 1.1 Transfer MIDI Music to Txt File

***补充music21和MIDI信息***

A midi song is usually composed of multiple instrument tracks. One instrument track serves as the main melody throughout the song, and other instrument tracks serve as accompaniment tracks. This section deals with the MIDI songs with the piano melody and extract the piano track information by using [music21](http://web.mit.edu/music21/), then save the information in a txt file.

Now we use 60 jazz songs (put in jazz-music-example folder) as an example. The command below will transfer these songs to txt format with their track information.
Our original songs looks like:

![](img/jazz-music-mid.png)

In [2]:
print ("The first 10 songs in iazz example: ")
os.listdir("jazz-music-example")[0:10]

The first 10 songs in iazz example: 


['Four on Six - Wes Montgomery_chord.txt',
 'Body and Soul - 2 - Parker_chord.txt',
 'Corcovado - Jobim_chord.txt',
 'C jam blues - Duke Ellington_chord.txt',
 'Falling in Love - 2 - Rodgers & Hart.mid',
 'Grooving - Buddy Rich.mid',
 'Georgia on my Mind - 2 - Hoagi Charmichael_chord.txt',
 'Boplicity - Parker.mid',
 'C jam blues - Duke Ellington.mid',
 'Blue Trane - Coltrane.mid']

In [3]:
file_path = "001"
for file in os.listdir(file_path):
    if file.endswith(".mid"):
        path = os.path.join(file_path, file)
        TransferMidToChordTxt(path)

**TransferMidToChordTxt** is a def function and we've written it in functions.py. To save space, we will not introduct the functions in this notbook, but put all of them in functions.ipynb with specific examples.
Now all the songs have generated their chord information in .txt files in the jazz-music-example folder.

![](img/jazz-music-txt.png)

## 1.2 Devide Data

We collect 800 jazz style songs in MIDI format and transfer them into .txt based on part 1.1, then devide these txt files in three parts: train, dev and test. 

In [15]:
print ("The first 10 songs in train: ")
os.listdir("music-data/train")[0:10]

The first 10 songs in train: 


['Gaviota trio 2_chord.txt',
 'sammy_walked_in-Michel-Camilo-pt_dm_chord.txt',
 'embraceable_you_jhall_chord.txt',
 'suddenly_ocean_jlh_chord.txt',
 'masquerade_dz_chord.txt',
 'i_have_but_one_heart_rs_chord.txt',
 'TakinACh_chord.txt',
 'clean_sweep_rmb_chord.txt',
 'Corcovado - Jobim_chord.txt',
 'honky_tonk_train_blues_eh2_chord.txt']

In [16]:
print ("The first 10 songs in dev: ")
os.listdir("music-data/dev")[0:10]

The first 10 songs in dev: 


['well_meet_again_rs_chord.txt',
 'you_were_never_lovelier-1939-kar_jpp_chord.txt',
 'whydidi_chord.txt',
 'when_joanna_loved_me-tony-bennett-kar_rt_chord.txt',
 'weather_channel_ce_chord.txt',
 'washington_square_bw2_chord.txt',
 'When I Fall in Love_chord.txt',
 'youll_never_know_rs_chord.txt',
 'WhyDoILoveYou_chord.txt',
 'you_belong_to_me-dk3074_dk_chord.txt']

In [17]:
print ("The first 10 songs in test: ")
os.listdir("music-data/test")[0:10]

The first 10 songs in test: 


['big_dipper_gw_chord.txt',
 'Body and Soul - 2 - Parker_chord.txt',
 'between_the_sheets_mellod_chord.txt',
 'accentuate_the_positive_jh_chord.txt',
 'a_taste_of_honey_dc_chord.txt',
 'all_of_me-1931-vs2-kar_jpp_chord.txt',
 'bags_groove_jh_chord.txt',
 'AllTheThings Reharmonized_chord.txt',
 'blues_jc3_chord.txt',
 'bach_variations_sn_chord.txt']

In [18]:
print ("Total train:",len(os.listdir("music-data/train")))
print ("Total dev:",len(os.listdir("music-data/dev")))
print ("Total test:",len(os.listdir("music-data/test")))

Total train: 600
Total dev: 50
Total test: 150


## 1.3 Extract Node and Chord Information in Each Folder

In this part, we will sepearate the track information into node information and chord information by using **ExtractNodeAndChord** function. Using train data asexample, firstly we input the train folder with the 600 .txt files, then extract the node and chord information of each file and finally cooperate the nodes and chords into train.node and train.chord files.

In [24]:
split_length = 20
input_path = "music-data/train"
out_node_path = "music-data/train.node"
out_chord_path = "music-data/train.chord"

for file in os.listdir(input_path):
    if file.endswith(".txt"):
        path = os.path.join(input_path, file)
        ExtractNodeAndChord(path, split_length,out_node_path,out_chord_path)

In [32]:
file_path = "music-data/train.node"
num = 5
print ("The first",num,"lines in train.node:")
PrintNumLines(file_path,num)

file_path = "music-data/train.chord"
num = 5
print ("The first",num,"lines in train.chord:")
PrintNumLines(file_path,num)

The first 5 lines in train.node:
{F-sharp_in_octave_2_|_C_in_octave_2} {C-sharp_in_octave_2_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2} {D_in_octave_3_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2} {C_in_octave_2_|_F-sharp_in_octave_2} {C-sharp_in_octave_2_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2} {D_in_octave_3_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2} {C_in_octave_2_|_F-sharp_in_octave_2} {C-sharp_in_octave_2_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2} {D_in_octave_3_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2} {C_in_octave_2_|_F-sharp_in_octave_2} {C-sharp_in_octave_2_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2} {D_in_octave_3_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2}
{C_in_octave_2_|_F-sharp_in_octave_2} {C-sharp_in_octave_2_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_in_octave_2} {D_in_octave_3_|_F-sharp_in_octave_2} {B_in_octave_2_|_F-sharp_

In [26]:
input_path = "music-data/dev"
out_node_path = "music-data/dev.node"
out_chord_path = "music-data/dev.chord"

for file in os.listdir(input_path):
    if file.endswith(".txt"):
        path = os.path.join(input_path, file)
        ExtractNodeAndChord(path, split_length,out_node_path,out_chord_path)

In [33]:
file_path = "music-data/dev.node"
num = 5
print ("The first",num,"lines in dev.node:")
PrintNumLines(file_path,num)

file_path = "music-data/dev.chord"
num = 5
print ("The first",num,"lines in dev.chord:")
PrintNumLines(file_path,num)

The first 5 lines in dev.node:
{F_in_octave_3_|_G_in_octave_3_|_D_in_octave_3} {F_in_octave_3_|_A_in_octave_3} {C_in_octave_4_|_E_in_octave_3} {B_in_octave_1_|_F-sharp_in_octave_2_|_C-sharp_in_octave_3} {F-sharp_in_octave_2_|_C-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {F-sharp_in_octave_2_|_C-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {F-sharp_in_octave_2_|_C-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {F-sharp_in_octave_2_|_C-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {F-sharp_in_octave_2_|_C-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {F-sharp_in_octave_2_|_C-sharp_in_octave_2}
{B_in_octave_1_|_F-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octave_2} {F-sharp_in_octave_2_|_C-sharp_in_octave_2} {B_in_octave_1_|_F-sharp_in_octav

In [29]:
input_path = "music-data/test"
out_node_path = "music-data/test.node"
out_chord_path = "music-data/test.chord"

for file in os.listdir(input_path):
    if file.endswith(".txt"):
        path = os.path.join(input_path, file)
        ExtractNodeAndChord(path, split_length,out_node_path,out_chord_path)

In [34]:
file_path = "music-data/test.node"
num = 5
print ("The first",num,"lines in test.node:")
PrintNumLines(file_path,num)

file_path = "music-data/test.chord"
num = 5
print ("The first",num,"lines in test.chord:")
PrintNumLines(file_path,num)

The first 5 lines in test.node:
{E_in_octave_4_|_B_in_octave_3_|_A_in_octave_3_|_F_in_octave_3} {D_in_octave_4_|_B_in_octave_3_|_A_in_octave_4_|_F_in_octave_3} {D_in_octave_4_|_B_in_octave_3_|_A_in_octave_4_|_F_in_octave_3} {D_in_octave_4_|_B_in_octave_3_|_A_in_octave_4_|_F_in_octave_3} {D_in_octave_4_|_B_in_octave_3_|_A_in_octave_4_|_F_in_octave_3} {D_in_octave_4_|_B_in_octave_3_|_A_in_octave_4_|_F_in_octave_3} {D_in_octave_4_|_B_in_octave_3_|_A_in_octave_4_|_F_in_octave_3} {D_in_octave_4_|_B_in_octave_3_|_A_in_octave_4_|_F_in_octave_3} {E_in_octave_4_|_B_in_octave_3_|_A_in_octave_3_|_F_in_octave_3} {E_in_octave_4_|_B_in_octave_3_|_A_in_octave_3_|_F_in_octave_3} {E_in_octave_4_|_B_in_octave_3_|_A_in_octave_3_|_F_in_octave_3} {E-flat_in_octave_4_|_B_in_octave_3_|_G-sharp_in_octave_3_|_F_in_octave_3} {E-flat_in_octave_4_|_B_in_octave_3_|_G-sharp_in_octave_3_|_F_in_octave_3} {E-flat_in_octave_4_|_B_in_octave_3_|_G-sharp_in_octave_3_|_F_in_octave_3} {A_in_octave_3_|_E_in_octave_3_|_D_in_o

In [35]:
print ("Total train chord:",len(open("music-data/train.chord",'r').readlines()))
print ("Total train node:",len(open("music-data/train.node",'r').readlines()))
print ("Total dev chord:",len(open("music-data/dev.chord",'r').readlines()))
print ("Total dev node:",len(open("music-data/dev.node",'r').readlines()))
print ("Total test chord:",len(open("music-data/test.chord",'r').readlines()))
print ("Total test node:",len(open("music-data/test.node",'r').readlines()))

Total train chord: 12103
Total train node: 12103
Total dev chord: 813
Total dev node: 813
Total test chord: 2581
Total test node: 2581


## 1.4 Build the Vocabulary of Jazz Songs

In this part, we will calculate all the chords and nodes in the input jazz songs and save them as two vocabularies.

### 1.4.1 Generate The Chord Vocabulary

In [68]:
vocab = set()
with open("music-data/train.chord", 'r') as f:
    read_data = f.readlines()
for lines in read_data:
    for chord in lines.split(" "):
        if chord:
            vocab.add(chord)

In [69]:
with open("music-data/dev.chord", 'r') as f:
    read_data = f.readlines()
for lines in read_data:
    for chord in lines.split(" "):
        if chord:
            vocab.add(chord)

In [70]:
with open("music-data/test.chord", 'r') as f:
    read_data = f.readlines()
for lines in read_data:
    for chord in lines.split(" "):
        if chord:
            vocab.add(chord)

In [71]:
with open("music-data/vocab.chord", 'w') as f:
    for chord in vocab:
        if chord:
            f.write(chord+'\n')
print ("The chord vocabulary of 800 Jazz songs:")
print (vocab)

The chord vocabulary of 800 Jazz songs:
{'C#5', 'E7', 'G#1', 'F7', 'F#4', 'E', 'D1', 'C#7', 'F9', 'D3', 'F2', 'G6', 'B2', '\n', 'F#2', 'G#3', 'A5', 'E2', 'C6', 'F', 'D4', 'C#1', 'B', 'G5', 'G7', 'A7', 'G#7', 'C1', 'E4', 'D5', 'G2', 'A0', 'G#5', 'C5', 'A3', 'C4', 'F#3', 'G#4', 'D2', 'F3', 'B5', 'C7', 'F#7', 'B1', 'F4', 'C#4', 'E5', 'F#5', 'A6', 'F#6', 'F1', 'G0', 'C#2', 'A4', 'F5', 'B4', 'E3', 'F8', 'C#6', 'F#1', 'B0', 'B3', 'F6', 'A1', 'G#6', 'D7', 'E6', 'C#', 'G4', 'E1', 'G1', 'D6', 'C3', 'G3', 'C2', 'C#3', 'B6', 'C8', 'A2', 'G#2'}


In [72]:
RemoveDuplicateVocab("music-data/vocab.chord")

Remove 1 duplicate words!


### 1.4.2 Generate The Node Vocabulary

In [73]:
vocab = set()
with open("music-data/train.node", 'r') as f:
    read_data = f.readlines()
for lines in read_data:
    for node in lines.split(" "):
        if node:
            vocab.add(node)

In [74]:
with open("music-data/dev.node", 'r') as f:
    read_data = f.readlines()
for lines in read_data:
    for node in lines.split(" "):
        if node:
            vocab.add(node)

In [75]:
with open("music-data/test.node", 'r') as f:
    read_data = f.readlines()
for lines in read_data:
    for node in lines.split(" "):
        if node:
            vocab.add(node)

In [76]:
with open("music-data/vocab.node", 'w') as f:
    for chord in vocab:
        if chord:
            f.write(chord+'\n')
file_path = "music-data/vocab.node"
num = 10
print ("The first",num,"node vocabulary of 800 Jazz songs:")
PrintNumLines(file_path,num)

The first 10 node vocabulary of 800 Jazz songs:
{G_in_octave_6_|_A_in_octave_3_|_G_in_octave_5}
{G-sharp_in_octave_3_|_C_in_octave_4}
{E-flat_in_octave_3_|_E_in_octave_4_|_B-flat_in_octave_5}
{A_in_octave_3_|_F-sharp_in_octave_4_|_C_in_octave_4_|_D_in_octave_2_|_D_in_octave_3_|_E_in_octave_4_|_A_in_octave_4_|_A_in_octave_2}
{G-sharp_in_octave_3_|_F_in_octave_4_|_C_in_octave_6_|_C_in_octave_4_|_F_in_octave_5}
{G_in_octave_3_|_C_in_octave_4_|_A_in_octave_4_|_E_in_octave_4}
{E-flat_in_octave_4_|_A_in_octave_3_|_E-flat_in_octave_6}
{C_in_octave_5_|_D_in_octave_5_|_F_in_octave_5}
{F-sharp_in_octave_3_|_E-flat_in_octave_2}
{G-sharp_in_octave_3_|_G-sharp_in_octave_6_|_C_in_octave_4_|_G-sharp_in_octave_5}


In [77]:
RemoveDuplicateVocab("music-data/vocab.node")

Remove 1 duplicate words!


## 1.5 Delete Empty Lines in Each File

Since we will use TensorFlow Neural Machine Translation(NMT) as our model, the input file should remove all the empty lines for the machine to recognize. Thus we use **RemoveEmptyLine** function to delete all the empty lines in each filw.

In [78]:
file_path = "music-data/train.chord"
RemoveEmptyLine(file_path)

Remove 72 empty lines!


In [79]:
file_path = "music-data/train.node"
RemoveEmptyLine(file_path)

Remove 72 empty lines!


In [80]:
file_path = "music-data/dev.chord"
RemoveEmptyLine(file_path)

Remove 5 empty lines!


In [81]:
file_path = "music-data/dev.node"
RemoveEmptyLine(file_path)

Remove 5 empty lines!


In [82]:
file_path = "music-data/test.chord"
RemoveEmptyLine(file_path)

Remove 22 empty lines!


In [83]:
file_path = "music-data/test.node"
RemoveEmptyLine(file_path)

Remove 22 empty lines!


In [84]:
file_path = "music-data/vocab.chord"
RemoveEmptyLine(file_path)

Remove 1 empty lines!


In [85]:
file_path = "music-data/vocab.node"
RemoveEmptyLine(file_path)

Remove 1 empty lines!


Now we've finished the music data processing part and get 8 files:

![](img/preprocessing.png)

# Part 2. Train Neural Machine Translation (seq2seq)

In this part, we will use the [Tensorflow Neural Machine Translation (seq2seq)](https://github.com/tensorflow/nmt).

## 2.1 Basic Information About NMT
***补充NMT信息***

## 2.2 Prepare Work

1. First, we need to download the source code from github and put the files in our project folder.

2. Second, we should build a new folder names "tmp" in nmp-model folder. 

3. Next, build a new folder names "train-data" and copy the 8 files generated in part1.5 to this new folder.

![](img/traindata.png)

## 2.3 Train NMT

***补充参数选择信息***

In Ubuntu 16.04, use this command in terminal to train the nmt model:

***python -m nmt.nmt     --src=chord --tgt=node     --vocab_prefix=nmt/tmp/train-data/vocab      --train_prefix=nmt/tmp/train-data/train     --dev_prefix=nmt/tmp/train-data/dev      --test_prefix=nmt/tmp/train-data/test     --out_dir=nmt/tmp/model-data     --num_train_steps=1200     --steps_per_stats=100     --num_layers=2     --num_units=128     --dropout=0.2     --metrics=bleu***

The training process is in tensorflow-gpu(V1.7) environment, thus the training speed is very fast. For more information about the training envoironment information, please read the [README](https://github.com/huuuuusy/Music-Style-Transformation/blob/master/Readme) page. 

![](img/train01.png)

We can clearly see that the command transfer some parameters to the model at first.

![](img/train02.png)

Since my GPU is GTX1070(8G), terminal shows that find a GPU(device 0) to process the data.

![](img/train03.png)

At end, terminal shows the successful information.

## 2.4 Check Model

Install the **tree** package to print the folder in model-data folder with tree structure.

![](img/modeltree.png)

This folder saves the model which can be reuse to transfer a no-jazz song to jazz-style. The specific information shows in transferMusic.ipynb.