In [2]:
""" Second take on machine learning features... 

CURRENTLY TESTING:
1. Counts on common intervals/steps? (step pattern of 0,3,7 for minor, vs. 0,4,7 for major)
2. Minor mode offers 9 notes (incl. ) vs. majors 7 notes

Add:
1. #1 is looking promising. Need to handle for case when no discernible difference. 
2. Need to scale step frequency to be relevent to num of total notes (%age rather than abs)
3. Should I put in all steps/interval data and let algorithm do the heavy lifting?
    (Rather than calculate count of patterns I'm expecting)
4. #2 is also looking promising. Pare out rarely using notes (<1% of all notes?) 
to better define most frequently used. 

------------------

Notes from Henry:
Ideally you want the machine to do all the work without putting computation in yourself.
Potentially try minimal computation first—at least to establish the baseline. 

- Stick with Logistic regression for first stab
- Naive Bayes may also work
- Random forest (Adaboost is a variation on this).... may not be right for this problem
- SVM tends to need more tweaking 

"""

###############################################

import music21
import pandas as pd
import numpy as np
from pandas.tools.plotting import scatter_matrix
import matplotlib.pyplot as plt
%matplotlib inline


In [9]:
training_data = []

# read test files and construct columns
filenames = open('test_files.txt').read().split('\n')

for filename in filenames:
    
    # Create empty score_instance to hold data of training features + outcome
    score_instance = {}

    # Convert MIDI file to score and iterate over notes in score 
    # Save relevant individual Note attributes into list (incl. Note obj, name, freq, major/minor at given measure)
    score = music21.converter.parse('MIDI test files/Cello solos/' + filename)
    note_attributes = []
    for note in music21.alpha.theoryAnalysis.theoryAnalyzer.getNotes(score, 0):
        if note == None:
            pass
        else:
#             measure = int(note.offset)
#             key = music21.alpha.theoryAnalysis.theoryAnalyzer.getKeyAtMeasure(score, measure)
            note_attributes.append([note, note.name, note.frequency])
    print "Note attributes for file {} completed".format(filename)
            
    # Calculate global attributes of score (incl: Avg interval, avg steps, is_major)
    # Note: At the moment, this doesn't take into account changes b/w major + minor w/in a score

    score_instance['note_count'] = {}
    for i in range(len(note_attributes)):
        note_name = note_attributes[i][1]
        score_instance['note_count'][note_name] = score_instance['note_count'].get(note_name, 0) + 1
    
    count_maj_interval = 0
    count_min_interval = 0
    
    for i in range(1, len(note_attributes)-1):
        interval_1 = music21.interval.Interval(noteStart=note_attributes[i-1][0], noteEnd=note_attributes[i][0])
        interval_2 = music21.interval.Interval(noteStart=note_attributes[i][0], noteEnd=note_attributes[i+1][0])

        step_1 = ((interval_1.cents)/100.00)
        step_2 = ((interval_2.cents)/100.00)
 
        if (step_1 == 3 and step_2 == 4) or (step_1 == -4 and step_2 == -3):
            count_min_interval += 1
        if (step_1 == 4 and step_2 == 5) or (step_1 == -5 and step_2 == -4):
            count_min_interval += 1
            
        if (step_1 == 4 and step_2 == 3) or (step_1 == -3 and step_2 == -4):
            count_maj_interval += 1
        if (step_1 == 3 and step_2 == 5) or (step_1 == -5 and step_2 == -3):
            count_maj_interval += 1
            
    mode_at_measure_0 = music21.alpha.theoryAnalysis.theoryAnalyzer.getKeyAtMeasure(score, 0).mode
    is_major = (mode_at_measure_0 == 'major')
    
    # Saves all global attributes into a dictionary and appends to global set of all training data
    # Used to create df and plt below
    score_instance['count_maj_interval'] = count_maj_interval
    score_instance['count_min_interval'] = count_min_interval
    score_instance['is_major'] = is_major    
    
    print "Global attributes include: {}".format(score_instance)
    training_data.append(score_instance)

Note attributes for file cs1-1pre.mid completed
Global attributes include: {'note_count': {'A': 81, 'E-': 6, 'C': 68, 'B': 78, 'E': 63, 'D': 80, 'G': 95, 'F': 12, 'C#': 18, 'B-': 3, 'F#': 84, 'G#': 5}, 'is_major': True, 'count_maj_interval': 8, 'count_min_interval': 3}
Note attributes for file cs1-2all.mid completed
Global attributes include: {'note_count': {'A': 148, 'E-': 2, 'C': 100, 'B': 130, 'E': 96, 'D': 138, 'G': 152, 'F': 16, 'C#': 26, 'B-': 4, 'F#': 94, 'G#': 8}, 'is_major': True, 'count_maj_interval': 34, 'count_min_interval': 12}
Note attributes for file cs1-3cou.mid completed
Global attributes include: {'note_count': {'A': 132, 'E-': 8, 'C': 88, 'B': 120, 'E': 84, 'D': 116, 'G': 136, 'F': 18, 'C#': 22, 'F#': 88}, 'is_major': True, 'count_maj_interval': 12, 'count_min_interval': 2}
Note attributes for file cs1-4sar.mid completed
Global attributes include: {'note_count': {'A': 32, 'E-': 6, 'C': 24, 'B': 38, 'E': 36, 'D': 36, 'G': 38, 'F': 4, 'C#': 10, 'F#': 34, 'G#': 2}, 'is_

In [None]:
for instance in training_data:
    proportion_maj = instance['count_maj_interval']/(instance['count_maj_interval'] + instance['count_min_interval'])
    proportion_min = instance['count_min_interval']/(instance['count_maj_interval'] + instance['count_min_interval'])
