# <center>Similarity Test</center>

This notebook will read in a MIDI file generated and a directory of MIDI files that it learned on and will determine the similarity of it to each of the songs.

This notebook was created to ensure that the model wasn't overfitting music automatically since it is intractatable to search through several hundreds of songs to confirm it isn't copying. We have found that no significant copying has been done by the model. At most 10-15% overlaps with multiple songs.

## Directions for Use

1. Place the desired MIDI file in the folder /Similarity/input
2. Place the training MIDIs into the folder /Similarity/compare_to
3. Run all code snippets
4. The final code snippet will output the percentage the input song matches the comparison songs

In [45]:
import glob
from music21 import converter, instrument, note, chord, stream, duration
from fuzzywuzzy import fuzz
import numpy as np

First get all of the notes for the songs to compare the input to.

In [46]:
def get_notes(path):
    """
        Gets all notes and chords from midi file
    """
    notes = []
    names = []

    for file in glob.glob(path + "*.mid"):        
        song = []
        midi = converter.parse(file)
        
        # Get some info about the file
        key = midi.analyze('key')
        key_string = key.tonic.name + ' ' + key.mode
        
        print("Parsing %s" % file)
        print(key_string)
        
        notes_to_parse = None

        try: # file has instrument parts
            s2 = instrument.partitionByInstrument(midi)
            notes_to_parse = s2.parts[0].recurse() 
        except: # file has notes in a flat structure
            notes_to_parse = midi.flat.notes

        for element in notes_to_parse:
            if isinstance(element, note.Note):
                song.append([str(element.pitch), element.offset, element.duration])
            elif isinstance(element, chord.Chord):
                song_note = '.'.join(str(n) for n in element.normalOrder)
                song.append([song_note, element.offset, element.duration])
        notes.append(np.array(song))
        names.append(file.strip(path + '\\'))

    return notes, names

comp_path = "../Similarity/compare_to/"
comp_notes, comp_names = get_notes(comp_path)

Parsing ../Similarity/compare_to\Pokemon Gold, Silver, Crystal - Cinnabar Island (HGSS Version).mid
G major
Parsing ../Similarity/compare_to\Pokemon Gold, Silver, Crystal - S.S. Aqua .mid
G major
Parsing ../Similarity/compare_to\Pokemon GoldSilverCrystal - Azalea TownBlackthorn City.mid
C# major
Parsing ../Similarity/compare_to\Pokemon GoldSilverCrystal - Bicycle.mid
E minor
Parsing ../Similarity/compare_to\Pokemon GoldSilverCrystal - Bug Catching Contest.mid
E minor
Parsing ../Similarity/compare_to\Pokemon GoldSilverCrystal - Burned Tower.mid
E minor
Parsing ../Similarity/compare_to\Pokemon GoldSilverCrystal - Champion Battle.mid
G# minor
Parsing ../Similarity/compare_to\Pokemon GoldSilverCrystal - Cherrygrove CityMahogany Town.mid
F major
Parsing ../Similarity/compare_to\Pokemon GoldSilverCrystal - Dance Theatre.mid
A minor
Parsing ../Similarity/compare_to\Pokemon GoldSilverCrystal - Dark Cave.mid
A- major
Parsing ../Similarity/compare_to\Pokemon GoldSilverCrystal - Dragons Den.mid
C

Process the similarities into readable formats

In [47]:
sep = " "
comp_pitches = []
comp_offsets = []
comp_durations = []
for song in comp_notes:
    if len(song):
        comp_pitches.append(sep.join(song[:,0]))
        comp_offsets.append(sep.join([str(round(float(x), 2)) for x in song[:,1]]))
        comp_durations.append(sep.join([str(round(float(x.quarterLength), 2)) for x in song[:,2]]))

Next, get the input song. I will assume this is the first song received.

In [48]:
input_path = "../Similarity/input/"
notes, names = get_notes(input_path)

Parsing ../Similarity/input\Pokemon_GSC_3.mid
A- major


In [49]:
input_notes = notes[0]
input_name = names[0]

I will then process the input song.

In [50]:
sep = " "
input_pitches = sep.join(input_notes[:,0])
input_offsets = sep.join([str(round(float(x), 2)) for x in input_notes[:,1]])
input_durations = sep.join([str(round(float(x.quarterLength), 2)) for x in input_notes[:,2]])

Now, I will use the fuzz library to test the similarity of songs. I will consider pitch, offset and duration as equally weighted so I will consider the similarity score the weighted average of the three.

I chose to consider similarity based on partial ratio because this considers the similarity of sentences, rather than exact words. I expect many of the elements to be copied over from the songs, so simply counting how many are used in both is too basic to compare realistically.<br>
This is the best measure for music because it allows for the same notes to be played with other notes in between without penalization in score, which is what would be desired - sounding similar, while trying new things.

I will present them in descending order.

In [51]:
similarity_scores = dict()
for (comp_pitch, comp_offset, comp_duration, name) in zip(comp_pitches, comp_offsets, comp_durations, comp_names):
    similarity_scores[name] = round((fuzz.partial_ratio(input_pitches, comp_pitch) + fuzz.partial_ratio(input_offsets, comp_offset) + fuzz.partial_ratio(input_durations, comp_duration))/3, 2)

In [52]:
import operator
sorted_similarity_scores = sorted(similarity_scores.items(), key=operator.itemgetter(1), reverse=True)
print('Similary Score (%)  -      Song Name')
for tup in sorted_similarity_scores:
    print(f'{tup[1]:.2f}                -     {tup[0]}')

Similary Score (%)  -      Song Name
6.67                -     Pokemon GoldSilverCrystal - Goldenrod City.mid
6.33                -     Pokemon HeartGoldSoulSilver - Game Corner.mid
5.67                -     Pokemon GoldSilverCrystal - Saffron City.mid
3.67                -     Pokemon HeartGoldSoulSilver - Safari Zone Gate.mid
2.00                -     Pokemon Gold, Silver, Crystal - Cinnabar Island (HGSS Version).mid
2.00                -     Pokemon GoldSilverCrystal - Champion Battle.mid
2.00                -     Pokemon GoldSilverCrystal - Girl Trainer Confrontation.mid
2.00                -     Pokemon GoldSilverCrystal - Sprout Tower.mid
2.00                -     Pokemon HeartGoldSoulSilver - Cinnabar Island.mid
1.67                -     Pokemon GoldSilverCrystal - Gym.mid
1.67                -     Pokemon GoldSilverCrystal - Lavender Town.mid
1.67                -     Pokemon GoldSilverCrystal - Pokemon March.mid
1.67                -     Pokemon GoldSilverCrystal - Victory Roa

## Conclusion

As can be seen here, the majority of songs are barely referenced with the cap around 15%