# Instrument identification granularity

### Attested labels

Instrument categories (classes) attested in the current Cytomine dataset are listed in `classes.txt`:

In [177]:
items = []
for line in open('classes.txt'):
    line = line.strip()
    if not line:
        continue
    idx, rest = line.split(',')
    instrument, code = [i.strip() for i in rest.split('(')]
    code = code.replace(')', '')
    items.append((idx, instrument, code))

We convert this to a cleaner format:

In [178]:
import pandas as pd
df = pd.DataFrame(items, columns=('index', 'instrument', 'mimo'))
df = df.set_index('index')
df.to_csv('classes.csv')
df.head(10)

Unnamed: 0_level_0,instrument,mimo
index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Cornett,3868
1,Viol,3597
2,Galoubet,3970
3,Clarinet,3836
4,Aulos,4173
5,Bladder pipe,3748
6,Fiddle,3142
7,Lirone,3183
8,Horn,4118
9,Mandolin,3510


In [179]:
df = pd.read_csv('classes.csv')
df = df.set_index('index')
df.head(10)

Unnamed: 0_level_0,instrument,mimo
index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Cornett,3868
1,Viol,3597
2,Galoubet,3970
3,Clarinet,3836
4,Aulos,4173
5,Bladder pipe,3748
6,Fiddle,3142
7,Lirone,3183
8,Horn,4118
9,Mandolin,3510


### Mimo thesaurus: 3-level hierarchy

We consult the MIMO thesaurus in the spreadsheet provided here: [link to repository](https://github.com/philharmoniedeparis/mimo/blob/master/harvesting/Docs/MIMO_Thesaurus.xlsx).

In [180]:
df = pd.read_excel('MIMO_Thesaurus.xlsx')
df.head(20)

Unnamed: 0,Identifier,Level_1,Level_2,Level_3,Synonyms,YOUR LANGUAGE,Synonyms (YOUR LANGUAGE),H&S link,Nb of instruments,Définition,Original language,URI_DBPEDIA
0,LEXICON_00002208,Electronic instruments,Electronic instruments,Electronic instruments,,,,,0,,,
1,LEXICON_00002209,Electronic instruments,Electronic,Electronic,,,,LEXICON_00006180,0,,,
2,LEXICON_00006617,Electronic instruments,Electronic,Accord guitar,,,,,0,,,
3,LEXICON_00006618,Electronic instruments,Electronic,Accordion-synthesizer,,,,,3,,,
4,LEXICON_00006619,Electronic instruments,Electronic,Assemblage of electronic modules,,,,LEXICON_00006185,0,,,
5,LEXICON_00002210,Electronic instruments,Electronic,Beatbox,,,,,4,,,
6,LEXICON_00002211,Electronic instruments,Electronic,Clavioline,,,,LEXICON_00006169,3,,French,
7,LEXICON_00005903,Electronic instruments,Electronic,Croix sonore,Sound cross,,,,1,,,
8,LEXICON_00002212,Electronic instruments,Electronic,Gmebaphone,,,,,1,,French,
9,LEXICON_00002213,Electronic instruments,Electronic,Gmebogosse,,,,,0,,French,


A class to convert an instrument name:

In [181]:
class Oracle(object):
    def __init__(self, path='MIMO_Thesaurus.xlsx'):
        super().__init__()
        df = pd.read_excel(path)
        self.lookup = {}
        for l1, l2, l3 in zip(df['Level_1'], df['Level_2'], df['Level_3']):
            l1 = ' '.join(l1.split())
            l2 = ' '.join(l2.split())
            l3 = ' '.join(l3.split())

            self.lookup[l3] = (l1, l2)
    
    def convert(self, q, level=2):
        assert level in (1, 2, 3)
        if q in self.lookup:
            if level == 3:
                return q
            else:
                l1, l2 = self.lookup[q]
                if level == 2:
                    return ' > '.join((l1, l2))
                elif level == 1:
                    return l1
            return q
        else:
            print(f'-> "{q}" not found')
        

In [182]:
oracle = Oracle()
print(oracle.convert('Pianoline'))
print(oracle.convert('Pianoline', level=1))
print(oracle.convert('Pianoline', level=2))
print(oracle.convert('Pianoline', level=3))

Electronic instruments > Electronic
Electronic instruments
Electronic instruments > Electronic
Pianoline


### Application

In [183]:
oracle = Oracle()
attested = pd.read_csv('classes.csv')['instrument'].tolist()

In [185]:
for instrument in attested:
    print(instrument, '> 1:', oracle.convert(instrument, level=1), ' 2:', oracle.convert(instrument, level=2))

Cornett > 1: Wind instruments  2: Wind instruments > Cornetts
Viol > 1: Stringed instruments  2: Stringed instruments > Viols
Galoubet > 1: Wind instruments  2: Wind instruments > Flutes
Clarinet > 1: Wind instruments  2: Wind instruments > Clarinets
Aulos > 1: Wind instruments  2: Wind instruments > Oboes
Bladder pipe > 1: Wind instruments  2: Wind instruments > Bagpipes
Fiddle > 1: Stringed instruments  2: Stringed instruments > Fiddles
Lirone > 1: Stringed instruments  2: Stringed instruments > Fiddles
Horn > 1: Wind instruments  2: Wind instruments > Horns
Mandolin > 1: Stringed instruments  2: Stringed instruments > Mandolins
Rattle > 1: Percussion instruments  2: Percussion instruments > Rattles
Bagpipe > 1: Wind instruments  2: Wind instruments > Bagpipes
Panpipe > 1: Wind instruments  2: Wind instruments > Flutes
Organ > 1: Keyboard instruments  2: Keyboard instruments > Organs
Drum > 1: Percussion instruments  2: Percussion instruments > Drums
Percussion instruments > 1: Percu