# Quick `ms3` reference

## To run this notebook

* install ms3 (`pip install ms3`)
* set the `DATA_PATH` to where you want the folder `dcml_corpora` to be created that contains the data

Read about {ref}`keys_and_ids` {ref}`Keys and IDs <keys_and_ids>` 

In [1]:
DATA_PATH = '~'

## Setup

In [2]:
import os
import ms3
from git import Repo

corpora_path = os.path.join(os.path.expanduser(DATA_PATH), 'dcml_corpora')
if os.path.isdir(corpora_path):
    repo = Repo(corpora_path)
else:
    repo = Repo.clone_from(url='https://github.com/DCMLab/dcml_corpora.git', 
                to_path=corpora_path, 
                multi_options=['--recurse-submodules', '--shallow-submodules'])
print(f"dcml_corpora @ commit {repo.commit().hexsha}")

dcml_corpora @ commit 9dcde40cba36d31b900ff12852cc557b8cca8221


## Parsing multiple scores at once

### The Corpus object

Scores often come grouped into a corpus, so when we want to parse multiple scores, we create a [Corpus](Corpus) object and pass it the directory containing the scores. `ms3` will scan the directory and discover all scores and TSV files that can be potentially parsed:

In [19]:
tchaikovsky_path = os.path.join(corpora_path, 'tchaikovsky_seasons')
corpus = ms3.Corpus(tchaikovsky_path)
corpus

[[1mdefault[0;0m|all]
Corpus 'tchaikovsky_seasons'
----------------------------
Location: /home/hentsche/dcml_corpora/tchaikovsky_seasons
View: This view is called 'default'. It 
	- excludes fnames that are not contained in the metadata,
	- filters out file extensions requiring conversion (such as .xml), and
	- excludes review files and folders.

All 12 pieces are listed in 'metadata.tsv':

          scores measures    notes expanded   events   chords
        detected detected detected detected detected detected
op37a01        1        1        1        1        1        1
op37a02        1        1        1        1        1        1
op37a03        1        1        1        1        1        1
op37a04        1        1        1        1        1        1
op37a05        1        1        1        1        1        1
op37a06        1        1        1        1        1        1
op37a07        1        1        1        1        1        1
op37a08        1        1        1        1   

When inspecting this object, 

In [21]:
corpora_path = '~/corelli'
corpora = ms3.Parse(corpora_path, level='c')
corpora

[[1mdefault[0;0m|all]
All corpora
-----------
View: This view is called 'default'. It 
	- excludes fnames that are not contained in the metadata,
	- filters out file extensions requiring conversion (such as .xml), and
	- excludes review files and folders.

             has   active   scores measures    notes expanded
        metadata     view detected detected detected detected
corpus                                                       
corelli      yes  default      149      149      149      149

452/1197 files are excluded from this view.

447 files have been excluded based on their subdir.
5 files have been excluded based on their file name.

**From here we can use the methods**

* [parse_scores()](Parse.parse_scores()) to parse all detected scores,
* [parse_tsv()](Parse.parse_tsv()) to parse all detected TSV files (previously extracted from scores),
* [parse()](Parse.parse()) to parse everything.

In [11]:
corpora.parse_scores()
corpora

[[1mgunthamund[0;0m|default|all]
All corpora
-----------
View: This view is called 'gunthamund'. It 
	- filters out file extensions requiring conversion (such as .xml), and
	- excludes review files and folders.

               has      active   scores        measures    notes   labels expanded   events
          metadata        view detected parsed detected detected detected detected detected
corpus                                                                                     
docs            no  gunthamund        4      4        0        0        0        0        0
new_tests       no  gunthamund        0      0        0        0        0        0        0
old_tests       no  gunthamund       10     10       21       21        8        4        7

4/93 files are excluded from this view.

4 files have been excluded based on their file name.


There are 3 orphans that could not be attributed to any of the respective corpus's fnames.

**Now we can extract the facets we need from the parsed scores, e.g. information on all measures from all scores:**

In [16]:
corpora.get_facet('measures')

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,mc,mn,quarterbeats,duration_qb,keysig,timesig,act_dur,mc_offset,numbering_offset,dont_count,barline,breaks,repeats,next,quarterbeats_all_endings,volta,markers,jump_bwd,jump_fwd,play_until
corpus,fname,i,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
docs,cujus,0,1,1,0,1.5,-3,3/8,3/8,0,,,,,firstMeasure,"(2,)",,,,,,
docs,cujus,1,2,2,3/2,1.5,-3,3/8,3/8,0,,,,,,"(3,)",,,,,,
docs,cujus,2,3,3,3,1.5,-3,3/8,3/8,0,,,,,,"(4,)",,,,,,
docs,cujus,3,4,4,9/2,1.5,-3,3/8,3/8,0,,,,,,"(5,)",,,,,,
docs,cujus,4,5,5,6,1.5,-3,3/8,3/8,0,,,,,,"(6,)",,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
old_tests,stabat_03_coloured,21,22,22,84,4.0,-2,4/4,1,0,,,,line,,"(23,)",,,,,,
old_tests,stabat_03_coloured,22,23,23,88,4.0,-2,4/4,1,0,,,,,,"(24,)",,,,,,
old_tests,stabat_03_coloured,23,24,24,92,4.0,-2,4/4,1,0,,,,,,"(25,)",,,,,,
old_tests,stabat_03_coloured,24,25,25,96,4.0,-2,4/4,1,0,,,,,,"(26,)",,,,,,


**Or we iterate through the corpora and print information on the first 10 notes:**

In [13]:
for corpus_name, corpus_object in corpora:
    print(f"First ten measures of {corpus_name}:")
    display(corpus_object.get_facet('notes').iloc[:10])

First ten measures of docs:


Unnamed: 0_level_0,Unnamed: 1_level_0,mc,mn,quarterbeats,duration_qb,mc_onset,mn_onset,timesig,staff,voice,duration,gracenote,nominal_duration,scalar,tied,tpc,midi,name,octave,chord_id
fname,notes_i,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
cujus,0,1,1,0,1.0,0,0,3/8,3,1,1/4,,1/4,1,,0,48,C3,3,5
cujus,1,1,1,0,1.0,0,0,3/8,2,2,1/4,,1/4,1,,-3,63,Eb4,4,2
cujus,2,1,1,0,1.5,0,0,3/8,2,3,3/8,,1/4,3/2,,1,67,G4,4,4
cujus,3,1,1,0,0.5,0,0,3/8,2,1,1/8,,1/8,1,,0,72,C5,5,0
cujus,4,1,1,1/2,1.0,1/8,1/8,3/8,2,1,1/4,,1/4,1,,2,74,D5,5,1
cujus,5,1,1,1,0.5,1/4,1/4,3/8,3,1,1/8,,1/8,1,,5,47,B2,2,6
cujus,6,1,1,1,0.5,1/4,1/4,3/8,2,2,1/8,,1/8,1,,2,62,D4,4,3
cujus,7,2,2,3/2,1.0,0,0,3/8,3,2,1/4,,1/4,1,,0,48,C3,3,12
cujus,8,2,2,3/2,1.0,0,0,3/8,3,1,1/4,,1/4,1,,0,60,C4,4,10
cujus,9,2,2,3/2,1.5,0,0,3/8,2,2,3/8,,1/4,3/2,1.0,1,67,G4,4,9


First ten measures of new_tests:


First ten measures of old_tests:


Unnamed: 0_level_0,Unnamed: 1_level_0,mc,mn,quarterbeats,duration_qb,mc_onset,mn_onset,timesig,staff,voice,duration,...,tremolo,nominal_duration,scalar,tied,tpc,midi,name,octave,chord_id,volta
fname,notes_i,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
05_symph_fant,0,1,1,0,1.0,0,0,4/4,28,1,1/4,...,1/4_r32_0,1/4,1,,10,70,A#4,4,30,
05_symph_fant,1,1,1,0,1.0,0,0,4/4,27,1,1/4,...,1/4_r32_0,1/4,1,,7,73,C#5,5,26,
05_symph_fant,2,1,1,0,1.0,0,0,4/4,26,1,1/4,...,1/4_r32_0,1/4,1,,4,76,E5,5,22,
05_symph_fant,3,1,1,0,1.0,0,0,4/4,25,1,1/4,...,1/4_r32_0,1/4,1,,1,79,G5,5,18,
05_symph_fant,4,1,1,0,1.0,0,0,4/4,24,1,1/4,...,1/4_r32_0,1/4,1,,10,82,A#5,5,14,
05_symph_fant,5,1,1,0,1.0,0,0,4/4,23,1,1/4,...,1/4_r32_0,1/4,1,,7,85,C#6,6,10,
05_symph_fant,6,1,1,0,1.0,0,0,4/4,22,1,1/4,...,1/4_r32_0,1/4,1,,4,88,E6,6,6,
05_symph_fant,7,1,1,0,1.0,0,0,4/4,21,1,1/4,...,1/4_r32_0,1/4,1,,1,91,G6,6,2,
05_symph_fant,8,1,1,1,1.0,1/4,1/4,4/4,28,1,1/4,...,1/4_r32_0,1/4,1,,10,70,A#4,4,31,
05_symph_fant,9,1,1,1,1.0,1/4,1/4,4/4,27,1,1/4,...,1/4_r32_0,1/4,1,,7,73,C#5,5,27,


**The available facets are `'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'`.
We can request several at the same time:**

In [14]:
corpora.get_facets(['labels', 'chords'])

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,mc,mn,quarterbeats,duration_qb,mc_onset,mn_onset,timesig,staff,voice,harmony_layer,...,pedal,Pedal_<sym>keyboardPedalPed</sym>,volta,Ottava:8va,Ottava:15mb,color,color_a,color_b,color_g,color_r
corpus,fname,facet,i,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
docs,cujus,labels,0,1,1,0,0.5,0,0,3/8,2,1,1,...,,,,,,,,,,
docs,cujus,labels,1,1,1,1/2,0.5,1/8,1/8,3/8,2,1,1,...,,,,,,,,,,
docs,cujus,labels,2,1,1,1,0.5,1/4,1/4,3/8,2,1,1,...,,,,,,,,,,
docs,cujus,labels,3,2,2,3/2,0.5,0,0,3/8,2,1,1,...,,,,,,,,,,
docs,cujus,labels,4,2,2,2,0.5,1/8,1/8,3/8,2,1,1,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
old_tests,stabat_03_coloured,chords,578,26,26,101,1.0,1/4,1/4,4/4,3,1,,...,,,,,,,,,,
old_tests,stabat_03_coloured,chords,579,26,26,100,1.0,0,0,4/4,4,1,,...,,,,,,,,,,
old_tests,stabat_03_coloured,chords,580,26,26,102,0.0,1/2,1/2,4/4,4,1,,...,,,,,,,,,,
old_tests,stabat_03_coloured,chords,581,26,26,100,1.0,0,0,4/4,4,2,,...,,,,,,,,,,
