# Welcome to the Jupyter server for MUSI 986

Click the "Run" button above and see what happens. We'll take it from there in class.

In [18]:
import chant

import pandas as pd

from IPython.core.display import display, HTML

# LMLO Basics

The Late Medieval Offices Corpus is an electronic corpus curated by Andrew Hughes that is no longer accessible in its original form. The `chant` module (imported by the first line in this notebook) provides convenient access to the LMLO data using the `pandas` data infrastructure (imported by the second line). Don't worry about the third line for now.

### `cd = chant.chantData`

The `chant` module provides two data tables. The first, formally known as `chant.chantData` and `cd` for short, contains one row for each of the nearly 6000 chants in the corpus. The columns correspond to data attributes for each chant: its source, mode, text, and liturgical metadata (office, service, ordinal, genre), as well as Hughes's original encoding and an intermediate encoding (volpiano) used for notation and searching.


In [19]:
chant.chantData.to_pickle('chantData.zip')
chant.noteData.to_pickle('noteData.zip')


In [None]:
cd = chant.chantData


pd.set_option('display.max_rows', 6)
display(cd)

### `nd = chant.noteData`
The second data table, which is much longer, contains one row for each note in each chant. The output below should give you an indication of how long this table is. We'll go over its data attributes in class.

Try changing the number in the cell below to another number. Re-run the cell by hitting shift-enter, or the Run button above. What's different?

In [None]:
nd = chant.noteData

pd.set_option('display.max_rows', 20)
display(nd)


# Some basic descriptive analysis

What's the highest number in the `word` column? Add 1 to it and it's the answer to the question "what's the greatest number of words any chant in the corpus has?" (Why do we have to add 1?)

In [None]:
max(nd.word)

What's the highest number in the `syll` column? Suppose we add 1 to it: what question is this [theoretically] answering?

In [None]:
max(nd.syll)

Same question for `note`.

In [None]:
max(nd.note)

# A thing about Jupyter 

Compare the output here:

In [None]:
max(nd.note)
max(nd.syll)
max(nd.word)

In [None]:
display(max(nd.note))
display(max(nd.syll))
display(max(nd.word))

and with these:

In [None]:
display(max(nd.note))
display(max(nd.syll))
max(nd.word)

In [None]:
display(max(nd.note))
max(nd.syll)
display(max(nd.word))

In [None]:
max(nd.note)
display(max(nd.syll))
display(max(nd.word))

Can you figure out what's going on here? What patterns do you notice?

# More stuff to play with

In [None]:
pd.set_option('display.max_rows', 500)
nd.loc[nd['syll'] >= 8]

In [None]:
nd.loc[nd['text'] == 'edificabuntur']

In [None]:
display(pd.crosstab(cd.Service, cd.subcorpus, margins=True))
display(pd.crosstab(cd.Service, cd.subcorpus, normalize='columns', margins=True))
display(pd.crosstab(cd.Service, cd.subcorpus, normalize='index', margins=True))

In [None]:
display(pd.crosstab(cd.service, cd.Service, margins=True))
display(pd.crosstab(cd.genre, cd.Genre, margins=True))
display(pd.crosstab(cd.Modus, cd.subcorpus))