# Computing "vertical variation"

This notebook examples a fundamental concept in the study of Homeric oral poetry: *vertical variation*.  *Vertical* variation refers to the presence or absence of entire hexameters in different versions of the poem.  (*Horizontal* variation refers to differences within a single hexameter in different versions of the poem.)

We'll take advantage of some specialized code libraries to approach this problem.




## Importing  libraries

The following cell configures this Jupyter notebook to look for libraries (in addition to standard locations) in a personal repository on the widely used site `bintray.com`.

In [1]:
val personalRepo = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(personalRepo)


[36mpersonalRepo[39m: [32mcoursierapi[39m.[32mMavenRepository[39m = MavenRepository(https://dl.bintray.com/neelsmith/maven)

The next cell configures the notebook to look in our code repositories for version `4.3.0` of the first library we want to use.

In [2]:
import $ivy.`edu.holycross.shot.cite::xcite:4.3.0`

[32mimport [39m[36m$ivy.$                                     [39m

### `cite` for URNs

The following cell is a standard, generic Scala statement for making a library available to our program.

Reading the `import` statement below from right to left, we're importing *all* classes (that's the "fil-in-theblank" notation `_`, from the `cite` package belonging to the organization `edu.holycross.shot`.

In [3]:
import edu.holycross.shot.cite._

[32mimport [39m[36medu.holycross.shot.cite._[39m

Now we can use classes defined in the `cite` library.

In the following cell, identify the *type* (class) and *value* of each expression.

In [4]:
val iliadLine = CtsUrn("urn:cts:greekLit:tlg0012.tlg001.msA:1.1")
iliadLine.textGroup
iliadLine.work
iliadLine.passageComponent
val venetusA = iliadLine.dropPassage

[36miliadLine[39m: [32mCtsUrn[39m = [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:1.1"[39m)
[36mres3_1[39m: [32mString[39m = [32m"tlg0012"[39m
[36mres3_2[39m: [32mString[39m = [32m"tlg001"[39m
[36mres3_3[39m: [32mString[39m = [32m"1.1"[39m
[36mvenetusA[39m: [32mCtsUrn[39m = [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:"[39m)

Try tabbing from `iliadLine.` to see what methods are available.

You can also [look at the user's guide and API docs](https://cite-architecture.github.io/cite-api-docs/).

Identify the *type* (class) and *value* of each expression in the following cell.

In [5]:
val titleImage = Cite2Urn("urn:cite2:hmt:vaimg.2017a:VA012RN_0013@0.2060,0.2076,0.1672,0.02265")

titleImage.collection
titleImage.version
titleImage.objectComponent
val img = titleImage.dropExtensions


[36mtitleImage[39m: [32mCite2Urn[39m = [33mCite2Urn[39m(
  [32m"urn:cite2:hmt:vaimg.2017a:VA012RN_0013@0.2060,0.2076,0.1672,0.02265"[39m
)
[36mres4_1[39m: [32mString[39m = [32m"vaimg"[39m
[36mres4_2[39m: [32mString[39m = [32m"2017a"[39m
[36mres4_3[39m: [32mString[39m = [32m"VA012RN_0013@0.2060,0.2076,0.1672,0.02265"[39m
[36mimg[39m: [32mCite2Urn[39m = [33mCite2Urn[39m([32m"urn:cite2:hmt:vaimg.2017a:VA012RN_0013"[39m)

### `ohco2` for citable text corpora

First, configure the Jupyter notebook.  (If you want to see the syntax for sbt, look at the `build.sbt` file in your editing repository.)

In [6]:
import $ivy.`edu.holycross.shot::ohco2:10.19.0`

[32mimport [39m[36m$ivy.$                                  [39m

Now a generic scala import makes the library available.  Again, we'll import *all* the classes in the `ohco2` package with `_`.

In [7]:
import edu.holycross.shot.ohco2._

[32mimport [39m[36medu.holycross.shot.ohco2._[39m

We have a text of the Venetus A *Iliad* from the `2020i` release of the Homer Multitext project in our data directory here:

In [8]:
val venetusAUrl = "https://raw.githubusercontent.com/neelsmith/summer2020nbs/master/data/vaIliad-2020i.cex"


[36mvenetusAUrl[39m: [32mString[39m = [32m"https://raw.githubusercontent.com/neelsmith/summer2020nbs/master/data/vaIliad-2020i.cex"[39m

We can use an object in our new library to load the text into a citable corpus.  What type of object does the next cell create?

In [9]:
val venetusA = CorpusSource.fromUrl(venetusAUrl)


[36mvenetusA[39m: [32mCorpus[39m = [33mCorpus[39m(
  [33mVector[39m(
    [33mCitableNode[39m(
      [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:1.1"[39m),
      [32m"\u039c\u1fc6\u03bd\u03b9\u03bd \u1f04\u03b5\u03b9\u03b4\u03b5 \u03b8\u03b5\u1f70 \u03a0\u03b7\u03bb\u03b7\u03ca\u1f71\u03b4\u03b5\u03c9 \u1f08\u03c7\u03b9\u03bb\u1fc6\u03bf\u03c2"[39m
    ),
    [33mCitableNode[39m(
      [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:1.2"[39m),
      [32m"\u03bf\u1f50\u03bb\u03bf\u03bc\u1f73\u03bd\u03b7\u03bd\u00b7 \u1f21 \u03bc\u03c5\u03c1\u03af' \u1f08\u03c7\u03b1\u03b9\u03bf\u1fd6\u03c2 \u1f04\u03bb\u03b3\u03b5' \u1f14\u03b8\u03b7\u03ba\u03b5\u03bd\u00b7"[39m
    ),
    [33mCitableNode[39m(
      [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:1.3"[39m),
      [32m"\u03c0\u03bf\u03bb\u03bb\u1f70\u03c2 \u03b4' \u1f30\u03c6\u03b8\u03af\u03bc\u03bf\u03c5\u03c2 \u03c8\u03c5\u03c7\u1f70\u03c2 \u1f0c\u03ca\u03b4\u03b9 \u03c0\u03c1\u

The `Corpus` class implements the OHCO2 model.  It contains a Vector (so an ordered sequence) of text nodes that are citable by URN (with the hierarchy of work and of passage).  Let's see how many text nodes are in this corpus.

In [10]:
venetusA.nodes.size

[36mres9[39m: [32mInt[39m = [32m15640[39m

Let's look at what an individual citable node in this corpus looks like.

(We'll use `println` since Jupyter notebook's display of output is not always kind to Unicode.)

In [12]:
println(venetusA.nodes.head)

CitableNode(urn:cts:greekLit:tlg0012.tlg001.msA:1.1,Μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος)


The node has two components, or *members*, which we can refer to by name

In [15]:
venetusA.nodes.head.urn
venetusA.nodes.head.text
println(venetusA.nodes.head.text)

Μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος


[36mres14_0[39m: [32mCtsUrn[39m = [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:1.1"[39m)
[36mres14_1[39m: [32mString[39m = [32m"\u039c\u1fc6\u03bd\u03b9\u03bd \u1f04\u03b5\u03b9\u03b4\u03b5 \u03b8\u03b5\u1f70 \u03a0\u03b7\u03bb\u03b7\u03ca\u1f71\u03b4\u03b5\u03c9 \u1f08\u03c7\u03b9\u03bb\u1fc6\u03bf\u03c2"[39m

## Repeat that cycle

Let's look for vertical variation between the Venetus A and the Oxford Classical Text edition of T.W. Allen.


In [17]:
val allenUrl = "https://raw.githubusercontent.com/neelsmith/summer2020nbs/master/data/iliad-allen.cex"
val allen = CorpusSource.fromUrl(allenUrl)


[36mallenUrl[39m: [32mString[39m = [32m"https://raw.githubusercontent.com/neelsmith/summer2020nbs/master/data/iliad-allen.cex"[39m
[36mallen[39m: [32mCorpus[39m = [33mCorpus[39m(
  [33mVector[39m(
    [33mCitableNode[39m(
      [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.allen:1.1"[39m),
      [32m"\u039c\u1fc6\u03bd\u03b9\u03bd \u1f04\u03b5\u03b9\u03b4\u03b5 \u03b8\u03b5\u1f70 \u03a0\u03b7\u03bb\u03b7\u03ca\u03ac\u03b4\u03b5\u03c9 \u1f08\u03c7\u03b9\u03bb\u1fc6\u03bf\u03c2"[39m
    ),
    [33mCitableNode[39m(
      [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.allen:1.2"[39m),
      [32m"\u03bf\u1f50\u03bb\u03bf\u03bc\u03ad\u03bd\u03b7\u03bd, \u1f23 \u03bc\u03c5\u03c1\u03af\u1fbd \u1f08\u03c7\u03b1\u03b9\u03bf\u1fd6\u03c2 \u1f04\u03bb\u03b3\u03b5\u1fbd \u1f14\u03b8\u03b7\u03ba\u03b5,"[39m
    ),
    [33mCitableNode[39m(
      [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.allen:1.3"[39m),
      [32m"\u03c0\u03bf\u03bb\u03bb\u1f70

In [19]:
allen.nodes.size
venetusA.nodes.size

[36mres18_0[39m: [32mInt[39m = [32m15683[39m
[36mres18_1[39m: [32mInt[39m = [32m15640[39m

43 more lines in Allen??  Let's compute that vertical difference.

To start with, let's just get the list of URNs for each *Iliad*.

In [20]:
val vaUrns = venetusA.nodes.map(n => n.urn)
val allenUrns = allen.nodes.map(n => n.urn)

[36mvaUrns[39m: [32mVector[39m[[32mCtsUrn[39m] = [33mVector[39m(
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:1.1"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:1.2"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:1.3"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:1.4"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:1.5"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:1.6"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:1.7"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:1.8"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:1.9"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:1.10"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:1.11"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.msA:1.12"[39m),
  [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg001

In [21]:
allenUrns.size - vaUrns.size

[36mres20[39m: [32mInt[39m = [32m43[39m

## Aligning the differences

We can bring in another library (that we saw briefly in looking at the versions of the Gettysburg address).  First, configure the notebook.

In [22]:
import $ivy.`edu.holycross.shot::seqcomp:2.2.1`



[32mimport [39m[36m$ivy.$                                  

[39m

Then import the library.

In [23]:
import edu.holycross.shot.seqcomp._


[32mimport [39m[36medu.holycross.shot.seqcomp._[39m

In [24]:
val comparison = SequenceComp(allenUrns, vaUrns)

[36mcomparison[39m: [32mSequenceComp[39m[[32mCtsUrn[39m] = [33mSequenceComp[39m(
  [33mVector[39m(
    [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.allen:1.1"[39m),
    [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.allen:1.2"[39m),
    [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.allen:1.3"[39m),
    [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.allen:1.4"[39m),
    [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.allen:1.5"[39m),
    [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.allen:1.6"[39m),
    [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.allen:1.7"[39m),
    [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.allen:1.8"[39m),
    [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.allen:1.9"[39m),
    [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.allen:1.10"[39m),
    [33mCtsUrn[39m([32m"urn:cts:greekLit:tlg0012.tlg001.allen:1.11"[39m),
    [33mCtsUrn[39m([32m"urn:cts:greekLi

In [None]:
comparison.align