# Find OCRE coins by morphological search

## Contents of this notebook

This notebook shows how to build a parsed Latin corpus (a `latincorpus` object) for OCRE texts, and use that to go from a single surface form to a morphologically sensitive full-corpus search.



## Configure Jupyter notebook to find libraries

In [None]:
// 1. Add maven repository where we can find our libraries
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(myBT)

In [None]:
// 2. Make libraries available with `ivy` imports:
import $ivy.`edu.holycross.shot::ohco2:10.18.1`
import $ivy.`edu.holycross.shot::midvalidator:9.2.0`
import $ivy.`edu.holycross.shot::latphone:2.7.2`
import $ivy.`edu.holycross.shot::latincorpus:2.2.1`

## Build a `LatinCorpus`

The `LatinCorpus` class lets you work with a morphologically parsed corpus of Latin texts.  We build a citable `Corpus`, and associate morpholoigcal data with it.

In [None]:
//Import all libraries:
import edu.holycross.shot.ohco2._
import scala.io.Source
import edu.holycross.shot.mid.validator._
import edu.holycross.shot.latin._
import edu.holycross.shot.latincorpus._

Download data and build a citable `Corpus':

In [None]:
val url = "https://raw.githubusercontent.com/neelsmith/hctexts/master/cex/ocre43k.cex"
val corpus = CorpusSource.fromUrl(url, cexHeader = true)

Download previously compiled morphological analyses:

In [None]:
val fstUrl = "https://raw.githubusercontent.com/neelsmith/hctexts/master/workfiles/ocre/ocre-fst.txt"
val fstLines = Source.fromURL(fstUrl).getLines.toVector

Construct a `LatinCorpus`:


In [None]:
val ocrelat = LatinCorpus.fromFstLines(corpus, Latin24Alphabet, fstLines, strict = false)


## Find Ocre issues by morphological search

Define useful functions to lookup occurrecnes of a given lexeme, and to find OCRE IDs for coins including any form of a given token in its legends:

In [None]:
// Find coin IDs where a specified lexeme occurs.
def findLexemeOccurrences(lexemeId: String) : Vector[String] = {
    val occurrences =  ocrelat.lexemeConcordance(lexemeId)
    // Convert text references to coin IDs:
    occurrences.map(_.collapsePassageBy(1).passageComponent)
}


// Find coin IDs where any form of a given token appears
def findOccurrences(tkn : String, latCorpus: LatinCorpus) : Vector[String] = {
    val lexemeIds = latCorpus.tokenLexemeIndex("libertas")
    if (lexemeIds.size == 1){
        val lexemeId = lexemeIds(0)
        findLexemeOccurrences(lexemeId)

    } else {
      println("Found " + lexemeIds.size + " lexemes for " + tkn)
      Vector.empty[String]
    }
}



## `Libertas` in OCRE

Example:  find all coins with legends including a form of `libertas`.

In [None]:
val token = "libertas"
val libertasCoins = findOccurrences(token, ocrelat)
println("Found " + libertasCoins.size + " coins with legends including a form of 'libertas'")
