# Reading `nomisma.org` data for CRRO

This notebook shows you how to load data for the *Coinage of the Roman Republic Online* (CRRO) from `nomisma.org`'s RDF format and from delimited text files built from the RDF source.  It uses version `2.0.1` of the `nomisma` library.

## Configure Jupyter notebook

Configure the notebook to find the `nomisma` library.  (You could do the same thing in other environments with `sbt` or `maven`.)

In [1]:
// 1. Add maven repository where we can find our libraries
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(myBT)

[36mmyBT[39m: [32mcoursierapi[39m.[32mMavenRepository[39m = MavenRepository(https://dl.bintray.com/neelsmith/maven)

In [2]:
// 2. Make libraries available with `$ivy` imports:
import $ivy.`edu.holycross.shot::nomisma:2.0.1`
import $ivy.`edu.holycross.shot::histoutils:2.2.0`
import $ivy.`org.plotly-scala::plotly-almond:0.7.1`

[32mimport [39m[36m$ivy.$                                  
[39m
[32mimport [39m[36m$ivy.$                                     
[39m
[32mimport [39m[36m$ivy.$                                      [39m


## Load the CRRO data set

Load the CRRO data set from a `.cex` file in github repository.

In [3]:
import edu.holycross.shot.nomisma._
import edu.holycross.shot.histoutils._

import plotly._, plotly.element._, plotly.layout._, plotly.Almond._
// Set display defaults suggested for use in Jupyter NBs:
repl.pprinter() = repl.pprinter().copy(defaultHeight = 3)

[32mimport [39m[36medu.holycross.shot.nomisma._
[39m
[32mimport [39m[36medu.holycross.shot.histoutils._

[39m
[32mimport [39m[36mplotly._, plotly.element._, plotly.layout._, plotly.Almond._
// Set display defaults suggested for use in Jupyter NBs:
[39m

In [4]:
val crroCex = "https://raw.githubusercontent.com/neelsmith/nomisma/master/cex/crro-2019-12-03.cex"
val crro = CrroSource.fromUrl(crroCex)

// Check results:
require(crro.size > 2000)   

Dec 11, 2019 6:58:46 AM wvlet.log.Logger log
INFO: Reading 2281 lines of CEX data.
Dec 11, 2019 6:58:46 AM wvlet.log.Logger log
INFO: Created Crro with 2281 issues.


[36mcrroCex[39m: [32mString[39m = [32m"https://raw.githubusercontent.com/neelsmith/nomisma/master/cex/crro-2019-12-03.cex"[39m
[36mcrro[39m: [32mCrro[39m = [33mCrro[39m(
  [33mVector[39m(
...

In [5]:
val obvLegends = crro.hasObvLegend.issues.map(_.obvLegend)
val revLegends = crro.hasRevLegend.issues.map(_.revLegend)
val allLegends = obvLegends ++ revLegends

val legendIssues = crro.hasObvLegend.issues ++ crro.hasRevLegend.issues
require (allLegends.size == legendIssues.size)

[36mobvLegends[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"M\u00b7ANTON\u00b7IMP\u00b7III\u00b7VIR\u00b7R\u00b7P\u00b7C\u00b7AVG"[39m,
...
[36mrevLegends[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"CAESAR\u00b7IMP\u00b7PONT\u00b7III\u00b7VIR\u00b7R\u00b7P\u00b7C"[39m,
...
[36mallLegends[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"M\u00b7ANTON\u00b7IMP\u00b7III\u00b7VIR\u00b7R\u00b7P\u00b7C\u00b7AVG"[39m,
...
[36mlegendIssues[39m: [32mVector[39m[[32mCrroIssue[39m] = [33mVector[39m(
  [33mCrroIssue[39m(
...

In [6]:
println("Obv legends: " + obvLegends.size)
println("Rev legends: " + revLegends.size)
println("All: " + allLegends.size)

Obv legends: 1144
Rev legends: 2132
All: 3276


In [7]:
val allChars = allLegends.map(_.toVector).flatten
println("All chars: " + allChars.size)

All chars: 36208


[36mallChars[39m: [32mVector[39m[[32mChar[39m] = [33mVector[39m(
  [32m'M'[39m,
...

In [8]:
val charFreqs = allChars.groupBy(c => c).map{ case (c,v)=> Frequency(c.toString,v.size) }
val charHist = edu.holycross.shot.histoutils.Histogram(charFreqs.toVector)

[36mcharFreqs[39m: [32mcollection[39m.[32mimmutable[39m.[32mIterable[39m[[32mFrequency[39m[[32mString[39m]] = [33mList[39m(
  [33mFrequency[39m([32m"E"[39m, [32m1161[39m),
...
[36mcharHist[39m: [32medu[39m.[32mholycross[39m.[32mshot[39m.[32mhistoutils[39m.[32mHistogram[39m[[32mString[39m] = [33mHistogram[39m(
  [33mVector[39m(
...

In [9]:
val charCounts = charHist.sorted.frequencies.map(_.count)
val charValues = charHist.sorted.frequencies.map(_.item)

[36mcharCounts[39m: [32mVector[39m[[32mInt[39m] = [33mVector[39m(
  [32m3725[39m,
...
[36mcharValues[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"\u00b7"[39m,
...

In [10]:
val charHistPlot = Seq(
  Bar(x = charValues, y = charCounts)
)
plot(charHistPlot)

[36mcharHistPlot[39m: [32mSeq[39m[[32mBar[39m] = [33mList[39m(
  [33mBar[39m(
...
[36mres9_1[39m: [32mString[39m = [32m"plot-b69c4709-f9bd-42ab-bc29-0e0b0dc3005a"[39m

In [11]:
val threshhold = 300
val rareChars = charHist.sorted.frequencies.filter(_.count < threshhold)
val lessRareChars = charHist.sorted.frequencies.filter(_.count >= threshhold)
println("Rare chars: \n" + rareChars.mkString("\n"))
println("\nCommon chars:\n" + lessRareChars.mkString("\n"))

Rare chars: 
Frequency(o,238)
Frequency(i,225)
Frequency(Q,225)
Frequency(t,224)
Frequency(s,216)
Frequency(e,212)
Frequency(?,210)
Frequency(n,201)
Frequency(r,191)
Frequency(a,177)
Frequency([,168)
Frequency(],167)
Frequency(d,146)
Frequency(c,107)
Frequency(H,98)
Frequency(.,84)
Frequency(),80)
Frequency((,80)
Frequency(/,75)
Frequency(b,45)
Frequency(l,44)
Frequency(h,34)
Frequency(g,31)
Frequency(w,27)
Frequency(p,23)
Frequency(v,21)
Frequency(f,19)
Frequency(",19)
Frequency(u,18)
Frequency(Y,16)
Frequency(K,16)
Frequency(;,14)
Frequency(m,11)
Frequency(k,10)
Frequency(:,8)
Frequency(|,7)
Frequency(Φ,3)
Frequency(',3)
Frequency(x,2)
Frequency(Ω,2)
Frequency(Α,1)
Frequency(Μ,1)
Frequency(Ν,1)
Frequency(Ι,1)
Frequency(-,1)
Frequency(Σ,1)
Frequency(Ρ,1)
Frequency(3,1)

Common chars:
Frequency(·,3725)
Frequency(I,3168)
Frequency(A,3119)
Frequency(R,2672)
Frequency( ,2290)
Frequency(M,2263)
Frequency(O,2104)
Frequency(S,1908)
Frequency(V,1868)
Frequency(C,1689)
Frequency(L,1442)
Freque

[36mthreshhold[39m: [32mInt[39m = [32m300[39m
[36mrareChars[39m: [32mVector[39m[[32mFrequency[39m[[32mString[39m]] = [33mVector[39m(
  [33mFrequency[39m([32m"o"[39m, [32m238[39m),
...
[36mlessRareChars[39m: [32mVector[39m[[32mFrequency[39m[[32mString[39m]] = [33mVector[39m(
  [33mFrequency[39m([32m"\u00b7"[39m, [32m3725[39m),
...

In [12]:
// Define some useful functions.


// True if String s composed only of allowable characters
def validOrtho(s: String) : Boolean = {
    val alphaChars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ ·"
    val charChecks = for (c <- s.toVector) yield {
        alphaChars.contains(c)
    }
    val flatVals = charChecks.distinct
    (flatVals.size == 1) && (flatVals(0)== true)
}


// Map a Vector of OcreIssues to strings displaying ID and text of legends
def formatLegends(v: Vector[CrroIssue]) : Vector[String] = {
    v.map( issue => issue.id + " obv: " + issue.obvLegend + " rev: " + issue.revLegend)    
}


// Limit Vector of OcreIssues to those containing a specified string on either legend
def filterLegends(v : Vector[CrroIssue], subString : String) : Vector[CrroIssue] = {
    v.filter(issue => issue.obvLegend.contains(subString) || (issue.revLegend.contains(subString)))
}

defined [32mfunction[39m [36mvalidOrtho[39m
defined [32mfunction[39m [36mformatLegends[39m
defined [32mfunction[39m [36mfilterLegends[39m

In [13]:
def pct(i1: Int, i2: Int): Float = {
    i1 * 100.0f / i2
}

defined [32mfunction[39m [36mpct[39m

In [17]:
val sheep = allLegends.filter(validOrtho(_))
val goats = allLegends.filterNot(validOrtho(_))
val total =  (sheep.size + goats.size)
val pct1 = pct(sheep.size, total)
val pct2 = pct(goats.size, total)

println("Sheep: " + sheep.size + " (" + pct1 + "%)")
println("Goats: " + goats.size  + " (" + pct2 + "%)")
println("Total:  " + total)
require(allLegends.size == total)



Sheep: 2720 (83.02808%)
Goats: 556 (16.971916%)
Total:  3276


[36msheep[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"M\u00b7ANTON\u00b7IMP\u00b7III\u00b7VIR\u00b7R\u00b7P\u00b7C\u00b7AVG"[39m,
...
[36mgoats[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"Inscription Obverse: signature P ANTON\u00b7AVG\u00b7IMP\u00b7III\u00b7COS\u0[39m...
[36mtotal[39m: [32mInt[39m = [32m3276[39m
[36mpct1[39m: [32mFloat[39m = [32m83.02808F[39m
[36mpct2[39m: [32mFloat[39m = [32m16.971916F[39m

In [None]:
val os = filterLegends(legendIssues, "o")
println("English 'o's: " + os.size)

val bigQs = filterLegends(legendIssues, "Q")
println("Latin 'Q's: " + bigQs.size)
//println(formatLegends(bigQs.take(10)).mkString("\n\n"))

val bigHs = filterLegends(legendIssues, "H")
println("Latin 'H's: " + bigHs.size)
//println(formatLegends(bigHs.take(10)).mkString("\n\n"))


val bigKs = filterLegends(legendIssues, "K")
println("Latin 'K's: " + bigKs.size)
println(formatLegends(bigKs.take(10)).mkString("\n\n"))
