# Summarizing data in OCRE

This notebook shows you how to load OCRE data from a CEX file over the internet, and summarize its contents.



### Configure Jupyter notebook

First configure the Jupyter notebook to find the `nomisma` library.  (You could do the same thing in other environments with `sbt` or `maven`.)

In [10]:
// 1. Add maven repository where we can find our libraries
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(myBT)

[36mmyBT[39m: [32mcoursierapi[39m.[32mMavenRepository[39m = MavenRepository(https://dl.bintray.com/neelsmith/maven)

In [11]:
// 2. Make libraries available with `$ivy` imports:
import $ivy.`edu.holycross.shot::nomisma:1.4.0`
import $ivy.`edu.holycross.shot::histoutils:2.2.0`
import $ivy.`org.plotly-scala::plotly-almond:0.7.1`

[32mimport [39m[36m$ivy.$                                  
[39m
[32mimport [39m[36m$ivy.$                                     
[39m
[32mimport [39m[36m$ivy.$                                      [39m

## Load the full OCRE data set

In [12]:
import edu.holycross.shot.nomisma._
val ocreCex = "https://raw.githubusercontent.com/neelsmith/nomisma/master/cex/ocre-valid.cex"
val ocre = OcreSource.fromUrl(ocreCex)

// Check results:
require(ocre.size > 50000) 

Nov 30, 2019 1:39:17 PM wvlet.log.Logger log
INFO: Reading 50023 lines of CEX data.
Nov 30, 2019 1:39:18 PM wvlet.log.Logger log
INFO: Created Ocre with 50023 issues.


[32mimport [39m[36medu.holycross.shot.nomisma._
[39m
[36mocreCex[39m: [32mString[39m = [32m"https://raw.githubusercontent.com/neelsmith/nomisma/master/cex/ocre-valid.cex"[39m
[36mocre[39m: [32mOcre[39m = [33mOcre[39m(
  [33mVector[39m(
    [33mOcreIssue[39m(
      [32m"3.com.43"[39m,
      [32m"RIC III Commodus 43"[39m,
      [32m"denarius"[39m,
      [32m"ar"[39m,
      [32m"commodus"[39m,
      [32m"rome"[39m,
      [32m"italy"[39m,
      [32m"Head of Commodus, laureate, right"[39m,
      [32m"M COMMODVS ANTONINVS AVG"[39m,
      [32m"rdf:resource=\"http://nomisma.org/id/commodus\""[39m,
      [32m"Roma, helmeted, draped, standing left, holding Victory in extended right hand and vertical spear in left hand"[39m,
      [32m"TR P VII IMP V COS III P P"[39m,
      [32m"rdf:resource=\"http://collection.britishmuseum.org/id/person-institution/60208\""[39m,
      [33mSome[39m([33mYearRange[39m([32m182[39m, [33mSome[39m([32m182[39m)))


## What's in it? Core information for every issue

We'll verify that every issue has information about the issuing authority, the material, and the denomination.  

In [13]:
val authorities = ocre.issues.filter(_.authority.nonEmpty)
require (authorities.size == ocre.issues.size)

val materials = ocre.issues.filter(_.material.nonEmpty)
require (materials.size == ocre.issues.size)

val denominations = ocre.issues.filter(_.denomination.nonEmpty)
require (denominations.size == ocre.issues.size)

[36mauthorities[39m: [32mVector[39m[[32mOcreIssue[39m] = [33mVector[39m(
  [33mOcreIssue[39m(
    [32m"3.com.43"[39m,
    [32m"RIC III Commodus 43"[39m,
    [32m"denarius"[39m,
    [32m"ar"[39m,
    [32m"commodus"[39m,
    [32m"rome"[39m,
    [32m"italy"[39m,
    [32m"Head of Commodus, laureate, right"[39m,
    [32m"M COMMODVS ANTONINVS AVG"[39m,
    [32m"rdf:resource=\"http://nomisma.org/id/commodus\""[39m,
    [32m"Roma, helmeted, draped, standing left, holding Victory in extended right hand and vertical spear in left hand"[39m,
    [32m"TR P VII IMP V COS III P P"[39m,
    [32m"rdf:resource=\"http://collection.britishmuseum.org/id/person-institution/60208\""[39m,
    [33mSome[39m([33mYearRange[39m([32m182[39m, [33mSome[39m([32m182[39m)))
  ),
  [33mOcreIssue[39m(
    [32m"9.thes.27B.iii"[39m,
    [32m"RIC IX Thessalonica 27B: Subtype iii"[39m,
    [32m"ae3"[39m,
    [32m"ae"[39m,
    [32m"valentinian_i"[39m,
    [32m"thessal

### Distinct values for core information

In [14]:
val authorityValues = authorities.map(_.authority).distinct
val materialValues = materials.map(_.material).distinct
val denominationValues = denominations.map(_.authority).distinct

println("OCRE corpus includes values for:")
println(authorityValues.size + " issuing authorities")
println(materialValues.size + " metals")
println(denominationValues.size + " denominations")

OCRE corpus includes values for:
131 issuing authorities
4 metals
131 denominations


[36mauthorityValues[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"commodus"[39m,
  [32m"valentinian_i"[39m,
  [32m"severus_alexander"[39m,
  [32m"constantine_i"[39m,
  [32m"valerian"[39m,
  [32m"trajan_decius"[39m,
  [32m"maxentius"[39m,
  [32m"constantius_chlorus"[39m,
  [32m"maximian"[39m,
  [32m"valentinian_ii"[39m,
  [32m"severus_ii"[39m,
  [32m"hadrian"[39m,
  [32m"septimius_severus"[39m,
  [32m"macrinus"[39m,
  [32m"diadumenian"[39m,
  [32m"elagabalus"[39m,
  [32m"geta"[39m,
  [32m"caracalla"[39m,
  [32m"uncertain_value"[39m,
  [32m"gratian"[39m,
  [32m"tacitus"[39m,
  [32m"aurelian"[39m,
  [32m"valens"[39m,
  [32m"clodius_albinus"[39m,
  [32m"pescennius_niger"[39m,
  [32m"didius_julianus"[39m,
  [32m"pertinax"[39m,
  [32m"antoninus_pius"[39m,
  [32m"gallienus"[39m,
  [32m"claudius"[39m,
  [32m"tiberius"[39m,
  [32m"marcus_aurelius"[39m,
  [32m"augustus"[39m,
  [32m"trajan"[39m,
  [32m"the

### How are they distributed?

In [15]:
import edu.holycross.shot.histoutils._


val authorityFreqs = ocre.issues.map(_.authority).groupBy(d => d).map { case (k,v) => Frequency(k, v.size)}
val authorityHistogram = edu.holycross.shot.histoutils.Histogram(authorityFreqs.toVector)



[32mimport [39m[36medu.holycross.shot.histoutils._


[39m
[36mauthorityFreqs[39m: [32mcollection[39m.[32mimmutable[39m.[32mIterable[39m[[32mFrequency[39m[[32mString[39m]] = [33mList[39m(
  [33mFrequency[39m([32m"quietus"[39m, [32m13[39m),
  [33mFrequency[39m([32m"zenobia"[39m, [32m2[39m),
  [33mFrequency[39m([32m"maximus_barcelona"[39m, [32m4[39m),
  [33mFrequency[39m([32m"trebonianus_gallus"[39m, [32m307[39m),
  [33mFrequency[39m([32m"carausius"[39m, [32m1146[39m),
  [33mFrequency[39m([32m"tetricus_i"[39m, [32m295[39m),
  [33mFrequency[39m([32m"postumus"[39m, [32m392[39m),
  [33mFrequency[39m([32m"macrianus_minor"[39m, [32m13[39m),
  [33mFrequency[39m([32m"anthemius"[39m, [32m107[39m),
  [33mFrequency[39m([32m"magnus_maximus"[39m, [32m110[39m),
  [33mFrequency[39m([32m"otho"[39m, [32m24[39m),
  [33mFrequency[39m([32m"pertinax"[39m, [32m71[39m),
  [33mFrequency[39m([32m"basiliscus"[39m, [32m5

Visual as a bar graph using the `plotly` library:

In [17]:
// Import plotly, and 
import plotly._, plotly.element._, plotly.layout._, plotly.Almond._
repl.pprinter() = repl.pprinter().copy(defaultHeight = 3)

[32mimport [39m[36mplotly._, plotly.element._, plotly.layout._, plotly.Almond._
[39m

Plot number of issues by each authority, sorted from largest to smallest number of issues:

In [18]:
val authNames = authorityHistogram.sorted.frequencies.map(_.item)
val authCounts = authorityHistogram.sorted.frequencies.map(_.count)
val authPlot = Seq(
  Bar(
   authNames, authCounts
  )
)
plot(authPlot)


[36mauthNames[39m: [32mVector[39m[[32mString[39m] = [33mVector[39m(
  [32m"constantine_i"[39m,
...
[36mauthCounts[39m: [32mVector[39m[[32mInt[39m] = [33mVector[39m(
  [32m4096[39m,
...
[36mauthPlot[39m: [32mSeq[39m[[32mBar[39m] = [33mList[39m(
  [33mBar[39m(
...
[36mres17_3[39m: [32mString[39m = [32m"plot-50623266-35d6-4e0f-9778-140cc3a384ba"[39m

## Notebook in progress.  

More to come!




Now view chronologically?

- use ocre functions to get date ranges for authorities
- sort histogram by minimum date for each authority
- plot

## Other information



In [None]:
ocre.size
ocre.datable.size


In [None]:
ocre.issues.filter(_.obvLegend.nonEmpty).size
ocre.issues.filter(_.revLegend.nonEmpty).size


In [None]:
ocre.issues.filter(_.obvType.nonEmpty).size
ocre.issues.filter(_.revType.nonEmpty).size

In [None]:
val noRType = ocre.issues.filter(_.revType.isEmpty)

In [None]:
val legends = ocre.corpus

In [None]:
val obv = legends.nodes.filter(_.urn.passageComponent.contains("obv"))
val rev = legends.nodes.filter(_.urn.passageComponent.contains("rev"))

In [None]:
legends.size
obv.size
rev.size

In [None]:
val mints = ocre.issues.filter(_.mint.nonEmpty)
mints.size