# Summarizing data in OCRE

This notebook shows you how to load OCRE data from a CEX file over the internet, and summarize its contents.



## Configure Jupyter notebook

First configure the Jupyter notebook to find the `nomisma` library.  (You could do the same thing in other environments with `sbt` or `maven`.)

In [None]:
// 1. Add maven repository where we can find our libraries
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(myBT)

In [None]:
// 2. Make libraries available with `$ivy` imports:
import $ivy.`edu.holycross.shot::nomisma:1.4.0`
import $ivy.`edu.holycross.shot::histoutils:2.2.0`
import $ivy.`org.plotly-scala::plotly-almond:0.7.1`

## Load the full OCRE data set

In [None]:
import edu.holycross.shot.nomisma._
val ocreCex = "https://raw.githubusercontent.com/neelsmith/nomisma/master/cex/ocre-valid.cex"
val ocre = OcreSource.fromUrl(ocreCex)

// Check results:
require(ocre.size > 50000) 

## What's in it? Core information for every issue

We'll verify that every issue has information about the issuing authority, the material, and the denomination.  

In [None]:
val authorities = ocre.issues.filter(_.authority.nonEmpty)
require (authorities.size == ocre.issues.size)

val materials = ocre.issues.filter(_.material.nonEmpty)
require (materials.size == ocre.issues.size)

val denominations = ocre.issues.filter(_.denomination.nonEmpty)
require (denominations.size == ocre.issues.size)

### Distinct values for core information

In [None]:
val authorityValues = authorities.map(_.authority).distinct
val materialValues = materials.map(_.material).distinct
val denominationValues = denominations.map(_.authority).distinct

println("OCRE corpus includes values for:")
println(authorityValues.size + " issuing authorities")
println(materialValues.size + " metals")
println(denominationValues.size + " denominations")

### How are they distributed?

In [None]:
import edu.holycross.shot.histoutils._


val authorityFreqs = ocre.issues.map(_.authority).groupBy(d => d).map { case (k,v) => Frequency(k, v.size)}
val authorityHistogram = edu.holycross.shot.histoutils.Histogram(authorityFreqs.toVector)



Visual as a bar graph using the `plotly` library:

In [None]:
// Import plotly, and 
import plotly._, plotly.element._, plotly.layout._, plotly.Almond._
repl.pprinter() = repl.pprinter().copy(defaultHeight = 3)

Plot number of issues by each authority, sorted from largest to smallest number of issues:

In [None]:
val authNames = authorityHistogram.sorted.frequencies.map(_.item)
val authCounts = authorityHistogram.sorted.frequencies.map(_.count)
val authPlot = Seq(
  Bar(
   authNames, authCounts
  )
)
plot(authPlot)


## Notebook in progress.  

More to come!




Now view chronologically?

- use ocre functions to get date ranges for authorities
- sort histogram by minimum date for each authority
- plot

## Other information



In [None]:
ocre.size
ocre.datable.size


In [None]:
ocre.issues.filter(_.obvLegend.nonEmpty).size
ocre.issues.filter(_.revLegend.nonEmpty).size


In [None]:
ocre.issues.filter(_.obvType.nonEmpty).size
ocre.issues.filter(_.revType.nonEmpty).size

In [None]:
val noRType = ocre.issues.filter(_.revType.isEmpty)

In [None]:
val legends = ocre.corpus

In [None]:
val obv = legends.nodes.filter(_.urn.passageComponent.contains("obv"))
val rev = legends.nodes.filter(_.urn.passageComponent.contains("rev"))

In [None]:
legends.size
obv.size
rev.size

In [None]:
val mints = ocre.issues.filter(_.mint.nonEmpty)
mints.size