# Verifying data in OCRE

This notebook shows you how to load OCRE data from a CEX file over the internet, and verify its contents.  It uses version `1.4.3` of the `nomisma` library.


## Configure Jupyter notebook

First configure the Jupyter notebook to find the `nomisma` library.  (You could do the same thing in other environments with `sbt` or `maven`.)

In [None]:
// 1. Add maven repository where we can find our libraries
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(myBT)

In [None]:
// 2. Make libraries available with `$ivy` imports:
import $ivy.`edu.holycross.shot::nomisma:1.4.3`
import $ivy.`edu.holycross.shot::histoutils:2.2.0`
import $ivy.`org.plotly-scala::plotly-almond:0.7.1`

## Load the full OCRE data set

In [None]:
import edu.holycross.shot.nomisma._
val ocreCex = "https://raw.githubusercontent.com/neelsmith/nomisma/master/cex/ocre-cite-ids.cex"
val ocre = OcreSource.fromUrl(ocreCex)

// Check results:
require(ocre.size > 50000) 

## Contents of an `Ocre` object

The object `ocre` created in the preceding cell belongs to the `Ocre` class.  `Ocre` objects have a Vector of `OcreIssue`s, each of which in turn has the following properties:


    id: String,
    labelText:  String,
    denomination: String,
    material: String,
    authority: String,
    mint: String,
    region: String,
    obvType: String,
    obvLegend: String,
    obvPortraitId: String,
    revType: String,
    revLegend: String,
    revPortraitId,
    dateRange: Option[YearRange]
    
    
    
The first seven properties should have values for each issue.  As a first step in validating the contents of `ocre`, we'll verify that each of those String properties is non-empty.

In [None]:
println("Number of issues in OCRE: " + ocre.size)
require (ocre.issues.filter(_.id.nonEmpty).size == ocre.issues.size)
require (ocre.issues.filter(_.labelText.nonEmpty).size == ocre.issues.size)
require (ocre.issues.filter(_.denomination.nonEmpty).size == ocre.issues.size)
require (ocre.issues.filter(_.material.nonEmpty).size == ocre.issues.size)
require (ocre.issues.filter(_.authority.nonEmpty).size == ocre.issues.size)
require (ocre.issues.filter(_.mint.nonEmpty).size == ocre.issues.size)
require (ocre.issues.filter(_.region.nonEmpty).size == ocre.issues.size)

val requiredProperties = List("id", "labelText", "denomination", "material", "authority", "mint", "region")
println("All issues have data value for:\n" + requiredProperties.mkString("\n"))

In [None]:
val obvTypes = ocre.issues.filter(_.obvType.nonEmpty)
val obvLegends = ocre.issues.filter(_.obvLegend.nonEmpty)
val obvPortraitIds = ocre.issues.filter(_.obvPortraitId.nonEmpty)

println("Issues with obverse types: " + obvTypes.size)
println("Issues with obverse legends: " + obvLegends.size)
println("Issues with obverse portrait IDs: " + obvPortraitIds.size)

In [None]:
val revTypes = ocre.issues.filter(_.revType.nonEmpty)
val revLegends = ocre.issues.filter(_.revLegend.nonEmpty)
val revPortraitIds = ocre.issues.filter(_.revPortraitId.nonEmpty)

println("Issues with reverse types: " + revTypes.size)
println("Issues with reverse legends: " + revLegends.size)
println("Issues with reverse portrait IDs: " + revPortraitIds.size)

## Contents of required properties

In [None]:
val mintValues = mints.map(_.mint).distinct
println(mintValues.mkString("\n"))

In [None]:
val mints = ocre.issues.filter(issue => (issue.mint.trim != "none") && ! issue.mint.trim.contains("uncertain"))
println("Issues with mints: " + mints.size)

In [None]:
val regions = ocre.issues.filterNot(_.region.contains("uncertain")).filterNot(_.region.contains("none"))
val regionValues = regions.map(_.region).distinct
println(regionValues.size)
println(regionValues.mkString("\n"))

### Distinct values for core information

In [None]:
val authorityValues = ocre.issues.map(_.authority).distinct
val materialValues =  ocre.issues.map(_.material).distinct
val denominationValues =  ocre.issues.map(_.authority).distinct


In [None]:
println(authorityValues.size + " issuing authorities")
println(authorityValues.mkString("\n"))

In [None]:
println(materialValues.size + " metals")
println(materialValues.mkString("\n"))

In [None]:
println(denominationValues.size + " denominations")
println(denominationValues.mkString("\n"))