# Verifying data in CRRO

This notebook shows you how to load CRRO data from a CEX file over the internet, and verify its contents.  It uses version `1.8.0` of the `nomisma` library.


## Configure Jupyter notebook

First configure the Jupyter notebook to find the `nomisma` library.  (You could do the same thing in other environments with `sbt` or `maven`.)

In [1]:
// 1. Add maven repository where we can find our libraries
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(myBT)

[36mmyBT[39m: [32mcoursierapi[39m.[32mMavenRepository[39m = MavenRepository(https://dl.bintray.com/neelsmith/maven)

In [2]:
// 2. Make libraries available with `$ivy` imports:
import $ivy.`edu.holycross.shot::nomisma:1.8.0`

Downloading https://repo1.maven.org/maven2/edu/holycross/shot/nomisma_2.12/1.8.0/nomisma_2.12-1.8.0.pom
Downloaded https://repo1.maven.org/maven2/edu/holycross/shot/nomisma_2.12/1.8.0/nomisma_2.12-1.8.0.pom
Downloading https://repo1.maven.org/maven2/edu/holycross/shot/nomisma_2.12/1.8.0/nomisma_2.12-1.8.0.pom.sha1
Downloaded https://repo1.maven.org/maven2/edu/holycross/shot/nomisma_2.12/1.8.0/nomisma_2.12-1.8.0.pom.sha1
Downloading https://dl.bintray.com/neelsmith/maven/edu/holycross/shot/nomisma_2.12/1.8.0/nomisma_2.12-1.8.0.pom
Downloaded https://dl.bintray.com/neelsmith/maven/edu/holycross/shot/nomisma_2.12/1.8.0/nomisma_2.12-1.8.0.pom
Downloading https://repo1.maven.org/maven2/edu/holycross/shot/cite/xcite_2.12/4.1.1/xcite_2.12-4.1.1.pom
Downloading https://repo1.maven.org/maven2/org/scala-lang/modules/scala-xml_2.12/1.0.6/scala-xml_2.12-1.0.6.pom
Downloading https://repo1.maven.org/maven2/org/wvlet/airframe/airframe-log_2.12/19.8.10/airframe-log_2.12-19.8.10.pom
Downloading https:

[32mimport [39m[36m$ivy.$                                  [39m

## Load the CRRO data set

In [3]:
import edu.holycross.shot.nomisma._
val crroCex = "https://raw.githubusercontent.com/neelsmith/nomisma/master/cex/crro-2019-12-03.cex"
val crro = CrroSource.fromUrl(crroCex)

// Sanity check:
require(crro.size > 2000) 

Dec 03, 2019 9:11:01 AM wvlet.log.Logger log
INFO: Reading 2281 lines of CEX data.
Dec 03, 2019 9:11:01 AM wvlet.log.Logger log
INFO: Created Crro with 2281 issues.


[32mimport [39m[36medu.holycross.shot.nomisma._
[39m
[36mcrroCex[39m: [32mString[39m = [32m"https://raw.githubusercontent.com/neelsmith/nomisma/master/cex/crro-2019-12-03.cex"[39m
[36mcrro[39m: [32mCrro[39m = [33mCrro[39m(
  [33mVector[39m(
    [33mCrroIssue[39m(
      [32m"http://numismatics.org/crro/id/rrc-528.3"[39m,
      [32m"RRC 528/3"[39m,
      [32m"denarius"[39m,
      [32m"ar"[39m,
      [32m"none"[39m,
      [32m"uncertain_value"[39m,
      [32m"none"[39m,
      [32m"Head of M. Antonius right; around, inscription. Border of dots."[39m,
      [32m"M\u00b7ANTON\u00b7IMP\u00b7III\u00b7VIR\u00b7R\u00b7P\u00b7C\u00b7AVG"[39m,
      [32m""[39m,
      [32m"Head of Octavian right; around, inscription. Border of dots."[39m,
      [32m"CAESAR\u00b7IMP\u00b7PONT\u00b7III\u00b7VIR\u00b7R\u00b7P\u00b7C"[39m,
      [32m""[39m,
      [33mSome[39m([33mYearRange[39m([32m-39[39m, [33mSome[39m([32m-39[39m)))
    ),
    [33mCrroIssue[39m

## Contents of a `Crro` object

The object `crro` created in the preceding cell belongs to the `Crro` class.  The `Crro` class and the parallel `Ocre` class for *Online Coins of the Roman Empire* extend a trait defining a common structure for catalogs of issues.  These catalogs have have a Vector of `Issue`s, each of which in turn has the following properties:


    id: String,
    labelText:  String,
    denomination: String,
    material: String,
    authority: String,
    mint: String,
    region: String,
    obvType: String,
    obvLegend: String,
    obvPortraitId: String,
    revType: String,
    revLegend: String,
    revPortraitId,
    dateRange: Option[YearRange]
    
In this notebook, we'll check for each property that all the values in the 2,000+ records of Roman Repubican coin issues look reasonable.


## Check for presence of required properties

    
The first seven properties should have values for each issue.  As a first step in validating the contents of `ocre`, we'll verify that each of those String properties is non-empty.


In [4]:
println("Number of issues in CRRO: " + crro.size)
require (crro.issues.filter(_.id.nonEmpty).size == crro.issues.size)
require (crro.issues.filter(_.labelText.nonEmpty).size == crro.issues.size)
require (crro.issues.filter(_.denomination.nonEmpty).size == crro.issues.size)
require (crro.issues.filter(_.material.nonEmpty).size == crro.issues.size)
require (crro.issues.filter(_.authority.nonEmpty).size == crro.issues.size)
require (crro.issues.filter(_.mint.nonEmpty).size == crro.issues.size)
require (crro.issues.filter(_.region.nonEmpty).size == crro.issues.size)

val requiredProperties = List("id", "labelText", "denomination", "material", "authority", "mint", "region")
println("All issues have a non-empty data value for:\n" + requiredProperties.mkString("\n"))

Number of issues in CRRO: 2281
All issues have a non-empty data value for:
id
labelText
denomination
material
authority
mint
region


[36mrequiredProperties[39m: [32mList[39m[[32mString[39m] = [33mList[39m(
  [32m"id"[39m,
  [32m"labelText"[39m,
  [32m"denomination"[39m,
  [32m"material"[39m,
  [32m"authority"[39m,
  [32m"mint"[39m,
  [32m"region"[39m
)

## Check values of required properties

Now we want to see if those non-empty values look reasonable.  The first constraint to check is that all values for the `id` and `labelText` properties must be unique.

In [5]:
require(crro.issues.map(_.id).distinct.size == crro.size)
require(crro.issues.map(_.labelText).distinct.size == crro.size)
println("All id and labelText values are unique.")

All id and labelText values are unique.


Classes extending the `IssueCollection` trait include functions to list all values for a given property.  The name of the functions has the form `[PROPERTYNAME]List`.  Let's look at the `material` property for an example.

In [6]:
println(crro.materialList.mkString("\n"))

ae
ar
av
none


You can see that in addition to abbreviations for bronze (`ae`), silver (`ar`) and gold (`av`), there is a fourth category, `none`.  `IssueCollection` classes includes functions for each property that create a new `Ocre` containing meaningful values for that property.  The name of these functions has the form `has[PROPERTYNAME]`. The `hasMint` function, for example, creates a new `Ocre` containing only issues that have a value other than `none` for the mint property.

In [7]:
println("All issues in Ocre: " + crro.size)
println("Issues with mint not equal to 'none': " + crro.hasMint.size)

All issues in Ocre: 2281
Issues with mint not equal to 'none': 2021


## Optional properties

CRRO's RDF records optionally include information for each side (obverse and reverse) about type description, legend and identifiers for portraits.  Unlike the required properties, these properties appear in the delimited-text records simply as empty strings that the `Ocre` object ignores, so you won't find entries like `none` or `uncertain_value` in the list of values for these properties.  


### Portrait identifiers

Let's start with the values for identifiers for obverse portraits.  You'll see that while the identifiers include a mix of plain strings and URLs in the `britishmuseum.org` domain, there are no values like `none`.

In [8]:
// Not all issues have an obverse portrait ID:
println("Number of issues in OCRE: " + crro.size)
println("Issues with obv. portrait ID: " + crro.hasObvPortraitId.size  + "\n")

// and there are no "no data" values in obvPortraitId:
println("Distinct values for obvPortraitId:")
println(crro.obvPortraitIdList.mkString("\n"))

// reverse portrait identifiers work the same way.

Number of issues in OCRE: 2281
Issues with obv. portrait ID: 1826

Distinct values for obvPortraitId:
Janus
http://collection.britishmuseum.org/id/person-institution/175498
http://collection.britishmuseum.org/id/person-institution/56979
http://collection.britishmuseum.org/id/person-institution/56988
http://collection.britishmuseum.org/id/person-institution/57039
http://collection.britishmuseum.org/id/person-institution/57060
http://collection.britishmuseum.org/id/person-institution/57291
http://collection.britishmuseum.org/id/person-institution/57638
http://collection.britishmuseum.org/id/person-institution/57655
http://collection.britishmuseum.org/id/person-institution/57657
http://collection.britishmuseum.org/id/person-institution/57930
http://collection.britishmuseum.org/id/person-institution/57951
http://collection.britishmuseum.org/id/person-institution/58247
http://collection.britishmuseum.org/id/person-institution/58260
http://collection.britishmuseum.org/id/person-institution/5

### Type descriptions 

The optional description of obverse and reverse types is a free-text description, so unlike the properties we've looked at above, there is no `[obv|rev]TypeList` function to get a list of controlled vocabulary.  `Crro` does have functions named `hasObvType` and `hasRevType` to create a new `Crro` including only those issues with an obverse or reverse type description, respctively.

As the following cell shows, we can of course string those functions together to create an `Crro` containing only issues including *both* an obverse and reverse type description.

In [9]:
val oTypes = crro.hasObvType
val rTypes = crro.hasRevType
println("Total number of issues in CRRO: " + crro.size)
println("Issues with obv. type description: " + oTypes.size)
println("Issues with rev. type description: " + rTypes.size)
val bothTypes = oTypes.hasRevType // == rTypes.hasObvType
println("Issues with both obv. and rev. type description: " + bothTypes.size)

Total number of issues in CRRO: 2281
Issues with obv. type description: 2260
Issues with rev. type description: 2263
Issues with both obv. and rev. type description: 2260


[36moTypes[39m: [32mCrro[39m = [33mCrro[39m(
  [33mVector[39m(
    [33mCrroIssue[39m(
      [32m"http://numismatics.org/crro/id/rrc-528.3"[39m,
      [32m"RRC 528/3"[39m,
      [32m"denarius"[39m,
      [32m"ar"[39m,
      [32m"none"[39m,
      [32m"uncertain_value"[39m,
      [32m"none"[39m,
      [32m"Head of M. Antonius right; around, inscription. Border of dots."[39m,
      [32m"M\u00b7ANTON\u00b7IMP\u00b7III\u00b7VIR\u00b7R\u00b7P\u00b7C\u00b7AVG"[39m,
      [32m""[39m,
      [32m"Head of Octavian right; around, inscription. Border of dots."[39m,
      [32m"CAESAR\u00b7IMP\u00b7PONT\u00b7III\u00b7VIR\u00b7R\u00b7P\u00b7C"[39m,
      [32m""[39m,
      [33mSome[39m([33mYearRange[39m([32m-39[39m, [33mSome[39m([32m-39[39m)))
    ),
    [33mCrroIssue[39m(
      [32m"http://numismatics.org/crro/id/rrc-529.1"[39m,
      [32m"RRC 529/1"[39m,
      [32m"aureus"[39m,
      [32m"av"[39m,
      [32m"none"[39m,
      [32m"uncertain_val

### Legends

Like type descriptions, obverse and reverse legends are free text, and therefore `Crro` does not have functions `[obv|rev]LegendList` to get a list of controlled vocabulary.

As you would expect by now, the `hasObvLegend` and `hasRevLegend` functions create a new `Crro` including only those issues with an obverse or reverse legend, respctively.

In [10]:
val oLegends = crro.hasObvLegend
val rLegends = crro.hasRevLegend
println("Total number of issues in OCRE: " + crro.size)
println("Issues with obv. type description: " + oLegends.size)
println("Issues with rev. type description: " + rLegends.size)
val bothLegends = oLegends.hasRevLegend// == rLegends.hasObvLegend
println("Issues with both obv. and rev. legends: " + bothLegends.size)


Total number of issues in OCRE: 2281
Issues with obv. type description: 1144
Issues with rev. type description: 2132
Issues with both obv. and rev. legends: 1098


[36moLegends[39m: [32mCrro[39m = [33mCrro[39m(
  [33mVector[39m(
    [33mCrroIssue[39m(
      [32m"http://numismatics.org/crro/id/rrc-528.3"[39m,
      [32m"RRC 528/3"[39m,
      [32m"denarius"[39m,
      [32m"ar"[39m,
      [32m"none"[39m,
      [32m"uncertain_value"[39m,
      [32m"none"[39m,
      [32m"Head of M. Antonius right; around, inscription. Border of dots."[39m,
      [32m"M\u00b7ANTON\u00b7IMP\u00b7III\u00b7VIR\u00b7R\u00b7P\u00b7C\u00b7AVG"[39m,
      [32m""[39m,
      [32m"Head of Octavian right; around, inscription. Border of dots."[39m,
      [32m"CAESAR\u00b7IMP\u00b7PONT\u00b7III\u00b7VIR\u00b7R\u00b7P\u00b7C"[39m,
      [32m""[39m,
      [33mSome[39m([33mYearRange[39m([32m-39[39m, [33mSome[39m([32m-39[39m)))
    ),
    [33mCrroIssue[39m(
      [32m"http://numismatics.org/crro/id/rrc-529.1"[39m,
      [32m"RRC 529/1"[39m,
      [32m"aureus"[39m,
      [32m"av"[39m,
      [32m"none"[39m,
      [32m"uncertain_v

## Dating information

`Issue` classes includes a final optional property with date information about each issue.  Instead of a simple string value, it's an object modeling a range of years.  

The `datable` function creates a new `Crro` containing only issues that have dating information.  The functions `dateRange`, `minDate` and `maxDate` identify the chronological limits of all the issues in a given `Crro` instance.  Negative values represent years BCE.


In [12]:
// THe date range object:
println("Total number of issues in OCRE: " + crro.size)
println("Number of datable issues: " + crro.datable.size)

println("Chronological range of issues in OCRE: " + crro.dateRange)
println("Earliest issue: " + crro.minDate)
println("Latest issues: " + crro.maxDate)


Total number of issues in OCRE: 2281
Number of datable issues: 2281
Chronological range of issues in OCRE: -326:-31
Earliest issue: -326
Latest issues: -31
