# Examine DSE data in a CITE library

This notebook uses  a CEX release of HMT project data with known errors that were cleaned up in the summer, 2019, HMT seminar.  The purpose of this notebook is to illustrate how to identify inconsistencies in vectors of  `DsePassage`s using new functions introduced in version 6.0.0 of the `dse` library to  


## Configure Jupyter notebook




In [None]:
// 1. Add maven repository where we https://hub.gke.mybinder.org/user/neelsmith-nomisma-pwjvmfpa/notebooks/Untitled.ipynb?kernel_name=scala212#can find our libraries
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(myBT)

In [None]:
// 2. Make libraries available with `$ivy` imports:
import $ivy.`edu.holycross.shot::scm:7.2.0`
import $ivy.`edu.holycross.shot.cite::xcite:4.2.0`
import $ivy.`edu.holycross.shot::dse:6.0.2`


## Load a CITE library from CEX source

In [None]:
import edu.holycross.shot.scm._
import edu.holycross.shot.cite._

In [None]:
val url = "https://raw.githubusercontent.com/homermultitext/hmt-archive/master/releases-cex/hmt-2018e-tweaked.cex"
val lib = CiteLibrarySource.fromUrl(url)

// or if you've downloaded the data locally:
//val f = "hmt-2018e-errors.cex"
//val lib = CiteLibrarySource.fromFile(f)

val citecoll = lib.collectionRepository.get

## Assemble CITE collection objects for DSE collections

This data set has only one collection implementing the DSE model.  We'll get all its DSE data as CITE Collection objects.

In [None]:
val dseModel = Cite2Urn("urn:cite2:cite:datamodels.v1:dsemodel")
val dseCollections = lib.collectionsForModel(dseModel)

// verify that the data only include 1 collection:
require(dseCollections.size == 1)
val vaPageCollection = dseCollections(0)
val vaPages = citecoll.objectsForCollection(vaPageCollection)

Construct `DsePassage`s from the cite collection objects:

In [None]:
val vaDsePassages = vaPages.map(edu.holycross.shot.dse.DsePassage.fromCitableObject(_))



## Use `DseVector` object to find inconsistencies

Text pages with multiple DSE entries are verboten.

In [None]:
val duplicateTexts = edu.holycross.shot.dse.DseVector.duplicatePassages(vaDsePassages)
println(duplicateTexts.size + " texts appear more than once in the DSE passages.")

Require that text-bearing surfaces be indexed consistently to a single reference image.

In [None]:
val badPageIndexing = edu.holycross.shot.dse.DseVector.doubleIndexedSurfaces(vaDsePassages)
println(badPageIndexing.size + " pages are indexed to more than one reference image.")

## (Optional) Obsessive consistency checking

Check that text passages appearing on each surface are indexed to the same reference image as the physical surface.

>Warning: this step is **slow**.


In practical terms, this function is often unnecessary.  If you're building from source such as a delimited-text listing of DSE triples, the `doubleIndexedSurfaces` function will much more quickly identify the same pages with errors.

In [None]:
val dseInconsistencies = edu.holycross.shot.dse.DseVector.triangleConsistencyErrors(vaDsePassages)
println(dseInconsistencies.size + " entries had inconsistencies between text passage and surface.")