# Search diplomatic text of HMT *scholia*


### How to use this notebook

1. First, run step 1 (e.g., by selecting the cell labelled **Step 1: load everything** and choosing "Run all below" from the "Cell" menu).  This will be slow, and your mileage may vary depending on how well your connection to different resources on the internet happens to be performing just then.
2. Just below the cell labelled **Step 2: search**, fill in between quotation marks an argument to the function `search`. 

Then run the cell (e.g., by selecting it, and choosing "Run cells" from the "Cell" menu).



# Step 2: search

In [None]:
search("τέον")

# Step 1. Load everything


The most recent release of the archive is always available from [this directory](https://github.com/homermultitext/hmt-archive/tree/master/releases-cex):  you can check there to update the release version in the following cell.

In [None]:
// Check for most recent release at
// https://github.com/homermultitext/hmt-archive/tree/master/releases-cex
// and change this value if needed:
val releaseId = "2020i"


## Configure Jupyter notebook

In [None]:

// 1. Add maven repository where we can find our libraries
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(myBT)

In [None]:
// 2. Make libraries available with `$ivy` imports:
import $ivy.`edu.holycross.shot::scm:7.4.0`
import $ivy.`edu.holycross.shot::ohco2:10.20.4`
import $ivy.`edu.holycross.shot.cite::xcite:4.3.0`
import $ivy.`edu.holycross.shot::dse:7.1.3`
import $ivy.`edu.holycross.shot::greek:9.0.0`

## Load HMT data

Data releases of the Homer Multitext project archive are published as CITE libraries, and committed to the `hmt-archive` github repository in CEX format.



In [None]:
import edu.holycross.shot.scm._

val url = s"https://raw.githubusercontent.com/homermultitext/hmt-archive/master/releases-cex/hmt-${releaseId}.cex"
val lib = CiteLibrarySource.fromUrl(url)

In [None]:
import edu.holycross.shot.ohco2._
import edu.holycross.shot.dse._
import edu.holycross.shot.greek._

val corpus = lib.textRepository.get.corpus
val dsev = DseVector.fromCiteLibrary(lib)
val scholia = corpus.nodes.filter(_.urn.textGroup == "tlg5026")

## Search and format results 

In [None]:

val pageBaseUrl = "http://www.homermultitext.org/facsimiles/venetus-a/"

def search(s: String) = {
  val matchedPsgs = scholia.filter(_.text.contains(s))
  val pls = if (matchedPsgs.size == 1) { "" } else  { "s" }
  val hdr = s"<h2>Search for string ${s}</h2>" +
  s"<p>Found ${matchedPsgs.size} passage${pls}</p>"
  val results = for ( (urn, idx)  <- matchedPsgs.map(_.urn).zipWithIndex) yield {
    val scholion = urn.collapsePassageBy(1)
    //println(scholion)
    val nd = corpus.nodes.filter(nd => scholion > nd.urn)
    //println(nd)
    val text = nd.map(n => "<blockquote>" + n.text.replaceAll(s, "<strong>" + s + "</strong>") + "</blockquote>" )
    val pgOpt = dsev.tbsForText(scholion)
    pgOpt match  {
      case None => {

        s"<li> <strong>${idx + 1}/${matchedPsgs.size}</strong> ${scholion} (Sadly, no page indexed in DSE) "  + text.mkString("\n")  + "</li>"
      }
      case _ => {
        val pg = pgOpt.get.objectComponent
        val url = pageBaseUrl + pg + "/"

        val link = "<a href=\"" + url + "\">facsimile</a>"

        s"<li> <strong>${idx + 1}/${matchedPsgs.size}</strong> ${scholion}, page ${pg} (${link})" + text.mkString("\n") + "</li>"
      }
    }
  }
  Html(hdr + results.mkString("\n"))
}
