# Visualizing frequency of lexemes

## Configuring Jupyter notebook

In [None]:
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(myBT)

In [None]:
import $ivy.`edu.holycross.shot::ohco2:10.18.2`
import $ivy.`edu.holycross.shot.cite::xcite:4.2.0`
import $ivy.`edu.holycross.shot::midvalidator:10.0.0`
import $ivy.`edu.holycross.shot::latincorpus:2.2.1`
import $ivy.`edu.holycross.shot::latphone:2.7.2`

## Load a citable corpus from a URL

In [None]:
import edu.holycross.shot.cite._
import edu.holycross.shot.ohco2._

val hyginusUrl = "https://raw.githubusercontent.com/neelsmith/hctexts/master/cex/hyginus.cex"
val corpus = CorpusSource.fromUrl(hyginusUrl, cexHeader = true)

## Create a tokenizable corpus

Combine the citable corpus with an orthographic system to create a tokenizable corpus.

Among the many methods of the `TokenizableCorpus` class is a method to generate an alphabetized list of lexical tokens (in colloquial English, a word list).  This can be used as input to a morphological parser.



In [None]:
import edu.holycross.shot.latin._
import edu.holycross.shot.mid.validator._

val tcorpus = TokenizableCorpus(corpus, Latin23Alphabet )
val wordList =  tcorpus.wordList

## Create a `LatinCorpus`

Add output of a morphological parser to the tokenizable corpus to create a `LatinCorpus`.

In [None]:
// Read output from morphological parser from a URL:
val hyginusFstUrl = "https://raw.githubusercontent.com/neelsmith/hctexts/master/parser-output/hyginus/hyginus-parses.txt"
import scala.io.Source
val fstOutput = Source.fromURL(hyginusFstUrl).getLines.toVector

In [None]:
import edu.holycross.shot.latincorpus._

val lc = LatinCorpus.fromFstLines(
      corpus,
       Latin23Alphabet,
     fstOutput,
      strict = false
    )

## ## Zipf's Law for analyzed lexemes in Hyginus

A `LatinCorpus` has a method to create a histogram of lexemes.

In [None]:
// This is the histogram of recognized lexemes:
lc.labelledLexemeHistogram

In [None]:
// It would be nice to visualize, so let's use the 
// plotly library with ammonite sh:
// Make plotly libraries available to this notebook:
import $ivy.`org.plotly-scala::plotly-almond:0.7.1`

In [None]:
// Import plotly libraries, and set display defaults suggested for use in Jupyter NBs:
import plotly._, plotly.element._, plotly.layout._, plotly.Almond._
repl.pprinter() = repl.pprinter().copy(defaultHeight = 3)

In [None]:
val items = lc.labelledLexemeHistogram.frequencies.map(fr => fr.item)
val counts = lc.labelledLexemeHistogram.frequencies.map(fr => fr.count)

val zipf = Vector(
  Bar(x = items, y = counts)
)
plot(zipf)