# Aorists in Polybius and Arrianus

We'll make use of the `CaptainCorpusReader`, the `AGLDTReader` and the `CitableConcordanceIndex` to extract and index of the finite forms of the aorists in those two authors

In [2]:
import sys
sys.path.append("../")
sys.path.append("../../")

In [37]:
from perseus_nlp_toolkit.reader import CapitainCorpusReader, AGLDTReader
from perseus_nlp_toolkit.text import CitableConcordanceIndex
import os
# we use etree to parse the treebank XML: if you mycapitain, then you also have lxml...
from lxml import etree
import re

In [9]:
pers_root = os.path.expanduser("~/cltk_data/greek/text/canonical-greekLit-master/data")

In [83]:
arrian = CapitainCorpusReader(pers_root, "tlg0074/tlg001/tlg0074.tlg001.perseus-grc1.xml")

## Polybius (treebank)

For Polybius, we will start by indexing the aorist forms that match our constraints from [Vanessa and Bob Gorman's annotated treebanks](https://perseids-publications.github.io/gorman-trees/). The actual files can be donwloaded from the GitHub [repo](https://github.com/perseids-publications/gorman-trees) of the project.

Unfortunately, not all the text of the historian has been treebank, but it's already much; [Vanessa and Bob Gorman]() should be thanked and given a lot of credit for the service they are providing to all of us!

### Load the files

We use the `AGLDTReader` to read the Polybius files in memory

In [18]:
tbroot = os.path.expanduser("~/cltk_data/greek/agdt/gorman-trees-master/xml/")
polyfs = "polybius.*\.xml"

In [22]:
polybius = AGLDTReader(tbroot, polyfs)

In [23]:
poly_words = polybius.annotated_words()

In [25]:
len(poly_words)

105694

### Add the subdoc citation

Unfortunately, Vanessa Gorman's treebanks don't have a token-by-token cite disambiguation, so it's not possible to get more fine-grained citation than the paragraph level. But it's already something...

We'll grab that indication from the `subdoc` properties of the sentence

In [28]:
poly_sents = polybius.annotated_sents()
sent_meta = polybius.get_sentences_metadata()

In [33]:
cites = []
for m,s in zip(sent_meta, poly_sents):
    for w in s:
        cites.append(m.subdoc)

In [34]:
len(cites) == len(poly_words)

True

### Create the concordances

First we create a function that returns `True` if the tag matches our query and `False` if it doesn't

In [66]:
# question: do we want imperatives ('m')?

def is_aorist(token):
    reg = re.compile(r'^...a[isom]....$')
    try:
        m = reg.search(token[1])
    except TypeError:
        return "False"
    if m:
        return "True"
    else:
        return "False"

In [67]:
tbtokens = [(t.form, t.postag) for t in poly_words]

In [68]:
poly_conc = CitableConcordanceIndex(tbtokens, cites, is_aorist)

In [75]:
poly_conc.print_concordance("True", lines=10)

Displaying 10 of 2279 matches:
ταύτῃ ποιεῖσθαι τῇ πόλει . φερομένους τεκμήραιτο δʼ ἄν τις τοῦ τόπου τὴν εὐκαιρίαν ἐκ  (10.1.6)
 ὁρμηθεὶς τὰς τηλικαύτας καὶ τοσαύτας ἐπετελέσατο πράξεις , ἀγνοεῖν δὲ καὶ ψευδοδοξεῖν  (10.2.2)
ὶ τὸ προτεθὲν ἐντεταμένος , οὐθεὶς ἂν συγχωρήσειε πλὴν τῶν συμβεβιωκότων καὶ τεθεαμένων (10.3.1)
ν καιρὸν ὁ πατὴρ αὐτοῦ τὴν ἱππομαχίαν συνεστήσατο πρὸς Ἀννίβαν περὶ τὸν Πάδον καλούμενο (10.3.3)
 τετρωμένον ἐπισφαλῶς , τὰς μὲν ἀρχὰς ἐπεβάλετο παρακαλεῖν τοὺς μεθʼ αὑτοῦ βοηθῆσαι τ (10.3.4)
μβαλεῖν οἱ μὲν πολέμιοι καταπλαγέντες διέστησαν , ὁ δὲ Πόπλιος ἀνελπίστως σωθεὶς πρῶτ (10.3.6)
ς σωθεὶς πρῶτος αὐτὸς τὸν υἱὸν σωτῆρα προσεφώνησε πάντων ἀκουόντων . περιγενομένης δʼ α (10.3.6)
 κατʼ ἰδίαν κινδύνους , ὅτʼ εἰς αὐτὸν ἀναρτηθεῖεν ὑπὸ τῆς πατρίδος αἱ τῶν ὅλων ἐλπίδες  (10.3.7)
τῆς προθέσεως , εἰ συμφρονήσαντες ἅμα ποιήσαιντο τὴν ἐπιβολήν , ἦλθεν ἐπί τινα τοιαύτη (10.4.3)
ήσαντες ἅμα ποιήσαιντο τὴν ἐπιβολήν , ἦλθεν ἐπί τινα τοιαύτην ἔννοιαν . θεωρῶν γὰ (10.4.3)


### Style and write out

In [79]:
conc =  poly_conc.find_concordance("True")
h = '''<!doctype html>
<html lang=en>
<head>
<meta charset=utf-8>
<title>Polybius: aorist concordances</title>
</head>
<body>
<ul>
'''
for c in conc:
    h += '<li>{} <span style="color:blue">{}</span> {} (<span style="color:green">{}</span>)</li>\n'.format(c.left_print,
                                                                                                       c.query,
                                                                                                       c.right_print,
                                                                                                       c.cite)
    
h += '''</body>
</html>
'''

In [80]:
with open("/home/francesco/Desktop/polybius_treebank.html", "w") as out:
    out.write(h)

In [82]:
conc[0].query

'τεκμήραιτο'

## Using Morpheus

In [92]:
import requests
from lxml import etree
import json

In [None]:
@functools.lru_cache(maxsize=256)
def search_morpheus(word):
    url = "https://morph.perseids.org/analysis/word?lang=grc&engine=morpheusgrc&word=" + word
    

In [95]:
resp = requests.get("https://morph.perseids.org/analysis/word?lang=grc&engine=morpheusgrc&word=%CE%BA%CF%89%CE%BB%CF%8D%CF%83%CE%B5%CE%B9")

In [97]:
j = resp.json()

In [100]:
print(json.dumps(j, indent=2))

{
  "RDF": {
    "Annotation": {
      "about": "urn:TuftsMorphologyService:\u03ba\u03c9\u03bb\u03cd\u03c3\u03b5\u03b9:morpheusgrc",
      "creator": {
        "Agent": {
          "about": "org.perseus:tools:morpheus.v1"
        }
      },
      "created": {
        "$": "2019-05-24T13:58:12.704031"
      },
      "hasTarget": {
        "Description": {
          "about": "urn:word:\u03ba\u03c9\u03bb\u03cd\u03c3\u03b5\u03b9"
        }
      },
      "title": {},
      "hasBody": [
        {
          "resource": "urn:uuid:idm140232419604384"
        },
        {
          "resource": "urn:uuid:idm140232421406720"
        },
        {
          "resource": "urn:uuid:idm140232421366016"
        }
      ],
      "Body": [
        {
          "about": "urn:uuid:idm140232419604384",
          "type": {
            "resource": "cnt:ContentAsXML"
          },
          "rest": {
            "entry": {
              "uri": null,
              "dict": {
                "hdwd": {
              

In [101]:
results = j["RDF"]["Annotation"]["Body"]

In [104]:
results[1]

{'about': 'urn:uuid:idm140232421406720',
 'type': {'resource': 'cnt:ContentAsXML'},
 'rest': {'entry': {'uri': None,
   'dict': {'hdwd': {'lang': 'grc', '$': 'κώλυσις'},
    'pofs': {'order': 3, '$': 'noun'},
    'decl': {'$': '3rd'},
    'gend': {'$': 'feminine'}},
   'infl': [{'term': {'lang': 'grc',
      'stem': {'$': 'κωλυς'},
      'suff': {'$': 'ει'}},
     'pofs': {'order': 3, '$': 'noun'},
     'decl': {'$': '3rd'},
     'case': {'order': 7, '$': 'nominative'},
     'gend': {'$': 'feminine'},
     'num': {'$': 'dual'},
     'dial': {'$': 'Attic epic'},
     'stemtype': {'$': 'is_ews'},
     'morph': {'$': 'contr'}},
    {'term': {'lang': 'grc', 'stem': {'$': 'κωλυς'}, 'suff': {'$': 'ει'}},
     'pofs': {'order': 3, '$': 'noun'},
     'decl': {'$': '3rd'},
     'case': {'order': 1, '$': 'vocative'},
     'gend': {'$': 'feminine'},
     'num': {'$': 'dual'},
     'dial': {'$': 'Attic epic'},
     'stemtype': {'$': 'is_ews'},
     'morph': {'$': 'contr'}},
    {'term': {'lang': '

---

In [72]:
matches = poly_conc.find_concordance("True")

In [47]:
w = poly_sents[23][31

In [36]:
s.replace("-", ".")

'...ai....'

In [56]:
poly_words[0]

Word(id='1', form='ὄντων', lemma='εἰμί', postag='v-pppang-', head='21', relation='ADV', cite=None)