# Spanning sentences

## A bit of dataset development.

Ernst Boogert is developing a converter that converts many Greek sources from TEI to TF.
He has to deal with many patterns in the sources, sometimes confusing and conflicting ones.

So a fair bit of trial and error is conducted.

Here we show how a dataset with unhelpful sentence boundaries got improved.
Some sentence endings had not been detected by the converter,
causing more than hundred sentences to spill over to the next chapter.

After Ernst gave the message that he had remedied this, we performed a check (this notebook).

**Note in passing how easy it is to load previous versions of the data.**

In [1]:
from tf.app import use

## After the improvement

There are no sentences that span chapters anymore.

We test version 1.1 at a moment where the `athenaeus` app is not yet updated GitHub, only locally.
The app is configured to a specific version of the data.

By means of the specifier `:clone` we ask for the app in its development version in our local
git repository.

In [2]:
# A = use('athenaeus', hoist=globals())
A = use("athenaeus:clone", checkout="clone", hoist=globals())

### Find sentences across chapters

A sentence that spans multiple chapters is not embedded in any chapter.
So if we look up the embedders of sentences, restricted to chapters, and we do not find any, we have a chapter spanning sentence.

In [3]:
def getSpanning():
    spanning = []

    for s in F.otype.s("_sentence"):
        chapters = L.u(s, otype="chapter")
        if not chapters:
            spanning.append(
                (s,)
            )  # we add s as a singleton tuple, so that we can feed all to table() later

    print(len(spanning))
    return spanning

In [4]:
spanning = getSpanning()

0


No spanning sentences. Good!

## Before the improvement

But it has not been always so good.

We use the newest app, but we override the data version.

In [5]:
# A = use('athenaeus', version='1.0', hoist=globals())
A = use("athenaeus:clone", checkout="clone", version="1.0", hoist=globals())

In [6]:
spanning = getSpanning()

138


In [7]:
N.otypeRank

{'word': 0,
 '_sentence': 1,
 'l': 2,
 'bibl': 3,
 'quote': 4,
 'p': 5,
 'pb': 6,
 'chapter': 7,
 'add': 8,
 'num': 9,
 'cit': 10,
 'hi': 11,
 'book': 12,
 'head': 13,
 '_book': 14}

In [8]:
A.table(spanning, full=True)

n,p,_sentence
1,1 3:16,"‘ἆρ’ οὖν ἐθελήσεις καὶ ἡμῖν τῶν καλῶν ἐπικυλικίων λόγων μεταδοῦναι— τρὶς δ’ ἀπομαξαμένοισι θεοὶ διδόασιν ἄμεινον, ὥς πού φησιν ὁ Κυρηναῖος ποιητής( )—ʼἢ παρ’ ἄλλου τινὸς ἡμᾶς ἀναπυνθάνεσθαι δεῖ;’ εἶτα εἰσβάλλει μετ’ ὀλίγον εἰς τὸν τοῦ Λαρηνσίου ἔπαινον καὶ λέγει· ὃς ὑπὸ φιλοτιμίας πολλοὺς τῶν ἀπὸ παιδείας συναθροίζων οὐ μόνον τοῖς ἄλλοις ἀλλὰ καὶ λόγοις εἱστία, τὰ μὲν προβάλλων τῶν ἀξίων ζητήσεως, τὰ δὲ ἀνευρίσκων, οὐκ ἀβασανίστως οὐδ’ ἐκ τοῦ παρατυχόντος τὰς ζητήσεις ποιούμενος, ἀλλ’ ὡς ἔνι μάλιστα μετὰ κριτικῆς τινος καὶ Σωκρατικῆς ἐπιστήμης, ὡς πάντας θαυμάζειν τῶν ζητήσεων τὴν τήρησιν."
2,1 6:39,"‘γυργάθους ψηφισμάτων φέροντες,’ Ἀριστοφάνης φησίν( ). ὅτι Ἀρχέστρατος ὁ Συρακούσιος ἢ Γελῷος ἐν τῇ ὡς Χρύσιππος ἐπιγράφει Γαστρονομίᾳ, ὡς δὲ Λυγκεὺς καὶ Καλλίμαχος Ἡδυπαθείᾳ, ὡς δὲ Κλέαρχος Δειπνολογίᾳ, ὡς δ’ ἄλλοι Ὀψοποιίᾳ—ἐπικὸν Ὀψοποιίᾳ—ἐπικὸν δὲ τὸ ποίημα, οὗ ἡ ἀρχή( )· ἱστορίης ἐπίδειγμα ποιούμενος Ἑλλάδι πάσῃ— φησί( )· πρὸς δὲ μιᾷ πάντας δειπνεῖν ἁβρόδαιτι τραπέζῃ."
3,1 11:87,"ἐπεὶ δὲ τὴν ἐρωμένην Γαλάτειαν ἐφωράθη διαφθείρων, εἰς τὰς λατομίας ἐνεβλήθη· ἐν αἷς ποιῶν τὸν Κύκλωπα συνέθηκε τὸν μῦθον εἰς τὸ περὶ αὑτὸν γενόμενον πάθος, τὸν μὲν Διονύσιον Κύκλωπα ὑποστησάμενος, τὴν δ’ αὐλητρίδα Γαλάτειαν, ἑαυτὸν δ’ Ὀδυσσέα( ). ἐγένετο δὲ κατὰ τοὺς Τιβερίου χρόνους ἀνήρ τις Ἀπίκιος, πλουσιώτατος τρυφητής, ἀφ’ οὗ πλακούντων γένη πολλὰ Ἀπίκια ὀνομάζεται."
4,1 20:170,"οὕτω σφόδρ᾽ ἦν ἀρχαῖος,’ Ἀντιφάνης φησί( ). καὶ τῶν κρεῶν δὲ μοῖραι ἐνέμοντο· ὅθεν ἐίσας φησὶ τὰς δαῖτας ἀπὸ τῆς ἰσότητος."
5,1 28:256,"ἀδύνατον γὰρ μὴ φρονίμους εἶναι Φαίακας, οἳ μάλα φίλοι εἰσὶ θεοῖσιν, ὡς ἡ Ναυσικάα φησί( ). καὶ οἱ μνηστῆρες δὲ παρ’ αὐτῷ πεσσοῖσι προπάροιθε θυράων ἐτέρποντο, οὐ παρὰ τοῦ μεγάλου Διοδώρου [ἢ Θεοδώρου] μαθόντες τὴν πεττείαν οὐδὲ τοῦ Μιτυληναίου Λέοντος τοῦ ἀνέκαθεν Ἀθηναίου, ὃς ἀήττητος ἦν κατὰ τὴν πεττευτικήν, ὥς φησι Φαινίας( ). Ἀπίων δὲ ὁ Ἀλεξανδρεὺς καὶ ἀκηκοέναι φησὶ παρὰ τοῦ Ἰθακησίου Κτήσωνος τὴν τῶν μνηστήρων πεττείαν οἵα ἦν."
6,1 30:283,"ἀλλ’ οὐδ’ ὅτε μνηστῆρας εἰσάγει μεθύοντας, οὐδὲ τότε τοιαύτην ἀκοσμίαν εἰσήγαγεν ὡς Σοφοκλῆς καὶ Αἰσχύλος πεποιήκασιν, ἀλλὰ πόδα βόειον ἐπὶ τὸν Ὀδυσσέα ῥιπτούμενον( ). καθέζονται δ’ ἐν τοῖς συνδείπνοις οἱ ἥρωες, οὐ κατακέκλινται."
7,1 35:326,Φιλίππου δὲ τοῦ γελωτοποιοῦ Ξενοφῶν μνημονεύει( ). ὅρος οἰκουμένης.
8,1 38:350,"οὗ μήτε πράττεται τέλος μηδεὶς... ἡμᾶς μήτε τιμὴν δόντα δεῖ ἑτέρων λαβεῖν, φέρει δὲ τοῖς μὲν χρωμένοις δόξης τιν᾽ ὄγκον, τοῖς δ’ ὁρῶσιν ἡδονήν, κόσμον δὲ τῷ βίῳ—τὸ βίῳ—τὸ τοιοῦτον γέρας τίς οὐκ ἂν αὑτῷ κτῷτο φάσκων νοῦν ἔχειν; καὶ Αἰσχύλος δὲ οὐ μόνον ἐξεῦρε τὴν τῆς στολῆς εὐπρέπειαν καὶ σεμνότητα, ἣν ζηλώσαντες οἱ ἱεροφάνται καὶ δᾳδοῦχοι ἀμφιέννυνται, ἀλλὰ καὶ πολλὰ σχήματα ὀρχηστικὰ αὐτὸς ἐξευρίσκων ἀνεδίδου τοῖς χορευταῖς."
9,1 58:594,"οὐδὲν ἀπόβλητον Διονύσιον, οὐδὲ γίγαρτον, ὁ Κεῖός φησι ποιητής( ). τῶν οἴνων ὃ μὲν λευκός, ὃ δὲ κιρρός, ὃ δὲ μέλας."
10,2 19:837,"τὰ αὐτὰ δ’ ἰαμβεῖα καὶ Ὠφελίων φησί( ). τοιαῦτα ὥσπερ οἱ ῥήτορες πρὸς ὕδωρ εἰπὼν καὶ βραχὺ ἀναπαυσάμενος αὖθις ἔφη· ‘Ἄμφις ὁ κωμικός πού φησιν( )· ἐνῆν ἄρ’, ὡς ἔοικε, κἀν οἴνῳ λόγος· ἔνιοι δ’ ὕδωρ πίνοντές εἰσ᾽ ἀβέλτεροι."


We could also have picked the commit hash of the app where it still asks for version `1.0` of the data.

But then we need a previous version of Text-Fabric!

In [9]:
A = use("athenaeus:5e24f733392baaf28658dc6bef0fb42d04f8d296", hoist=globals())

rate limit is 5000 requests per hour, with 4997 left for this hour
	connecting to online GitHub repo annotation/app-athenaeus ... connected


App `athenaeus` requires API version 0 but Text-Fabric provides 3.
Your copy of the TF app `athenaeus` is outdated for this version of TF.
Recommendation: obtain a newer version of `athenaeus`.
Hint: load the app in one of the following ways:

    athenaeus
    athenaeus:latest
    athenaeus:hot

    For example:

    The Text-Fabric browser:

        text-fabric athenaeus:latest

    In a program/notebook:

        A = use('athenaeus:latest', hoist=globals())

