Adapt parallel reader endpoint #25

jtauber · 2019-04-22T18:11:18Z

replace HOMER PARALLEL READER with one using GRC's translations and which is driven by the HOMER REFERENCE INPUT.

jtauber · 2019-04-22T18:14:29Z

DDN: do we serve up the alignment together? A better approach would be to decouple the serving up of both languages separately and also the alignment itself separately.

jtauber · 2019-04-23T03:46:13Z

Data: https://docs.google.com/spreadsheets/d/1-Zz1TCm1bVygngmWeXuskzlfQHN3ER6HERX0DC-VRI0/edit#gid=0

jacobwegner · 2019-04-24T16:49:19Z

Notes from our discussion:

Pass a greek reference (1.1 - 1.18)
Return alignment chunks, expanding if necessary (1.1-1.7, 1.8, 1.9, 1.9-1.12, 1.12-1.16, 1.17-1.19)
Each chunk has text from both translations
Will follow up with ISSUE GOES HERE to strip back the endpoint to not include the text, but rather references

First pass at a spec:

Show other chunking schemes, completion
1.1 to 1.19 is chapter 1
May be a "follow-on" like with GraphQL

{
    "metadata": {
        "self_url": "/<text-identifier-1>/alignment/<text-identifier-2>/<range>/",
        "refs_url": "/<text-identifier-1>:<range>/",
        "refs": {
            "start": "1.1",
            "end": "1.7",
        },
    },
    "chunks": [
        {
            "metadata": {
                "id": 123456,
                "self_url": "/<text-identifier-1>/alignment/<text-identifier-2>/by-id/<id>/",
            },
            "items": [
                {
                    "metadata": {
                        "self_url": "/<text-identifier-1>:<range>/",
                    },
                    "text_html": "",
                    "refs": {
                        "start": "1.1",
                        "end": "1.7",
                    },
                },
                {
                    "metadata": {},
                    "text_html": "",
                    "refs": {}
                }
            ]
        }
    ]
}

jacobwegner · 2019-04-26T17:12:28Z

Endpoints

I've made the first pass with the following endpoints:

Alignment by reference

/<work_1_urn>/alignment/eng/<reference>/

https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/3.411-3.412/

Retrieves the alignment based on the provided reference.

Returns 404 if the reference is not valid

Alignment by offset

/<work_1_urn>/alignment/eng/paginate/

https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/paginate/

Paginates through alignment milestones.

Supports limit (defaults to 10 ) and offset (defaults to 0) arguments. Includes previous/next URLs.

Alignment by offset from reference

/<work_1_urn>/alignment/eng/paginate/<reference>/

https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/paginate/3.411-3.412/

Redirects to "Alignment by offset" at the first offset where the milestone contains the passage (e.g. https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/paginate/?offset=985&limit=10)

I though this could be a useful shortcut to start pagination without having to work out what the offset is for a particular reference.

Alignment milestone detail

/<work_1_urn>/alignment/eng/by-id/<milestone_id>/

https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/by-id/2275092/

A detail view for an alignment milestone by its particular milestone id. Same as the milestones returned in chunks by the other endpoints, but just a quick way to load a particular milestone.

Gotchas hit along the way

My original spec had assumed a 1:1 relationship references and milestones, but there are several references that appear in multiple milestones. As part of resolving alignment milestones from a reference (and calculating the offset for that particular milestone), we will peek at the next and previous milestone from where the milestone is indexed. Here are two samples:
- https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/1.9/
- https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/1.47/
Because that relationship is not 1:1, I'm not making an attempt to resolve "subreferences" from leaves within the text server. Instead, I'm stripping out references from the "Greek" field on each row in the source CSV and returning that content.
I found that there are several errors in the "Citation" field in the source data. For example, look at 2185582 for the Iliad. citation="1.8" but greek="[1.80] τὸν δ᾽ ἠμείβετ᾽ ἔπειτα θεά, γλαυκῶπις Ἀθήνη:". Since I was already parsing the Greek content, I just re-created the citations from the greek content.
Greg's Iliad has a citation for 18.616-18.617, but the Iliad text we're using doesn't have 18.167. I wrote an edge case fix for this that can be expanded as desired (HEALED_CITATIONS)

TODOs

There's some circular dependencies between the various Python modules that I'd like to clean up. Might take the opportunity to move from Flask to Django too
I'd also like to port the alignment functionality over to the text server backends; everything is currently in memory but probably needs to be available in Redis as well. Doing the pagination bits has also made me think that a RDBMS (hello, Postgres!) backend might be something we should consider (and that might allow us to leverage something like Django Rest Framework on the backend and get pagination, cursors, etc "for free").
Add a top-level metadata endpoint for each work that lists available alignments (right now we're harcoding to a <work-urn> andeng pairing, but we know in things like Digital Sira that won't be hardcoded)
Get access to the homer-api app on Heroku (likely work with @jtauber on that). Currently, we're hosting on a new readhomer-dev-api app under the SV team.
Determine how to resolve other formatting errors in the Greek content
Determine if we want to standardize the output format (rather than returning text for the Greek and English items in the alignment, returning tokens instead, etc.)

jacobwegner · 2019-04-30T18:19:16Z

Add-ons

Retain line breaks
text --> reference, content, continuation triplicates (English retains text)
continuation has a kind or None

Future

Ideally, "mute" the portion of the line that overlaps
Enumerate continuation kinds

jtauber added this to Backlog in Project Apr 22, 2019

jtauber added the backend-api New API or change to existing API label Apr 22, 2019

jacobwegner mentioned this issue Apr 24, 2019

synchronised reading of translations #17

Open

jacobwegner changed the title ~~new parallel reader~~ Adapt parallel reader endpoint May 28, 2019

jacobwegner mentioned this issue Jul 5, 2019

Translation Alignments: Port translation alignment endpoints from Flask app to ATLAS scaife-viewer/explorehomer-atlas#4

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt parallel reader endpoint #25

Adapt parallel reader endpoint #25

jtauber commented Apr 22, 2019

jtauber commented Apr 22, 2019

jtauber commented Apr 23, 2019

jacobwegner commented Apr 24, 2019 •

edited

Loading

jacobwegner commented Apr 26, 2019

jacobwegner commented Apr 30, 2019 •

edited

Loading

Adapt parallel reader endpoint #25

Adapt parallel reader endpoint #25

Comments

jtauber commented Apr 22, 2019

jtauber commented Apr 22, 2019

jtauber commented Apr 23, 2019

jacobwegner commented Apr 24, 2019 • edited Loading

jacobwegner commented Apr 26, 2019

Endpoints

Gotchas hit along the way

TODOs

jacobwegner commented Apr 30, 2019 • edited Loading

jacobwegner commented Apr 24, 2019 •

edited

Loading

jacobwegner commented Apr 30, 2019 •

edited

Loading