Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt parallel reader endpoint #25

Open
jtauber opened this issue Apr 22, 2019 · 5 comments
Open

Adapt parallel reader endpoint #25

jtauber opened this issue Apr 22, 2019 · 5 comments
Labels
backend-api New API or change to existing API
Projects

Comments

@jtauber
Copy link
Member

jtauber commented Apr 22, 2019

replace HOMER PARALLEL READER with one using GRC's translations and which is driven by the HOMER REFERENCE INPUT.

@jtauber jtauber added this to Backlog in Project Apr 22, 2019
@jtauber jtauber added the backend-api New API or change to existing API label Apr 22, 2019
@jtauber
Copy link
Member Author

jtauber commented Apr 22, 2019

DDN: do we serve up the alignment together? A better approach would be to decouple the serving up of both languages separately and also the alignment itself separately.

@jtauber
Copy link
Member Author

jtauber commented Apr 23, 2019

@jacobwegner
Copy link
Contributor

jacobwegner commented Apr 24, 2019

Notes from our discussion:

  • Pass a greek reference (1.1 - 1.18)
  • Return alignment chunks, expanding if necessary (1.1-1.7, 1.8, 1.9, 1.9-1.12, 1.12-1.16, 1.17-1.19)
  • Each chunk has text from both translations
  • Will follow up with ISSUE GOES HERE to strip back the endpoint to not include the text, but rather references

First pass at a spec:

  • Show other chunking schemes, completion
  • 1.1 to 1.19 is chapter 1
  • May be a "follow-on" like with GraphQL
{
    "metadata": {
        "self_url": "/<text-identifier-1>/alignment/<text-identifier-2>/<range>/",
        "refs_url": "/<text-identifier-1>:<range>/",
        "refs": {
            "start": "1.1",
            "end": "1.7",
        },
    },
    "chunks": [
        {
            "metadata": {
                "id": 123456,
                "self_url": "/<text-identifier-1>/alignment/<text-identifier-2>/by-id/<id>/",
            },
            "items": [
                {
                    "metadata": {
                        "self_url": "/<text-identifier-1>:<range>/",
                    },
                    "text_html": "",
                    "refs": {
                        "start": "1.1",
                        "end": "1.7",
                    },
                },
                {
                    "metadata": {},
                    "text_html": "",
                    "refs": {}
                }
            ]
        }
    ]
}

@jacobwegner
Copy link
Contributor

Endpoints

I've made the first pass with the following endpoints:

Alignment by reference

/<work_1_urn>/alignment/eng/<reference>/

https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/3.411-3.412/

Retrieves the alignment based on the provided reference.

Returns 404 if the reference is not valid

Alignment by offset

/<work_1_urn>/alignment/eng/paginate/

https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/paginate/

Paginates through alignment milestones.

Supports limit (defaults to 10 ) and offset (defaults to 0) arguments. Includes previous/next URLs.

Alignment by offset from reference

/<work_1_urn>/alignment/eng/paginate/<reference>/

https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/paginate/3.411-3.412/

Redirects to "Alignment by offset" at the first offset where the milestone contains the passage (e.g. https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/paginate/?offset=985&limit=10)

I though this could be a useful shortcut to start pagination without having to work out what the offset is for a particular reference.

Alignment milestone detail

/<work_1_urn>/alignment/eng/by-id/<milestone_id>/

https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/by-id/2275092/

A detail view for an alignment milestone by its particular milestone id. Same as the milestones returned in chunks by the other endpoints, but just a quick way to load a particular milestone.

Gotchas hit along the way

  • My original spec had assumed a 1:1 relationship references and milestones, but there are several references that appear in multiple milestones. As part of resolving alignment milestones from a reference (and calculating the offset for that particular milestone), we will peek at the next and previous milestone from where the milestone is indexed. Here are two samples:

  • Because that relationship is not 1:1, I'm not making an attempt to resolve "subreferences" from leaves within the text server. Instead, I'm stripping out references from the "Greek" field on each row in the source CSV and returning that content.

  • I found that there are several errors in the "Citation" field in the source data. For example, look at 2185582 for the Iliad. citation="1.8" but greek="[1.80] τὸν δ᾽ ἠμείβετ᾽ ἔπειτα θεά, γλαυκῶπις Ἀθήνη:". Since I was already parsing the Greek content, I just re-created the citations from the greek content.

  • Greg's Iliad has a citation for 18.616-18.617, but the Iliad text we're using doesn't have 18.167. I wrote an edge case fix for this that can be expanded as desired (HEALED_CITATIONS)

TODOs

  • There's some circular dependencies between the various Python modules that I'd like to clean up. Might take the opportunity to move from Flask to Django too
  • I'd also like to port the alignment functionality over to the text server backends; everything is currently in memory but probably needs to be available in Redis as well. Doing the pagination bits has also made me think that a RDBMS (hello, Postgres!) backend might be something we should consider (and that might allow us to leverage something like Django Rest Framework on the backend and get pagination, cursors, etc "for free").
  • Add a top-level metadata endpoint for each work that lists available alignments (right now we're harcoding to a <work-urn> andeng pairing, but we know in things like Digital Sira that won't be hardcoded)
  • Get access to the homer-api app on Heroku (likely work with @jtauber on that). Currently, we're hosting on a new readhomer-dev-api app under the SV team.
  • Determine how to resolve other formatting errors in the Greek content
  • Determine if we want to standardize the output format (rather than returning text for the Greek and English items in the alignment, returning tokens instead, etc.)

@jacobwegner
Copy link
Contributor

jacobwegner commented Apr 30, 2019

Add-ons

  • Retain line breaks
  • text --> reference, content, continuation triplicates (English retains text)
  • continuation has a kind or None

Future

  • Ideally, "mute" the portion of the line that overlaps
  • Enumerate continuation kinds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend-api New API or change to existing API
Projects
Project
  
Backlog
Development

No branches or pull requests

2 participants