Improve implementation of disambiguated lemmas #142

jacobwegner · 2023-05-23T15:11:12Z

See our LSJ entries for ἄωρος in urn:cts:greekLit:tlg0012.tlg002.perseus-grc2:12.89:

https://beyond-translation.perseus.org/reader/urn:cts:greekLit:tlg0012.tlg002.perseus-grc2:12.89?mode=dictionary-entries&entryUrn=urn%3Acite2%3Ascafife-viewer%3Adictionary-entries.atlas_v1%3Alsj-18938

The text was updated successfully, but these errors were encountered:

jacobwegner · 2023-06-01T13:02:31Z

@jtauber:

We had introduced a "normalized" version of the entry headword:

If I use the "display" value instead of the normalized value, things get cluttered:

I can make use of the "display" version when choosing a "sibling":

Any thoughts?

I'll get a deploy done soon so you can play around with this some more...

jacobwegner · 2023-06-06T13:37:15Z

(Deployed to https://beyond-transl-pr-143.herokuapp.com/reader/urn:cts:greekLit:tlg0012.tlg002.perseus-grc2:12.89?mode=dictionary-entries&entryUrn=urn%3Acite2%3Ascafife-viewer%3Adictionary-entries.atlas_v1%3Acambridge-greek-lexicon-2307 )

jacobwegner · 2023-06-06T13:49:47Z

@jtauber to investigate δελφίς --> https://beyond-transl-pr-143.herokuapp.com/reader/urn:cts:greekLit:tlg0012.tlg002.perseus-grc2:12.96?mode=dictionary-entries&entryUrn=urn%3Acite2%3AexploreHomer%3Aentries.atlas_v1%3A1.2121

jacobwegner · 2023-06-06T13:55:11Z

(To review character stripping)

jacobwegner · 2023-06-16T12:09:11Z

@jtauber: Here is a better explanation of what is going on with δελφῖνάς in Odyssey 12.
If you click through to load this query:

https://tinyurl.com/gh-bt-142-sample

You can see that headwordNormalizedStripped for LSJ, Cunliffe and Cambridge is stored as δελφις.

headword is provided directly from each lexicon.

headwordNormalized is computed in normalized_no_digits:

get the NFD normalized form
get the case-folded NFKC form of the NFD normalized form
strip digits (done for disambiguation, e.g. ἄωρος1 vs ἄωρος2, etc)

headwordNormalizedStripped is computed in normalize_and_strip_marks:

get the NFD normalized form
remove characters matching UNICODE_MARK_CATEGORY_REGEX
get the case-folded NFKC form of the NFD normalized, mark-stripped value
does not do a stripping of digits (so that θεά1 and θέα2 in LSJ are distinct)

Beyond Translation is currently using headwordNormalized for the lookups; I believe this was done to avoid the exact kind of error where we might resolve both θεά and θέα within LSJ.

We're performing the exact same normalization from headwordNormalized on the search term provided by a user on the frontend.

So, back to δελφῖνάς in Od. 12:

The lemma we're using is δελφίς
The headwordNormalized form for the Cambridge Greek Lexicon is δελφῑ́ς
If we could make the headword in the file you're providing for Cambridge Greek Lexicon δελφίς or δελφίς, the headwordNormalized would then become δελφίς
headwordDisplay could continue to have δελφῑ́ς

Does that make sense to you? I have some additional things I'd like to document around this, but I think having this new headwordDisplay option will be a big help going forward.

jacobwegner · 2023-09-19T12:39:20Z

(We should review this for Cambridge and Lexicon Thucydideum, as well as replicating what the "word study tool" does for lookups https://www.perseus.tufts.edu/hopper/morph?l=%CF%84%CE%B1%CF%81%CE%AC%CF%83%CF%83%CF%89&la=greek)

jacobwegner mentioned this issue May 24, 2023

Ingest and display Cambridge Greek Lexicon #63

Open

jacobwegner added this to the Perseus 4 Baseline Functionality milestone Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve implementation of disambiguated lemmas #142

Improve implementation of disambiguated lemmas #142

jacobwegner commented May 23, 2023

jacobwegner commented Jun 1, 2023

jacobwegner commented Jun 6, 2023

jacobwegner commented Jun 6, 2023

jacobwegner commented Jun 6, 2023

jacobwegner commented Jun 16, 2023

jacobwegner commented Sep 19, 2023

Improve implementation of disambiguated lemmas #142

Improve implementation of disambiguated lemmas #142

Comments

jacobwegner commented May 23, 2023

jacobwegner commented Jun 1, 2023

jacobwegner commented Jun 6, 2023

jacobwegner commented Jun 6, 2023

jacobwegner commented Jun 6, 2023

jacobwegner commented Jun 16, 2023

jacobwegner commented Sep 19, 2023