Incomplete display of delimited dictionary entries #5168

ceaum · 2019-08-02T17:27:19Z

KOReader version: v2019.07
Device: Kobo Clara HD

Dictionary look-up of a word that contains an entry with delimited portions displays a seemingly arbitrary portion.

Two examples using the wikt-en-ALL-2018-05-15 dictionary:

Querying "haan" in StarDict displays both the Dutch and Finnish entries as such
. The Koreader dictionary only displays the Dutch entry like this, which happens to be the first one.
Querying "vraag" displays this
in StarDict, but only the Dutch entry in Koreader, which here happens to be the last entry.

The .dict.dz, .idx and .ifo files are placed within a directory in /mnt/onboard/.adds/koreader/data/dict/.

E: Title and text edits because I accidentally posted before completing the Issue.
E2: formatting

Frenzie · 2019-08-02T17:52:30Z

See Dushistov/sdcv#30.

cyphar · 2021-10-18T06:02:44Z

Unsurprisingly because this affects Japanese text I went and fixed it 😅. I've submitted Dushistov/sdcv#78 upstream which should fix this issue.

poire-z · 2021-10-18T07:55:46Z

Just asking: how do you judge the performance impact of your upstream PR ?
sdcv can be slow when you have lots of dicts and no results, and/or the need to use fuzzy search.
Feels like your fix will just happen after an entry is found and will read before/after, so it shouldn't be too expensive - and will do nothing when nothing found, right ?

cyphar · 2021-10-18T09:02:45Z

Yes to your questions -- only after finding an entry (binary search) it will do the minimum possible extra work to find any extra entries (linearly look before and after the found index, comparing each with the string). If there are no identical entries it'll add only two extra string comparisons, if there are identical entries I doubt you can do better than O(number-of-identical-entries) which is what we are doing. There is sort-and-remove-duplicates step -- which I guess isn't strictly necessary -- at the end of the search but that's O(n log n) where n is the number of results (which is going to be small).

All-in-all it shouldn't make lookups much slower than they already were. On my laptop, exact searches with my 7 relatively-large Japanese dictionaries takes ~80ms for both the no matches case and the lots-of-entries-matching (>100 for はい) cases. Fuzzy searching takes 300-500ms (depending on whether it finds anything during fuzzy searching). This is basically identical to the time taken with sdcv master.

EDIT: I added some micro-optimisations (using std::set so no need to sort the vector, and only iterate over the match block once rather than twice in the worst case). There wasn't any change to the timing, but now there aren't any low-hanging optimisations to apply left.

ceaum changed the title ~~Incomplete display of dictionary entry~~ Incomplete display of delimited dictionary entries Aug 2, 2019

Frenzie added the bug label Aug 2, 2019

Frenzie mentioned this issue Aug 2, 2019

Duden stardict dictionary issues #2951

Closed

Frenzie added the Upstream label Aug 3, 2019

NiLuJe mentioned this issue Nov 10, 2020

Dictionary shows only one definition #6863

Closed

Frenzie closed this as completed Oct 18, 2021

Frenzie added this to the 2021.11 milestone Oct 18, 2021

This was referenced Nov 14, 2021

sdcv: update to include multiple results fix koreader/koreader-base#1431

Merged

koreader-base: update to include sdcv update #8446

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incomplete display of delimited dictionary entries #5168

Incomplete display of delimited dictionary entries #5168

ceaum commented Aug 2, 2019 •

edited

Loading

Frenzie commented Aug 2, 2019

cyphar commented Oct 18, 2021

poire-z commented Oct 18, 2021

cyphar commented Oct 18, 2021 •

edited

Loading

Incomplete display of delimited dictionary entries #5168

Incomplete display of delimited dictionary entries #5168

Comments

ceaum commented Aug 2, 2019 • edited Loading

Frenzie commented Aug 2, 2019

cyphar commented Oct 18, 2021

poire-z commented Oct 18, 2021

cyphar commented Oct 18, 2021 • edited Loading

ceaum commented Aug 2, 2019 •

edited

Loading

cyphar commented Oct 18, 2021 •

edited

Loading