-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Dictionary support
KOReader supports dictionary lookup in EPUB and even in scanned PDF/DJVU documents. All you need to do to select a phrase for the dictionary and Wikipedia is hold on a word, or hold and drag to select multiple words for other functions.
To use the dictionary lookup function, you first need to install one or more dictionaries in the StarDict format.
The StarDict format dictionary files have suffixes *.idx
, *.ifo
or *.ifo.gz
, *.dict
or *.dict.dz
.
The dictionaries need to be installed into one of those directories:
-
/sdcard/koreader/data/dict
directory for Android -
/mnt/private/koreader/data/dict
for Cervantes -
koreader/data/dict
directory for Kindle -
.adds/koreader/data/dict/
directory for Kobo -
applications/koreader/data/dict
directory for Pocketbook -
$HOME/.config/koreader/data/dict
directory for Linux -
$HOME/Library/Application Support/koreader/data/dict
directory for macOS
Since v2020.04 you can override the directory where dictionaries are installed. This is useful if your device has more than one app that can deal with StarDict dictionaries to avoid duplicates. To do so, you'll need to add the full path to defaults.custom.lua
. For example: STARDICT_DATA_DIR = "/mnt/onboard/.adds/vlasovsoft/dictionary"
.
- The ebook-reader-dict project provides StarDict version of daily dumps of Wiktionary monolingual dictionaries for a variety of languages.
- The WikDict project provides bilingual StarDict dictionaries (download link) based on Wiktionary for a lot of language pairs.
- This Github repository contains dictionaries based on Wiktionary from many languages to English, including English-English.
- The DictInfo website provides outdated monolingual dictionaries based on Wiktionary.
- The Firedict site contains a list of freely available dictionaries.
- One can convert between different dictionaries formats using PyGlossary.
- Some freely available dictionaries can be converted to the StarDict format with stardicter. See also wiktionary-to-stardict.
- It is also possible to convert dict.cc dictionaries to the StarDict format with dictcc-stardict.
- You may also be able to use
DICT
files used by the standard dictd daemon and the related dict packages that contain.dict
files. Those files can be converted tostardict
format using the/usr/lib/stardict-tools/dictd2dic
command provided in thestardict-tools
package, although it seems to fail to create the necessary metadata files like the.ifo
file. - You can download dictionaries from the internet within KOReader as shown here.
- Fictionaries provides dictionaries for various speculative fiction books and series.
You can use HTML encoded dictionaries, as described here.
Also, dictionaries can be tweaked with a custom CSS file, as described here and here. You can find sample files showing how to tweak them here. And some more discussion can be found here.
MuPDF is used to render the HTML dictionary results. If KOReader notices MuPDF didn't like the HTML, it falls back to stripping tags, keeping line feeds, and gives it back to MuPDF.
We can't easily fix up HTML, but one can add a .lua file in the dict
directory with code to tweak the output before feeding it to MuPDF.
You need to be at ease with Lua, or just hack the samples @poire-z created for some french dicts. More details in #3585 (and #3606, #3611).
You can strip (or more simply make them not interpreted by MuPDF) the inline css with something like the following in the <dictfilename>.lua
:
return function(html)
-- html = html:gsub(' style=', ' zzztyle=')
html = html:gsub(' [Ss][Tt][Yy][Ll][Ee]=', ' zzztyle=')
return html
end
- Edit an
.ifo
file in the dictionary folder. There should be a parametersametypesequence
. To make css stripping work it should besametypesequence=h
. - Keep in mind that css stripping is a very powerful tool which can lead to enormous substitutions. To play safe, check out the output of stardict binary to find out what tags are used in the html layout. For example, from SSH or terminal on a device, go too koreader/ directory and call
sdcv -02 data/dict quaint
, wheredata/dict
is the dictionary folder andquaint
in a search query. The output should look like this:
[root@kindle koreader]# ./sdcv -02 data/dict/ quaint
Found 2 items, similar to quaint.
-->Longman Dictionary of Contemporary English 5th Ed. (En-En)
-->quaint
<k>quaint</k>
<c c="blue"><b>quaint</b></c> /kweɪnt/ <abr>BrE</abr> <rref>bre_quaint0205.wav</rref> <abr>AmE</abr> <rref>ame_quaint.wav</rref><i><c> adjective</c></i>
<blockquote><blockquote>[<c c="lightcoral">Date: </c><c c="darkgray">1100-1200</c>; <c c="lightcoral">Language: </c><c c="darkgray">Old French</c>; <c c="lightcoral">Origin: </c><c c="darkgray">cointe</c><c c="darkgray"> </c><i><c c="lightseagreen">'clever'</c></i><c c="darkgray">, from </c><c c="darkgray">Latin</c><c c="darkgray"> </c><c c="darkgray">cognitus</c><c c="darkgray"> </c><i><c c="lightseagreen">'known'</c></i>]</blockquote></blockquote>
<blockquote><blockquote> unusual and attractive, especially in an old-fashioned way: </blockquote></blockquote>
<blockquote><blockquote><blockquote><blockquote> <rref>exa_p008-000464505.wav</rref> <ex>a quaint little village in Yorkshire</ex></blockquote></blockquote></blockquote></blockquote>
From the output, several things can be extracted. One - the main tag for paragraphs is <blockquote>
. Two - the main tag for colored text is <c c="color">
which is not a classical css-coloring scheme. Moreover, colors themselves are written out as text instead of html-rgb references, so they might be completely ignored by KOReader. Three - there are references to .wav
sound files which are redundant for KOReader. In dictionary applications that support such references, these are essentially small icons of a speaker action as a button to trigger the sound. However in KOReader's dictionary they will be rendered plainly as in the html source, e.g. bre_quaint0205.wav
. Four - there is an extra word of the query in the <k>
tag.
- After you find out what you would like to replace, create a
.lua
file with exactly the same name of the.ifo
file before the file extension. Here is an example content of such a file to replace color schemes and definitions with classical ones, replaced.wav
references with a Unicode icon of speaker (to distinguish sound examples from the word explanation), removed<k>
tag word, and made sure the images are pointing to the right path, realtive to...koreader/data/dict/DICTNAME/res/
directory.
return function(html)
html = html:gsub('<rref[^>]*>[^<]*%.wav</rref>', '🔊')
html = html:gsub('<k[^>]*>[^<]*</k>', '')
html = html:gsub('<c>', '<span>')
html = html:gsub('</c>', '</span>')
html = html:gsub('<c c="', '<span style="color:')
html = html:gsub('"color:indigo"', '"color:#4B0082"')
html = html:gsub('"color:darkgray"', '"color:#A9A9A9"')
html = html:gsub('"color:lightcoral"', '"color:#F08080"')
html = html:gsub('"color:lightseagreen"', '"color:#20B2AA"')
html = html:gsub('"color:darkgoldenrod"', '"color:#B8860B"')
html = html:gsub('<rref[^>]*>', '<img src="/')
html = html:gsub('.jpg</rref>', '.jpg">')
return html
end
- If you want to tweak the text output with css, create a
.css
file with the same name as.ifo
and.lua
files before the file extension. For this particular example, the css file looks like:
blockquote{
margin-left: 1.0rem;
margin-right: 0.5rem;
text-align: justify;
}
Here is the screenshot of how it was before with sametypesequence=x
by default, and after making it sametypesequence=h
and adding .lua
and .css
:
KOReader has a built-in OCR engine for recognizing words in scanned PDF/DJVU pages. In order to use OCR in scanned pages, you need to install respective Tesseract trained data and add new document languages to koreader/defaults.lua
, if your language is other than English or Chinese.
-
Download language data files for Tesseract 4.00+ and copy the appropriate language data file (e.g.
eng.traineddata
in thetesseract-fast repository
for English andspa.traineddata
for Spanish) intokoreader/data/tessdata
. -
To add new languages, open
koreader/defaults.custom.lua
and add languages via theirISO 3-letter code
(important, this needs to match the training data filename!) to theDKOPTREADER_CONFIG_DOC_LANGS_CODE
array:
DKOPTREADER_CONFIG_DOC_LANGS_CODE = {"eng", "chi_sim"} -- language code, make sure you have corresponding training data
For example, for Kazakh these would be kaz
; for Russian - rus
, etc. If you are unsure of the code for your language, look at the tessdata filenames first.
If you've never customized any advanced settings before, the file will not exist, in which case, just follow the directions in the next sentence, any modified entries will appear in bold, and will automatically be added to the file on exit (this will also help making sure that file is syntactically sound).
If you don't need to add new entries, and simply want to modify the existing ones, you can also go to Tools
> More tools
> Advanced settings
in the file-manager's top menu, and find the DKOPTREADER_CONFIG_DOC_LANGS_CODE
entry there.
Forced OCR
option make KOReader to ignore any built-in text layers that come with pdf/djvu and use only OCR tessdata instead.
You can configure the order of dictionaries in the interface below.
Tap the name of one dictionary(not the checkbox). Then it's selected. And you can move it upward and downward using the buttons on the bottom of the screen.
More info can be found here.
To look up a word in the dictionary, press and hold on the word. If you press and hold for more than 3s, it will open a menu with more options, as described here.
The dictionary supports a history of searched words, accessible through the menu. More info can be found here (with images).
You can cancel (too) long, or any, search by tap. More on this here.
Home |
- Changelog
- Install on Android
- Install on Cervantes
- Install on ChromeOS
- Install on Kindle
- Install on Kobo
- Install on PocketBook
- Install on ReMarkable
- Install on Desktop Linux
- Install on MacOS
- Install on Windows with WSL
- Install using Linux container (Docker, Podman)
Basic reading controls (click to open)
Advanced reading controls (click to open)
- Getting started
- Gestures
- Change defaults
- DPI control
- Style tweaks for misbehaving books
- Reflowing tweaks
- Screenshots
- Troubleshooting
- Android tips and tricks
- Auto frontlight
- Auto standby
- Auto suspend
- Background runner
- Battery statistics
- Calibre
- Cover browser
- Cover image
- Frontlight gesture controller
- Goodreads (deprecated)
- Highlight exporter
- Japanese Support
- Keep alive
- News downloader
- Perception expander
- Progress sync
- Read timer
- Reading statistics
- Send2Ebook
- SSH
- System statistics
- Terminal emulator
- Time sync
- Vocabulary builder
- Wallabag