-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wikipedia .zim files #2333
Comments
|
In my experience slob is significantly faster. I don't know if it's inherent in the format or just an implementation detail. The slob format is used in Aard Dictionary 2. |
|
I've tried opening zim files (from here "http://wiki.kiwix.org/wiki/Special:MyLanguage/Content_in_all_languages") with kobo firmware, search is basic (titles not full text), only one .zim open, speed is fair, still usable. Slob search, as you say, is better and faster and I see there are wikipedia and wiktionaries also in slob formats. I don't know how feasible it is, if it is easier to implement using slob or zim format, but I think it's a nice option to have a searchable (titles only) wikipedia (and wiktionaries) offline. |
|
I suppose there are fewer available, although it's easy to compile your Anyway, it was just a remark. It's not like I currently have the time to On Fri, Nov 11, 2016 at 10:58 PM, gorlan notifications@github.com wrote:
|
|
I just tried Kiwix on my phone for the first time in a couple of years and actually it's as fast as Aard Dictionary or faster. Matching is worse though, with I might actually be interested in finally playing around with Lua bindings, in this case in combination with libzim, but I just don't have the time to atm. :-) |
|
On 2017-02-27 16:02, Frans de Jonge wrote:
I just tried Kiwix on my phone for the first time in a couple of years
and actually it's as fast as Aard Dictionary or faster. Matching is
worse though, with deja not matching déjà, which especially on a
phone (or KOReader) is rather convenient.
I might actually be interested in finally playing around with Lua
bindings, in this case in combination with libzim, but I just don't
have the time to atm. :-)
--
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub [1], or mute the
thread [2].
*
Links:
------
[1]
#2333 (comment)
[2]
https://github.com/notifications/unsubscribe-auth/AWUWfUoP-wAoY2lFA7NJlzRpNwsVWub0ks5rguWagaJpZM4KvxMr
I still think that off-line access to wikipedia (and wiktionaries and
others more) is a very useful feature and I'm happy to hear you may
start implementing it in koreader. Thanks for your job
|
|
Note that you can already get at least an English StarDict Wiktionary here. |
|
Kiwix has a bunch of command line tools that might be useful over at https://github.com/kiwix/kiwix-tools, but unfortunately they compile into some pretty enormous executables and they require a server to boot (kiwix-read and kiwix-search don't do exactly what I thought they might). |
|
Responding to #2345 (comment) @didierm Note that a selected word is currently automatically copied to the clipboard, so unlike on a platform like Kobo all you need to do is switch & paste. Btw, this issue is about offline, while the one where you posted is about online. @pazos Do you happen to know if this is more or less generic (besides the Intent intent = new Intent("aard2.lookup");
intent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK | Intent.FLAG_ACTIVITY_CLEAR_TOP);
intent.putExtra(SearchManager.QUERY, "Foo Bar");
startActivity(intent); |
|
@Frenzie While running the latest release 2019.04, highlighting a word does not copy it to the clipboard (with Settings > Document > Highlight action = 'popup' or 'highlight'). |
|
Pardon, I added a Copy action to the highlight popup. The auto-copy was just for testing back in the day I suppose. I think @poire-z made it so can always get to it by holding on it slightly longer or some such. |
|
We may as well copy all selected word/text. Or make that dependant of a new toggle menu item under |
|
I see : I tried before, holding on it for 2-3 seconds, but it seems 3-4 seconds are needed to get to the "Copy" popup. Hardly fluent and rather interruptive/invasive when e.g. reading in a foreign language ... Hence my proposal for a more immersive direct search/lookup with Aard2 locally installed slobs. (note : a StarDict conversion of the English wiktionary is available at http://dictinfo.com/). |
I have no idea. But looking at https://developer.android.com/reference/android/content/Intent.html#FLAG_ACTIVITY_NEW_TASK I see no problem.
I have no idea too, but seems harder to implement than the "simple" share button you expect on a normal android app. |
That's what I was thinking about :) Intent intent = getIntent();
Bundle extras = intent.getExtras();
String action = intent.getAction();
if (Intent.ACTION_SEND.equals(action)) {
if (extras.containsKey(Intent.EXTRA_TEXT)) {
String text = intent.getStringExtra(Intent.EXTRA_TEXT); |
I gave it a try today. Kinda works. The android backend is easy doable but each dict app requires some oddities. It happens that Coolreader has a great support for external dictionaries but in our case we might want to move some things to lua. I'm thinking about the table of supported dicts (each dict has a display name, a package name, and an action name) and the current dict. Here is a (partial) example of the android-luajit-launcher side of the things: +
+ public void openInExternalDict(String text, String packageName, String actionName) {
+ if (isAppInstalled(packageName)) {
+ openLookup(text, packageName, actionName);
+ }
+ }
+
+ private boolean isAppInstalled(String packageName) {
+ try {
+ PackageManager pm = getPackageManager();
+ pm.getPackageInfo(packageName, PackageManager.GET_ACTIVITIES);
+ return pm.getApplicationInfo(packageName, 0).enabled;
+ } catch (PackageManager.NameNotFoundException e) {
+ Logger.e(TAG, e.toString());
+ return false;
+ }
+ }
+
+ private void openLookup(String text, String packageName, String actionName) {
+ try {
+ Intent intent = new Intent(actionName);
+ if ("aard2.lookup".equals(actionName) {
+ intent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK | Intent.FLAG_ACTIVITY_CLEAR_TOP);
+ intent.putExtra(SearchManager.QUERY, text);
+ }
+ startActivity(intent);
+
+ } catch (Exception e) {
+ Logger.e(TAG, e.toString());
+ return false;
+ }
+
+ }
+Oddities are handled in java. From lua side we can use: the openInExternalDict would be: + android.openInExternalDict = function(query, package, action)
+ android.DEBUG("launching external dict application " .. package)
+ JNI:context(android.app.activity.vm, function(JNI)
+ local query = JNI.env[0].NewStringUTF(JNI.env, query)
+ local package = JNI.env[0].NewStringUTF(JNI.env, package)
+ local action = JNI.env[0].NewStringUTF(JNI.env, action)
+ JNI:callVoidMethod(
+ android.app.activity.clazz,
+ "openInExternalDict",
+ "(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)V",
+ query, package, action
+ )
+ JNI.env[0].DeleteLocalRef(JNI.env, query)
+ JNI.env[0].DeleteLocalRef(JNI.env, package)
+ JNI.env[0].DeleteLocalRef(JNI.env, action)
+ end)
+ endWe probably want to show a menu of supported dictionaries and just disable them if they're not installed and ready to use. If we can build a simple interface for external lookups, like the openLink function, I will be more than happy. |
|
Looks fine to me, except for the obvious complaints about hardcoding all that. |
If you were requested our help with the UI stuff :), here's something that should allow you to get going with the android part (dummy implementation for the emulator) --- a/frontend/device/generic/device.lua
+++ b/frontend/device/generic/device.lua
@@ -71,6 +71,7 @@ local Device = {
canOpenLink = no,
openLink = no,
+ canExternalDictLookup = no,
}
function Device:new(o)
--- a/frontend/device/sdl/device.lua
+++ b/frontend/device/sdl/device.lua
@@ -6,6 +6,26 @@ local logger = require("logger")
local function yes() return true end
local function no() return false end
+local EXTERNAL_DICTS_AVAILABILITY_CHECKED = false
+local EXTERNAL_DICTS = {
+ -- Internal id (stored as setting), Display name, available, params
+ { "Fora", "Fora dict", false, {package="com.ngc.fora.ForaDictionary", action="ACTION_SEARCH"} },
+ { "ColorDict", "ColorDict", false, {package="com.socialnmobile.colordict.activity.Main", action="ACTION_SEARCH"} },
+ { "AardDict", "AardDict", false, {package="aarddict.android.Article", action="ACTION_SEARCH"} },
+}
+local function getExternalDicts()
+ if not EXTERNAL_DICTS_AVAILABILITY_CHECKED then
+ EXTERNAL_DICTS_AVAILABILITY_CHECKED = true
+ for i, v in ipairs(EXTERNAL_DICTS) do
+ -- check availability with v[4] (params)
+ if i % 2 == 1 then -- for test, have 1 out of 2 available
+ v[3] = true
+ end
+ end
+ end
+ return EXTERNAL_DICTS
+end
+
local Device = Generic:new{
model = "SDL",
isSDL = yes,
@@ -22,6 +42,19 @@ local Device = Generic:new{
if not link or type(link) ~= "string" then return end
return os.execute("xdg-open '"..link.."'") == 0
end,
+
+ canExternalDictLookup = yes,
+ getExternalDictLookupList = getExternalDicts,
+ doExternalDictLookup = function(self, text, method)
+ local params = nil
+ for i, v in ipairs(getExternalDicts()) do
+ if v[1] == method then
+ params = v[4]
+ break
+ end
+ end
+ logger.info("External dict lookup for", text, "with params:", params)
+ end,
}
local AppImage = Device:new{
--- a/frontend/apps/reader/modules/readerdictionary.lua
+++ b/frontend/apps/reader/modules/readerdictionary.lua
@@ -298,6 +298,53 @@ If you'd like to change the order in which dictionaries are queried (and their r
}
}
}
+ if Device:canExternalDictLookup() then
+ local function genExternalDictItems()
+ local items_table = {}
+ for i, v in ipairs(Device:getExternalDictLookupList()) do
+ table.insert(items_table, {
+ text = v[2],
+ checked_func = function()
+ return v[1] == G_reader_settings:readSetting("external_dict_lookup_method")
+ end,
+ enabled_func = function()
+ return v[3] == true
+ end,
+ callback = function()
+ G_reader_settings:saveSetting("external_dict_lookup_method", v[1])
+ end,
+ })
+ end
+ return items_table
+ end
+ table.insert(menu_items.dictionary_settings.sub_item_table, 1, {
+ text = _("Use external dictionary"),
+ checked_func = function()
+ return G_reader_settings:isTrue("external_dict_lookup")
+ end,
+ callback = function()
+ G_reader_settings:flipNilOrFalse("external_dict_lookup")
+ end,
+ })
+ table.insert(menu_items.dictionary_settings.sub_item_table, 2, {
+ text_func = function()
+ local display_name = _("none")
+ local ext_id = G_reader_settings:readSetting("external_dict_lookup_method")
+ for i, v in ipairs(Device:getExternalDictLookupList()) do
+ if v[1] == ext_id then
+ display_name = v[2]
+ break
+ end
+ end
+ return T(_("Dictionary: %1"), display_name)
+ end,
+ enabled_func = function()
+ return G_reader_settings:isTrue("external_dict_lookup")
+ end,
+ sub_item_table = genExternalDictItems(),
+ separator = true,
+ })
+ end
end
function ReaderDictionary:onLookupWord(word, box, highlight, link)
@@ -710,6 +757,11 @@ function ReaderDictionary:stardictLookup(word, dict_names, fuzzy_search, box, li
})
end
+ if Device:canExternalDictLookup() and G_reader_settings:isTrue("external_dict_lookup") then
+ Device:doExternalDictLookup(word, G_reader_settings:readSetting("external_dict_lookup_method"))
+ return
+ end
+
if fuzzy_search then
self:showLookupInfo(word)
end(Had to be put into May be @Frenzie will want to implement that on the sdl/linux side with a list of common linux dictionary apps (if there are some, dunno). And what to do with the word highlighted? |
|
Even on mobile I have my doubts compared to plain copy paste, but on desktop it makes significantly less sense. Maybe emulator exclusive for testing if desired ? The dictionary could just be launching a browser link to Wiktionary or something. |
Thanks a lot for the help!!!
There are some ways to know when the lookup activity ends: We can know what happened to the intent by overriding onActivityResult. (*) KOReader activity is managed by the Android Framework. We have another thread running the luajit vm and our frontend. That main thread is running without caring about application state.
Please, because the other solution involves a boolean value holding isLookup, overriding onActivityResult to set isLookup to false and lua code polling while android.isLookup() == true |
|
@poire-z: I just tested your diff and it works great. I think I managed to have some form of universal dictionary apk finder, but sadly I'm not sure how to convert a java List<String,String,String> to a lua table and I hate spending time on jni stuff, so meh. The table of supported dicts is the way to go for me at the moment. We need to refresh the table when the activity is resumed (because an user can install/uninstall software in between). I implemented a few specific methods for Aard2 and ColorDict and general methods for ACTION_SEND and ACTION_SEARCH. Now I just need to find some dictionaries. I think I will start to add support for those already present in F-Droid. Any suggestion? |
|
Many of the Wiktionaries have achieved reasonable quality. |
Cool, I've added https://f-droid.org/en/packages/de.reimardoeffinger.quickdic/ |
Is the packages-installed-checks expensive? |
FYI, I personally have installed the following (F-Droid hosted) dictionaries / information sources :
I guess Aard 2 (https://f-droid.org/en/packages/itkach.aard2/) is the most important one, as it provides access to locally installed information such as (in my case) WordNet, WikiWoordenboek (nl), Wiktionary (en), Wiktionary (nl), and Babylon English-Dutch. |
Aard2 is currently supported. Aard is not updated since 2012 and currently deprecated. Are the rest of packages intended for an offline lookup?. I think Wikipedia / Wiktionary are not. |
|
No, they aren't, AFAIK. |
|
Back to OP: libzim integration (as a dict, not as a wikipedia replacement) looks straightfoward. The main blocker are their prerrequisites: https://github.com/openzim/libzim#dependencies |
|
Yeah, ICU is going to be a problem: it's a ginormous binary if unfiltered. |
|
There's also a barebones implementation in https://github.com/imapersonman/ZimReader/tree/master/ZimReader that I just discovered. Didn't play much with it but works fine to read the header: It relies on LZMA only and could work as a minimal reader. |
|
hi, is there any progress (and will to do so) with reading zim* files by KOreader? *) or any other format allowing for reading wikipedia offline |
|
I believe https://github.com/ilius/pyglossary can convert from Slob and Zim (among others) to StarDict. |
|
@Frenzie thanks for reply and suggestion, I'll try and see if it feasible to do so, it would be perfect wikipedia reader |
|
Does KOReader handle huge ePub files well, like loading them only partially? If so, could we use PyGlossary to convert Wikipedia dumps (or ZIM files) to ePub (or MOBI) files with hyperlinks between articles and lazy-load only the interesting article? |
It does not. But note that "huge" doesn't really matter except in the number of elements sense. That is, you can have a couple hundred kilobyte EPUB that's "huge" and you can have one that's 1 GB that's not. But a Wikipedia dump is huge by any metric, of course. |
|
moving out of #9534 where i feel i'm becoming too noisy... i've just bought a 512GB microSD card for basically 50$USD here (well, 80$CAD which is 60$ or so) which can not only fit all of wikipedia, with images, but it can fit it multiple times, then have my book collection, then some more. so, really, there's room in there. in #9534, there's a debate on whether we should implement libzim support or some other library time. the PR itself implements its own sqlite database that needs to be converted from ZIM files. i don't feel this is the simplest or even best approach. instead, i think a quick fix would be to have, "simply", a crude web browser plugin. power users like me could deploy their own kiwix-serve backend and files, figure out how to launch that at boot, and then connect to it with the web browser. it wouldn't be as seamless as the dictionary or current (online) wikipedia plugin, but it would actually have a working offline wikipedia, something i've been dreaming of for basically forever (since, anyways, kobo kind of implemented it by accident and then pulled it away from us forever). has anyone considered that approach? just make a simple web browser and let the user setup kiwix? oh, and i also wanted to mention other prior art, mostly concerning nickel: https://a3nm.net/blog/kobo_glo_hacking.html both of those take a similar approach: they deploy the precompiled arm kiwix binaries on the kobo, then patch the nickel binary to allow the web browser to run offline, and boom, you're done. surely we can do better than binary patching here! :) |
It isn't :)
This-does-not-compute :d There's no such thing as a "simple web browser". We have a pretty good engine (without javascript and ffmpeg), thanks to @poire-z's work on crengine, and even then we're using MuPDF for the html widgets (I mean, surely there's a reason for that, I have no clue :)) And a browser is much more than a renderer.
I don't know if If kiwix doesn't provide an API then it is painful to parse the http response. So better to read the zim file directly. Building and shipping ICU is the only roadblock I can see for 1st party support. If you want to discuss 3rdparty, that's another story. Nowadays with |
i'm not sure i understand this... kiwix does provide a binary API through the library but it also provides an HTTP server, which, for me, seems simpler to implement. we already do implement something like this for the normal wikipedia plugin, no? we pull the page from wikipedia over HTTP? but i'm a bit out of my depth here... |
|
oh, and another thing i didn't realize is that /mnt/onboard gets forcibly mounted as |
In terms of storage that's fairly inoffensive, but you could also phrase that as a doubling of the download size. (But of course I wouldn't be saying that if we were talking about going from 100 kB to 200 kB.)
We use https://wikimedia.org/api/rest_v1/ to retrieve exactly what we want. There are no "Readability" type heuristics involved to ignore parts of the page. For Zim it's basically just stored webpages, so it's a bit different (read: harder). I think they're not actually the webpages but dynamically rendered from a Mediawiki data dump at the time of creation so they're probably at least somewhat optimized, but I haven't kept a close eye on how they've evolved the past couple of years.
There's https://github.com/akhenakh/gozim btw. |
|
On 2024-04-27 00:36:42, Frans de Jonge wrote:
> (It also puts into perspective concerns about ICU's size, IMHO. I don't know about how it would impact koreader, but here on Debian, the package is 37... megabytes...)
In terms of storage that's fairly inoffensive, but you could also phrase that as a doubling of the download size. (But of course I wouldn't be saying that if we were talking about going from 100 kB to 200 kB.)
Oh yeah, that's true, I didn't have that perspective.
> i'm not sure i understand this... kiwix does provide a binary API through the library but it also provides an HTTP server, which, for me, seems simpler to implement. we already do implement something like this for the normal wikipedia plugin, no? we pull the page from wikipedia over HTTP?
We use https://wikimedia.org/api/rest_v1/ to retrieve exactly what we want. There are no "Readability" type heuristics involved to ignore parts of the page.
Oh, I see! Didn't realize such an API even existed, but of course...
For Zim it's basically just stored webpages, so it's a bit different (read: harder). I think they're not actually the webpages but dynamically rendered from a Mediawiki data dump at the time of creation so they're probably at least somewhat optimized, but I haven't kept a close eye on how they've evolved the past couple of years.
Right.
> If you want to discuss 3rdparty, that's another story. Nowadays with `go` it should be trivial to write a zim reader and package it as a binary for the various ABI's we support
There's https://github.com/akhenakh/gozim btw.
Thanks! But that's basically like kiwix-serve: it makes a webserver out
of the ZIM files.
|
|
It looks like So an interested third party would be able to use kiwix-serve, hosted on the same device that runs KO or in any other place of the network, to retrieve info. For our PoV makes no sense to rely on middleware, which involves a few extra setup steps, and should be straightforward to implement proper zim support directly if somebody takes care of integrating the missing bits (ICU, xapian and libzim itself) to the build system and check than that doesn't add too much size to our binaries. (No idea what's sensible, maybe 3-5MB in size?)
The debian package is less than 30MiB in size, uncompressed is more than 100MiB, mostly fonts, hyphenation, translations and libraries: 20M /usr/lib/koreader/fonts/
30M /usr/lib/koreader/l10n/
8,1M /usr/lib/koreader/data/hyph/
28M /usr/lib/koreader/libs/Other platforms are even worse than that because at least in debian we repurpose most fonts from the repos. So, a few mb for a library that's not mandatory to read might be too much to spend. Not sure :/ |
Okay, well that's promising at first glance, but digging a little deeper, it doesn't seem to work very well. I can talk to the root endpoint, but from there, most links end up in dead ends. The "all entries" links to a single PNG image, and the categories are a similar dead end. The easiest way to test this is to run kiwix on another machine and use the Koreader OPDS plugin to browse it from outside. But even browsing it from another machine leads to similar dead end, it's not an issue specific to Koreader's OPDS plugin, which otherwise works fine. |



Hi all,
is it possible to open wikipedia .zim files like they do in kobo original firmware (http://www.mobileread.com/forums/showthread.php?t=276219")? If it is not already possible it would be a nice feature in my opinion.
Thanks for developing Koreader.
The text was updated successfully, but these errors were encountered: