Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wikipedia .zim files #2333

Open
gorlan opened this issue Nov 11, 2016 · 43 comments
Open

Wikipedia .zim files #2333

gorlan opened this issue Nov 11, 2016 · 43 comments

Comments

@gorlan
Copy link

gorlan commented Nov 11, 2016

Hi all,
is it possible to open wikipedia .zim files like they do in kobo original firmware (http://www.mobileread.com/forums/showthread.php?t=276219")? If it is not already possible it would be a nice feature in my opinion.
Thanks for developing Koreader.

@Frenzie
Copy link
Member

Frenzie commented Nov 11, 2016

In my experience slob is significantly faster. I don't know if it's inherent in the format or just an implementation detail. The slob format is used in Aard Dictionary 2.

https://github.com/itkach/slob

@gorlan
Copy link
Author

gorlan commented Nov 11, 2016

I've tried opening zim files (from here "http://wiki.kiwix.org/wiki/Special:MyLanguage/Content_in_all_languages") with kobo firmware, search is basic (titles not full text), only one .zim open, speed is fair, still usable. Slob search, as you say, is better and faster and I see there are wikipedia and wiktionaries also in slob formats. I don't know how feasible it is, if it is easier to implement using slob or zim format, but I think it's a nice option to have a searchable (titles only) wikipedia (and wiktionaries) offline.

@Frenzie
Copy link
Member

Frenzie commented Nov 11, 2016

I suppose there are fewer available, although it's easy to compile your
own: https://github.com/itkach/slob/wiki/Dictionaries

Anyway, it was just a remark. It's not like I currently have the time to
look into either. ;)

On Fri, Nov 11, 2016 at 10:58 PM, gorlan notifications@github.com wrote:

I've tried opening a zim file with kobo firmware, search is basic (titles
not full text), only one .zim open, speed is fair, still usable. Slob
search is better and faster, but I don't know if there are available in
"slob format" all collections in different languages of wikipedia,
wiktionary, ... like these ("http://wiki.kiwix.org/wiki/
Special:MyLanguage/Content_in_all_languages") that are in zim format. I
think it's a nice option to have a searchable (titles only) wikipedia (and
wiktionaries) offline. I don't know how feasible it is, maybe it is not
that easy.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#2333 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMYBalMlOnt_f6aw-OZLQmKa8ocyhoPks5q9OUCgaJpZM4KvxMr
.

@Frenzie
Copy link
Member

Frenzie commented Feb 27, 2017

I just tried Kiwix on my phone for the first time in a couple of years and actually it's as fast as Aard Dictionary or faster. Matching is worse though, with deja not matching déjà, which especially on a phone (or KOReader) is rather convenient.

I might actually be interested in finally playing around with Lua bindings, in this case in combination with libzim, but I just don't have the time to atm. :-)

@gorlan
Copy link
Author

gorlan commented Feb 28, 2017 via email

@Frenzie
Copy link
Member

Frenzie commented Feb 28, 2017

Note that you can already get at least an English StarDict Wiktionary here.

@Frenzie
Copy link
Member

Frenzie commented Apr 13, 2017

Kiwix has a bunch of command line tools that might be useful over at https://github.com/kiwix/kiwix-tools, but unfortunately they compile into some pretty enormous executables and they require a server to boot (kiwix-read and kiwix-search don't do exactly what I thought they might).

$ ./kiwix-search 
Usage: kiwix-search [--verbose|-v] [--backend|-b=xapian] INDEX_PATH SEARCH
$ ./kiwix-search data/index/wiktionary_nl_all_2017-01.zim.idx/ test
test
test probe
IQ-test
test uit
test de grossesse
liefdestest
laboratoriumtest
persoonlijkheidstest
testen/vervoeging
burgerschapstest
$ ./kiwix-read 
Usage: kiwix-read --suggest=<PATTERN> ZIM_FILE_PATH
$ ./kiwix-read --suggest=test ./data/content/wiktionary_nl_all_2017-01.zim 
Searching suggestions for: test
test
test de grossesse
test probe
test uit
testa
testaba
testabais
testaban
testabas
testad

@Frenzie
Copy link
Member

Frenzie commented Feb 23, 2019

@Frenzie
Copy link
Member

Frenzie commented Apr 24, 2019

Responding to #2345 (comment)

@didierm Note that a selected word is currently automatically copied to the clipboard, so unlike on a platform like Kobo all you need to do is switch & paste. Btw, this issue is about offline, while the one where you posted is about online.

@pazos Do you happen to know if this is more or less generic (besides the aard2.lookup)?

Intent intent = new Intent("aard2.lookup");
intent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK | Intent.FLAG_ACTIVITY_CLEAR_TOP);
intent.putExtra(SearchManager.QUERY, "Foo Bar");
startActivity(intent);

@didierm
Copy link

didierm commented Apr 24, 2019

@Frenzie While running the latest release 2019.04, highlighting a word does not copy it to the clipboard (with Settings > Document > Highlight action = 'popup' or 'highlight').

@Frenzie
Copy link
Member

Frenzie commented Apr 24, 2019

Pardon, I added a Copy action to the highlight popup. The auto-copy was just for testing back in the day I suppose. I think @poire-z made it so can always get to it by holding on it slightly longer or some such.

@poire-z
Copy link
Contributor

poire-z commented Apr 24, 2019

We may as well copy all selected word/text. Or make that dependant of a new toggle menu item under Document> ?
Also, on Android, we may add a button (extending the 2x4 dialog) to send the selected text with an Intent (mimetype text/plain ?) so that any app that supports that could be selected in the Android popup that would then be shown?

@didierm
Copy link

didierm commented Apr 24, 2019

I see : I tried before, holding on it for 2-3 seconds, but it seems 3-4 seconds are needed to get to the "Copy" popup.

Hardly fluent and rather interruptive/invasive when e.g. reading in a foreign language ...

Hence my proposal for a more immersive direct search/lookup with Aard2 locally installed slobs.
One of the nice things of Aard2 : next to off-line Wikipedia access, it integrates e.g. wiktionaries from different languages (i.e. examining the etymology of an English word in the English wiktionary and seamlessly jumping to its Dutch translation in the Dutch wiktionary).

(note : a StarDict conversion of the English wiktionary is available at http://dictinfo.com/).

@pazos
Copy link
Member

pazos commented Apr 24, 2019

Do you happen to know if this is more or less generic (besides the aard2.lookup)?

I have no idea. But looking at https://developer.android.com/reference/android/content/Intent.html#FLAG_ACTIVITY_NEW_TASK I see no problem.

Also, on Android, we may add a button (extending the 2x4 dialog) to send the selected text with an Intent (mimetype text/plain ?) so that any app that supports that could be selected in the Android popup that would then be shown?

I have no idea too, but seems harder to implement than the "simple" share button you expect on a normal android app.

@poire-z
Copy link
Contributor

poire-z commented Apr 24, 2019

seems harder to implement than the "simple" share button you expect on a normal android app.

That's what I was thinking about :)
(Looking at some old android app i once glued from various bits found here and there, koreader would send an Intent that an other app could receive with these (quite standard I think) bits:

Intent intent = getIntent();
Bundle extras = intent.getExtras();
String action = intent.getAction();
if (Intent.ACTION_SEND.equals(action)) {
  if (extras.containsKey(Intent.EXTRA_TEXT)) {
    String text = intent.getStringExtra(Intent.EXTRA_TEXT);

@pazos
Copy link
Member

pazos commented Jun 13, 2019

seems harder to implement than the "simple" share button you expect on a normal android app.

That's what I was thinking about :)
(Looking at some old android app i once glued from various bits found here and there, koreader would send an Intent that an other app could receive with these (quite standard I think) bits:

Intent intent = getIntent();
Bundle extras = intent.getExtras();
String action = intent.getAction();
if (Intent.ACTION_SEND.equals(action)) {
  if (extras.containsKey(Intent.EXTRA_TEXT)) {
    String text = intent.getStringExtra(Intent.EXTRA_TEXT);

@poire-z @Frenzie

I gave it a try today. Kinda works. The android backend is easy doable but each dict app requires some oddities. It happens that Coolreader has a great support for external dictionaries but in our case we might want to move some things to lua.

I'm thinking about the table of supported dicts (each dict has a display name, a package name, and an action name) and the current dict.

Here is a (partial) example of the android-luajit-launcher side of the things:

+
+    public void openInExternalDict(String text, String packageName, String actionName) {
+        if (isAppInstalled(packageName)) {
+            openLookup(text, packageName, actionName);
+        }
+    }
+
+    private boolean isAppInstalled(String packageName) {
+        try {
+            PackageManager pm = getPackageManager();
+            pm.getPackageInfo(packageName, PackageManager.GET_ACTIVITIES);
+            return pm.getApplicationInfo(packageName, 0).enabled;
+        } catch (PackageManager.NameNotFoundException e) {
+            Logger.e(TAG, e.toString());
+            return false;
+        }
+    }
+
+    private void openLookup(String text, String packageName, String actionName) {
+        try {
+            Intent intent = new Intent(actionName);
+            if ("aard2.lookup".equals(actionName) {
+                intent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK | Intent.FLAG_ACTIVITY_CLEAR_TOP);
+                intent.putExtra(SearchManager.QUERY, text);
+            }
+            startActivity(intent);
+
+        } catch (Exception e) {
+            Logger.e(TAG, e.toString());
+            return false;
+        }
+
+    }
+

Oddities are handled in java. From lua side we can use: android.openInExternalDict("some text", "itkach.aard2", "aard2.lookup") to open the lockup on that specific dictionary.

the openInExternalDict would be:

+    android.openInExternalDict = function(query, package, action)
+        android.DEBUG("launching external dict application " .. package)
+        JNI:context(android.app.activity.vm, function(JNI)
+            local query = JNI.env[0].NewStringUTF(JNI.env, query)
+            local package = JNI.env[0].NewStringUTF(JNI.env, package)
+            local action = JNI.env[0].NewStringUTF(JNI.env, action)
+            JNI:callVoidMethod(
+                android.app.activity.clazz,
+                "openInExternalDict",
+                "(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)V",
+                query, package, action
+            )
+            JNI.env[0].DeleteLocalRef(JNI.env, query)
+            JNI.env[0].DeleteLocalRef(JNI.env, package)
+            JNI.env[0].DeleteLocalRef(JNI.env, action)
+        end)
+    end

We probably want to show a menu of supported dictionaries and just disable them if they're not installed and ready to use.

If we can build a simple interface for external lookups, like the openLink function, I will be more than happy.

@Frenzie
Copy link
Member

Frenzie commented Jun 14, 2019

Looks fine to me, except for the obvious complaints about hardcoding all that.

@poire-z
Copy link
Contributor

poire-z commented Jun 14, 2019

If we can build a simple interface for external lookups, like the openLink function, I will be more than happy.

If you were requested our help with the UI stuff :), here's something that should allow you to get going with the android part (dummy implementation for the emulator)

--- a/frontend/device/generic/device.lua
+++ b/frontend/device/generic/device.lua
@@ -71,6 +71,7 @@ local Device = {

     canOpenLink = no,
     openLink = no,
+    canExternalDictLookup = no,
 }

 function Device:new(o)
--- a/frontend/device/sdl/device.lua
+++ b/frontend/device/sdl/device.lua
@@ -6,6 +6,26 @@ local logger = require("logger")
 local function yes() return true end
 local function no() return false end

+local EXTERNAL_DICTS_AVAILABILITY_CHECKED = false
+local EXTERNAL_DICTS = {
+        -- Internal id (stored as setting), Display name, available, params
+        { "Fora", "Fora dict", false, {package="com.ngc.fora.ForaDictionary", action="ACTION_SEARCH"} },
+        { "ColorDict", "ColorDict", false, {package="com.socialnmobile.colordict.activity.Main", action="ACTION_SEARCH"} },
+        { "AardDict", "AardDict", false, {package="aarddict.android.Article", action="ACTION_SEARCH"} },
+}
+local function getExternalDicts()
+    if not EXTERNAL_DICTS_AVAILABILITY_CHECKED then
+        EXTERNAL_DICTS_AVAILABILITY_CHECKED = true
+        for i, v in ipairs(EXTERNAL_DICTS) do
+            -- check availability with v[4] (params)
+            if i % 2 == 1 then -- for test, have 1 out of 2 available
+                v[3] = true
+            end
+        end
+    end
+    return EXTERNAL_DICTS
+end
+
 local Device = Generic:new{
     model = "SDL",
     isSDL = yes,
@@ -22,6 +42,19 @@ local Device = Generic:new{
         if not link or type(link) ~= "string" then return end
         return os.execute("xdg-open '"..link.."'") == 0
     end,
+
+    canExternalDictLookup = yes,
+    getExternalDictLookupList = getExternalDicts,
+    doExternalDictLookup = function(self, text, method)
+        local params = nil
+        for i, v in ipairs(getExternalDicts()) do
+            if v[1] == method then
+                params = v[4]
+                break
+            end
+        end
+        logger.info("External dict lookup for", text, "with params:", params)
+    end,
 }

 local AppImage = Device:new{
--- a/frontend/apps/reader/modules/readerdictionary.lua
+++ b/frontend/apps/reader/modules/readerdictionary.lua
@@ -298,6 +298,53 @@ If you'd like to change the order in which dictionaries are queried (and their r
             }
         }
     }
+    if Device:canExternalDictLookup() then
+        local function genExternalDictItems()
+            local items_table = {}
+            for i, v in ipairs(Device:getExternalDictLookupList()) do
+                table.insert(items_table, {
+                    text = v[2],
+                    checked_func = function()
+                        return v[1] == G_reader_settings:readSetting("external_dict_lookup_method")
+                    end,
+                    enabled_func = function()
+                        return v[3] == true
+                    end,
+                    callback = function()
+                        G_reader_settings:saveSetting("external_dict_lookup_method", v[1])
+                    end,
+                })
+            end
+            return items_table
+        end
+        table.insert(menu_items.dictionary_settings.sub_item_table, 1, {
+            text = _("Use external dictionary"),
+            checked_func = function()
+                return G_reader_settings:isTrue("external_dict_lookup")
+            end,
+            callback = function()
+                G_reader_settings:flipNilOrFalse("external_dict_lookup")
+            end,
+        })
+        table.insert(menu_items.dictionary_settings.sub_item_table, 2, {
+            text_func = function()
+                local display_name = _("none")
+                local ext_id = G_reader_settings:readSetting("external_dict_lookup_method")
+                for i, v in ipairs(Device:getExternalDictLookupList()) do
+                    if v[1] == ext_id then
+                        display_name = v[2]
+                        break
+                    end
+                end
+                return T(_("Dictionary: %1"), display_name)
+            end,
+            enabled_func = function()
+                return G_reader_settings:isTrue("external_dict_lookup")
+            end,
+            sub_item_table = genExternalDictItems(),
+            separator = true,
+        })
+    end
 end

 function ReaderDictionary:onLookupWord(word, box, highlight, link)
@@ -710,6 +757,11 @@ function ReaderDictionary:stardictLookup(word, dict_names, fuzzy_search, box, li
         })
     end

+    if Device:canExternalDictLookup() and G_reader_settings:isTrue("external_dict_lookup") then
+        Device:doExternalDictLookup(word, G_reader_settings:readSetting("external_dict_lookup_method"))
+        return
+    end
+
     if fuzzy_search then
         self:showLookupInfo(word)
     end

(Had to be put into ReaderDictionary:stardictLookup() for the word to be added to lookup history.)

image

image

image

May be @Frenzie will want to implement that on the sdl/linux side with a list of common linux dictionary apps (if there are some, dunno).
Might need a bit more thinking on the Device methods and settings names.

And what to do with the word highlighted?
With internal dict lookup, it statys highlighted until all dict windows are closed.
On android, with that intent thing, I guess you can't know when the user is done with the lookup app and back to KOReader, to clear the highlight. So, may be clear it before launching the intent?

@Frenzie
Copy link
Member

Frenzie commented Jun 14, 2019

Even on mobile I have my doubts compared to plain copy paste, but on desktop it makes significantly less sense. Maybe emulator exclusive for testing if desired ? The dictionary could just be launching a browser link to Wiktionary or something.

@pazos
Copy link
Member

pazos commented Jun 15, 2019

If you were requested our help with the UI stuff :), here's something that should allow you to get going with the android part (dummy implementation for the emulator)

Thanks a lot for the help!!!
I will try your patch and report how it behaves

And what to do with the word highlighted?
With internal dict lookup, it statys highlighted until all dict windows are closed.
On android, with that intent thing, I guess you can't know when the user is done with the lookup app and back to KOReader, to clear the highlight.

There are some ways to know when the lookup activity ends:

We can know what happened to the intent by overriding onActivityResult.
When you invoke startActivity(intent) KOReader activity* is paused and the new takes focus.

(*) KOReader activity is managed by the Android Framework. We have another thread running the luajit vm and our frontend. That main thread is running without caring about application state.

So, may be clear it before launching the intent?

Please, because the other solution involves a boolean value holding isLookup, overriding onActivityResult to set isLookup to false and lua code polling while android.isLookup() == true

@pazos
Copy link
Member

pazos commented Jul 3, 2019

@poire-z: I just tested your diff and it works great.

I think I managed to have some form of universal dictionary apk finder, but sadly I'm not sure how to convert a java List<String,String,String> to a lua table and I hate spending time on jni stuff, so meh.

The table of supported dicts is the way to go for me at the moment. We need to refresh the table when the activity is resumed (because an user can install/uninstall software in between). I implemented a few specific methods for Aard2 and ColorDict and general methods for ACTION_SEND and ACTION_SEARCH.

Now I just need to find some dictionaries. I think I will start to add support for those already present in F-Droid. Any suggestion?

@Frenzie
Copy link
Member

Frenzie commented Jul 4, 2019

Many of the Wiktionaries have achieved reasonable quality.

@pazos
Copy link
Member

pazos commented Jul 4, 2019

Many of the Wiktionaries have achieved reasonable quality.

Cool, I've added https://f-droid.org/en/packages/de.reimardoeffinger.quickdic/

@poire-z
Copy link
Contributor

poire-z commented Jul 4, 2019

We need to refresh the table when the activity is resumed (because an user can install/uninstall software in between).

Is the packages-installed-checks expensive?
Is that really needed to do it each time KOReader is put to foreground? Once some dict are installed, we may never install any new ones. And if the user installs one and doesn't see it in the menu, he will naturally think that it needs a real quit/restart.

@didierm
Copy link

didierm commented Jul 4, 2019

@poire-z: I just tested your diff and it works great.
...
Now I just need to find some dictionaries. I think I will start to add support for those already present in F-Droid. Any suggestion?

FYI, I personally have installed the following (F-Droid hosted) dictionaries / information sources :

  • Aard (aarddict.android)
  • Aard 2 (itkach.aard2)
  • Wikipedia (org.wikipedia)a
  • WikipOff (fr.renzo.wikipoff)
  • Wiktionary (org.wiktionary)

I guess Aard 2 (https://f-droid.org/en/packages/itkach.aard2/) is the most important one, as it provides access to locally installed information such as (in my case) WordNet, WikiWoordenboek (nl), Wiktionary (en), Wiktionary (nl), and Babylon English-Dutch.

@pazos
Copy link
Member

pazos commented Jul 4, 2019

FYI, I personally have installed the following (F-Droid hosted) dictionaries / information sources :

  • Aard (aarddict.android)
  • Aard 2 (itkach.aard2)
  • Wikipedia (org.wikipedia)a
  • WikipOff (fr.renzo.wikipoff)
  • Wiktionary (org.wiktionary)

Aard2 is currently supported. Aard is not updated since 2012 and currently deprecated.

Are the rest of packages intended for an offline lookup?. I think Wikipedia / Wiktionary are not.

@didierm
Copy link

didierm commented Jul 4, 2019

No, they aren't, AFAIK.
Aard 2 fits the bill.

@pazos
Copy link
Member

pazos commented Jan 24, 2021

Back to OP: libzim integration (as a dict, not as a wikipedia replacement) looks straightfoward. The main blocker are their prerrequisites: https://github.com/openzim/libzim#dependencies

@NiLuJe
Copy link
Member

NiLuJe commented Jan 24, 2021

Yeah, ICU is going to be a problem: it's a ginormous binary if unfiltered.

@pazos
Copy link
Member

pazos commented Jan 24, 2021

There's also a barebones implementation in https://github.com/imapersonman/ZimReader/tree/master/ZimReader that I just discovered. Didn't play much with it but works fine to read the header:

Header read successfully
Magic Number: 72173914
Version: 5
UUID Lower: 15199123272871078998
UUID Upper: 7095794308258182112
Article Count: 2198
Cluster Count: 6
URL Pointer Position: 6473059
Title Pointer Position: 6490643
Cluster Pointer Position: 6499435
Mime List Position: 80
Main Page: 2050
Layout Page: -1
Checksum Position: 6499483
Geo Index Position: 18446744073709551615
Mime List read successfully
Pointer Lists read successfull

It relies on LZMA only and could work as a minimal reader.

@kiicia
Copy link

kiicia commented Jul 2, 2021

hi, is there any progress (and will to do so) with reading zim* files by KOreader?

*) or any other format allowing for reading wikipedia offline

@Frenzie
Copy link
Member

Frenzie commented Jul 2, 2021

I believe https://github.com/ilius/pyglossary can convert from Slob and Zim (among others) to StarDict.

@kiicia
Copy link

kiicia commented Jul 2, 2021

@Frenzie thanks for reply and suggestion, I'll try and see if it feasible to do so, it would be perfect wikipedia reader

@niutech
Copy link

niutech commented Nov 21, 2021

Does KOReader handle huge ePub files well, like loading them only partially? If so, could we use PyGlossary to convert Wikipedia dumps (or ZIM files) to ePub (or MOBI) files with hyperlinks between articles and lazy-load only the interesting article?

@Frenzie
Copy link
Member

Frenzie commented Nov 21, 2021

Does KOReader handle huge ePub files well, like loading them only partially?

It does not. But note that "huge" doesn't really matter except in the number of elements sense. That is, you can have a couple hundred kilobyte EPUB that's "huge" and you can have one that's 1 GB that's not. But a Wikipedia dump is huge by any metric, of course.

@anarcat
Copy link
Contributor

anarcat commented Apr 26, 2024

moving out of #9534 where i feel i'm becoming too noisy... i've just bought a 512GB microSD card for basically 50$USD here (well, 80$CAD which is 60$ or so) which can not only fit all of wikipedia, with images, but it can fit it multiple times, then have my book collection, then some more.

so, really, there's room in there.

in #9534, there's a debate on whether we should implement libzim support or some other library time. the PR itself implements its own sqlite database that needs to be converted from ZIM files.

i don't feel this is the simplest or even best approach.

instead, i think a quick fix would be to have, "simply", a crude web browser plugin. power users like me could deploy their own kiwix-serve backend and files, figure out how to launch that at boot, and then connect to it with the web browser. it wouldn't be as seamless as the dictionary or current (online) wikipedia plugin, but it would actually have a working offline wikipedia, something i've been dreaming of for basically forever (since, anyways, kobo kind of implemented it by accident and then pulled it away from us forever).

has anyone considered that approach? just make a simple web browser and let the user setup kiwix?

oh, and i also wanted to mention other prior art, mostly concerning nickel:

https://a3nm.net/blog/kobo_glo_hacking.html
https://phire.cc/Offline-Wikipedia-on-the-Kobo.html

both of those take a similar approach: they deploy the precompiled arm kiwix binaries on the kobo, then patch the nickel binary to allow the web browser to run offline, and boom, you're done.

surely we can do better than binary patching here! :)

@pazos
Copy link
Member

pazos commented Apr 27, 2024

in #9534, there's a debate on whether we should implement libzim support or some other library time. the PR itself implements its own sqlite database that needs to be converted from ZIM files.

i don't feel this is the simplest or even best approach.

It isn't :)
IMHO its code is good and easy to maintain, it serves a purpose and it's already done. So no bad either :)

has anyone considered that approach? just make a simple web browser and let the user setup kiwix?

This-does-not-compute :d

There's no such thing as a "simple web browser".

We have a pretty good engine (without javascript and ffmpeg), thanks to @poire-z's work on crengine, and even then we're using MuPDF for the html widgets (I mean, surely there's a reason for that, I have no clue :))

And a browser is much more than a renderer.

best approach.

I don't know if kiwix-serve has a consumer endpoint. If so use it directly.

If kiwix doesn't provide an API then it is painful to parse the http response. So better to read the zim file directly.
In both cases you'll need unicode support, which is huge.

Building and shipping ICU is the only roadblock I can see for 1st party support.

If you want to discuss 3rdparty, that's another story. Nowadays with go it should be trivial to write a zim reader and package it as a binary for the various ABI's we support

@anarcat
Copy link
Contributor

anarcat commented Apr 27, 2024

If kiwix doesn't provide an API then it is painful to parse the http response

i'm not sure i understand this... kiwix does provide a binary API through the library but it also provides an HTTP server, which, for me, seems simpler to implement. we already do implement something like this for the normal wikipedia plugin, no? we pull the page from wikipedia over HTTP?

but i'm a bit out of my depth here...

@anarcat
Copy link
Contributor

anarcat commented Apr 27, 2024

oh, and another thing i didn't realize is that /mnt/onboard gets forcibly mounted as vfat by kobo's rcS thing, which messes up any attempt at reformatting that partition into something sane. i tried patching the boot script to remove -t vfat but that uh... seems to have broken my kobo, i think it's doing a factory reset now... oops. :) (good thing i was working on a copy!)

@Frenzie
Copy link
Member

Frenzie commented Apr 27, 2024

(It also puts into perspective concerns about ICU's size, IMHO. I don't know about how it would impact koreader, but here on Debian, the package is 37... megabytes...)

In terms of storage that's fairly inoffensive, but you could also phrase that as a doubling of the download size. (But of course I wouldn't be saying that if we were talking about going from 100 kB to 200 kB.)

i'm not sure i understand this... kiwix does provide a binary API through the library but it also provides an HTTP server, which, for me, seems simpler to implement. we already do implement something like this for the normal wikipedia plugin, no? we pull the page from wikipedia over HTTP?

We use https://wikimedia.org/api/rest_v1/ to retrieve exactly what we want. There are no "Readability" type heuristics involved to ignore parts of the page.

For Zim it's basically just stored webpages, so it's a bit different (read: harder). I think they're not actually the webpages but dynamically rendered from a Mediawiki data dump at the time of creation so they're probably at least somewhat optimized, but I haven't kept a close eye on how they've evolved the past couple of years.


If you want to discuss 3rdparty, that's another story. Nowadays with go it should be trivial to write a zim reader and package it as a binary for the various ABI's we support

There's https://github.com/akhenakh/gozim btw.

@anarcat
Copy link
Contributor

anarcat commented Apr 29, 2024 via email

@pazos
Copy link
Member

pazos commented Apr 29, 2024

It looks like kiwix-serve provides an opds api that mimics the workflow used to retrieve entries from lzma/zstd blobs in zim files (explained in https://wiki.openzim.org/wiki/ZIM_file_format)

So an interested third party would be able to use kiwix-serve, hosted on the same device that runs KO or in any other place of the network, to retrieve info.

For our PoV makes no sense to rely on middleware, which involves a few extra setup steps, and should be straightforward to implement proper zim support directly if somebody takes care of integrating the missing bits (ICU, xapian and libzim itself) to the build system and check than that doesn't add too much size to our binaries. (No idea what's sensible, maybe 3-5MB in size?)

I don't know about how it would impact koreader, but here on Debian, the package is 37... megabytes...)

The debian package is less than 30MiB in size, uncompressed is more than 100MiB, mostly fonts, hyphenation, translations and libraries:

20M	/usr/lib/koreader/fonts/
30M	/usr/lib/koreader/l10n/
8,1M	/usr/lib/koreader/data/hyph/
28M	/usr/lib/koreader/libs/

Other platforms are even worse than that because at least in debian we repurpose most fonts from the repos.

So, a few mb for a library that's not mandatory to read might be too much to spend. Not sure :/

@anarcat
Copy link
Contributor

anarcat commented Apr 29, 2024

It looks like kiwix-serve provides an opds api that mimics the workflow used to retrieve entries from lzma/zstd blobs in zim files (explained in https://wiki.openzim.org/wiki/ZIM_file_format)

So an interested third party would be able to use kiwix-serve, hosted on the same device that runs KO or in any other place of the network, to retrieve info.

Okay, well that's promising at first glance, but digging a little deeper, it doesn't seem to work very well. I can talk to the root endpoint, but from there, most links end up in dead ends. The "all entries" links to a single PNG image, and the categories are a similar dead end.

The easiest way to test this is to run kiwix on another machine and use the Koreader OPDS plugin to browse it from outside. But even browsing it from another machine leads to similar dead end, it's not an issue specific to Koreader's OPDS plugin, which otherwise works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants