New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Add dictionary download option #3176

Merged
merged 17 commits into from Dec 13, 2018

Conversation

Projects
None yet
7 participants
@Frenzie
Copy link
Member

Frenzie commented Sep 5, 2017

Last month I used Nickel for a couple of hours to see what it was like these days and it had one feature I missed in KOReader: downloading dictionaries. It's been in the back of my mind ever since, so last night I wrote this. It might still need some polish but I'm putting it up here to discuss the design (in code and UX) to avoid putting in extra effort.

Anyway, so there's a new menu entry in dictionary settings.
screenshot_2017-09-05_18-12-41

And then you can download pretty much all of the easily available freely licensed dictionaries I could find.
screenshot_2017-09-05_18-08-49

You download just by tapping on the name. I'm not entirely happy with that, but I prefer to reuse/abuse KeyValuePage if at all reasonably possible.

Pinging @lightonflux @KenMaltby @Markismus @Eduardomb22 for general comments

Pinging @poire-z as the resident dictionary expert who'll look at the code with a critical eye. ;-)

@@ -16,6 +17,115 @@ local T = require("ffi/util").template
-- so we only have to look for them on the first :init()
local available_ifos = nil

-- largely thanks to https://tuxor1337.github.io/firedict/dictionaries.html
local downloadable_dicts = {

This comment has been minimized.

@poire-z

poire-z Sep 5, 2017

Contributor

May be this could be imported from another file, for easier update by people?
And may be a LanguageSorter class that would make a first level menu by language, with the available dicts that have it as lang_in or lang_out :)

This comment has been minimized.

@Frenzie

Frenzie Sep 5, 2017

Member

The Czech dicts would get a double entry in the list? The zip includes both ways.

This comment has been minimized.

@poire-z

poire-z Sep 5, 2017

Contributor

They would appear only once in each of the English> and Czech> menu.

This comment has been minimized.

@Frenzie

Frenzie Sep 5, 2017

Member

That does seem like a good idea to sort them.

So drop the KeyValuePage idea alltogether and switch to individual MultiConfirmBoxes or something? Or per language KeyValuePage with slightly more generous info?

This comment has been minimized.

@poire-z

poire-z Sep 5, 2017

Contributor

Oh, right, this was KeyValuePage. I've been to much into Menu these last weeks :)
You could just add the languages as menu items, and they will launch a KeyValuePage just like you have there, with all the dicts related to this language.
Most users (and me) would probably only go see the dicts related to their mothertongue, and probably have a look at Latin and Greek (was ok to me that they translate to english and not to french).

end,
})
else
self:downloadDictionary(dict, download_location)

This comment has been minimized.

@poire-z

poire-z Sep 5, 2017

Contributor

Can we have a ConfirmBox before download starts?
Could be nice to have the size of what will be downloaded, so one can choose small dicts, or see it's too big to download now.

This comment has been minimized.

@Frenzie

Frenzie Sep 5, 2017

Member

Yes, (approximate) size might be more relevant than number of entries. There's a definite correlation between the two anyway.

@a-kohout

This comment has been minimized.

Copy link
Contributor

a-kohout commented Sep 5, 2017

There seems to be something wrong with "GNU Collaborative International English (GCIDE)" from the source you are using. It should contain more entries than Webster's 1913 (on which it builds), not less.
The version here seems to be better: http://download.huzheng.org/dict.org/
Also consider including "Moby Thesaurus II" as well as "Webster's Revised Unabridged Dictionary (1913)" which is quite popular and more widely known than GCIDE.
The licenses on this page are wrong though, "Moby" is Public domain, "Webster's" is Public domain (copyright lapsed)

@KenMaltby

This comment has been minimized.

Copy link

KenMaltby commented Sep 5, 2017

Unlike with Kobo's dictionaries, KOReader's use of standard stardict dictionaries means that there can be many sources to download from. Once you have them on your PC, the sideloading process seems simple enough, but I guess some sort of "Dictionary Manager" could provide a way to load and offload them, on the device. In any case I would differ to @Markismus , as a dictionary use expert.

@Frenzie

This comment has been minimized.

Copy link
Member

Frenzie commented Sep 5, 2017

@a-kohout

The licenses on this page are wrong though, "Moby" is Public domain, "Webster's" is Public domain (copyright lapsed)

I'm more concerned about all the stuff under the red line that is "Free to use", lol

@KenMaltby
If you just quickly want a dictionary it's nice to be able to get it.

@a-kohout

This comment has been minimized.

Copy link
Contributor

a-kohout commented Sep 5, 2017

Yeah, most dictionaries on this website are probably illegal.
Btw the en-en wiktionary on http://www.dictinfo.com/ is great as well.
Especially since it knows foreign words that are sometimes used in English.
The all languages or all western languages to English are rather slow (70mb+).
Wiktionary is GFDL/CC BY-SA 3.0

Once the ability to download is there, deleting dictionaries would be great as well (including sideloaded ones).
And a reordering option in lieu of the "info on dictionary order" would make sense as well.

@KenMaltby
The dictionary manager already offers enable/disable
A curated selection of high quality dictionaries is a big plus imho.
Finding the good ones out there takes time.

@KenMaltby

This comment has been minimized.

Copy link

KenMaltby commented Sep 6, 2017

I agree it would be great to be able to provide the user an easier way to obtain what they need to acquire a usable dictionary. This is a very good addition to the current sideloading process, (to cover all the other options out there). Another approach would be to provide a dictionary in the download.

@Frenzie

This comment has been minimized.

Copy link
Member

Frenzie commented Sep 6, 2017

Bundling doesn't really seem like a good option to me, except in the sense that we could consider hosting a few on koreader.rocks ourselves to ensure speed and availability. Even the smaller <5 MB ones that are little more than toys (e.g., Jargon File, CIA World Factbook) would add some serious extra bulk to the initial download if nothing else, not to mention the language issue.

@KenMaltby

This comment has been minimized.

Copy link

KenMaltby commented Sep 6, 2017

Agree, what you are providing here might be the most usable approach.
Luck;
Ken

@Frenzie Frenzie added the enhancement label Sep 16, 2017

@Frenzie Frenzie force-pushed the Frenzie:download_dic branch from 8fae6fc to 8859ad3 Sep 19, 2017

@Frenzie Frenzie force-pushed the Frenzie:download_dic branch 2 times, most recently from ed6e09b to a0e5920 Oct 23, 2017

@Frenzie

This comment has been minimized.

Copy link
Member

Frenzie commented Oct 23, 2017

@poire-z I got a bit distracted by Android but I pushed the stuff that was sitting around on my drive. The size is still missing and there's the zip conundrum, but the menu stuff is there. I'll have to look at the GH terms of service. It might be easiest and most dependable to stick 'em in a GH repo as tar.gz (or xz if our tar supports it).

screenshot_2017-10-23_17-13-25
screenshot_2017-10-23_17-13-52
screenshot_2017-10-23_17-13-58

end

function ReaderDictionary:downloadDictionaryPrep(dict, size, continue)
continue = continue or false

This comment has been minimized.

@Frenzie

Frenzie Oct 23, 2017

Member

Should stick a wifiprompt here.

@poire-z

This comment has been minimized.

Copy link
Contributor

poire-z commented Oct 23, 2017

Looks ok.
About "English/Czech", shouldn't these not make for such a menu entry, but appear both in the English and the Czech menu ?
Download dictionaries above or below Installed dictionaries ?

And, not related to this PR: shouldn't we be more positive, and replace
[ ] Disable something (default) / [x] Disable something
with
[x] Enable something (default) / [ ] Enable something
even if the settings is itself named disable_fuzzy_search or disable_lookup_history

@Frenzie

This comment has been minimized.

Copy link
Member

Frenzie commented Oct 23, 2017

Yes, that's a matter of changing the data. I only moved it for the moment.

Probably agreed on the options.

@Frenzie Frenzie force-pushed the Frenzie:download_dic branch 2 times, most recently from 0a7de67 to 10d6b1b Oct 24, 2017

@Frenzie Frenzie force-pushed the Frenzie:download_dic branch from d5dbfb9 to 801e623 Jan 31, 2018

@cramoisi

This comment has been minimized.

Copy link
Contributor

cramoisi commented Mar 23, 2018

Very often, I add words in my dicts (with csv -> penelope -> dict) and thinking about this, it should be great to have the possibility to download (or delete) our own dictionaries the same way we download books from ftp or Dropbox storage.

@Frenzie Frenzie force-pushed the Frenzie:download_dic branch from 801e623 to 21ae7f3 Dec 12, 2018

@Frenzie

This comment has been minimized.

Copy link
Member

Frenzie commented Dec 12, 2018

@poire-z I finally got around to a quick touch-up. I figure it should be basically ready to go, but could you have a quick peek to see if anything jumps out at you?

Edit: oh wait, I still have to deal with a couple of your remarks from last year. I'll do that tomorrow. :-)

Frenzie added some commits Dec 12, 2018

@Frenzie Frenzie changed the title [WIP] Add dictionary download option [feat] Add dictionary download option Dec 12, 2018

Frenzie added some commits Dec 12, 2018

@Frenzie

This comment has been minimized.

Copy link
Member

Frenzie commented Dec 12, 2018

Okay, I moved down the menu item and removed the crud. Ready for review.

screenshot_2018-12-12_22-27-39
screenshot_2018-12-12_22-27-46
screenshot_2018-12-12_22-27-58
screenshot_2018-12-12_22-28-08
screenshot_2018-12-12_22-28-18
screenshot_2018-12-12_22-28-23

@poire-z
Copy link
Contributor

poire-z left a comment

lgtm

local res = {}
for k,v in ipairs(lang) do
if not hash[v.name] then
res[#res+1] = v

This comment has been minimized.

@poire-z

poire-z Dec 12, 2018

Contributor

Not used to see this done that way in current code :) We're most often using table.insert(res, v)

@AlanSP1

This comment has been minimized.

@Frenzie Frenzie merged commit b261a64 into koreader:master Dec 13, 2018

1 check passed

ci/circleci Your tests passed on CircleCI!
Details
@Frenzie

This comment has been minimized.

Copy link
Member

Frenzie commented Dec 13, 2018

Incidentally, here are some potentially interesting links:

https://github.com/korhoj/wiktionary-convert/
http://www.dictinfo.com/

@AlanSP1 Thanks!

@avsej

This comment has been minimized.

Copy link
Contributor

avsej commented Dec 13, 2018

@Frenzie, could you also link dicts from here https://sourceforge.net/projects/xdxf/files/dicts-stardict-form-xdxf/?

For instance stardict-comn_sdict05_eng_rus_full-2.4.2.tar.bz2 says it is under GPL license.

StarDict's dict ifo file
version=2.4.2
wordcount=526873
idxfilesize=12272791
bookname=English-Russian full dictionary
date=2009.01.30
sametypesequence=x
description=Copyright: Converted by swaj under GNU Public License.; Version: 1.0
@Frenzie

This comment has been minimized.

Copy link
Member

Frenzie commented Dec 13, 2018

@avsej Sourceforge can't be used directly as a download source (see here for the list I added) but assuming the license is correct on that one a copy could hosted elsewhere (e.g., GitLab or our own server).

@Frenzie Frenzie deleted the Frenzie:download_dic branch Dec 13, 2018

@avsej

This comment has been minimized.

Copy link
Contributor

avsej commented Dec 13, 2018

@Frenzie I've uploaded dicts from sourceforge to gitlab. Maybe now you can add them to the list?
https://gitlab.com/avsej/dicts-stardict-form-xdxf/blob/master/dictionaries.lua

@Frenzie

This comment has been minimized.

Copy link
Member

Frenzie commented Dec 13, 2018

Wow, that's great work! Yes, that'll do the trick. You don't want to submit a PR yourself? Besides using tabs instead of spaces (our coding style, sorry ;-) ) that list is very good quality already and almost ready to go.

We can't ship include some of the more dubious licenses I spotted on that list of course, but most of it looks like it should be good.

@avsej

This comment has been minimized.

Copy link
Contributor

avsej commented Dec 13, 2018

I trust you to create proper PR, @Frenzie.

@Frenzie

This comment has been minimized.

Copy link
Member

Frenzie commented Dec 13, 2018

The more people contributing code/lists the merrier.

I can probably take a closer at it tomorrow. It'll take me a little while to look at the full list in more detail. But on the up(?)side, I figure I'd have to do that anyway.

Frenzie added a commit to Frenzie/koreader that referenced this pull request Dec 14, 2018

[feat] Add many more downloadable dictionaries
Virtually the entire list that was originally on Sourceforge, except for some dictionaries with unclear or questionable licences.

Many thanks to @avsej who prepared it all.

See koreader#3176 (comment)

Frenzie added a commit that referenced this pull request Dec 21, 2018

[feat] Add many more downloadable dictionaries (#4401)
* Virtually the entire list that was originally on Sourceforge, except for some dictionaries with unclear or questionable licences.

* Without the Webster's duplicates. The GCIDE (GNU Collaborative International Dictionary of English), already included, is the extended version of it.

* Without Freelang. To "dissociate" dictionaries, the permission of individual authors has to be obtained.

Many thanks to @avsej who prepared it all.

See #3176 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment