New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Add many more downloadable dictionaries #4401

Merged
merged 4 commits into from Dec 21, 2018

Conversation

Projects
None yet
3 participants
@Frenzie
Copy link
Member

Frenzie commented Dec 14, 2018

Virtually the entire list that was originally on Sourceforge, except for some dictionaries with unclear or questionable licences.

Many thanks to @avsej who prepared it all.

See #3176 (comment)

[feat] Add many more downloadable dictionaries
Virtually the entire list that was originally on Sourceforge, except for some dictionaries with unclear or questionable licences.

Many thanks to @avsej who prepared it all.

See #3176 (comment)

@Frenzie Frenzie added the enhancement label Dec 14, 2018

lang_in = "Russian",
lang_out = "Russian",
entries = 65372,
license = "See translation for <<00-database-...>> ",

This comment has been minimized.

@Frenzie

Frenzie Dec 14, 2018

Member

Might need further review.

url = "https://gitlab.com/avsej/dicts-stardict-form-xdxf/raw/264aadf8/002d/stardict-comn_sdict_axm05_Vietnamese_English-2.4.2.tar.bz2",
},
{
name = "Webster's 1913 Dictionary",

This comment has been minimized.

@Frenzie

Frenzie Dec 14, 2018

Member

Duplicate.

url = "https://gitlab.com/avsej/dicts-stardict-form-xdxf/raw/264aadf8/002d/stardict-comn_sdict_axm05_webster_1913-2.4.2.tar.bz2",
},
{
name = "Webster's Revised Unabridged Dictionary (1913)",

This comment has been minimized.

@Frenzie

Frenzie Dec 14, 2018

Member

Duplicate.

This comment has been minimized.

@avsej

avsej Dec 14, 2018

Contributor

Not really. Because the number of articles are different.

This comment has been minimized.

@Frenzie

Frenzie Dec 14, 2018

Member

Not duplicates in the sense that they're exactly the same, but I think adding two inferior copies of almost the same dictionary to the list is confusing at best.

Remove Webster's duplicates
The GCIDE (GNU Collaborative International Dictionary of English), already included, is the extended version of it.
url = "https://gitlab.com/avsej/dicts-stardict-form-xdxf/raw/264aadf8/002c/stardict-comn_sdict05_afrikaans-english-2.4.2.tar.bz2",
},
{
name = "Albanian-English dictionary",

This comment has been minimized.

@Frenzie

Frenzie Dec 14, 2018

Member

We should probably change all of these to something like FREELANG Lang A – Lang B for clarity.

This comment has been minimized.

@Frenzie

Frenzie Dec 14, 2018

Member

I was looking for the FREELANG license and it's not permissive. I'll send them an e-mail, but these may have to be excluded.

@avsej

This comment has been minimized.

Copy link
Contributor

avsej commented Dec 14, 2018

@Frenzie I have updated list https://gitlab.com/avsej/dicts-stardict-form-xdxf/blob/0c5ec5f6768410e34ea9d6f3c35fe303ab0c1d84/dictionaries.lua (but old list will also work, because of SHA1 in permalinks)

Now it uses just .tar instead of tar.bz2. Using bzip2 does not save a lot of space, because .dz dicts are already compressed. Also the device need to have buzip2 installed, which at least Cervantes does not have.

I've noticed here

koreader/frontend/util.lua

Lines 709 to 711 in e044093

if archive:match("%.tar%.bz2$") or archive:match("%.tar%.gz$") or archive:match("%.tar%.lz$") or archive:match("%.tgz$") then
ok = os.execute(("./tar xf %q -C %q"):format(archive, extract_to))
else

You use tar xf instead of tar axf. I wonder how it even works when you tested your lzip-compressed dicts. Again, Cervantes does not have lzip installed either.

@Frenzie

This comment has been minimized.

Copy link
Member

Frenzie commented Dec 14, 2018

All good points, switching to plain tar probably makes the most sense since even best-case compression doesn't reduce it by more than about 5 %. (On the flipside, adding gzip compression here also reduces it by some 5 %, so an argument can certainly be made in its favor.)

As for -a, I believe that's only relevant when compressing? Whatever the case, it makes no difference for extracting with our GNU Tar 1.30. It automatically extracts the files correctly with xf.

@avsej

This comment has been minimized.

Copy link
Contributor

avsej commented Dec 14, 2018

the .tar files contain dictionaries in .dz format, which already gzipped. So no point in having bzip2

@Frenzie

This comment has been minimized.

Copy link
Member

Frenzie commented Dec 14, 2018

Something like stardict-comn_sdict_axm05_Ukrainian_English-2.4.2 would be 6 MB without compression and is 2 MB with bz2 compression. The .dict.dz file may yield no or slightly negative returns, but the 4 MB .idx file compresses down to a little over 1 MB in gzip. And it goes down much further still with LZMA-type compressions:

screenshot_2018-12-14_11-46-52

So my 5 % was a slightly misguided example based on the specific dictionary I happened to be looking at, although it's likely to be generally more representative.

Now on my personal internet connection a few megabytes more or less are negligible, but the difference in filesize can certainly be quite pronounced.

@avsej

This comment has been minimized.

Copy link
Contributor

avsej commented Dec 14, 2018

I can repackage dicts in any format, but if koreader will guarantee that its ./tar command will support it. How about bundling xz for example?

@avsej

This comment has been minimized.

Copy link
Contributor

avsej commented Dec 14, 2018

@Frenzie, I've compressed everything with gzip (assuming it is available everywhere). New list is here (current master): https://gitlab.com/avsej/dicts-stardict-form-xdxf/blob/9ccdaaac22dc52b4168c71a2025323d2d0691dd8/dictionaries.lua

@Frenzie

This comment has been minimized.

Copy link
Member

Frenzie commented Dec 14, 2018

Thanks!

Note to self (and interested onlookers), link to FREELANG license for reference: https://www.freelang.net/dictionary/dic-copyrights.php

@poire-z

This comment has been minimized.

Copy link
Contributor

poire-z commented Dec 16, 2018

You could add keep_menu_open (#4189) to the languages list menu, so we meet it back when leaving the KeyValuePage with dicts for one language.

@Frenzie

This comment has been minimized.

Copy link
Member

Frenzie commented Dec 16, 2018

@Frenzie

This comment has been minimized.

Copy link
Member

Frenzie commented Dec 21, 2018

Freelang reiterated their license. It seems that to "dissociate" dictionaries, the permission of individual authors has to be obtained.

@Frenzie Frenzie merged commit c41dfc2 into koreader:master Dec 21, 2018

1 check passed

ci/circleci Your tests passed on CircleCI!
Details

@Frenzie Frenzie deleted the Frenzie:more-dicts branch Dec 21, 2018

Frenzie added a commit to Frenzie/koreader that referenced this pull request Dec 21, 2018

Frenzie added a commit that referenced this pull request Dec 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment