Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support writing dictionaries for Kobo e-Readers #205

Closed
karlb opened this issue May 15, 2020 · 11 comments
Closed

Support writing dictionaries for Kobo e-Readers #205

karlb opened this issue May 15, 2020 · 11 comments
Assignees
Labels

Comments

@karlb
Copy link
Contributor

karlb commented May 15, 2020

The dictionary format of Kobo's e-Readers would a useful new output format for pyglossary. This could be base on [Penelope's kobo writer]. Using Penelope as a user is not a good option anymore, as that project is discontinued (see #44).

I would be interested in FreeDict/TEI to Kobo conversion. If anyone works on that, I can help with FreeDict knowledge and testing on my Kobo.

@ilius
Copy link
Owner

ilius commented May 15, 2020

Please use branch kobo and test.
First install dependency: sudo pip3 install marisa-trie
Then either use command line or GUI, and put NAME.kobo.zip as output file name.
Make sure NAME.kobo directory does not already exist.

Or maybe need to put NAME.kobo as output file name, and the make a zip file manually.

Please let me know
Thanks

@karlb
Copy link
Contributor Author

karlb commented May 15, 2020

Thanks for working on this! I used the swe-deu slob file from freedict as input to try it out. The CLI output looks good

karl@t480k:~/code/github/pyglossary (kobo)$ python3 main.py ~/Downloads/freedict-swe-deu-2020.02.08.slob swe-deu.kobo.zip
Using Reader class from Aard2Slob plugin for direct conversion without loading into memory

Writing to Kobo format requires full sort, falling back to indirect mode
Reading|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|100.0% Time: 00:00:07
Loaded 29847 entries
Writing to file "/home/karl/code/github/pyglossary/swe-deu.kobo"
Writing|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|100.0% Time: 00:00:03

Writing file "/home/karl/code/github/pyglossary/swe-deu.kobo.zip" done.
Running time of convert: 10.9 seconds

but the resulting swe-deu.kobo.zip contains only an empty swe-deu.kobo directory (like when calling zip with -r).

The swe-deu.kobo directory in the pyglossary folder contains reasonably looking html files (in the expected gzipped form), but is lacking an index.

I used this source to get information on what to expect. I don't know if that is correct and up to date.

@ilius
Copy link
Owner

ilius commented May 16, 2020

I think index.txt is only used to create words file, and itself is not part of output glossary.

@ilius
Copy link
Owner

ilius commented May 16, 2020

I pushed to kobo branch with a fix on empty zip file.
Please try again.

@karlb
Copy link
Contributor Author

karlb commented May 16, 2020

The zip works after naming it dicthtml-sv-de.zip. This naming scheme is required for the e-reader to pick up the dictionary.

First results show up on the reader, but the formatting is a bit off and I find much less words than I expected too. I'll look into into and report back when I have more details.

@karlb
Copy link
Contributor Author

karlb commented May 16, 2020

The slightly strange formatting comes from the input and is out of pyglossary's scope.

But the low number of found words is caused by the fact that only two letter words are found in the dictionary. Do the prefixes go into the index instead of the whole words?

ilius added a commit that referenced this issue May 16, 2020
@ilius
Copy link
Owner

ilius commented May 16, 2020

I just force pushed to kobo branch.
Please try again.
Thanks

@karlb
Copy link
Contributor Author

karlb commented May 16, 2020

Works great!

IMG_20200516_111359

@ilius
Copy link
Owner

ilius commented May 16, 2020

Thanks!
I pushed into master as well.

ilius added a commit that referenced this issue May 16, 2020
- no need to inherit from EbookWriter class
- no need for uuid info key
- no need for second argument to write_groups
- rename self.glos to self._glos
- pep8 style fixes
@ilius
Copy link
Owner

ilius commented May 16, 2020

I made some refactoring and also fixed unsafe file name (mostly for Windows)
I appreciate if you test once more

@karlb
Copy link
Contributor Author

karlb commented May 19, 2020

The current master works just as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants