Converting this zim file failed #352

sobaee · 2022-01-09T20:03:36Z

Hello Saeed

Could you please help with this issue?

I have a zim file which has built from a website by using youzim.it website, it called "Essential drugs"

This file is working normally in kiwix.apk and anyone can search any drug and find its definition.

When I tried to convert this file to any format like slob it gives many errors and complete converting but a small file produced. When I open the converted dictionary slob by aard2.apk it contains strange headwords like these of html codes with strange definitions.

The errors:
[WARNING] Unrecognized mimetype='application/warc-headers' [ERROR] unknown content type for 'medicalguidelines.msf.org/viewport/EssDr/english/eflornithine-injectable-16682680.html'
[ERROR] unknown content type for 'update.googleapis.com/service/update2/json?cup2key=10:2622198263&cup2hreq=0338fb5b5cb30f0e5182132d1ff9620dec7fccf4020a155becf04a6b4c2d247a' [ERROR] unknown content type for 'Xapian Fulltext Index' [ERROR] unknown content type for 'Xapian Title Index' [INFO] ZIM Entry Count: 4334 [INFO] Empty Content Count: 2 [INFO] Redirect Count: 1 Converting | |█████████████|%100.0 Time: 0:00:04

The file download link:
https://s3.us-west-1.wasabisys.com/org-kiwix-zimit/other/medicalguidelines.msf.org_07e28661.zim

I think this file has the same idea of Wikipedia zim files that are succesfully converted without any problem.

Is there any possibility to convert like this file?

I also have this file in .epub and .mobi formats, is there any possibility to support reading from these formats?

ilius · 2022-01-09T23:27:13Z

Please try again.

Also can you upload your .epub and .mobi files?

sobaee · 2022-01-09T23:55:36Z

I get this error directly when i start converting

$python main.py essential-drugs.zim essential-drugs.slob
Traceback (most recent call last):
File "/storage/emulated/0/pyglossary-master/main.py", line 8, in
from pyglossary.ui.main import main
File "/storage/emulated/0/pyglossary-master/pyglossary/ui/main.py", line 30, in
from pyglossary.ui.base import UIBase
ImportError: cannot import name 'UIBase' from 'pyglossary.ui.base' (/storage/emulated/0/pyglossary-master/pyglossary/ui/base.py)
Traceback locals:
name = 'pyglossary.ui.main'
doc = None
package = 'pyglossary.ui'
loader = <_frozen_importlib_external.SourceFileLoader object at 0x780...
spec = ModuleSpec(name='pyglossary.ui.main', loader=<_frozen_importli...
file = '/storage/emulated/0/pyglossary-master/pyglossary/ui/main.py'
cached = '/storage/emulated/0/pyglossary-master/pyglossary/ui/__pycac...
len(cached) = 84
builtins = {'name': 'builtins', 'doc': "Built-in functions, e...
len(builtins) = 155
os = <module 'os' from '/data/data/com.termux/files/usr/lib/python3.10/os...
sys = <module 'sys' (built-in)>
argparse = <module 'argparse' from '/data/data/com.termux/files/usr/lib/p...
json = <module 'json' from '/data/data/com.termux/files/usr/lib/python3.1...
logging = <module 'logging' from '/data/data/com.termux/files/usr/lib/pyt...
core = <module 'pyglossary.core' from '/storage/emulated/0/pyglossary-mas...
Entry = <class 'pyglossary.entry.Entry'>

sobaee · 2022-01-09T23:57:48Z

Mobi and epub files:

https://drive.google.com/file/d/1yHqN8cSBsGcyqTvqfr_6Wc6_Foww923y/view?usp=drivesdk

https://drive.google.com/file/d/1yDK09bLzLWHETCwWvKmHuZmaCD-0h2vM/view?usp=drivesdk

sobaee · 2022-01-10T00:44:29Z

I used the last pyglossary the one before this to let it work and just replace zimfile.py plugin by the new one

This time I have the headwords the original beside those of html, the definitions of each of them are not correct, they show something like coding
See this:

I got this errors during convertion:
[ERROR] unknown content type for 'update.googleapis.com/service/update2/json?cup2key=10:2622198263&cup2hreq=0338fb5b5cb30f0e5182132d1ff9620dec7fccf4020a155becf04a6b4c2d247a' [ERROR] unknown content type for 'Xapian Fulltext Index' [ERROR] unknown content type for 'Xapian Title Index' [INFO] ZIM Entry Count: 4334 [ERROR] Files with name too long: 692 [INFO] Empty Content Count: 2 [INFO] Redirect Count: 1 Converting | |█████████████|%100.0 Time: 0:00:02

ilius · 2022-01-10T01:52:55Z

I tested with Aard2 Web (in desktop browser).
Images are not shown, but text is shown correctly.

Can you open the epub in mobile and search the drug name?
It's also got a nice list of drugs you can use to look up.
(But you have to keep going back to that page I guess, unless the reader app has a Back button!)

ilius · 2022-01-10T02:09:22Z

Reading the epub is definitely possible.
Each entry seems to be a separate html file.
I will try to do it.
But I'm not sure I can include it in PyGlossary, since it's too specific (one entry per html file).

sobaee · 2022-01-10T12:28:33Z

In bludict it still has no definition When converting zim to mdx

I get multiple errors during converting:
Traceback (most recent call last): File "/storage/emulated/0/pyglossary-master/pyglossary/entry.py", line 85, in save with open(fpath, "wb") as toFile: PermissionError: [Errno 1] Operation not permitted: '/storage/emulated/0/pyglossary-master/essential-drugs.mtxt_res/medicalguidelines.msf.org/s/e8c3fbfc487e50239343e141213e915a-CDN/-qljuxx/8402/45c55aec607bd3c0b24eb377ecd790d998a06033/e05c0ca06e5a38e49b9110818c14a22e/_/download/contextbatch/css/viewcontent,-_super/batch.css?highlightactions=true' [ERROR] error while saving /storage/emulated/0/pyglossary-master/essential-drugs.mtxt_res/update.googleapis.com/service/update2/json?cup2key=10:2622198263&cup2hreq=0338fb5b5cb30f0e5182132d1ff9620dec7fccf4020a155becf04a6b4c2d247a Traceback (most recent call last): File "/storage/emulated/0/pyglossary-master/pyglossary/entry.py", line 85, in save with open(fpath, "wb") as toFile: PermissionError: [Errno 1] Operation not permitted: '/storage/emulated/0/pyglossary-master/essential-drugs.mtxt_res/update.googleapis.com/service/update2/json?cup2key=10:2622198263&cup2hreq=0338fb5b5cb30f0e5182132d1ff9620dec7fccf4020a155becf04a6b4c2d247a' Traceback (most recent call last): File "/storage/emulated/0/pyglossary-master/pyglossary/entry.py", line 85, in save with open(fpath, "wb") as toFile: PermissionError: [Errno 1] Operation not permitted: '/storage/emulated/0/pyglossary-master/essential-drugs.mtxt_res/update.googleapis.com/service/update2/json?cup2key=10:2622198263&cup2hreq=0338fb5b5cb30f0e5182132d1ff9620dec7fccf4020a155becf04a6b4c2d247a' [INFO] ZIM Entry Count: 4334 [ERROR] Files with name too long: 692 [INFO] Empty Content Count: 2 [INFO] Redirect Count: 1 Converting | |█████████████|%100.0 Time: 0:00:17

Is there any possibility to get the definitions to appear?

sobaee · 2022-01-10T12:38:19Z

Is there any possibilityto have the produced dictionary either mdx, slob or ifo to work in mobile and in desktop just like wikipedia.zim files that has converted before with pyglossary?

Is there any progress about .epub converting?

Please consider this when you have time.

Thank you Saeed

ilius · 2022-01-10T13:55:46Z

I converted the epub to slob and StarDict:

https://mega.nz/file/w1oESToB#h3HzcHLz7dmWZaRcyPE_PXJGnEHgr-FshVOXrR16aXc

https://mega.nz/file/10pEkJKS#oXVUM6WeKtxYQfHn4X3f9hj5aN6_xw9SqJgMIZ_Am0U

Here is the plugin code:
https://gist.github.com/ilius/b5a4cbec5a81ff77557f4a54e7221692

sobaee · 2022-01-10T17:47:39Z

I appreciate that Saeed

Thank a lot 🙏

This epub plugin worked well with essential-drugs.epub but didn't work with other epub files which are from the same source like this:

https://medicalguidelines.msf.org/msf-books-hosting/16686604-English.epub

Ot this:
https://medicalguidelines.msf.org/msf-books-hosting/51415817-english.epub

The error:
[ERROR] Exception while calling plugin's write function
Traceback (most recent call last):
File "/storage/emulated/0/pyglossary-master/pyglossary/glossary.py", line 1214, in write
for entry in self:
File "/storage/emulated/0/pyglossary-master/pyglossary/glossary.py", line 406, in _readersEntryGen
for index, entry in enumerate(self._applyEntryFiltersGen(reader)):
File "/storage/emulated/0/pyglossary-master/pyglossary/glossary.py", line 417, in _applyEntryFiltersGen
for index, entry in enumerate(gen):
File "/storage/emulated/0/pyglossary-master/pyglossary/plugins/epub_ungrouped.py", line 77, in iter
title = doc.find(".//title").text
AttributeError: 'NoneType' object has no attribute 'text'
Traceback (most recent call last):
File "/storage/emulated/0/pyglossary-master/pyglossary/glossary.py", line 1214, in write
for entry in self:
File "/storage/emulated/0/pyglossary-master/pyglossary/glossary.py", line 406, in _readersEntryGen
for index, entry in enumerate(self._applyEntryFiltersGen(reader)):
File "/storage/emulated/0/pyglossary-master/pyglossary/glossary.py", line 417, in _applyEntryFiltersGen
for index, entry in enumerate(gen):
File "/storage/emulated/0/pyglossary-master/pyglossary/plugins/epub_ungrouped.py", line 77, in iter
title = doc.find(".//title").text
AttributeError: 'NoneType' object has no attribute 'text'
[ERROR] Writing file 'clinical-guidelines.txt' failed.

If this plugin is only used file by file, please tell me what to change inside this plugin code to make it suitable for any other epub file 🙏

I know this could be a lot of work, but if you get this plugin to work with all epub files, this will open the possibility to convert more and more dictionaries.

ilius · 2022-01-10T23:29:47Z

Updated https://gist.github.com/ilius/b5a4cbec5a81ff77557f4a54e7221692

sobaee · 2022-01-11T00:23:20Z

Updated https://gist.github.com/ilius/b5a4cbec5a81ff77557f4a54e7221692

Perfect
Thank you man 👍👍

ilius · 2022-01-11T03:05:17Z

No worries.

I'd like to know if you later test it and works with epubs from other sources as well (not generated by PyGlossary).

sobaee · 2022-01-11T14:03:25Z

No worries.

I'd like to know if you later test it and works with epubs from other sources as well (not generated by PyGlossary).

Looks like we need an epub file that have a separated html file for each entry for this to work, I tried it with more complicated epub file that has converted from pdf with its entries are the outlines (TOC) of the pdf, but this one didn't show any entry after conversion.

I will try with more original epub books (not converted or manipulated)

Thanks

ilius added a commit that referenced this issue Jan 9, 2022

zimfile: make improvements, #352

31a43ce

ilius added the Improvement label Jan 9, 2022

ilius added a commit that referenced this issue Jan 10, 2022

WIP: zimfile.py, #352

ad4c556

ilius added a commit that referenced this issue Jan 10, 2022

slob: add 2 mime types, #352

59bfd31

ilius added Q&A and removed Improvement labels Jan 21, 2022

ilius closed this as completed Feb 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting this zim file failed #352

Converting this zim file failed #352

sobaee commented Jan 9, 2022 •

edited

Loading

ilius commented Jan 9, 2022

sobaee commented Jan 9, 2022

sobaee commented Jan 9, 2022

sobaee commented Jan 10, 2022

ilius commented Jan 10, 2022

ilius commented Jan 10, 2022

sobaee commented Jan 10, 2022 •

edited

Loading

sobaee commented Jan 10, 2022

ilius commented Jan 10, 2022

sobaee commented Jan 10, 2022 •

edited

Loading

ilius commented Jan 10, 2022

sobaee commented Jan 11, 2022

ilius commented Jan 11, 2022

sobaee commented Jan 11, 2022

Converting this zim file failed #352

Converting this zim file failed #352

Comments

sobaee commented Jan 9, 2022 • edited Loading

ilius commented Jan 9, 2022

sobaee commented Jan 9, 2022

sobaee commented Jan 9, 2022

sobaee commented Jan 10, 2022

ilius commented Jan 10, 2022

ilius commented Jan 10, 2022

sobaee commented Jan 10, 2022 • edited Loading

sobaee commented Jan 10, 2022

ilius commented Jan 10, 2022

sobaee commented Jan 10, 2022 • edited Loading

ilius commented Jan 10, 2022

sobaee commented Jan 11, 2022

ilius commented Jan 11, 2022

sobaee commented Jan 11, 2022

sobaee commented Jan 9, 2022 •

edited

Loading

sobaee commented Jan 10, 2022 •

edited

Loading

sobaee commented Jan 10, 2022 •

edited

Loading