Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word pages look unformatted #129

Open
dbogdanov opened this issue Jul 15, 2020 · 6 comments
Open

Word pages look unformatted #129

dbogdanov opened this issue Jul 15, 2020 · 6 comments

Comments

@dbogdanov
Copy link

The pages for each particular work look unformatted with lots of metadata tags output as raw text.
Is this intentional? I've just installed the app and testing.

An example (EN.quickdic) rendered in QuickDic compared to the same Wiktionary page in Firefox Android:

drawing drawing

App version: 5.5.6

@rdoeffinger
Copy link
Owner

"Intentional" is the wrong word.
The wiktionary data looks like on the left side, and there is no easy to use/integrate code to convert it to the right side.
Support has been added for some specific, common ones. It would be possible to add support for some more, and for some others maybe just remove them (as they increase dictionary size without much benefit, for example online links are of somewhat questionable use in an offline dictionary).
It would be some work though, and only improve things, not completely fix it.

@Huy-Ngo
Copy link

Huy-Ngo commented Jan 1, 2021

I suppose Wikimedia should have the parser for this markup. Maybe you can import them?

@ilius
Copy link

ilius commented Jan 1, 2021

I have this problem as well in my Python tool: ilius/pyglossary#48

I think using .zim files (from Kiwix project) is the easiest way to use Wiktionary or Wikipedia offline.
There is libzim

@shaked6540
Copy link

There actually is an easy way to extract the formatted data using https://github.com/tatuylonen/wiktextract

@ilius
Copy link

ilius commented Jan 29, 2021

That tool simply downloads the rendered HTML from Wiktionary website one entry at a time.
It does not render it.
It's also in Python. This is a Java project.

@shaked6540
Copy link

You use it to extract the information which you can then convert to the same format this dictionary is using, making it human readable. I'm using it in my app, there's no readme yet but you can compile and see for yourself how its much cleaner and readerable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants