-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read Wikimedia Dumps #48
Comments
Hi |
@ilius Yes, absolutely. Read support is good enough. 👍 |
any update on this? |
This is my incomplete (but kind of working) code I'm not sure I would have enough time to finish it |
Well done. Thanks! |
Does this read from html dump only? Seems like static html dumps are abandoned. |
Yes, unfortunately, looks that way |
For now you may want to use slob dictionaries for Wikipedia: |
I tried to implement this. Also Wikipedia is too big to be one glossary, even for not so popular languages (the size of xml for Persian is 5.6 Gigabytes!) I'm gonna give up on this sadly! Wiktionary on the other hand, should be doable. There are already a lot of glossaries converted from Wiktionary here: |
I added a basic support for Wiktionary dump. |
Hello ilius and all developers and users. I have a clarification related to wikidumps: There are multiple files for each wiktionary founded in dump wikipedia site. The site for arwiktionary as an example: I converted these files after extraction: I was able to know this after I tried a lot with the other files in this wikidump site with no benefit. The converted file is good for me because it contains the basic translation for any word other than the converted slob files which contain many unusable details. One simple issue, that the produced file will contain some unusable headwords like "category: somthing..", but for now it's amazing solution. Thanks to illius for updating and adding the wikidump plugin yesterday. Amazing work 👍. |
I think you should download pages-articles.xml |
Is this working with small sized Wikipedias or just with Wiktionaries? |
Let's focus on Wiktionary for now. |
I have to add something: I converted wikimedia dump xml "enwikionary" today to (txt, ifo and also slob) and used the coverted file to translate some words (e.g. "google"), then I unfortunately discovered it contains a lot of non understanded words in the definations. Verb ⚫︎ : ''Tom '''googles''' all of his prospective girlfriends.'' ⚫︎ * {{quote-av|en|title=w|Buffy the Vampire Slayer This is completely frustrating, I just need to see a clear، corrected translation of any word instead of this! On the other hand, the translations found in the Slob files which I converted before, were completely organized (but unfortunately some important wikidumps there are outdated and some are not included) |
Hi @ilius I think there is something wrong either with the dumps themselves, or with file I downloaded from there (page articles, may be one file is not enough to build a complete, clear dictionary!!), or could be because of the conversion tools. Really I don't know. |
Yes, templates are not rendered. |
Is there any close solution for this? |
Hi @ilius |
There is that project that is using the Wiktionary as input data: eBook Reader Dictionaries. We are working actively to generate more output files, see BoboTiG/ebook-reader-dict#409 using pyglossary. For now, only the Kobo is supported. |
We are now generating a |
So we have two alternatives:
Both solutions are much better than anything that I can include in PyGlossary anytime soon. So I'm going to remove the plugin and close this issue. |
how can zim be used to read wikdictionary dumps? |
You need to download zim files instead or wikdictionary dumps. |
Hi @ilius,
Thanks for developing this wonderful package! I would like to check whether this package is able to or has any plan to support wikidump. I think wikidump is some kind of glossary, and somehow can fit into the scope of this project.
Regards,
Longqi
The text was updated successfully, but these errors were encountered: