-
Notifications
You must be signed in to change notification settings - Fork 732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing wikipedia zim - once and for all #914
Comments
If you do move forward on this, please consider adding the option to create a Wikipedia-based dictionary from dumps, i.e. something that would enable you to input, say, "The Terminator" and have GD output "Терминатор (фильм)" (if you've selected English->Russian). |
wikipedia_en_all_nopic_2017-08.zim, 64-bit Qt5-based GD - 8 minutes for indexing, 720 MB index file. Up to 7 GB memory consumption while indexing. What criterion do you suggest for reduce headwords handling? |
…ord headwords while indexing (issie #914)
I have add parameter to config file. |
Thanks. The problem is still there because:
|
No. Limit should be turned on only if GD can't index dictionary wthout one. 64-bit GD and 8+ GB RAM is not rare configuration nowadays. |
Fair enough. Thanks for fixing! 👍 Future Visitors:
anything between 2M and 10M should work. This reduces title indexing to be less extensive for files with more then 2M articles. So for example, with reduced indexing, if you search for "Alice" you'll have "Alice in wonderland" turn up, but it won't appear if you searching for "wonderland". With full-indexing, "wonderland" would also match the "Alice in wonderland" article. |
@darnn |
@jjzz That is, in fact, exactly what I want! I'm clueless when it comes to Linux, but I managed to run it as far as getting the txt and the dsl file, after which it gave an error on the dos2unix command. I copied over the files and changed the linebreak style to Windows, and turned on BOM, and Windows Goldendict now reads it! So, thank you so much! |
Previously discussed: #546 #680 #763
I'd like to see wikipedia zim files work out of the box with goldendict and I'm willing to do the work.
The solution (1) I have now is to:
addWord()
logic to index only the full (folded) title and not to generate additional entries on word boundaries - if the article count is large enough.how about it, @Abs62 ? I can send a PR for (1) soon if you're interested.
The text was updated successfully, but these errors were encountered: