-
-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't create Bulbapedia zim #149
Comments
I would try |
I ran it with the localparsoid flag but I still got this: http://termbin.com/v9s4 (the last 1000 lines) there's lots of warnings and then it just fails with a time out - I'm running it again but are the warnings before that bad or can I just ignore them |
@rashiq I come to a similar problem on my side
The problem is that it comes from Parsoid so not mwoffliner directly. @subbuss Any clue why we have a problem here? I find quite strange to have this |
Do you have the page title that produced the error? |
Oh never mind .. I see it in the error message. |
@subbuss could you figure out what's causing it? :) |
Sorry, not yet. I started and got distracted .. but, my quick comments are: |
A sample of time profile from Parsoid when I ran it locally.
Since ~30 secs of the profile time is unaccounted for, that is likely i/o wait time. So, my bet is that the reason for the long parse time is network i/o time and/or a slow mediawiki server. |
I have another wiki with the same timeout issue. Has anyone been able to fix this? |
@subbuss |
I would run again the command with |
@subbuss After X3 of all Parsoid timeouts I achieve to go further... but now it seems that mwoffliner crash on a new article "Battle_Frontier_(Generation_IV)/Pokémon_(Group_3,_001-251)". Parsoid seems simply unable to parse it properly (it is the only title in the ./bin/mwoffliner.script.js --mwUrl=https://bulbapedia.bulbagarden.net/ --adminEmail=rashiq@kiwix.org --withZimFullTextIndex --localParsoid --verbose --speed=0.1 --articleList=articles |
Locally on my laptop with the latest version of Parsoid, it parses in 35 s. parse.js --apiURL https://bulbapedia.bulbagarden.net/w/api.php --pageName "Battle_Frontier_(Generation_IV)/Pokémon_(Group_3,_001-251)" --trace time --dump wt2html:limits < /dev/null |
@subbuss @rashiq I have achieved to create a ZIM file on a more powerful system and with --speed=0.1 https://download.kiwix.org/zim/other/bulbagarden_en_all_2017-12.zim |
awesome! thank you so much @kelson42!! :) |
I'm trying to create a zim file of bulbapedia.bulbagarden.net but it's not working.
For test purposes here is a command you can try to only download only a single article:
mwoffliner --mwUrl=https://bulbapedia.bulbagarden.net/ --adminEmail=rashiq@kiwix.org --articleList articles.txt
articles.txt:
Output:
I'm using the mwoffliner docker image.
The text was updated successfully, but these errors were encountered: