New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.download()
scalability ?
#51
Comments
|
What categories do you want to synchronize? |
Aim is to provide convenient dumps for each category in Category:Lingua_Libre_pronunciation. The point for WikiapiJS |
Well, it seems I need to do some works... |
Nice ! |
This scale up question is handled in two related issues: |
Hi there,
I'm using WikiapiJS to code a wikiapi-egg (script) which will download all Commons files from target categories. My 3 largest target categories currently have about 50k audios files each, files being of 1.5KB each. Do you know:
cmlimit=500
for regular users,cmlimit=5000
ifapihighlimits
userright.Scale up
It's to provides the public direct and convenient dumps of LinguaLibre's audio assets on a per language basis. We want to create periodic (weekly?) dumps on our Lili server.
We want to keep a local dump synchronized based on Wikimedia Commons. We are talking about 700,000 files so far. According to tests duration above, the initial synchronization would take 21 days, that is ok.
But the later "updates" a week later would require about 15 days while only 1~2% of new files (7,000-15,000) will require a download.
Do you have possible optimization at sight ?
WikiapiJS download worked on tiny categories (files =12). See #48 code.
I'm currently reluctant to test further by fear of being banned.
.download()
bentchmark (1)Ok, I decided to test anyway on a category with n=369.
The text was updated successfully, but these errors were encountered: