You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in #22, Wikipedia has a limit of 2 concurrent connections and seems to rate limit each to about 4 MB/s. There are at least two mirrors of the Enterprise dumps.
For the fastest speeds, ideally we could share downloads between wikipedia and the mirrors, or even download different parts of the same file concurrently like aria2c.
Unfortunately, none of the parallel downloaders I've seen allow setting connection limits per host (e.g. 2 for dumps.wikimedia.org, 4 for the rest).
So besides writing our own downloader, to respect the wikimedia limits we could:
Keep the 2 threads limit and divide the files across the available hosts
Increase the 2 threads limit and only use dumps.wikimedia.org for two files
Increase the 2 threads limit and don't use dumps.wikimedia.org for any files
The text was updated successfully, but these errors were encountered:
The simplest is to only use a single host.
Beyond that, I think the second option would provide the best throughput increase and still be relatively straightforward.
As discussed in #22, Wikipedia has a limit of 2 concurrent connections and seems to rate limit each to about 4 MB/s. There are at least two mirrors of the Enterprise dumps.
For the fastest speeds, ideally we could share downloads between wikipedia and the mirrors, or even download different parts of the same file concurrently like
aria2c
.Unfortunately, none of the parallel downloaders I've seen allow setting connection limits per host (e.g. 2 for dumps.wikimedia.org, 4 for the rest).
So besides writing our own downloader, to respect the wikimedia limits we could:
The text was updated successfully, but these errors were encountered: