Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download: cURL "collected" files in series #485

Merged
merged 1 commit into from
Jul 8, 2021

Conversation

missinglink
Copy link
Member

@missinglink missinglink commented Jul 8, 2021

As per discussion in #484 this PR changes the "collected" OA downloads (ie. openaddr-collected-global.zip and openaddr-collected-global-sa.zip) to run in series rather than in parallel.

The reason for this change is that the OA CDN has a "Maximum Connections Per IP" limit of 1.

Prior to this PR, cURL would intermittently receive an HTML file containing the text 503 Service Unavailable, when unzip attempted to open this file it would error the cryptic message End-of-central-directory signature not found.

The positive effect of this PR is that the downloads will no longer only succeed intermittently, the negative effect is that downloads will be slower since the second file isn't started until the first has complete.

I noticed that the "filtered download" (ie. where the user selects only a subset of the OA database) code is already using async.series().

Hopefully in the future we can rework this a bit and return to parallel downloads, the financial costs of hosting these downloads at scale can be significant, and abuse is widespread, so I understand the need for the IP limits.

resolves #484

@missinglink missinglink merged commit 1b60e97 into master Jul 8, 2021
@missinglink missinglink deleted the download_collected_series branch July 8, 2021 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Importer data download fails - OA now requires authentication
1 participant