Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection refused / aborted / other errors with simplest CDX API GET request? #252

Open
WunderWungiel opened this issue Feb 2, 2023 · 0 comments

Comments

@WunderWungiel
Copy link

Hello everyone.
I am writing a small Python script for fetching files using specific syntax from Wayback Machine. For searching for them, I use CDX API, requests and RegEx.
Small sample of my code:
url = "https://web.archive.org/cdx/search/cdx?url=d.ovi.com/p/g/store/&matchType=prefix&collapse=urlkey&fl=original" response = requests.get(url, headers=headers, allow_redirects=True)
As you see, it's a simplest GET request to CDX API.

However, it seems like you think I am some kind of bot, spammer or attacker - about 9/10 requests end with errors like these:
Connection aborted, connection refused, Connect timeout (max retries exceeded with url: /cdx/...) and similar.

What's going on? Simplest GET request...

I will be thankful for any help / clue / tip.

Have a good day, and thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant