Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: able to download a website historically while only saving the 1st successful page? #70

Open
devinschumacher opened this issue Nov 26, 2023 · 2 comments

Comments

@devinschumacher
Copy link

any change to get a feature where we can download a site from a range of dates? for example 2015-Today to try and get every copy of a URL, but only save the most successful download?

the use case is im trying to get a website, but some pages are "blocked by cloudflare" on certain versions of archive.org

thanks!

@devinschumacher devinschumacher changed the title able to download a website historically while only saving the 1st successful page? question: able to download a website historically while only saving the 1st successful page? Nov 26, 2023
@jsvine
Copy link
Owner

jsvine commented Nov 27, 2023

I don't think waybackpack currently supports this, but would be open to a PR that adds it. One tricky bit might be defining a criteria for "successful", particularly if the HTTP status code does not make it clear.

@devinschumacher
Copy link
Author

devinschumacher commented Nov 28, 2023

I don't think waybackpack currently supports this, but would be open to a PR that adds it. One tricky bit might be defining a criteria for "successful", particularly if the HTTP status code does not make it clear.

yeah i was thinking that same thing about the criteria.

it would probably be a series of words/patterns that would get added to over time until it was reasonably comprehensive? might be some stuff in the the HTML tags as well i bet the meta title and description on pages like that would always give it away

what i normally see are things like Cloudflare, Login, Too Many Requests etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants