Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI option --filename-conflict-action=skip should not attempt to download page if file already exists #800

Open
andrewdbate opened this issue Oct 18, 2021 · 1 comment

Comments

@andrewdbate
Copy link

@andrewdbate andrewdbate commented Oct 18, 2021

The current behavior of --filename-conflict-action=skip is as follows:

  1. download the page as usual
  2. if the file to be created already exists, do overwrite the file.

It would be more efficient to first check if the file already exists, and then only download the page if the file does not already exist.

This would support the following use case:

Suppose we use --urls-file to download a list of URLs. Some of those pages may fail to download (e.g., due to a network failure). In my experience, for a large list of URLs, it is likely that at least one page will fail to download. If there is an error downloading the page then no file will be created (at least this seems to be the behavior).

I was hoping to be able to use the options --filename-template="{url-pathname-flat}.html" and --filename-conflict-action=skip combined with the --urls-file option to be able to resume after an error. I was hoping that SingleFile would only attempt to download the URLs did not already have files.

However, with the current implementation, because SingleFile attempts to download the page again, this is too slow to be practical.

@gildas-lormeau
Copy link
Owner

@gildas-lormeau gildas-lormeau commented Oct 19, 2021

An optimization could indeed be done when the template does not contain variables depending on the content of the page. Note that by default the template contains a variable to get the title of the page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants