Skip to content

Small Bash script to iterate over a list of URLs and download them incl. assets.

Notifications You must be signed in to change notification settings

jonasjacek/website-downloader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Website Downloader

Simple Bash script to download websites (including all assets needed to display them properly) locally. The script uses Wget to retrieve files.

Usage

On the command line:

  1. Clone repository. E.g. git clone git@gitlab.com:jonasjacek/website-downloader.git
  2. Go to repository. E.g. cd website-downloader/
  3. Add list of URL's to retrieve in website-downloader_urls.txt.
  4. Adjust script options as needed. See Options.
  5. Run website downloader. E.g. . website-downloader.sh

Options

  • --restrict-file-names=modes
    Change which characters found in remote URLs must be escaped during generation of local filenames. Values: [unix|windows]
  • -r, --recursive
    Turn on recursive retrieving. See Recursive Download, for more details. The default maximum depth is 5.
  • -x, --force-directories
    The opposite of -nd— create a hierarchy of directories, even if one would not have been created otherwise.
  • -k, --convert-links
    After the download is complete, convert the links in the document to make them suitable for local viewing.
  • -p, --page-requisites
    This option causes Wget to download all the files that are necessary to properly display a given HTML page.
  • -E, --adjust-extension
    If a file of type application/xhtml+xml or text/html is downloaded and the URL does not end with the regexp \.[Hh][Tt][Mm][Ll]?, this option will cause the suffix .html to be appended to the local filename.
  • --no-cache
    Disable server-side cache.
  • -w seconds, --wait=seconds
    Wait the specified number of seconds between the retrievals.
  • -e robots=off
    Ignore and do not download robots.txt files.
  • --show-progress
    Force wget to display the progress bar in any verbosity.
  • --progress=type
    Select the type of the progress indicator you wish to use. Legal indicators are “dot” and “bar”.
  • -i file, --input-file=file
    Read URLs from a local or external file.

Further Options

  • -np, --no-parent
    Do not ever ascend to the parent directory when retrieving recursively.
  • -H, --span-hosts
    Enable spanning across hosts when doing recursive retrieving (see Spanning Hosts).
  • -D domain-list, --domains=domain-list
    Set domains to be followed. domain-list is a comma-separated list of domains. Note that it does not turn on -H.
  • -a logfile, --append-output=logfile
    Append to logfile.
  • -q, --quiet
    Turn off Wget’s output.
  • -t number, --tries=number
    Set number of tries to number. Specify 0 or ‘inf’ for infinite retrying.
  • -nd, --no-directories
    Do not create a hierarchy of directories when retrieving recursively.
  • --no-check-certificate
    Don’t check the server certificate against the available certificate authorities.

Mirrors

You can find this repository at:


Warranty and Liability

Website Downloader is a small, private project. The author makes absolutely no claims and representations to warranties regarding the accuracy or completeness of the information provided. However, you can use the information in this repository AT YOUR OWN RISK.

License

Website Downloader by Jonas Jacek is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available upon request.

Contribute

Found a mistake? Open an issue or send a merge request. Want to help in another way? Contact me.

About

Small Bash script to iterate over a list of URLs and download them incl. assets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages