Website Downloader

Simple Bash script to download websites (including all assets needed to display them properly) locally. The script uses Wget to retrieve files.

Usage

On the command line:

Clone repository. E.g. git clone git@gitlab.com:jonasjacek/website-downloader.git
Go to repository. E.g. cd website-downloader/
Add list of URL's to retrieve in website-downloader_urls.txt.
Adjust script options as needed. See Options.
Run website downloader. E.g. . website-downloader.sh

Options

--restrict-file-names=modes
Change which characters found in remote URLs must be escaped during generation of local filenames. Values: [unix|windows]
-r, --recursive
Turn on recursive retrieving. See Recursive Download, for more details. The default maximum depth is 5.
-x, --force-directories
The opposite of -nd— create a hierarchy of directories, even if one would not have been created otherwise.
-k, --convert-links
After the download is complete, convert the links in the document to make them suitable for local viewing.
-p, --page-requisites
This option causes Wget to download all the files that are necessary to properly display a given HTML page.
-E, --adjust-extension
If a file of type application/xhtml+xml or text/html is downloaded and the URL does not end with the regexp \.[Hh][Tt][Mm][Ll]?, this option will cause the suffix .html to be appended to the local filename.
--no-cache
Disable server-side cache.
-w seconds, --wait=seconds
Wait the specified number of seconds between the retrievals.
-e robots=off
Ignore and do not download robots.txt files.
--show-progress
Force wget to display the progress bar in any verbosity.
--progress=type
Select the type of the progress indicator you wish to use. Legal indicators are “dot” and “bar”.
-i file, --input-file=file
Read URLs from a local or external file.

Further Options

-np, --no-parent
Do not ever ascend to the parent directory when retrieving recursively.
-H, --span-hosts
Enable spanning across hosts when doing recursive retrieving (see Spanning Hosts).
-D domain-list, --domains=domain-list
Set domains to be followed. domain-list is a comma-separated list of domains. Note that it does not turn on -H.
-a logfile, --append-output=logfile
Append to logfile.
-q, --quiet
Turn off Wget’s output.
-t number, --tries=number
Set number of tries to number. Specify 0 or ‘inf’ for infinite retrying.
-nd, --no-directories
Do not create a hierarchy of directories when retrieving recursively.
--no-check-certificate
Don’t check the server certificate against the available certificate authorities.

Mirrors

You can find this repository at:

Warranty and Liability

Website Downloader is a small, private project. The author makes absolutely no claims and representations to warranties regarding the accuracy or completeness of the information provided. However, you can use the information in this repository AT YOUR OWN RISK.

License

Website Downloader by Jonas Jacek is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available upon request.

Contribute

Found a mistake? Open an issue or send a merge request. Want to help in another way? Contact me.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
README.md		README.md
website-downloader.sh		website-downloader.sh
website-downloader_urls.txt		website-downloader_urls.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

website-downloader.sh

website-downloader.sh

website-downloader_urls.txt

website-downloader_urls.txt

Repository files navigation

Website Downloader

Usage

Options

Mirrors

Warranty and Liability

License

Contribute

About

Releases

Packages

Languages

jonasjacek/website-downloader

Folders and files

Latest commit

History

Repository files navigation

Website Downloader

Usage

Options

Mirrors

Warranty and Liability

License

Contribute

About

Resources

Stars

Watchers

Forks

Languages