wget Cheatsheet

Andrew Pennebaker


GNU wget is a command line Web page downloader


GNU wget manual

man wget


$ apt-get install wget

$ brew install wget

C:\> chocolatey install wget




Reference Dotfile

Supported protocols

  • HTTP
  • FTP

Basic usage

Download a Web page

$ wget

$ less index.html
<!doctype html>...

Handle special URL characters

$ wget "<URL>"

Save as

$ wget -O <filename> <URL>

Add a directory prefix

-P <name>

Hide progress info


Output to STDOUT

$ wget -qO- <URL> | less

Continue an interrupted download

$ wget [flags]

$ wget -c [flags]

Specify a user agent string

-U Firefox

Crawling Websites

wget offers -m to mirror a website, downloading a local copy of each remote file.

$ wget -m

wget users often use -mpNHk, a bundle of several options, to intuitively crawl a website.

$ wget -mpNHk

Traverse related domains


Specify allowed domains


Wait between requests

-w <seconds>

Convert links

wget can convert absolute links to local, so the mirror behaves as a more self-contained version of the original website.


Include related media files


Preserve timestamps


Save all files in the same directory


Limit traversal to child and sister directories


Whitelist directory patterns

-I img/,meme/

Blacklist directory patterns

-X thumbs/

Whitelist file patterns

-A .html,.htm,.jpg,.jpeg,.gif,.png

Blacklist file patterns

-R "*avatar*,*\?*,*_[0-9][0-9][0-9].*"

Example scripts

politerips offers example shell scripts for crawling websites with wget.

Unofficial protocols


Instead of hdfs://<host>:8020/<path>, use:

$ wget http://<host>:50075/streamFile/<path>


Instead of couchdb://<authority>, use:

$ wget http://<host>:5984/<authority>


  • curl outputs to STDOUT by default, making it a popular choice for debugging REST services.
  • lftp specializes in FTP transfers.
  • scp specializes in SSH file transfers.
  • WWW::Mechanize is a Perl library for fine-tuned Web crawling.