Skip to content

Commit

Permalink
spider: outputs all of the unique URLs on a domain
Browse files Browse the repository at this point in the history
  • Loading branch information
robmiller committed Feb 13, 2018
1 parent c773358 commit fb3dc29
Showing 1 changed file with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions bin/spider
@@ -0,0 +1,14 @@
#!/bin/bash
#
# spider
#
# Author: Rob Miller <rob@bigfish.co.uk>
#
# Outputs all of the HTML pages on a given domain.


wget -r -nd --delete-after -w 0.1 "$1" 2>&1 |
grep -B3 text/html |
grep -B2 '200 OK' | egrep 'https?://' |
cut -d' ' -f3- |
sort | uniq

0 comments on commit fb3dc29

Please sign in to comment.