Crude SEO Spider
Provides a simple method of spidering a website to provide some basic url information to assist in Search Engine Optimisation.
- Detects duplicate content using MD5 hashes
- Shows HTTP status codes for each url
- Displays the response time and page size
- Follows redirects
- Export results to CSV format
- Supports the Robots Exclusion Protocol (robots.txt)
- Supports rel="nofollow" link attribute
For usage parameters run
First open and edit the spider.pl script and at the top set the full path to the lib directory.
Modify the options in the spider.conf file, each option is commented so it should be self explanatory.
Run the spider either by executing the script directly:
./spider.plOr by running the script through perl:
While the script is running it will provide information on the currently tracked urls and will be outputting the information to results.txt file.
To output to a CSV file provide the --csv=FILE perameter.