Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Fetching latest commit…

Cannot retrieve the latest commit at this time

..
Failed to load latest commit information.
All.txt
clinttest.txt
readme.rst
test.txt
test_2.txt
test_duplicates.txt

readme.rst

Site Lists

All.txt - 18000+ sites compiled from alexa's topsites. This is the textfile
that should work in scrapy as a test of production-readiness.
test.txt - Custom page designed to be 3 pages deep, so that depth-limit can be
verified as well as no-follow on non-domain pages.
test_2.txt - Just google.com and mozilla.com, for a less controlled but larger
potential manual test.
Something went wrong with that request. Please try again.