Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
NO LONGER IN USE
Python Shell
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
govuk_spider
.gitignore
README.md
jenkins.sh
requirements.txt
scrapy.cfg

README.md

GOV.UK Spider

This is a web spider which will crawl all of the pages on www.gov.uk that it can find and expose information relating to status responses and referrers.

Requirements

You will need:

  • Python 2.6 (or greater)
  • Scrapy v0.14.4

To install the project dependencies you can use pip with the following command:

pip install -r requirements.txt

Running the spider

Once you have the relevant dependencies you can view a list of the available spiders by running:

scrapy list

And then run a specific spider with:

scrapy crawl insert_spider_name

A (local) JSON output can be created using the following options:

scrapy crawl insert_spider_name -o items.json -t jsonlines
Something went wrong with that request. Please try again.