Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
t
Agency.pm
Makefile
README
agencies.yaml
agencies_schema.yaml
commit_updated.pl
cpanfile
cpanfile.snapshot
csv2tsv.pl
dump_tables.pl
find_contract.pl
list_of_agencies.html
log.conf
parks_canada_contracts.csv.bak
report.csv
scrape.py
scrape_agency.pl
store_contract.pl
temp.pl
verify_csv.pl

README

Before you can run the scraping scripts, you need to set env var GOAT_HOME to point to where you checked out goat.


Using Carton:

To install carton on either Mac OS X or Linux:
> sudo cpan install carton
> carton install

Running the scraper:
> carton exec perl scrape_agency.pl oag # this would scrape 'Office of the Auditor General of Canada'


Alternatively you could try installing the dependencies using ubuntu packages. NOT RECOMMENDED. OBSOLETE.

Prerequisites for Ubuntu Intrepid Ibex:
	sudo apt-get install libhtml-tableextract-perl \
	 libwww-mechanize-perl \
	 libdata-dump-streamer-perl \
	 libtest-exception-perl \
	 libjson-xs-perl \
	 libtest-warn-perl \
	 libtext-csv-xs-perl \
	 liblog-log4perl-perl \
	 libdatetime-perl \
	 libdate-calc-perl \
	 liblist-moreutils-perl \
	 libdatetime-set-perl \
	 libdatetime-format-builder-perl \
	 libreadonly-perl \
	 libtest-deep-perl \
	 libtest-differences-perl \
	 libtest-harness-perl \
	 libexception-class-perl \
     libperl6-slurp-perl
	
	sudo cpan -i DateTimeX::Easy \
	 YAML::LibYAML \
	 WWW::Mechanize::Pluggable \
	 WWW::Mechanize::Plugin::Retry
You can’t perform that action at this time.