Skip to content
This repository has been archived by the owner on May 10, 2019. It is now read-only.

Commit

Permalink
Fork of code from ScraperWiki at https://classic.scraperwiki.com/scra…
Browse files Browse the repository at this point in the history
  • Loading branch information
mlandauer committed Jan 20, 2014
0 parents commit ac497e1
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitignore
@@ -0,0 +1,2 @@
# Ignore output of scraper
data.sqlite
1 change: 1 addition & 0 deletions README.md
@@ -0,0 +1 @@
Gets development applications for the "City of Ryde.":http://www.ryde.nsw.gov.au/development/pn.htm
28 changes: 28 additions & 0 deletions scraper.rb
@@ -0,0 +1,28 @@
require 'rubygems'
require 'mechanize'
require 'date'

url = 'http://www.ryde.nsw.gov.au/Development/Development+Applications/DAs+on+Exhibition/Received+Development+Applications'
agent = Mechanize.new

page = agent.get(url)

page.at('div.content-spacing').search('p').each do |p|
# Skip if this isn't a DA
next if p.search('strong').count < 3

record = {
'council_reference' => p.search('strong')[1].next.inner_text.gsub(': ', '').gsub('. ', '').strip,
'description' => p.search('strong')[2].next.next.next.inner_text.strip,
'address' => p.search('strong')[0].next.inner_text.gsub(': ', '').strip,
'info_url' => url,
'comment_url' => url,
'date_scraped' => Date.today.to_s
}

if ScraperWiki.select("* from swdata where `council_reference`='#{record['council_reference']}'").empty?
ScraperWiki.save_sqlite(['council_reference'], record)
else
puts "Skipping already saved record " + record['council_reference']
end
end

0 comments on commit ac497e1

Please sign in to comment.