Skip to content

Commit

Permalink
Merge pull request planningalerts-scrapers#2 from LoveMyData/master
Browse files Browse the repository at this point in the history
Added support for data recovery
  • Loading branch information
MichaelBone committed Nov 8, 2018
2 parents 13de2d5 + 2d9d1c2 commit f550fd0
Show file tree
Hide file tree
Showing 4 changed files with 92 additions and 2 deletions.
6 changes: 6 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
source "https://rubygems.org"

ruby "~>2.4"

gem "scraperwiki", git: "https://github.com/openaustralia/scraperwiki-ruby.git", branch: "morph_defaults"
gem "mechanize"
57 changes: 57 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
GIT
remote: https://github.com/openaustralia/scraperwiki-ruby.git
revision: fc50176812505e463077d5c673d504a6a234aa78
branch: morph_defaults
specs:
scraperwiki (3.0.1)
httpclient
sqlite_magic

GEM
remote: https://rubygems.org/
specs:
connection_pool (2.2.2)
domain_name (0.5.20180417)
unf (>= 0.0.5, < 1.0.0)
http-cookie (1.0.3)
domain_name (~> 0.5)
httpclient (2.8.3)
mechanize (2.7.6)
domain_name (~> 0.5, >= 0.5.1)
http-cookie (~> 1.0)
mime-types (>= 1.17.2)
net-http-digest_auth (~> 1.1, >= 1.1.1)
net-http-persistent (>= 2.5.2)
nokogiri (~> 1.6)
ntlm-http (~> 0.1, >= 0.1.1)
webrobots (>= 0.0.9, < 0.2)
mime-types (3.2.2)
mime-types-data (~> 3.2015)
mime-types-data (3.2018.0812)
mini_portile2 (2.3.0)
net-http-digest_auth (1.4.1)
net-http-persistent (3.0.0)
connection_pool (~> 2.2)
nokogiri (1.8.5)
mini_portile2 (~> 2.3.0)
ntlm-http (0.1.1)
sqlite3 (1.3.13)
sqlite_magic (0.0.6)
sqlite3
unf (0.1.4)
unf_ext
unf_ext (0.0.7.5)
webrobots (0.1.2)

PLATFORMS
ruby

DEPENDENCIES
mechanize
scraperwiki!

RUBY VERSION
ruby 2.4.1p111

BUNDLED WITH
1.15.1
15 changes: 14 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,14 @@
This is a scraper that runs on [Morph](https://morph.io). To get started [see the documentation](https://morph.io/documentation)
# City of Ipswich Council scraper

* Accept Terms - Yes
* Cookie tracking - Yes
* Request actual DA page for data - No
* Clearly defined data within a row - No but reasonable

Setup MORPH_PERIOD for data recovery, available options are

* not set = past 14 days (default)
* thismonth
* lastmonth
* thisyear
* lastyear
16 changes: 15 additions & 1 deletion scraper.rb
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
require 'scraperwiki'
require 'mechanize'

case ENV['MORPH_PERIOD']
when 'lastyear'
period = (Date.today-365).strftime("1/1/%Y")+'&2='+(Date.today-365).strftime("31/12/%Y")
when 'thisyear'
period = (Date.today).strftime("1/1/%Y")+'&2='+(Date.today).strftime("31/12/%Y")
when 'lastmonth'
period = 'lastmonth'
when 'thismonth'
period = 'thismonth'
else
period = (Date.today - 14).strftime("%d/%m/%Y")+'&2='+(Date.today).strftime("%d/%m/%Y")
end

puts "Collecting data from " + period
# Scraping from Masterview 2.0

def scrape_page(page, comment_url)
Expand Down Expand Up @@ -41,7 +55,7 @@ def click(page, doc)
end
end

url = "http://pdonline.ipswich.qld.gov.au/pdonline/modules/applicationmaster/default.aspx?page=found&1=thismonth&5=T&6=F"
url = "http://pdonline.ipswich.qld.gov.au/pdonline/modules/applicationmaster/default.aspx?page=found&1="+period+"&5=T&6=F"
comment_url = "mailto:plandev@ipswich.qld.gov.au"

agent = Mechanize.new
Expand Down

0 comments on commit f550fd0

Please sign in to comment.