Skip to content
This repository has been archived by the owner on Jun 24, 2019. It is now read-only.

Commit

Permalink
Merge pull request #5 from LoveMyData/master
Browse files Browse the repository at this point in the history
improved code
  • Loading branch information
MichaelBone committed Aug 29, 2018
2 parents a84c048 + b66c1a8 commit 412275e
Show file tree
Hide file tree
Showing 4 changed files with 91 additions and 2 deletions.
10 changes: 10 additions & 0 deletions Gemfile
@@ -0,0 +1,10 @@
# It's easy to add more libraries or choose different versions. Any libraries
# specified here will be installed and made available to your morph.io scraper.
# Find out more: https://morph.io/documentation/ruby

source "https://rubygems.org"

ruby "~>2.4"

gem "scraperwiki", git: "https://github.com/openaustralia/scraperwiki-ruby.git", branch: "morph_defaults"
gem "mechanize"
57 changes: 57 additions & 0 deletions Gemfile.lock
@@ -0,0 +1,57 @@
GIT
remote: https://github.com/openaustralia/scraperwiki-ruby.git
revision: fc50176812505e463077d5c673d504a6a234aa78
branch: morph_defaults
specs:
scraperwiki (3.0.1)
httpclient
sqlite_magic

GEM
remote: https://rubygems.org/
specs:
connection_pool (2.2.2)
domain_name (0.5.20180417)
unf (>= 0.0.5, < 1.0.0)
http-cookie (1.0.3)
domain_name (~> 0.5)
httpclient (2.8.3)
mechanize (2.7.6)
domain_name (~> 0.5, >= 0.5.1)
http-cookie (~> 1.0)
mime-types (>= 1.17.2)
net-http-digest_auth (~> 1.1, >= 1.1.1)
net-http-persistent (>= 2.5.2)
nokogiri (~> 1.6)
ntlm-http (~> 0.1, >= 0.1.1)
webrobots (>= 0.0.9, < 0.2)
mime-types (3.2.2)
mime-types-data (~> 3.2015)
mime-types-data (3.2018.0812)
mini_portile2 (2.3.0)
net-http-digest_auth (1.4.1)
net-http-persistent (3.0.0)
connection_pool (~> 2.2)
nokogiri (1.8.4)
mini_portile2 (~> 2.3.0)
ntlm-http (0.1.1)
sqlite3 (1.3.13)
sqlite_magic (0.0.6)
sqlite3
unf (0.1.4)
unf_ext
unf_ext (0.0.7.5)
webrobots (0.1.2)

PLATFORMS
ruby

DEPENDENCIES
mechanize
scraperwiki!

RUBY VERSION
ruby 2.4.1p111

BUNDLED WITH
1.15.1
13 changes: 12 additions & 1 deletion README.md
@@ -1 +1,12 @@
This is a scraper that runs on [Morph](https://morph.io). To get started [see the documentation](https://morph.io/documentation)
# Brisbane Council scraper

* Accept Terms - Yes
* Cookie tracking - Yes
* Request actual DA page for data - Yes
* Clearly defined data within a row - No but reasonable

Setup MORPH_PERIOD for data recovery, available options are

* not set = past 14 days (default)
* thismonth
* lastmonth
13 changes: 12 additions & 1 deletion scraper.rb
@@ -1,6 +1,16 @@
require 'scraperwiki'
require 'mechanize'

case ENV['MORPH_PERIOD']
when 'lastmonth'
period = 'lastmonth'
when 'thismonth'
period = 'thismonth'
else
period = (Date.today - 14).strftime("%d/%m/%Y")+'&2='+(Date.today).strftime("%d/%m/%Y")
end

puts "Collecting data from " + period
# Scraping from Masterview 2.0

$agent = Mechanize.new
Expand Down Expand Up @@ -40,6 +50,7 @@ def scrape_page(page)

#p record
if (ScraperWiki.select("* from data where `council_reference`='#{record['council_reference']}'").empty? rescue true)
puts "Saving record " + record['council_reference'] + ' - ' + record['address']
ScraperWiki.save_sqlite(['council_reference'], record)
else
puts "Skipping already saved record " + record['council_reference']
Expand Down Expand Up @@ -68,7 +79,7 @@ def scrape_and_follow_next_link(page)
end
end

url = "https://pdonline.brisbane.qld.gov.au/MasterViewUI/Modules/ApplicationMaster/default.aspx?page=found&1=thismonth&6=F"
url = "https://pdonline.brisbane.qld.gov.au/MasterViewUI/Modules/ApplicationMaster/default.aspx?page=found&1="+period+"&6=F"



Expand Down

0 comments on commit 412275e

Please sign in to comment.