Skip to content

Commit

Permalink
My first scraper
Browse files Browse the repository at this point in the history
  • Loading branch information
equivalentideas committed Nov 20, 2018
1 parent fb8dc00 commit 5e34200
Show file tree
Hide file tree
Showing 4 changed files with 53 additions and 41 deletions.
1 change: 1 addition & 0 deletions .ruby-version
@@ -0,0 +1 @@
2.5.3
2 changes: 1 addition & 1 deletion Gemfile
Expand Up @@ -4,7 +4,7 @@

source "https://rubygems.org"

ruby "2.0.0"
ruby "2.5.3"

gem "scraperwiki", git: "https://github.com/openaustralia/scraperwiki-ruby.git", branch: "morph_defaults"
gem "mechanize"
44 changes: 27 additions & 17 deletions Gemfile.lock
Expand Up @@ -10,38 +10,48 @@ GIT
GEM
remote: https://rubygems.org/
specs:
domain_name (0.5.24)
connection_pool (2.2.2)
domain_name (0.5.20180417)
unf (>= 0.0.5, < 1.0.0)
http-cookie (1.0.2)
http-cookie (1.0.3)
domain_name (~> 0.5)
httpclient (2.6.0.1)
mechanize (2.7.3)
httpclient (2.8.3)
mechanize (2.7.6)
domain_name (~> 0.5, >= 0.5.1)
http-cookie (~> 1.0)
mime-types (~> 2.0)
mime-types (>= 1.17.2)
net-http-digest_auth (~> 1.1, >= 1.1.1)
net-http-persistent (~> 2.5, >= 2.5.2)
nokogiri (~> 1.4)
net-http-persistent (>= 2.5.2)
nokogiri (~> 1.6)
ntlm-http (~> 0.1, >= 0.1.1)
webrobots (>= 0.0.9, < 0.2)
mime-types (2.5)
mini_portile (0.6.2)
net-http-digest_auth (1.4)
net-http-persistent (2.9.4)
nokogiri (1.6.6.2)
mini_portile (~> 0.6.0)
mime-types (3.2.2)
mime-types-data (~> 3.2015)
mime-types-data (3.2018.0812)
mini_portile2 (2.3.0)
net-http-digest_auth (1.4.1)
net-http-persistent (3.0.0)
connection_pool (~> 2.2)
nokogiri (1.8.5)
mini_portile2 (~> 2.3.0)
ntlm-http (0.1.1)
sqlite3 (1.3.10)
sqlite_magic (0.0.3)
sqlite3 (1.3.13)
sqlite_magic (0.0.6)
sqlite3
unf (0.1.4)
unf_ext
unf_ext (0.0.7.1)
webrobots (0.1.1)
unf_ext (0.0.7.5)
webrobots (0.1.2)

PLATFORMS
ruby

DEPENDENCIES
mechanize
scraperwiki!

RUBY VERSION
ruby 2.5.3p105

BUNDLED WITH
1.17.1
47 changes: 24 additions & 23 deletions scraper.rb
@@ -1,25 +1,26 @@
# This is a template for a Ruby scraper on morph.io (https://morph.io)
# including some code snippets below that you should find helpful
require 'scraperwiki'
require 'mechanize'

# require 'scraperwiki'
# require 'mechanize'
#
# agent = Mechanize.new
#
# # Read in a page
# page = agent.get("http://foo.com")
#
# # Find somehing on the page using css selectors
# p page.at('div.content')
#
# # Write out to the sqlite database using scraperwiki library
# ScraperWiki.save_sqlite(["name"], {"name" => "susan", "occupation" => "software developer"})
#
# # An arbitrary query against the database
# ScraperWiki.select("* from data where 'name'='peter'")
agent = Mechanize.new

# You don't have to do things with the Mechanize or ScraperWiki libraries.
# You can use whatever gems you want: https://morph.io/documentation/ruby
# All that matters is that your final data is written to an SQLite database
# called "data.sqlite" in the current working directory which has at least a table
# called "data".
# Read in a page
page = agent.get("http://www.rfs.nsw.gov.au/fire-information/fdr-and-tobans")

# Locate our table
table = page.at('table.danger-ratings-table')

# Look at the first row
table.search('tbody tr').each do |table_row|
# Create an object with the bits we want
fire_area = {
area_name: table_row.search('td')[0].text,
fire_danger_today: table_row.search('td')[1].text,
councils_effected: table_row.search('td')[-1].text
}

# print out the object for debugging perposes
puts fire_area

# Save it to the database
ScraperWiki.save_sqlite([:area_name], fire_area)
end

0 comments on commit 5e34200

Please sign in to comment.