Skip to content

Commit

Permalink
Also scrape Ministers page
Browse files Browse the repository at this point in the history
Ministers can be appointed from outside the House, and automatically
become members, so we also need to scrape their page.

However, ministers who were already members have a page in both
sections, with a different ID on each, so we need to filter out any that
appear twice. (We assume here that no two members have the same name.)
  • Loading branch information
tmtmtmtm committed Sep 21, 2018
1 parent 0ec83d6 commit 460d4c9
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 2 deletions.
2 changes: 1 addition & 1 deletion lib/member_page.rb
Expand Up @@ -13,7 +13,7 @@ class MemberPage < Scraped::HTML
end

field :name do
noko.css('.page-header h2').text.tidy
noko.css('.page-header h2').text.sub('Hon. ', '').tidy
end

field :image do
Expand Down
6 changes: 5 additions & 1 deletion scraper.rb
Expand Up @@ -13,10 +13,14 @@ def scrape(h)
klass.new(response: Scraped::Request.new(url: url).response)
end

data = %w[peoples nobles].flat_map do |section|
alldata = %w[peoples nobles ministers].flat_map do |section|
url = "http://parliament.gov.to/members-of-parliament/#{section}/"
scrape(url => MembersPage).member_urls.map { |mem_url| scrape(mem_url => MemberPage).to_h }
end

# Some of the members appear on more than one page, so only take the
# first entry for each
data = alldata.group_by { |mem| mem[:name] }.map { |_m, ms| ms.first }
data.each { |mem| puts mem.reject { |_, v| v.to_s.empty? }.sort_by { |k, _| k }.to_h } if ENV['MORPH_DEBUG']

ScraperWiki.sqliteexecute('DROP TABLE data') rescue nil
Expand Down

0 comments on commit 460d4c9

Please sign in to comment.