Skip to content
Browse files

outputs results from a single state page

  • Loading branch information...
1 parent aeb804b commit d3edc6ae8dffd3e69ca6c89405f645c2119340c1 @szTheory committed
Showing with 27 additions and 3 deletions.
  1. +2 −1 TODO
  2. +25 −2 scrape.rb
View
3 TODO
@@ -1,2 +1,3 @@
read in entries for one state (PA/NJ/etc)
-divide entries for state into type (installer, developer, etc)
+divide entries for state into type (installer, developer, etc)
+split up the addresses into state, zipcode, etc
View
27 scrape.rb
@@ -18,13 +18,36 @@ def self.scrape_page
# set form values
form = page.form("search")
- form["filter_equal.member_type.name"] = 'Distributor'
+ form["filter_equal.member_type.name"] = 'Contractor/Installer'
form["filter_equal.dm_seia_organization.address.state"] = 'PA'
form["filter_like.dm_seia_organization.description"] = ''
# submit form, get results page
result_page = agent.submit(form, form.buttons.first)
- puts result_page.body
+
+ # scrape entries from results page
+ entries = []
+ e = {}
+ result_page.search('.results p').each do |p|
+ className = p.attr('class')
+
+ # 1st/last in sequence for this entry?
+ is_first = className == 'company'
+ is_last = className == 'description'
+
+ # puts "#{p.attr('class')} => #{p.text}"
+
+ e = {} if is_first #new entry if we just started
+ e[className] = p.text.chomp #get next tag in sequence
+ entries << e if is_last #push the entry if we're done
+ end
+
+ # output each entry
+ entries.each_with_index do |e, i|
+ e.each_pair do |k, v|
+ puts "#{k} => #{v}"
+ end
+ end
end
# phone number regex (src: http://blog.stevenlevithan.com/archives/validate-phone-number#r4-2-v-inline)

0 comments on commit d3edc6a

Please sign in to comment.
Something went wrong with that request. Please try again.