Skip to content

Commit

Permalink
refine the collection of topics to ignore blank ones
Browse files Browse the repository at this point in the history
  • Loading branch information
alterisian committed Jan 22, 2015
1 parent b6e744f commit 7226715
Showing 1 changed file with 5 additions and 18 deletions.
23 changes: 5 additions & 18 deletions ggscraper.rb
Expand Up @@ -84,28 +84,15 @@ def get_topics
# Would be useful to detect the <current_number> of <total_topics> or even human enter the number as part of the initiation.
# It seems that google groups initial load is 30 threads.
sleep(6) #wait for it to load

#topics = driver.find_elements(:class, 'GFP-UI5CPO')
topics = @driver.find_elements(:tag_name, 'a')
puts "#{topics.count} total topics found."

found_last_link = false #Add welcome message
puts "#{topics.count} total topics found."

topics.each do |topic|
#if topic.attribute(:href).startsWith("#!topic/") # it is a topic.
#TODO-put in above, as this will mean no need to parse out blanks
if topic.text.nil? or topic.text==""
puts "blank topic"
else
puts "#{topic.text}"

if found_last_link
populated_topics << topic
end

if topic.text == "Add welcome message"
found_last_link = true
end

if !topic.nil? and !topic.attribute(:href).nil? and topic.attribute(:href).include? "#!topic/" # it is a topic.
puts "#{topic.text}"
populated_topics << topic
end
end

Expand Down

0 comments on commit 7226715

Please sign in to comment.