Skip to content

Commit

Permalink
Merge pull request #9 from ephox/master
Browse files Browse the repository at this point in the history
Improved handling of redirects
  • Loading branch information
stewartmckee committed Jul 17, 2012
2 parents ca6196a + 57da8c1 commit a92c337
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions lib/crawl_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,11 @@ def self.perform(content_request)
# if there is no limit or we're still under it lets get the url
if within_crawl_limits?(content_request[:crawl_limit])
begin
# move the url from the queued list to the crawled list - for both the original url, and the content url (to handle redirects)
@redis.srem "queued", content_request[:url]
@redis.sadd "crawled", content_request[:url]
@redis.srem "queued", content[:url]
@redis.sadd "crawled", content[:url]
# increment the counter if we are not limiting by page only || we are limiting count by page and it is a page
if content_request[:crawl_limit_by_page]
if content[:mime_type].match("text/html")
Expand Down

0 comments on commit a92c337

Please sign in to comment.