Skip to content

Commit

Permalink
Added the target of redirected urls to the list of crawled urls.
Browse files Browse the repository at this point in the history
  • Loading branch information
rojotek committed Jul 17, 2012
1 parent ca6196a commit 57da8c1
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions lib/crawl_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,11 @@ def self.perform(content_request)
# if there is no limit or we're still under it lets get the url
if within_crawl_limits?(content_request[:crawl_limit])
begin
# move the url from the queued list to the crawled list - for both the original url, and the content url (to handle redirects)
@redis.srem "queued", content_request[:url]
@redis.sadd "crawled", content_request[:url]
@redis.srem "queued", content[:url]
@redis.sadd "crawled", content[:url]
# increment the counter if we are not limiting by page only || we are limiting count by page and it is a page
if content_request[:crawl_limit_by_page]
if content[:mime_type].match("text/html")
Expand Down

0 comments on commit 57da8c1

Please sign in to comment.