Skip to content
master
Go to file
Code

Latest commit

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
bin
 
 
lib
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

CrawlerDetect

Build Gem Version

About

CrawlerDetect is a Ruby version of PHP class @CrawlerDetect.

It helps to detect bots/crawlers/spiders via the user agent and other HTTP-headers. Currently able to detect 1,000's of bots/spiders/crawlers.

Why CrawlerDetect?

Comparing with other popular bot-detection gems:

CrawlerDetect Voight-Kampff Browser
Number of bot-patterns >1000 ~280 ~280
Number of checked HTTP-headers 10 1 1
Number of updates of bot-list (1st half of 2018) 14 1 7

In order to remain up-to-date, this gem does not accept any crawler data updates – any PRs to edit the crawler data should be offered to the original JayBizzle/CrawlerDetect project.

Installation

Add this line to your application's Gemfile:

gem 'crawler_detect'

Basic Usage

CrawlerDetect.is_crawler?("Bot user agent")
=> true

Or if you need crawler name:

detector = CrawlerDetect.new("Googlebot/2.1 (http://www.google.com/bot.html)")
detector.is_crawler?
# => true
detector.crawler_name
# => "Googlebot"

Rack::Request extension

Optionally you can add additional methods for request:

request.is_crawler?
# => false
request.crawler_name
# => nil

It's more flexible to use request.is_crawler? rather than CrawlerDetect.is_crawler? because it automatically checks 10 HTTP-headers, not only HTTP_USER_AGENT.

Only one thing you have to do is to configure Rack::CrawlerDetect midleware:

Rails

class Application < Rails::Application
  # ...
  config.middleware.use Rack::CrawlerDetect
end

Rack

use Rack::CrawlerDetect

Configuration

In some cases you may want to use your own white-list, or black-list or list of http-headers to detect User-agent.

It is possible to do via CrawlerDetect::Config. For example, you may have initializer like this:

CrawlerDetect.setup! do |config|
  config.raw_headers_path    = File.expand_path("crawlers/MyHeaders.json", __dir__)
  config.raw_crawlers_path   = File.expand_path("crawlers/MyCrawlers.json", __dir__)
  config.raw_exclusions_path = File.expand_path("crawlers/MyExclusions.json", __dir__)
end

Make sure that your files are correct JSON files. Look at the raw files which are used by default for more information.

License

MIT License

About

Ruby gem to detect bots and crawlers via the user agent

Topics

Resources

License

Packages

No packages published
You can’t perform that action at this time.