CrawlerDetect

About

CrawlerDetect is a Ruby version of PHP class @CrawlerDetect.

It helps to detect bots/crawlers/spiders via the user agent and other HTTP-headers. Currently able to detect 1,000's of bots/spiders/crawlers.

Why CrawlerDetect?

Comparing with other popular bot-detection gems:

	CrawlerDetect	Voight-Kampff	Browser
Number of bot-patterns	>1000	~280	~280
Number of checked HTTP-headers	10	1	1
Number of updates of bot-list (1st half of 2018)	14	1	7

In order to remain up-to-date, this gem does not accept any crawler data updates – any PRs to edit the crawler data should be offered to the original JayBizzle/CrawlerDetect project.

Requirements

Ruby: MRI 2.5+ or JRuby 9.3+.

Installation

Add this line to your application's Gemfile:

gem 'crawler_detect'

Basic Usage

CrawlerDetect.is_crawler?("Bot user agent")
=> true

Or if you need crawler name:

detector = CrawlerDetect.new("Googlebot/2.1 (http://www.google.com/bot.html)")
detector.is_crawler?
# => true
detector.crawler_name
# => "Googlebot"

Rack::Request extension

Optionally you can add additional methods for request:

request.is_crawler?
# => false
request.crawler_name
# => nil

It's more flexible to use request.is_crawler? rather than CrawlerDetect.is_crawler? because it automatically checks 10 HTTP-headers, not only HTTP_USER_AGENT.

Only one thing you have to do is to configure Rack::CrawlerDetect midleware:

Rails

class Application < Rails::Application
  # ...
  config.middleware.use Rack::CrawlerDetect
end

Rack

use Rack::CrawlerDetect

Configuration

In some cases you may want to use your own white-list, or black-list or list of http-headers to detect User-agent.

It is possible to do via CrawlerDetect::Config. For example, you may have initializer like this:

CrawlerDetect.setup! do |config|
  config.raw_headers_path    = File.expand_path("crawlers/MyHeaders.json", __dir__)
  config.raw_crawlers_path   = File.expand_path("crawlers/MyCrawlers.json", __dir__)
  config.raw_exclusions_path = File.expand_path("crawlers/MyExclusions.json", __dir__)
end

Make sure that your files are correct JSON files. Look at the raw files which are used by default for more information.

Development

You can run rubocop \ rspec with any ruby version using docker like this:

docker build --build-arg RUBY_VERSION=3.3 --build-arg BUNDLER_VERSION=2.5 -t crawler_detect:3.3 .
docker run -it crawler_detect:3.3 bundle exec rspec

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
.github/workflows		.github/workflows
bin		bin
lib		lib
spec		spec
.dockerignore		.dockerignore
.gitignore		.gitignore
.rspec		.rspec
.rubocop.yml		.rubocop.yml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
crawler_detect.gemspec		crawler_detect.gemspec
docker-compose.yml		docker-compose.yml
rspec-notes		rspec-notes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrawlerDetect

About

Why CrawlerDetect?

Requirements

Installation

Basic Usage

Rack::Request extension

Rails

Rack

Configuration

Development

License

About

Releases

Packages

Contributors 7

Languages

License

loadkpi/crawler_detect

Folders and files

Latest commit

History

Repository files navigation

CrawlerDetect

About

Why CrawlerDetect?

Requirements

Installation

Basic Usage

Rack::Request extension

Rails

Rack

Configuration

Development

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages