Skip to content

Commit

Permalink
Add some bots found in logs (#5518)
Browse files Browse the repository at this point in the history
* Dataprovider Bot

* sixy.ch

* SafeDNS Bot

* Netcraft extension

* MetaJobBot

* remove quotes in tests

* Qwantify Bot from Qwant
  • Loading branch information
HostedDinner authored and sgiehl committed Aug 3, 2016
1 parent 04fddf6 commit 720f7e0
Show file tree
Hide file tree
Showing 2 changed files with 102 additions and 8 deletions.
68 changes: 61 additions & 7 deletions Tests/fixtures/bots.yml
Original file line number Diff line number Diff line change
Expand Up @@ -408,6 +408,15 @@
producer:
name: Cốc Cốc
url: http://coccoc.com/
-
user_agent: Mozilla/5.0 (compatible; Dataprovider/6.92; +https://www.dataprovider.com/)
bot:
name: Dataprovider
category: Crawler
url:
producer:
name: Dataprovider B.V.
url: https://www.dataprovider.com/
-
user_agent: Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server;) Daumoa/4.0;
bot:
Expand Down Expand Up @@ -1135,6 +1144,15 @@
producer:
name: Meanpath
url: http://www.meanpath.com
-
user_agent: Mozilla/5.0 (compatible; MetaJobBot; http://www.metajob.de/crawler)
bot:
name: MetaJobBot
category: Crawler
url: http://www.metajob.at/the/crawler
producer:
name: MetaJob
url: http://www.metajob.at
-
user_agent: Mozilla/5.0 (compatible; MixrankBot; crawler@mixrank.com)
bot:
Expand Down Expand Up @@ -1193,6 +1211,15 @@
producer:
name: Netcraft
url: http://www.netcraft.com
-
user_agent: Netcraft SSL Server Survey - contact info@netcraft.com
bot:
name: Netcraft Survey Bot
category: Search bot
url:
producer:
name: Netcraft
url: http://www.netcraft.com
-
user_agent: Netvibes (http://www.netvibes.com)
bot:
Expand Down Expand Up @@ -1517,21 +1544,30 @@
producer:
name: QueryEye Inc.
url: http://queryeye.com
-
user_agent: ROI Hunter; https://api-dev.roihunter.com
-
user_agent: Mozilla/5.0 (compatible; Qwantify/2.2w; +https://www.qwant.com/)/*
bot:
name: ROI Hunter
name: Qwantify
category: Crawler
url:
url: https://www.qwant.com/
producer:
name: Roihunter a.s.
url: http://roihunter.com/
-
name: Qwant Corporation
url: https://www.qwant.com/
-
user_agent: Rainmeter WebParser plugin
bot:
name: Rainmeter
category: Crawler
url: https://www.rainmeter.net
-
user_agent: ROI Hunter; https://api-dev.roihunter.com
bot:
name: ROI Hunter
category: Crawler
url:
producer:
name: Roihunter a.s.
url: http://roihunter.com/
-
user_agent: Mozilla/5.0 (compatible; redditbot/1.0; +http://www.reddit.com/feedback)
bot:
Expand Down Expand Up @@ -1586,6 +1622,15 @@
producer:
name: SISTRIX GmbH
url: http://www.sistrix.de
-
user_agent: SafeDNSBot (https://www.safedns.com/searchbot)
bot:
name: SafeDNSBot
category: Crawler
url: https://www.safedns.com/searchbot
producer:
name: SafeDNS, Inc.
url: https://www.safedns.com/
-
user_agent: Scrapy/1.0.3.post6+g2d688cd (+http://scrapy.org)
bot:
Expand Down Expand Up @@ -1722,6 +1767,15 @@
producer:
name: Site24x7
url: https://www.site24x7.com
-
user_agent: sixy.ch/1.0
bot:
name: Sixy.ch
category: Site Monitor
url: http://sixy.ch
producer:
name: Manuel Kasper
url: https://neon1.net/
-
user_agent: Mozilla/5.0 (Windows NT 6.1; WOW64) SkypeUriPreview Preview/0.5
bot:
Expand Down
42 changes: 41 additions & 1 deletion regexes/bots.yml
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,14 @@
name: ''
url: ''

- regex: 'Dataprovider'
name: 'Dataprovider'
category: 'Crawler'
url: ''
producer:
name: 'Dataprovider B.V.'
url: 'https://www.dataprovider.com/'

- regex: 'Daum(oa)?[ /][0-9]'
name: 'Daum'
category: 'Search bot'
Expand Down Expand Up @@ -540,6 +548,14 @@
name: 'Meanpath'
url: 'http://www.meanpath.com'

- regex: 'MetaJobBot'
name: 'MetaJobBot'
category: 'Crawler'
url: 'http://www.metajob.at/the/crawler'
producer:
name: 'MetaJob'
url: 'http://www.metajob.at'

- regex: 'MixrankBot'
name: 'Mixrank Bot'
category: 'Crawler'
Expand Down Expand Up @@ -572,7 +588,7 @@
name: 'Jaroslav Kuboš'
url: ''

- regex: 'Netcraft Web Server Survey'
- regex: 'Netcraft (Web Server Survey|SSL Server Survey)'
name: 'Netcraft Survey Bot'
category: 'Search bot'
url: ''
Expand Down Expand Up @@ -692,6 +708,14 @@
name: 'QueryEye Inc.'
url: 'http://queryeye.com'

- regex: 'Qwantify'
name: 'Qwantify'
category: 'Crawler'
url: 'https://www.qwant.com/'
producer:
name: 'Qwant Corporation'
url: 'https://www.qwant.com/'

- regex: 'Rainmeter'
name: 'Rainmeter'
category: 'Crawler'
Expand Down Expand Up @@ -721,6 +745,14 @@
name: 'Roihunter a.s.'
url: 'http://roihunter.com/'

- regex: 'SafeDNSBot'
name: 'SafeDNSBot'
category: 'Crawler'
url: 'https://www.safedns.com/searchbot'
producer:
name: 'SafeDNS, Inc.'
url: 'https://www.safedns.com/'

- regex: 'Scrapy'
name: 'Scrapy'
category: 'Crawler'
Expand Down Expand Up @@ -814,6 +846,14 @@
name: 'SISTRIX GmbH'
url: 'http://www.sistrix.de'

- regex: 'sixy.ch'
name: 'Sixy.ch'
category: 'Site Monitor'
url: 'http://sixy.ch'
producer:
name: 'Manuel Kasper'
url: 'https://neon1.net/'

- regex: 'Slackbot|Slack-ImgProxy'
name: 'Slackbot'
category: 'Crawler'
Expand Down

0 comments on commit 720f7e0

Please sign in to comment.