Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve bot detection regex #21

Merged
merged 7 commits into from Sep 24, 2012
Merged

Conversation

lencioni
Copy link
Contributor

@lencioni lencioni commented Aug 5, 2012

This pull request improves the bot detection regex by:

  • adding a number of missing strings, such as alexa, facebookexternalhit, feedburner, google web preview, nagios, postrank, pingdom, slurp, yahoo!, and yandex
  • removing redundant strings such as googlebot and robot (redundant because of the inclusion of a general "bot" string)
  • consolidating similar strings ("crawler" and "crawling" became "crawl(er|ing)")
  • alphabetizing the list of bots

This commit improves the bot detection regex by:

  - adding a number of missing strings, such as alexa, facebookexternalhit, feedburner, nagios, postrank, pingdom, slurp, and yahoo!
  - removing redundant strings such as googlebot and robot (redundant because of the inclusion of a general "bot" string)
  - consolidate similar strings ("crawler" and "crawling" became "crawl(er|ing)")
  - alphabetize the list of bots
To generate previews on the fly, Google uses the user-agent "Google Web Preview" (the fully-qualified user-agent you see in your server logs may change from time to time) to render images on demand. This commit adds "google web preview" to the bot detection regex.
This commit adds "yandex" to the list of bots, to detect the popular Russian search engine.
@lencioni
Copy link
Contributor Author

lencioni commented Aug 6, 2012

It looks like this may address Issues #14 and #18

@ghost
Copy link

ghost commented Aug 6, 2012

This would be helpful to us at Causes.

lencioni and others added 4 commits August 6, 2012 15:33
This commit improves the bot detection regex by:

  - adding a number of missing strings, such as alexa,
    facebookexternalhit, feedburner, nagios, postrank, pingdom, slurp,
    and yahoo!
  - removing redundant strings such as googlebot and robot (redundant
    because of the inclusion of a general "bot" string)   - consolidate
    similar strings ("crawler" and "crawling" became "crawl(er|ing)")
  - alphabetize the list of bots
To generate previews on the fly, Google uses the user-agent "Google Web
Preview" (the fully-qualified user-agent you see in your server logs may
change from time to time) to render images on demand. This commit adds
"google web preview" to the bot detection regex.
This commit adds "yandex" to the list of bots, to detect the popular
Russian search engine.
@twobitlabs
Copy link

+1 for this!

@kevinelliott
Copy link
Owner

I'll get this tested and merged in very soon!

On Sep 14, 2012, at 4:03 PM, Two Bit Labs notifications@github.com wrote:

+1 for this!


Reply to this email directly or view it on GitHub.

kevinelliott added a commit that referenced this pull request Sep 24, 2012
Improve bot detection regex
@kevinelliott kevinelliott merged commit fcc453f into kevinelliott:master Sep 24, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants