Predator is a prototype web application designed to demonstrate anti-crawling, anti-automation & bot detection techniques. It can be used a honeypot, anti-crawling system or a false positive test bed for vulnerability scanners.
Warning: I strongly discourage the use of the demonstrated methods in a production server without knowing what they exactly do. Remember, only the techniques which seem usable according to the web application should be implemented. Predator is a collection of techniques, its code shouldn't be used as is.
The mind map below is a loosely made visualization of how the techniques demonstrated here can be implmented in a production environment.
Note: The numbers and factors in "Observation Phase" can be used to set a reputation to a client which then can be used a strong indicator of malicious activity once a threshold is hit.
User-Agent and Header Inspection
HTTP headers sent by bots are often in different order when compared to a real browser or lack altogether. Many bots disclose themselves in the User-Agent header for the sake of ethics while others don't send one at all.
Most of the HTML mutation techniques described here can be bypassed with browser based frameworks such as
puppeteer but they can be detected with various tests as implemented in isBot.js.
Most of the bots only make requests to webpages and images but resources files such as
.css are often ignored as they aren't downloaded by the HTTP implementation in use. Bots can be detected when the ratio of webpages/images and such resource files becomes higher than a predefined threshold.
A lot of HTML parsers used in crawlers can't handle broken HTML as browsers do. For example, clicking the following link in a browser leads to
page_1 but affected parsers parse the latter value i.e.
It can be used to keep off and ban crawlers without affecting user experience.
Some links are hidden from users using CSS but automated progarms can still see them. These links can be used to detect bots and take a desired action such as banning the IP address.
When Predator suspects that the visitor is a bot, it generates random number of random links which direct to a page (
containing more random links and this process keeps repeating.
Vulnerability scanners usually enter a payload and see if the webapp responds in a certain way. Predator can pretend
to have a vulnerability by including exptected response i.e. signature within HTML.
Predator mimics the followiwng vulnerabilities at the moment:
- SQL Injection
- Cross Site Scripting (XSS)
- Local File Inclusion (LFI)
This method makes it possible to set up a honeypot without actually hosting any vulnerable code and serves as a test bed for false positive testing.
PatheticGeek did all the front-end magic to make Predator look good.