Skip to content
Anti-Automation System
CSS PHP JavaScript
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
css
images Add files via upload Nov 15, 2019
js
LICENSE Initial commit Nov 15, 2019
README.md Update README.md Nov 15, 2019
bg-header.png Add files via upload Nov 15, 2019
employees.php Add files via upload Nov 15, 2019
gallery.php
index.php
limbo.php Add files via upload Nov 15, 2019

README.md


Predator
Predator

Anti-Automation System

Introduction

Predator is a prototype web application designed to demonstrate anti-crawling, anti-automation & bot detection techniques. It can be used a honeypot, anti-crawling system or a false positive test bed for vulnerability scanners.

Warning: I strongly discourage the use of the demonstrated methods in a production server without knowing what they exactly do. Remember, only the techniques which seem usable according to the web application should be implemented. Predator is a collection of techniques, its code shouldn't be used as is.

The mind map below is a loosely made visualization of how the techniques demonstrated here can be implmented in a production environment.

workflow

Note: The numbers and factors in "Observation Phase" can be used to set a reputation to a client which then can be used a strong indicator of malicious activity once a threshold is hit.

Techniques Used

Bot Detection

User-Agent and Header Inspection

HTTP headers sent by bots are often in different order when compared to a real browser or lack altogether. Many bots disclose themselves in the User-Agent header for the sake of ethics while others don't send one at all.

Webdriver Detection

Most of the HTML mutation techniques described here can be bypassed with browser based frameworks such as selenium and puppeteer but they can be detected with various tests as implemented in isBot.js.

Resource Usage

Most of the bots only make requests to webpages and images but resources files such as .css are often ignored as they aren't downloaded by the HTTP implementation in use. Bots can be detected when the ratio of webpages/images and such resource files becomes higher than a predefined threshold.

Malformed HTML

A lot of HTML parsers used in crawlers can't handle broken HTML as browsers do. For example, clicking the following link in a browser leads to page_1 but affected parsers parse the latter value i.e. page_2

<a/href="page_1"/href="page_2">Click</a>

It can be used to keep off and ban crawlers without affecting user experience.

Invisible Links

Some links are hidden from users using CSS but automated progarms can still see them. These links can be used to detect bots and take a desired action such as banning the IP address.

Bait Links

When Predator suspects that the visitor is a bot, it generates random number of random links which direct to a page (limbo.php) containing more random links and this process keeps repeating.

Signature Reversing

Vulnerability scanners usually enter a payload and see if the webapp responds in a certain way. Predator can pretend to have a vulnerability by including exptected response i.e. signature within HTML.
Predator mimics the followiwng vulnerabilities at the moment:

  • SQL Injection
  • Cross Site Scripting (XSS)
  • Local File Inclusion (LFI)

This method makes it possible to set up a honeypot without actually hosting any vulnerable code and serves as a test bed for false positive testing.

Credits

PatheticGeek did all the front-end magic to make Predator look good.

You can’t perform that action at this time.