Experimental web tracker for separating humans from other visitors
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
browser
etc
lib
public
scripts
views
.eslintrc
.gitignore
LICENSE
README.md
app.js
config.example.json
package-lock.json
package.json

README.md

Human tracker logo

Human tracker/session viewer/visitor analytics

An experimental web tracking system to separate humans from bots. Currently tries to use these conditions:

  • Time spent on page.
  • Session tracking with site-local cookie.
  • Number of scroll events.
  • Number of mouse events.
  • Bot-specific JavaScript global variables.

The main motivation for this custom system is referral spam on Google Analytics and other large services. Referral spam makes them almost useless for smaller sites.

Data is dumped into MySQL. Gateway is written in Node.js. There is no analytic part at the moment. Analytics must be done manually by executing MySQL queries.

Available data

Available data is documented by the database table schemas.

Pages table, one row per page view:

    `pid` CHAR(20) NOT NULL,
    `sid` CHAR(20) DEFAULT NULL,
    `ip` VARCHAR(255) NOT NULL,
    `location` VARCHAR(255) NOT NULL,
    `agent` VARCHAR(255) DEFAULT NULL,
    `referrer` VARCHAR(255) DEFAULT NULL,
    `platform` VARCHAR(255) DEFAULT NULL,
    `window_call_phantom` TINYINT(1) NOT NULL DEFAULT 0,
    `window_buffer` TINYINT(1) NOT NULL DEFAULT 0,
    `window_emit` TINYINT(1) NOT NULL DEFAULT 0,
    `window_spawn` TINYINT(1) NOT NULL DEFAULT 0,
    `window_webdriver` TINYINT(1) NOT NULL DEFAULT 0,
    `window_dom_automation` TINYINT(1) NOT NULL DEFAULT 0,
    `elapsed` INT NOT NULL DEFAULT 0,
    `scroll` INT NOT NULL DEFAULT 0,
    `mouse` INT NOT NULL DEFAULT 0,
    `added` BIGINT NOT NULL,
    PRIMARY KEY (`pid`)
  • pid - pageview id, used for referencing the times table.
  • sid - session-scoped random id, can be used for session tracking.
  • ip - could be all variants of IPv4 and IPv6. Not normalized.

Sessions table, one row per session:

    `site_id` VARCHAR(255) NOT NULL,
    `sid` CHAR(20) NOT NULL,
    `started` BIGINT NOT NULL,
    `start_page` CHAR(20) NOT NULL,
    `total_elapsed` INT NOT NULL DEFAULT 0,
    `total_mouse` INT NOT NULL DEFAULT 0,
    `total_scroll` INT NOT NULL DEFAULT 0,
    `page_count` INT NOT NULL DEFAULT 1,
    PRIMARY KEY (`sid`)
  • site_id is the site/domain-specific identifier.
  • elapsed is the total session length in seconds.
  • scroll is the number of scroll events occurred.
  • mouse is the number of mousemove events occurred.

App and proxy configuration

Copy config.example.json to config.json and modify the MySQL server configuration. Use sql/schema/schema.sql to create the initial schema. Use sql/schema/user.sql to create the database user (change password).

Setup overview:

Human tracker setup

Nginx proxy (tracked site)

For Nginx, in a server block:

location /human {
    proxy_pass http://ip:port/human;
    proxy_set_header X-Real-IP $remote_addr;
}

Where http://ip:port is the gateway server. Use http://127.0.0.1:30080 when running on the same machine and default port.

The site also needs to load human.js or human.min.js (available in browser directory). It is best server together with the rest of the site's static files. The script should be loaded in the footer of every page.

App config file

The app configuration file (config.json) has the following structure:

    "port": Integer,
    "session": SessionConfiguration,
    "db": DatabaseConfiguration,
    "viewer": ViewerConfiguration,
    "sites": [String],
    "filters": FiltersConfiguration,
    "users": UsersConfiguration
  • port - app's HTTP port.
  • sites - array of allowed site IDs.

Session configuration

{
    "key": String
}

The default session key should be changed right after installing the application.

Database configuration

{
    "host": "localhost",
    "user": "logging",
    "password": "logging123",
    "database": "logging"
}

Full information on the database configuration options can be found in the mysql package documentation. The application creates connection using a pool.

Viewer configuration

{
    "enabled": Boolean,
    "date_format": String
}
  • enabled - whether to enable the session viewer.
  • date_format - UI date format. Used through moment.js.

Filters configuration

{
    "site1_id": [String],
    "site2_id": [String]
}

Filters are treated as regular expressions on visitor IP addresses. When the /human endpoint is behind a reverse proxy then approriate header must be configured on the proxy to forward the app the visitor's actual IP (see the Nginx proxy configuration above).

Example filter string: "^192". This excludes all requests from IP addresses starting with 192.

Users configuration

[
    {
        "username": String,
        "salt": String,
        "hash": String,
        "sites": [String]
    }
]
  • sites - array of authorized site IDs.

Generating user password/salt hashes (SHA-1):

npm run password

and enter the password. The hash and salt value along with the chose username has to be manually copied into the configuration file.

Sources of ideas

Viewer configuration

Users

Generating user password/salt hashes:

npm run password

and enter the password. The hash/salt value has to be copied into config.json.

Debugging

Run as:

DEBUG=app node app.js

GDPR and cookie consent

Starting from version 1.2.0 it is required to set

window.privacyConsent = true;

for data to be sent to the server. It may set dynamically while the page is open (when visitor consent is queried with a popover for example).

IP address storage

IP aadress storage can be disabled by setting "noip": true in the configuration file.

Changelog

  • 2018-07-25 version 1.3.0 - option to disable IP address collection.
  • 2018-07-13 version 1.2.0 - requires window.privacyConsent to be set to truthy value.
  • 2018-01-20 version 1.1.0 - record duplicate page entries (GoogleBot has deterministic Math.random function).
  • 2018-01-14 version 1.0.0 - session/page time calculation is fixed. Updated browser script.

Version 1.1.0 schema update

ALTER TABLE `pages` ADD COLUMN `duplicate` TINYINT(1) NOT NULL DEFAULT 0;

License

The MIT License. See the LICENSE file.