Skip to content

whotracksme/whotracks.me

Repository files navigation

 

WhoTracks.Me

Bringing Transparency to Online Tracking

Transparency · Privacy · Tracking landscape · Built by Ghostery
Trackers · Websites · Blog · Explorer

powered by Ghostery License Badge

This repository contains:

  • data on trackers and websites as shown on whotracks.me (WTM)
  • database mapping tracker domains to companies
  • code to render the whotracks.me site

⚠️ Upcoming changes

We are in the process of migrating the website to ghostery.com/whotracksme:

The following documentation has not been updated yet and still points to the old website. You can already try out the new website with the links above. If you have any feedback, about missing functionality or if you spot inconsistencies, feel free to create a Github issue.

Monthly data dumps will not be affected by these changes. The licensing also remains unchanged.

For more information, see #367

For now, if you came from the Ghostery website and have question on the data itself (e.g. how to download or looking for a technical documentation of the fields), it is best to start here:

https://github.com/whotracksme/whotracks.me/blob/master/whotracksme/data/Readme.md

If it does not answer your questions, also feel free to create a Github issue. It helps us to improve the missing parts of the documentation.


Installation

Python 3.11 is needed to build the site. We recommend creating a virtualenv to install the dependencies, or use pipenv or )

to

python -m venv venv
. venv/bin/activate

After the initial setup, you can proceed with installing whotracks.me.

For nushell:

python -m virtualenv venv
overlay use venv/bin/activate.nu

With Pip

$ python -m pip install git+https://github.com/ghostery/whotracks.me.git

From source

After cloning the repository:

$ python -m pip install -r requirements.txt
$ python -m pip install -e .

That’s all you need to get started!

Downloading the data

Each month, we release a new version of the web site. The raw data, from which the graphs have been computed, are also available as an open data set (updated every month).

The data from month can be also directly accessed through the website.

More information on the raw data can be found in whotracksme/data/Readme.md.

Using the data

To get started with the data, everything you need can be found in whotracksme.data:

from whotracksme.data.loader import DataSource

data = DataSource()

# available entities
data.trackers
data.companies
data.sites

A whitepaper for whotracks.me is available at https://arxiv.org/abs/1804.08959, and here's a BibTeX entry that you can use to cite it in a publication:

@misc{whotracksme,
    title={WhoTracks.Me: Shedding light on the opaque world of online tracking},
    author={Arjaldo Karaj and Sam Macbeth and Rémi Berson and Josep M. Pujol},
    year={2018},
    eprint={1804.08959},
    archivePrefix={arXiv},
    primaryClass={cs.CY}
}

Building the site

Building the site requires a few extra dependencies, not installed by default to not make the installation heavier than it needs to be. You will need to install whotracksme from the repository, because not all assets are packaged with whotracksme released on pypi:

$ python -m pip install -r requirements-dev.txt
$ python -m pip install -e '.[dev]'

Once this is done, you will have access to a whotracksme entry point that can be used this way:

$ whotracksme website [serve]

The serve part is optional and can be used while making changes on the website.

All generated artifacts can be found in the _site/ folder.

If you debug the website generator, the parallel execution can be disabled by setting the environment variable DEBUG=1.

Tests

To run tests, you will need pytest, or simply install whotracksme with the dev extra:

$ python -m pip install -e '.[dev]'
$ pytest

Contributing

We are happy to take contributions on:

  • Guest articles for our blog in the topics of tracking, privacy and security. Feel free to use the data in this repository if you need inspiration.
  • Feature requests that are doable using the WTM database.
  • Curating our database of tracker profiles. Open an issue if you spot anything odd.

Right to Amend

Please read our Guideline for 3rd parties wanting to suggest corrections to their data.

License

The content of this project itself is licensed under the Creative Commons Attribution 4.0 license, and the underlying source code used to generate and display that content is licensed under the MIT license.