A framework for conducting large scale web privacy studies.
git clone https://github.com/fpdetective/fpdetective.git
cd fpdetective
Then follow instructions for setting up VM to run FPDetective in a virtual machine
- Check documentation
- Read the paper: FPDetective: Dusting the Web for Fingerprinters (CCS 2013)
- Visit FPDetective website.
- Instructions for using FPDetective with a VM
- Check out recent binary releases.
- Check out the FPDetective browser extensions.
Below we give a description of the parameters that are passed to the agents.py
module.
- --index_url: path to the file containing the list of URLs to crawl
- --stop: index of the url_file where the crawl will stop
- --start (optional): index of the url_file where the crawl will start
- --type: the agent can be:
- lazy: uses phantomjs and visits homepages
- clicker: uses phantomjs and clicks a number of links
- chrome_lazy: uses chrome and visits homepages
- chrome_clicker: uses chromium and clicks a number of links
- dnt: visits homepages with a DNT header set to 1
- screenshot: visits homepages and takes a screenshot
- --max_proc: maximum number of processes that will run in parallel
- --fc_debug: boolean to set the system environment variable that logs the OS font requests
You can use following command to crawl the homepages of Alexa top 100 sites with 10 browsers running in parallel:
- Change to the FPDetective source directory: (
~/fpbase/src/crawler
) and run the command:
python agents.py --url_file ~/fpbase/run/top-1m.csv --stop 100 --type lazy --max_proc 10
Once the crawl is finished, you can check the log in run/logs/latest
or connect to the DB using Phpmyadmin (the password for the root user is: fpdetective
).
You can use following patches to build modified Chromium and PhantomJS browsers from source. Please consult the instructions for further explanation.