Cookies That Give You Away: The Surveillance Implications of Web Tracking
This is the public code release for our WWW 2015 paper. You should also check
out the paper,
and the data.
The measurements were taken on three Amazon EC2 instances using OpenWPM
v0.1, which is included in this repo.
run_crawl.py - Run a specific crawl, settings should be changed here
for each configuration. Only a single configuration from the paper is
Run after the crawl, on the same instance.
This will do DNS lookups for each unique hostname seen during the
crawl and run a traceroute to each.
make_profiles.py - Create Alexa profiles by randomly subsampling the
respective top alexa sites from
make_full_list.py - Create
union_of_sites.txt, a list of sites to
feed into synchronized crawls for ID detection.
profiles - Contains the 25 AOL profiles used in the paper, as well as
three Alexa models as pickled Python objects.
automation - OpenWPM v0.1
Will extract ID cookies using two SQLite databases created through
a synchronized crawl, as described in Section 4.5 of the paper.
create_graph.py - Builds cookie linking graph based on parameters set
generate_samples(), as described in section 4.6 of the paper.
haversine.py - Adds several columns to the crawl
databases, including the geocheck described in Section 4.4 of the paper.
Cookie.py - Parses HTTP Request/Response
headers to pull out cookies. Integrated into the more recent releases of
Cookie.py is included in the python standard library, but its
parsing rules are nowhere near what is used in practice. The version here is
heavily modified. I recommend using
cookies.py, which is based on
identity_parser.py - parses and prints statistics on identity leakers given
- Crawl Data are available as bzip2 compressed SQLite databases. Each database contains measurement data for 25 simulated users.
The following test cases are available for download:
GeoLite2-City.mmdb - available for download here
identity_leaks.txt - Data collected from manual study of identity leakers as described
in Section 4.7 of the paper