Code release for: Cookies that give you away: The surveillance implications of web tracking
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
analysis
collection
data
.gitignore
README.md

README.md

Cookies That Give You Away: The Surveillance Implications of Web Tracking

This is the public code release for our WWW 2015 paper. You should also check out the paper, the presentation, and the data.

Data Collection

The measurements were taken on three Amazon EC2 instances using OpenWPM v0.1, which is included in this repo.

  • run_crawl.py - Run a specific crawl, settings should be changed here for each configuration. Only a single configuration from the paper is included here.
  • run_network_measurment.py / get_dns.py / get_traceroute.py - Run after the crawl, on the same instance. This will do DNS lookups for each unique hostname seen during the crawl and run a traceroute to each.
  • make_profiles.py - Create Alexa profiles by randomly subsampling the respective top alexa sites from alexa_top_500_{IE,JP,US}.txt.
  • make_full_list.py - Create union_of_sites.txt, a list of sites to feed into synchronized crawls for ID detection.
  • profiles - Contains the 25 AOL profiles used in the paper, as well as three Alexa models as pickled Python objects.
  • automation - OpenWPM v0.1

Data Analysis

  • create_id_dict.py / cookie_util.py / extract_cookie_ids.py - Will extract ID cookies using two SQLite databases created through a synchronized crawl, as described in Section 4.5 of the paper.
  • create_graph.py - Builds cookie linking graph based on parameters set in generate_samples(), as described in section 4.6 of the paper.
  • db_postprocessing.py / haversine.py - Adds several columns to the crawl databases, including the geocheck described in Section 4.4 of the paper.
  • build_cookie_table.py / Cookie.py - Parses HTTP Request/Response headers to pull out cookies. Integrated into the more recent releases of OpenWPM.
    • NOTE: Cookie.py is included in the python standard library, but its parsing rules are nowhere near what is used in practice. The version here is heavily modified. I recommend using cookies.py, which is based on RFC 6265.
  • identity_parser.py - parses and prints statistics on identity leakers given in identity_leaks.txt

Data