Artifacts for IEEE Security & Privacy (2020) submission #760 Improving Web Content Blocking With Event-Loop-Turn Granularity JavaScript Signatures
This repository includes both the source code of the PageGraph implementation, and the data on which our evaluation (Section IV & V of the paper) was based.
Contents:
This can be found under page_graph/. We publish this as a set of patches for Chromium's Blink layout engine and V8 JavaScript engine. The modifications to Blink and V8 are mostly hook points that invoke our custom runtime that maintains the graph representation in memory. This custom runtime is found in third_party/blink/brave_page_graph.
The graph representation of websites in our crawl of the Alexa top 100K can be found at this Google Drive link. The graphs are in the GraphML format. Each GraphML file corresponds to one visited website, indicated by the name of the GraphML file.
This Google Drive link hosts all of the signatures that were generated by applying our algorithm described in Section III-B of the paper to each individual graph representation of websites. Since the signatures are essentially subgraphs of the original graph representation of websites, they are also in the GraphML format.
We also include the EL/EP rules that we used to determine ground truth of privacy-harming behaviours. These are found in data/easylist.txt and data/easyprivacy.txt.
We publish the scripts that match signatures of privacy-harming behaviours, as well as the first-party websites where they are found, in a separate JSON file found in data/summary_readable.json. For each script, we also indicate whether they are blocked by an existing EL/EP rule (i.e., data/easylist.txt and data/easyprivacy.txt), and if so, the EL/EP rule that blocked it. Additionally, we also publish all of the scripts referenced in that file as a tarball at this Google Drive link (the correspondence between the script URLs and the file names in the tarball is encoded in a JSON file in data/mappings.json).