Skip to content

privacy-tech-lab/privacy-pioneer

Repository files navigation

GitHub release (latest by date) GitHub Release Date GitHub last commit GitHub workflows GitHub issues GitHub closed issues GitHub watchers GitHub Repo stars GitHub forks GitHub sponsors

Privacy Pioneer

The idea of Privacy Pioneer is to help people understand the data collection and sharing practices of the websites they visit. For instance, the following URL-encoded string contains the latitude and longitude where a person is located:

https%3A%2F%2Fwww.example.com%2Flocation%3Flat%3D32.715736%26lon%3D%20-117.161087

If such a string is sent to a website, it can be concluded that it is collecting or sharing location data. Privacy Pioneer automatically detects such behaviors and shows them to the user.

Privacy Pioneer's privacy practice analysis is based on a machine learning model as well as rule-based heuristics. When you install Privacy Pioneer, the model is served from our machine learning repo.

Privacy Pioneer is implemented as a browser extension for Firefox (currently, the only browser we support).

Firefox Add Ons badge

Privacy Pioneer is developed and maintained by Dominik Dadak (@dadak-dom), Nate Levinson (@natelevinson10), Harry Yu (@atlasharry), and Sebastian Zimmeck (@SebastianZimmeck) of the privacy-tech-lab. Hamza Harkous (@harkous) is also collaborating with the team on the research.

Former contributors are Daniel Goldelman (@danielgoldelman), Joe Champeau (@JoeChampeau), Judeley Jean-Charles (@jjeancharles), Wesley Tan (@wesley-tan), Justin Casler (@JustinCasler), Logan Brown (@Lr-Brown), Owen Kaplan (@notowen333), Rafael Goldstein (@rgoldstein01), and David Baraka (@davebaraka).

Contact us with any questions or comments at sebastian@privacytechlab.org.

1. Research Publications
2. Promo Video
3. Development
4. Production
5. Testing
6. Source Directory Layout
7. Privacy Practice Analysis
8. Watchlist Notifications
9. Extension Architecture
10. Third Party Libraries and Resources
11. Known Issues
12. Thank You!

1. Research Publications

2. Promo Video

Unmute or turn up the volume if you do not hear any sound.

privpioneerdemo.mp4

3. Development

Here is how you install Privacy Pioneer for development purposes:

  1. Ensure that you have node and npm installed.

    You can install the latest version of node from the official site.

    You can install the latest version of npm via your terminal with the command:

    npm install -g npm
  2. Start installing Privacy Pioneer by cloning this repo to a local directory with:

    git clone https://github.com/privacy-tech-lab/privacy-pioneer.git

    Then, install Privacy Pioneer's dependencies by running in the root of your local directory:

    npm install --production=false

    If you encounter install errors with the TensorFlow libraries, for Windows try running the steps described in this comment. For macOS try running:

    xcode-select --install

    For any issues with incorrect dependency versions, copy the package-lock.json file from the repository into your local directory and run:

    npm ci
  3. Privacy Pioneer uses an external service, ipinfo.io, to automate the identification of a user's location in web traffic of visited websites. For this purpose Privacy Pioneer sends a user's IP address to ipinfo.io when the user restarts the browser or makes changes to the Watchlist. An ipinfo API token is required for Privacy Pioneer to work.

    In order to get your ipinfo API token, you must sign up for a free account with ipinfo via https://ipinfo.io. Then, once you are signed in, go to the token tab and copy the token that is displayed to your clipboard. Once this is done, do the following:

    Create a file holdAPI.js and save it in the /src/libs/ folder with your ipinfo API token as follows:

    export const apiIPToken = "<your ipinfo API token>";

    Be sure to not add your ipinfo API token to GitHub to avoid misuse.

  4. To start Privacy Pioneer, run:

    npm start
    • Runs Privacy Pioneer in development mode
    • The popup, options, and background page will reload if you make edits
    • You will also see any lint errors in the console

    A dev folder will be generated in the root directory, housing the generated extension files. Firefox should automatically open with the extension installed. If not, you can follow the instructions here, where dev will be the new src folder.

    Note: If you experience errors regarding missing dependencies (usually due to a newly incorporated node package), delete the node_modules folder and then re-run the installation steps above. You may also want to delete package-lock.json along with the node_modules folder as a second attempt to solve this issue. If needed, you can create a new package-lock.json file with:

    npm install --package-lock-only
  5. Note to lab members: Changes affecting the privacy analysis may need to be manually ported to the Privacy Pioneer Web Crawler as well. Check the instructions for doing so and contact the crawler team with any questions.

4. Production

Once you have the development version of Privacy Pioneer working, you can build Privacy Pioneer for production to the dist folder by running:

npm run build
  • The build is minified and the filenames include hashes
  • It correctly bundles and optimizes the extension for the best performance

The web-ext cli is included in the project. Learn more about packaging and signing for release at the extension workshop.

5. Testing

Privacy Pioneer uses Jest to run unit tests in order to maintain the integrity of the extension. All test files live in /src/tests. In order to create a new test either add it to an existing test file or add it to a new file that ends with .test.js.

All tests will be run on GitHub upon creating a pull request.

Run all tests locally with:

npm run test

6. Source Directory Layout

.
├── src                 # Extension source code
|   |── assets          # Images and other public files used in the extension
|   |── background      # Code for extension background related tasks (Ex. HTTP analysis)
|   |── libs            # Reusable utility functions and components used in frontend
|   |── options         # Options page frontend SPA
|   |── popup           # Popup dialog view frontend SPA
|   |── tests           # Testing capabilities
|   └── manifest.json   # Extension metadata
└── ...

The options and popup directories are similarly structured. Like many React projects they have an index.html file and index.js file that serve as entry points. These directories also have a components directory, which contains reusable components to be used within its parent directory and a views directory, which contains page views (which are just more React components). Each component has an index.js file as an entry point to that component, and they may also contain a style.js file for scoped styling. For styling, we use CSS in JS to apply styles using the third party library styled components. Styled components are prefixed with a S, e.g. SContainer. For more complex components or views, there may be an additional components directory. For transitions and animations we use the third party library framer motion.

The src/libs/indexed-db directory, contains functions that instantiate and communicate with the IndexedDB in the Firefox browser.

Some logos and other assets are in a Figma here.

7. Privacy Practice Analysis

Privacy Pioneer is analyzing the following privacy practices for each first and third party website.

  • Monetization
    • Advertising (from Disconnect)
    • Analytics (from Disconnect)
    • Social Networking (Social from Disconnect)
  • Location
    • GPS Location (Fine and Coarse Location)
    • ZIP Code
    • Street Address
    • City
    • Region
  • Tracking
    • Tracking Pixel
    • IP Address
    • Browser Fingerprinting (FingerprintingInvasive from Disconnect and our own list)
  • Personal
    • Phone Number
    • Email Address
    • Custom Keywords

Privacy Pioneer utilizes the Geolocation API, which is built into most modern browsers, to obtain the user's latitude and longitude. This information will not be shared with the developers or any third parties. It is used to check if the user's latitude or longitude show up in any of the network data being collected by first parties or shared with third parties by the current website.

Privacy Pioneer makes a distinction between Fine Location and Coarse Location within the GPS Location privacy practice. Fine Location means that the number calculated by the individual website is within +-0.1 degrees from the Geolocation API value, and Coarse Location means that it is within +-1.0 degrees. Thus, an instance of a user's latitude or longitude being collected or shared can result in one of the following outcomes in the extension:

  • The evidence is flagged by Privacy Pioneer as being an instance of Coarse Location and not Fine Location. This would mean that the latitude or longitude value is within +-1.0 degrees of the value determined by the Geolocation API.
  • The evidence is flagged by Privacy Pioneer as being an instance of both Coarse Location AND Fine Location. This would mean that the latitude or longitude value is within +-0.1 (and thus also +-1.0) degrees of the value determined by the Geolocation API.
  • The evidence is not flagged due to obfuscation or some other website defense.

ipinfo.io is sent the user's IP address and returns information about their location based on that IP address. We take the user's ZIP Code, City, and Region from this returned information and store it as an entry in the user's Watchlist to be looked for in new HTTP requests.

8. Watchlist Notifications

Users can enter keywords in Privacy Pioneer's Watchlist. Privacy Pioneer will then notify a user when any of their Watchlist keywords has been seen in their network traffic if the user has enabled notifications on the browser and in the Watchlist page of Privacy Pioneer. These notifications will appear about 15 seconds after a site has loaded to let the page load most of its data.

9. Extension Architecture

An overview of the architecture of Privacy Pioneer is available separately. (The document is up to date as of its most recent commit date. Later architectural changes are not reflected.)

10. Third Party Libraries and Resources

Privacy Pioneer uses various third party libraries.

It also uses the following resources.

We thank the developers.

11. Known Issues

  • Some warnings may occur when you run npm install --production=false, but they will not negatively affect the compilation or run of Privacy Pioneer.
  • When the Overview page of Privacy Pioneer is open, data from websites visited after opening it will not be shown until the Overview is refreshed.
  • For performance reasons Privacy Pioneer only analyzes HTTP messages up to 100,000 characters, only certain webRequest.ResourceTypes, and only request body, response body, and selected headers. See section 3.5 of our paper, Website Data Transparency in the Browser, for details. In addition to what is described in the paper, we broadened the analysis scope slightly allowing requests from both the Fetch and Beacon API, which otherwise could cause the extension to miss relevant requests.
  • Privacy Pioneer will turn off in Firefox's Private Window even if you have enabled the "Run in Private Windows" option in the extension settings.

12. Thank You!

We would like to thank our supporters!


Major financial support provided by Google.

Google Logo

Additional financial support provided by Wesleyan University and the Anil Fernando Endowment.

Wesleyan University Logo

Conclusions reached or positions taken are our own and not necessarily those of our financial supporters, its trustees, officers, or staff.

privacy-tech-lab logo