Skip to content

Latest commit

 

History

History
84 lines (56 loc) · 8.39 KB

README.md

File metadata and controls

84 lines (56 loc) · 8.39 KB

Facebook Political Ad Collector

This is the source code behind our project to collect political ads on Facebook. Built versions are available for Firefox and Chrome.

(The Globe and Mail, previously a partner on the ad collector, has taken over stewardship of the project which was previously maintained by ProPublica.)

We're asking our readers to use this extension when they are browsing Facebook. While they are on Facebook a background script runs to collect ads they see. Optionally, if the user wants to help train the political classifier, they can vote on whether or not a particular ad is political. Server-side, we use those ratings to train a naive Bayes classifier that then automatically rates the other ads we've collected. The extension also asks the server for the most recent ads that the classifier thinks are political so that users can see political ads they haven't seen. We're careful to protect our user's privacy by not sending identifying information to our backend server.

We're open sourcing this project because we'd love your help. Collecting these ads is challenging, and the more eyes on the problem the better.

Download and try it

Run it on your own

The collector is broken out into a series of services, each with its own repo:

  • fbpac-extension: The browser extension that monitors your Facebook feed and sends ads data to the app.
  • fbpac-api: Rails app. Serves API for React portion of website (public-facing, if applicable, and admin); manages login for admin dashboard; serves entire Targeting Breakdown page. Runs on Amazon ECS.
  • fbpac-backend: Rust app. Receives ads and ad ratings submitted by extension users; serves a rudimentary feed for Ads Others Are Seeing tab in extension; serves assets for website. Runs on Amazon ECS.
  • fbpac-classifier: Python scripts for building a model to predict whether an ad is political given its text, along with scripts for updating each database record with that prediction. Runs on Amazon ECS.
  • fbpac-archiver: A cron to archive the database to a CSV file weekly for analysis, if necessary. Runs on Amazon ECS.
  • fbpac-page: A simple site showing how to install the ad collector and explaining WTF the FBPAC project is.

Instructions on how to install the different parts of the project are contained in the individual repos, in their respective READMEs. Most of the repos rely on Docker for deployment, so make sure you have that installed.

To learn more about how to internationalize the app for a new country, read the i18n guide.

The app is designed to be deployed to an Amazon Web Services Elastic Container Service (ECS) environment.

For the ads database, we use an Amazon RDS db.t2.medium Postgres database, but that may be beefier than necessary. However, to my knowledge, it hasn't gotten overloaded.

We also need an Amazon S3 folder for storing ad images; the Rust app has to have credentials to write to this bucket.

Stories

Types of ads the collector doesn't collect

  • mobile ads
  • pre-roll, midstream video ads
  • video from ads in the stream
  • Instagram-only ads (Note that many ads are set to run on Facebook and Instagram with the same creative)

Where we need your help (aka TODOs)

In general, the project needs more tests. We've written a couple of tests for parsing the Facebook timeline in the extension directory, and a few for the tricky bits in the server, but any help here would be great!

Also, the rust backend needs a bit of love and care, and there is a bit of a mess in https://github.com/globeandmail/fbpac-backend/tree/master/server/src/server.rs that could use cleaning up.

We could also use help in orchestrating the deployment of all the repos to AWS. Right now, it's still a bit of a manual process to get the ECS tasks set up, permissions, whatnot. It'd be great to have something like a Terraform script that would set everything up in one fell swoop and make redeployment or environment changes a breeze.

A few other TODOs:

  • considering triggering the ad parsing routine only on scroll, to mitigate the clicking-off problems.
  • consider turning off the panelist_ads table, etc.
  • consider seeding the partisanship model in new languages with political tweets.
  • the targeting parser is in need of internationalization