Scraper for Facebook's Archive of Ads with Political Content
Switch branches/tags
Nothing to show
Clone or download
justinlittman Merge pull request #5 from dnass/fix-ad-performance-click
Fix "See Ad Performance" button click issue
Latest commit 44f32b0 Sep 8, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore Initial commit. Jun 3, 2018
LICENSE Initial commit Jun 3, 2018
README.md Updated README. Aug 27, 2018
requirements.txt
scraper.py

README.md

fb-ad-archive-scraper

Scraper for Facebook's Archive of Ads with Political Content ... until Facebook provides an API.

fb-ad-archive-scraper will produce:

  • CSV containing the text and metadata of the ads.
  • Screenshots of each ad.
  • A README file.

Like any scraper, fb-ad-archive-scraper is fragile. It will break if Facebook changes the structure / code of the Archive. If fb-ad-archive-scraper breaks, let me know.

Tickets / PRs are welcome.

Install

  1. Clone the repo:

     git clone https://github.com/justinlittman/fb-ad-archive-scraper.git
    
  2. Change to the directory:

     cd fb-ad-archive-scraper
    
  3. Optionally, create a virtual environment:

     virtualenv -p python3 ENV
     source ENV/bin/activate
    
  4. Install requirements:

     pip install -r requirements.txt
    
  5. Install Chromedriver. On a Mac, this is:

     brew cask install chromedriver
    

    If already installed, upgrade Chromedriver with:

     brew cask upgrade chromedriver
    

Usage

    usage: scraper.py [-h] [--limit LIMIT] [--headed] [--wait WAIT]
                      [--country {ALL,US,BR}]
                      [--type {news_ads,political_and_issue_ads}]
                      [--status {all,active,inactive}]
                      email password query [query ...]
    
    Scrape Facebook's Archive of Ads with Political Content
    
    positional arguments:
      email                 Email address for FB account
      password              Password for FB account
      query                 Query
    
    optional arguments:
      -h, --help            show this help message and exit
      --limit LIMIT         Limit on number of ads to scrape
      --headed              Use a headed chrome browser
      --wait WAIT           Seconds to sleep between requests
      --country {ALL,US,BR}
                            Limit ads by country. Choices: ALL, US, BR. Default is
                            ALL.
      --type {news_ads,political_and_issue_ads}
                            Limit ads by type. Choices: news_ads,
                            political_and_issue_ads. Default is
                            political_and_issue_ads.
      --status {all,active,inactive}
                            Limit ads by status. Choices: all, active, inactive.
                            Default is all.

For example:

    python scraper.py fbuser@gmail.com password pelosi

Notes:

  • fb-ad-archive-scraper uses a headless Chrome browser. This means that you will not see the browser at work.
  • The output of each run will be placed in a separate directory and include a README, CSV file summarizing the ads, PNG images, and JSON retrieved from Facebook.

The approach of extracting data from XHRs came from Ranjit Hatnagar.