Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Downloading Web Data Without Scraping

Michelle Minkoff and Scott Klein, NICAR Denver 2016

Software to Install

Google Chrome: Internet Explorer and Firefox both have excellent developer tools but the Web Inspector in Chrome will be basis of our examples. Some examples use Firefox extensions, but if you don’t have it, no worries.

JSONView Chrome Extension:


Requires Python. Mac/Linux comes with Python installed. Use Anthony DeBarros’s great guide to install Python on Windows. Install Python 2.7.x because CSVKit doesn’t work with Python 3.

Google Spreadsheets: You can import an HTML table directly by typing =ImportHTML(“url”, “elementtype”, numberElement on page)

Ex: =ImportHTML(“”, “table”, 2)

Tools we’re using

Open Refine:

Tool to inspect and manipulate spreadsheet files, allowing to run queries on it to manipulate it the way you would like.


Click on a type of information (names, emails, URLs, etc, and then right-click (control-click) and Scrape Similar. There’s an option to bring the results into a Google spreadsheet.


Tabula turns tabular PDF data into tables. Free software from the Knight-Mozilla OpenNews project.


A Firefox extension that detects what types of assets you might want to download on a page and allows you to download them, well, all. Doesn’t work for everything, but a good quick one to try.


Break your page down into its elements that it is made up of. Grab all pictures on a page at once, create a scraper using various HTML elements as start and end points. This is quite powerful, if you learn how to use it well. More robust than DownThemAll.

Import.Io - Like Outwit, but less code required

Break your page down into its elements that it is made up of. Click on elements to "train" the program as to what you want in a certain column. Crawl through multiple paginated results. Solid tutorial here:


Helpful command line utility for working with JSON files.

JSON Viewer:

Other Important Links

Dan Nguyen’s Terrific Web Inspector Guide:

Example Sites

When grabbing data, please remember to make sure you have permission to get the information you are grabbing. Just because you can take photos off a site, doesn’t mean you can use them for your project. With great power comes great responsibility!

  1. Denver elected officials – for html table --,
  2. Denver university list – for Scraper -
  3. Denver image search -
  4. ProPublica’s Recovery Tracker
  5. ProPublica’s Intern Lawsuits Tracker
  6. White House Recovery Act Projects


No description, website, or topics provided.






No releases published


No packages published