No description, website, or topics provided.
Python Ruby
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
python
ruby
.gitignore
readme.md

readme.md

Using the web inspector for complex scrapes

This repo contains example scripts of scrapes in both Ruby and Python using concepts taught in the NICAR 2015 advanced web scraping course. The class focuses on using the web inspector to find the information needed to conduct more sophisticated scrapes. The slide deck for the presentation can be found here.

Requirements

###Python The Python scrapes require only two modules not included with Python standard library. BeautifulSoup4 is a module for parsing markdown languages such as HTML and XML. Requests is used to make both get and post web requests. Both can be installed individually using pip or together using pip install -r requirements.txt.

###Ruby The Ruby scripts require three different libraries. The first is Nokogiri, Ruby's parser for HTML and XML. The ASP.NET scrape requires Mechanize to emulate a browser. Rest-Client is needed to make web requests in the mapscrape.rb example. If you have Bundler installed you can simply navigated to the Ruby directory and use bundle install to install the required libraries. Otherwise, use gem install <package name>.