-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Robert Gebeloff
committed
Mar 7, 2016
1 parent
872983e
commit 2f0f489
Showing
1 changed file
with
13 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,16 @@ | ||
<h1>NICAR 2016 Training Material</h1> | ||
<br> | ||
<h2>Web Scraping Without Programming</h2> | ||
<p> In this presentation, Tom Johnson of the Institute for Analytic Journalism and myself will demonstrate various ways of harvesting data from the Internet without programming. While we heartily recommend that reporters explore the power of programming languages such as Python, Ruby and R, we believe these software tools are a valuable means to getting information that is otherwise unobtainable.</p> | ||
<p> In this presentation, Tom Johnson of the <a href="http://www.analyticjournalism.com/">Institute for Analytic Journalism</a> and <a href="http://www.geb.net">I</a> will demonstrate various ways of harvesting data from the Internet without programming. While we heartily recommend that reporters explore the power of programming languages such as Python, Ruby and R, we believe these software tools are a valuable means to getting information that is otherwise unobtainable.</p> | ||
<ul>You can download</ul> | ||
<li>The primary handout</li> | ||
<li>Our powerpoint</li> | ||
<li>A detailed tutorial on using import.io to scrape Web sites</li> | ||
<li>An example of how to find and parse hidden XML or JSON data</li> | ||
<li>A walkthrough of various methods for dealing with PDFs</li> | ||
</ul> | ||
<br> | ||
<h2>An Introduction to Open Refine</h2> | ||
<p>Open Refine is a vital tool for cleaning dirty data. A typical example is when a dataset contains names of people or companies but with inconsistent spelling that needs to be standardized. At NICAR, Nils Mulvad and I will walk through a tutorial he created. The exercise is here, the practice data here and here.</p> | ||
|
||
|