Skip to content

Commit

Permalink
intro
Browse files Browse the repository at this point in the history
  • Loading branch information
Robert Gebeloff committed Mar 7, 2016
1 parent 872983e commit 2f0f489
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,16 @@
<h1>NICAR 2016 Training Material</h1>
<br>
<h2>Web Scraping Without Programming</h2>
<p> In this presentation, Tom Johnson of the Institute for Analytic Journalism and myself will demonstrate various ways of harvesting data from the Internet without programming. While we heartily recommend that reporters explore the power of programming languages such as Python, Ruby and R, we believe these software tools are a valuable means to getting information that is otherwise unobtainable.</p>
<p> In this presentation, Tom Johnson of the <a href="http://www.analyticjournalism.com/">Institute for Analytic Journalism</a> and <a href="http://www.geb.net">I</a> will demonstrate various ways of harvesting data from the Internet without programming. While we heartily recommend that reporters explore the power of programming languages such as Python, Ruby and R, we believe these software tools are a valuable means to getting information that is otherwise unobtainable.</p>
<ul>You can download</ul>
<li>The primary handout</li>
<li>Our powerpoint</li>
<li>A detailed tutorial on using import.io to scrape Web sites</li>
<li>An example of how to find and parse hidden XML or JSON data</li>
<li>A walkthrough of various methods for dealing with PDFs</li>
</ul>
<br>
<h2>An Introduction to Open Refine</h2>
<p>Open Refine is a vital tool for cleaning dirty data. A typical example is when a dataset contains names of people or companies but with inconsistent spelling that needs to be standardized. At NICAR, Nils Mulvad and I will walk through a tutorial he created. The exercise is here, the practice data here and here.</p>


0 comments on commit 2f0f489

Please sign in to comment.