Skip to content

Commit

Permalink
intro
Browse files Browse the repository at this point in the history
  • Loading branch information
Robert Gebeloff committed Mar 7, 2016
1 parent 2f0f489 commit 224d1e2
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 6 deletions.
Binary file added .DS_Store
Binary file not shown.
12 changes: 6 additions & 6 deletions README.md
Expand Up @@ -3,14 +3,14 @@
<h2>Web Scraping Without Programming</h2>
<p> In this presentation, Tom Johnson of the <a href="http://www.analyticjournalism.com/">Institute for Analytic Journalism</a> and <a href="http://www.geb.net">I</a> will demonstrate various ways of harvesting data from the Internet without programming. While we heartily recommend that reporters explore the power of programming languages such as Python, Ruby and R, we believe these software tools are a valuable means to getting information that is otherwise unobtainable.</p>
<ul>You can download</ul>
<li>The primary handout</li>
<li>Our powerpoint</li>
<li>A detailed tutorial on using import.io to scrape Web sites</li>
<li>An example of how to find and parse hidden XML or JSON data</li>
<li>A walkthrough of various methods for dealing with PDFs</li>
<li><a href="https://github.com/gebelo/nicar2016/blob/master/no_programing_handout.docx">The primary handout</a></li>
<li><a href="https://github.com/gebelo/nicar2016/blob/master/no_programming.pptx">Our powerpoint</a></li>
<li><a href="https://github.com/gebelo/nicar2016/blob/master/importio.docx">A detailed tutorial on using import.io to scrape Web sites</a></li>
<li><a href="https://github.com/gebelo/nicar2016/blob/master/xml_miracle.docx">An example of how to find and parse hidden XML or JSON data</a></li>
<li><a href="https://github.com/gebelo/nicar2016/blob/master/pdf_wrangling16.docx">A walkthrough of various methods for dealing with PDFs</a></li>
</ul>
<br>
<h2>An Introduction to Open Refine</h2>
<p>Open Refine is a vital tool for cleaning dirty data. A typical example is when a dataset contains names of people or companies but with inconsistent spelling that needs to be standardized. At NICAR, Nils Mulvad and I will walk through a tutorial he created. The exercise is here, the practice data here and here.</p>
<p>Open Refine is a vital tool for cleaning dirty data. A typical example is when a dataset contains names of people or companies but with inconsistent spelling that needs to be standardized. At NICAR, <a href="http://www.kaasogmulvad.dk/en/">Nils Mulvad</a> and <a href="http://www.geb.net">I</a> will walk through a tutorial he created. The exercise is <a href="https://github.com/gebelo/nicar2016/blob/master/refine.pdf">here</a>, the practice data <a href="https://github.com/gebelo/nicar2016/blob/master/prof.csv">here</a> and <a href="https://github.com/gebelo/nicar2016/blob/master/defendants.xlsx">here</a>.</p>


Binary file removed scraping.pptx
Binary file not shown.

0 comments on commit 224d1e2

Please sign in to comment.