Skip to content

Latest commit

 

History

History
88 lines (59 loc) · 2.07 KB

openrefine_slides.md

File metadata and controls

88 lines (59 loc) · 2.07 KB
title tags slideOptions
OpenRefine
presentation
theme transition spotlight
solarized
fade
enabled
true

Data Cleaning with OpenRefine


Setup

  1. Download the data and save it to your desktop
  2. Download the latest stable release of OpenRefine and unzip it to a convenient location on your computer

Introduction

Questions:

  • What is OpenRefine useful for?

Introduction

Objectives:

  • Describe OpenRefine's uses and applications.
  • Differentiate data cleaning from data organization.
  • Experiment with OpenRefine's user interface.
  • Locate helpful resources to learn more about OpenRefine.

What is OpenRefine

  • OpenRefine is a powerful, free, and open source tool that helps you with data wrangling
  • OpenRefine started as Google Refine before it was released to the open source community
  • It combines the GUI interface of Excel with the reproducibility of scripting languages like R and Python

Benefits of OpenRefine

  • Automatically keeps a log of every change you make
  • Does not allow you to modify your original file
  • Any operation can be undone
  • Can repeat your steps for more than one data set
  • Provides a user-friendly interface for complex clustering algorithms

Uses of OpenRefine

  • Overview a data set
  • Resolve inconsistencies
  • Help you split data into granular parts
  • Match local data with other data sets
  • Save a set of data cleaning steps for replay on multiple files

Working with OpenRefine

Questions:

  • How can we import data into OpenRefine?
  • How can we sort and summarize data with OpenRefine?
  • How can we find and correct errors with OpenRefine?

Working with OpenRefine

Objectives:

  • Create a new OpenRefine project from a .csv file.
  • Look at facets and how they sort and summarize data.
  • Look at clustering and how to apply it to edit groups of typos.
  • Undo/redo steps.
  • Split values into multiple columns.
  • Remove white spaces from cells.