Skip to content
Slides and materials from my talk for Data Wranglers DC on June 4, 2014
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.


Type Name Latest commit message Commit time
Failed to load latest commit information.

Data Wrangling in SQL & Other Tools

Scripting Reproducible and Understandable Data Wrangling and Analysis Pipelines with Tabular and Relational Data

This repository contains materials for my talk at the Data Wranglers DC meetup on June 4, 2014.


The talk consists of several major directions:

  • A slide deck (./slides) in Apple Keynote, PDF and HTML formats
  • Sample data in CSV format (./csv), courtesy of tilling
  • A set of SQL scripts (./sql) that create the local PostgreSQL database used for the examples and perform the simple linear model analysis example
  • An RMarkdown document (./R), published on RPubs, that uses the data from the database to perform the analysis in R and compare with the SQL results
  • An iPython notebook document (./python) that uses the data from the database to perform the example analysis, compare the results across SQL and R, and plot the resulting linear models

Where do I start?

I recommend that anyone wishing to understand what I've done should tackle these pieces in order, starting with the slide deck.

Future Work

Given time and maturity of database libraries, I hope to add a parallel example in Julia soon.


This work and the opinions expressed here are my own, and do not purport to represent the views of my current or former employers.

You can’t perform that action at this time.