Data Wrangling in SQL & Other Tools
Scripting Reproducible and Understandable Data Wrangling and Analysis Pipelines with Tabular and Relational Data
This repository contains materials for my talk at the Data Wranglers DC meetup on June 4, 2014.
The talk consists of several major directions:
- A slide deck (
./slides) in Apple Keynote, PDF and HTML formats
- Sample data in CSV format (
./csv), courtesy of tilling
- A set of SQL scripts (
./sql) that create the local PostgreSQL database used for the examples and perform the simple linear model analysis example
- An RMarkdown document (
./R), published on RPubs, that uses the data from the database to perform the analysis in R and compare with the SQL results
- An iPython notebook document (
./python) that uses the data from the database to perform the example analysis, compare the results across SQL and R, and plot the resulting linear models
Where do I start?
I recommend that anyone wishing to understand what I've done should tackle these pieces in order, starting with the slide deck.
Given time and maturity of database libraries, I hope to add a parallel example in Julia soon.
This work and the opinions expressed here are my own, and do not purport to represent the views of my current or former employers.