SQL, the Sequel
More SQL in the Database, and SQL in Data Science Contexts (i.e. SQL on DataFrames)
This repository contains materials for my talk at the Data Wranglers DC meetup on August 6, 2014, which is a follow-on to my talk at the Data Wranglers DC meetup on June 4, 2014. Materials for the prior talk are in the GitHub Repo nihonjinrxs/dwdc-june2014.
The talk consists of two major directions:
- Using more advanced SQL techniques in a database system (examples in PostgreSQL) to script auto-updating computations
- Using SQL on data frames in R and in Python (also maybe Julia?)
Folders are as follows:
- A slide deck (
./slides) in Apple Keynote, PDF and HTML formats
- A set of SQL scripts (
./sql) that create the local PostgreSQL database objects demonstrating creation and use of views, custom functions and indexes for use in data analysis
- An RMarkdown document (
./R), published on RPubs, that demonstrates using
sqldfin R to perform SQL queries on data frames as if they are tables
- An IPython notebook (
./python), available at IPython nbviewer, that demonstrates using
pandasqlpackage to perform SQL queries on Pandas DataFrame objects as if they are tables
- An IJulia notebook document (
./julia) that demonstrates using
SQLite.jlpackage in Julia to perform SQL queries on data frames as if they are tables (in progress, and not working yet)
Where do I start?
I recommend that anyone wishing to understand what I've done should start with the prior talk materials, then tackle these pieces in order, starting with the slide deck.
Given time, I hope to get
sqldf working in Julia as well - being a young language, it's a little finicky at the moment. Also, a few examples of SQL views with INSERT and UPDATE rules and a SQL trigger or two would be a nice addition.
This work and the opinions expressed here are my own, and do not purport to represent the views of my current or former employers.