# Lecture 09: SQL
<div style="border: 1px double black; padding: 10px; margin: 10px">

**Goals for today's lecture:**
* Learn SQL (Structured Query Language)
</div>


A huge amount of data lives in relational databases so it is important to understand how to connect to a relational database and work with it.

To connect to the database from R, you’ll use a pair of packages:

* DBI (database interface) - this provides a set of generic functions that connect to the database, upload data, run SQL queries, etc.
* You’ll also use a package tailored for the DBMS you’re connecting to. This package translates the generic DBI commands into the specifics needed for a given DBMS. There’s usually one package for each DMBS, e.g. RPostgres for Postgres and RMariaDB for MySQL. In this example we use SQLite and the package to use is **RSQLite**


In [None]:
# install.packages('RSQLite')  # if required

In [1]:
library(DBI)
library(dbplyr)
library(tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.4.0      [32m✔[39m [34mpurrr  [39m 1.0.0 
[32m✔[39m [34mtibble [39m 3.1.8      [32m✔[39m [34mdplyr  [39m 1.0.10
[32m✔[39m [34mtidyr  [39m 1.2.1      [32m✔[39m [34mstringr[39m 1.5.0 
[32m✔[39m [34mreadr  [39m 2.1.3      [32m✔[39m [34mforcats[39m 0.5.2 
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mident()[39m  masks [34mdbplyr[39m::ident()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[31m✖[39m [34mdplyr[39m::[32msql()[39m    masks [34mdbplyr[39m::sql()


In [33]:
library(DBI)
# Create an ephemeral in-memory RSQLite database
con <- dbConnect(RSQLite::SQLite(), ":memory:")

dbListTables(con)

At the outset you see that there are no tables when you run the above command. 

### dbplyr
dbplyr is a dplyr backend, that allows you to keep writing dplyr code and **dbplyr** translates it to SQL. 
Now we are going to create a table called 'mpg' in our SQLite database using the mpg tibble and dbplyr is generating the necessary SQL queries to get this done behind the scenes.

In [62]:
dbWriteTable(con, "mpg", mpg, overwrite=T)
dbListTables(con)

In [65]:
# dbRemoveTable(con, 'mpg-d')

In [57]:
?dbWriteTable

0,1
dbWriteTable {DBI},R Documentation

0,1
conn,"A DBIConnection object, as returned by dbConnect()."
name,"The table name, passed on to dbQuoteIdentifier(). Options are:  a character string with the unquoted DBMS table name, e.g. ""table_name"",  a call to Id() with components to the fully qualified table name, e.g. Id(schema = ""my_schema"", table = ""table_name"")  a call to SQL() with the quoted and fully qualified table name given verbatim, e.g. SQL('""my_schema"".""table_name""')"
value,a data.frame (or coercible to data.frame).
...,Other parameters passed on to methods.


Now let us look into the column names of this table

In [55]:
dbListFields(con, "mpg")

Time to read the entire table!

In [56]:
dbReadTable(con, "mpg")

manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
<chr>,<chr>,<dbl>,<int>,<int>,<chr>,<chr>,<int>,<int>,<chr>,<chr>
audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact
audi,a4,2.8,1999,6,manual(m5),f,18,26,p,compact
audi,a4,3.1,2008,6,auto(av),f,18,27,p,compact
audi,a4 quattro,1.8,1999,4,manual(m5),4,18,26,p,compact
audi,a4 quattro,1.8,1999,4,auto(l5),4,16,25,p,compact
audi,a4 quattro,2.0,2008,4,manual(m6),4,20,28,p,compact


Let us select some records

In [53]:
res <- dbSendQuery(con, "SELECT * FROM mpg WHERE cyl = 4")

ERROR: Error: near "-": syntax error


In [16]:
dbFetch(res)

manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
<chr>,<chr>,<dbl>,<int>,<int>,<chr>,<chr>,<int>,<int>,<chr>,<chr>


In [17]:
dbClearResult(res)

In [18]:
dbFetch(res)

ERROR: Error: Invalid result set


Can also get 1 row at a time

In [30]:
res <- dbSendQuery(con, "SELECT * FROM mpg WHERE manufacturer = 'audi'")
while(!dbHasCompleted(res)){
  row <- dbFetch(res)
  print(row)
}

   manufacturer      model displ year cyl      trans drv cty hwy fl   class
1          audi         a4   1.8 1999   4   auto(l5)   f  18  29  p compact
2          audi         a4   1.8 1999   4 manual(m5)   f  21  29  p compact
3          audi         a4   2.0 2008   4 manual(m6)   f  20  31  p compact
4          audi         a4   2.0 2008   4   auto(av)   f  21  30  p compact
5          audi         a4   2.8 1999   6   auto(l5)   f  16  26  p compact
6          audi         a4   2.8 1999   6 manual(m5)   f  18  26  p compact
7          audi         a4   3.1 2008   6   auto(av)   f  18  27  p compact
8          audi a4 quattro   1.8 1999   4 manual(m5)   4  18  26  p compact
9          audi a4 quattro   1.8 1999   4   auto(l5)   4  16  25  p compact
10         audi a4 quattro   2.0 2008   4 manual(m6)   4  20  28  p compact
11         audi a4 quattro   2.0 2008   4   auto(s6)   4  19  27  p compact
12         audi a4 quattro   2.8 1999   6   auto(l5)   4  15  25  p compact
13         a

### Solve
Find the max highway miles across each manufacturer

In [48]:
res <- dbSendQuery(con, "")
dbFetch(res)
dbClearResult(res)

manufacturer,max(hwy)
<chr>,<int>
land rover,18
lincoln,18
mercury,19
jeep,22
dodge,24
ford,26
subaru,27
pontiac,28
chevrolet,30
audi,31


## 🤔 Quiz

How many 4 cylinder models are present in this database?

<ol style="list-style-type: upper-alpha;">
    <li>81</li>
    <li>52</li>
    <li>63</li>
    <li>91</li>
</ol>

In [None]:
# fill in the blanks
res <- dbSendQuery(con, "")
dbFetch(res)
dbClearResult(res)

### Solve
Extend the above query to retrieve the minimum hwy value across manufacturer and model

In [67]:
res <- dbSendQuery(con, "")
dbFetch(res)
dbClearResult(res)

manufacturer,model,min(hwy)
<chr>,<chr>,<int>
audi,a4,26
audi,a4 quattro,25
audi,a6 quattro,23
chevrolet,c1500 suburban 2wd,15
chevrolet,corvette,23
chevrolet,k1500 tahoe 4wd,14
chevrolet,malibu,26
dodge,caravan 2wd,17
dodge,dakota pickup 4wd,12
dodge,durango 4wd,12


### Solve
Get all the manufacturers who have 4 or more 'compact' car class

In [71]:
res <- dbSendQuery(con, "select manufacturer, class, count(*) from mpg where class='compact' group by manufacturer having count(*) >= 4")
dbFetch(res)
dbClearResult(res)

manufacturer,class,count(*)
<chr>,<chr>,<int>
audi,compact,15
subaru,compact,4
toyota,compact,12
volkswagen,compact,14


### Solve
Get distinct car manufacturers in this dataset

In [72]:
res <- dbSendQuery(con, "")
dbFetch(res)
dbClearResult(res)

manufacturer
<chr>
audi
chevrolet
dodge
ford
honda
hyundai
jeep
land rover
lincoln
mercury


Once done, it is a good idea to disconnect from the database

In [31]:
dbClearResult(res)
dbDisconnect(con)