Skip to content

Commit

Permalink
Afternoon update
Browse files Browse the repository at this point in the history
  • Loading branch information
edgararuiz committed Jan 26, 2018
1 parent dbdcc82 commit 046bf35
Show file tree
Hide file tree
Showing 29 changed files with 31,509 additions and 29,146 deletions.
1 change: 1 addition & 0 deletions .gitignore
Expand Up @@ -2,3 +2,4 @@
.Rhistory
.RData
.Ruserdata
*.html
2 changes: 1 addition & 1 deletion README.md
Expand Up @@ -4,4 +4,4 @@ Repo containing all of the class material.

- `workbook` - Contains a `bookdown` with all of the exercises that will be used during the class

- `dashboard` - Has a completed version of the `shinydashboard` that will be created across multiple exercises during the class
- `dashboard` (under the `assets` folder) - Has a completed version of the `shinydashboard` that will be created across multiple exercises during the class
8 changes: 8 additions & 0 deletions config.yml
@@ -0,0 +1,8 @@
default:
datawarehouse-dev:
driver: 'PostgreSQL'
server: 'localhost'
uid: 'rstudio_admin'
pwd: 'admin_user_be_careful'
port: 5432
database: 'postgres'
43 changes: 0 additions & 43 deletions daily.Rmd

This file was deleted.

2 changes: 2 additions & 0 deletions options.R
@@ -0,0 +1,2 @@
options(database_userid = "rstudio_dev")
options(database_password = "dev_user")

This file was deleted.

30 changes: 25 additions & 5 deletions workbook/01-database-access.Rmd
@@ -1,13 +1,15 @@
```{r, section01, include = FALSE}
knitr::opts_chunk$set(eval = TRUE)
```

# Access a database

```{r, include = FALSE}
library(dplyr)
library(dbplyr)
library(DBI)
```

# Access a database

## Connect to a database

*The simpliest way to connect to a database. More complex examples will be examined later in the class.*
Expand Down Expand Up @@ -133,6 +135,12 @@ con <- dbConnect(

2. When prompted, type in **rstudio_dev** for the user, and **dev_user** as the password

3. Disconnect from the database using `dbDisconnect()`
```{r}
dbDisconnect(con)
```


## Secure credentials in a file

*Credentials can be saved in a YAML file and then read using the `config` package: http://db.rstudio.com/best-practices/managing-credentials/#stored-in-a-file-with-config *
Expand Down Expand Up @@ -160,6 +168,13 @@ con <- dbConnect(odbc::odbc(),
)
```

5. Disconnect from the database using `dbDisconnect()`
```{r}
dbDisconnect(con)
```



## Environment variables
*Use .Renviron file to store credentials*

Expand All @@ -183,6 +198,12 @@ con <- dbConnect(
)
```

4. Disconnect from the database using `dbDisconnect()`
```{r}
dbDisconnect(con)
```


## Use options()
*Set options() in a separate R script*

Expand Down Expand Up @@ -211,8 +232,7 @@ con <- dbConnect(
)
```



```{r, include = FALSE}
5. Disconnect from the database using `dbDisconnect()`
```{r}
dbDisconnect(con)
```
134 changes: 71 additions & 63 deletions workbook/02-dplyr-basics.Rmd
@@ -1,13 +1,11 @@

```{r, include = FALSE}
```{r, section02, include = FALSE}
knitr::opts_chunk$set(eval = TRUE)
```


# `dplyr` Basics

```{r, include = FALSE}

```{r, include = FALSE}
library(dplyr)
library(dbplyr)
library(DBI)
Expand All @@ -18,16 +16,15 @@ library(DBI)
*Basics to how to point a variable in R to a table or view inside the database*


1. Load the `dplyr` and `dbplyr` libraries
1. Load the `dplyr`, `DBI` and `dbplyr` libraries
```{r, dplyr}
library(dplyr)
library(dbplyr)
library(DBI)
```

2. *(Optional)* Open a connection to the database if it's currently closed
```{r}
library(DBI)
con <- dbConnect(odbc::odbc(), "Postgres Dev")
```

Expand All @@ -50,37 +47,10 @@ airports

6. Set up the pointers to the other of the tables
```{r}
flights <- tbl(con, in_schema("datawarehouse", "flight"))
flights <- tbl(con, in_schema("datawarehouse", "vflight"))
carriers <- tbl(con, in_schema("datawarehouse", "carrier"))
```

## Basic aggregation
*A couple of `dplyr` commands that run in-database*

1. How many records are in the **airport** table?
```{r}
tbl(con, in_schema("datawarehouse", "vflight")) %>%
group_by(month) %>%
tally()
```

2. What is the average character length of the airport codes? How many characters is the longest and the shortest airport name?
```{r}
airports %>%
summarise(
avg_airport_length = mean(length(airport), na.rm = TRUE),
max_airport_name = max(length(airportname), na.rm = TRUE),
min_airport_name = min(length(airportname), na.rm = TRUE),
total_records = n()
)
```

**Additional exercises:**

1. How many records are in the **carrier** table?

2. How many characters is the longest **carriername**?

## Under the hood
* Use `show_query()` to preview the SQL statement that will be sent to the database*

Expand All @@ -91,22 +61,24 @@ show_query(airports)

2. Easily view the resulting query by adding `show_query()` in another piped command
```{r}
carriers %>%
summarise(n())
airports %>%
show_query()
```

3. Run the same for last exercise in the previous section
3. Insert `head()` in between the two statements to see how the SQL changes
```{r}
airports %>%
summarise(
avg_airport_length = mean(length(airport), na.rm = TRUE),
max_airport_name = max(length(airportname), na.rm = TRUE),
min_airport_name = min(length(airportname), na.rm = TRUE),
total_records = n()
) %>%
head() %>%
show_query()
```

4. Use `sql_render()` and `simulate_mssql()` to see how the SQL statement changes from vendor to vendor
```{r}
airports %>%
head() %>%
sql_render(con = simulate_mssql())
```

## Un-translated R commands
*Review of how `dbplyr` handles R commands that have not been translated into a like-SQL command*

Expand All @@ -131,25 +103,6 @@ airports %>%
select(today) %>%
head()
```
## knitr SQL engine

1. Copy the result of `show_query()`
```{r}
airports %>%
summarise(
avg_airport_length = mean(length(airport), na.rm = TRUE),
max_airport_name = max(length(airportname), na.rm = TRUE),
min_airport_name = min(length(airportname), na.rm = TRUE),
total_records = n()
) %>%
show_query()
```

2. Paste the result in this SQL chunk
```{sql, connection = con}
SELECT AVG(LENGTH("airport")) AS "avg_airport_length", MAX(LENGTH("airportname")) AS "max_airport_name", MIN(LENGTH("airportname")) AS "min_airport_name", COUNT(*) AS "total_records"
FROM datawarehouse.airport
```

## Using bang-bang
*Intro on passing unevaluated code to a dplyr verb*
Expand All @@ -172,9 +125,64 @@ airports %>%
```{r}
airports %>%
mutate(today = !!Sys.time()) %>%
select(today) %>%
head()
```

## knitr SQL engine

1. Copy the result of the latest `show_query()` exercise
```{r}
airports %>%
mutate(today = !!Sys.time()) %>%
show_query()
```

2. Paste the result in this SQL chunk
```{sql, connection = con}
SELECT "airport", "airportname", "city", "state", "country", "lat", "long", '2018-01-26T14:50:10Z' AS "today"
FROM datawarehouse.airport
```


## Basic aggregation
*A couple of `dplyr` commands that run in-database*

1. How many records are in the **airport** table?
```{r}
tbl(con, in_schema("datawarehouse", "airport")) %>%
tally()
```

2. What is the average character length of the airport codes? How many characters is the longest and the shortest airport name?
```{r}
airports %>%
summarise(
avg_airport_length = mean(str_length(airport), na.rm = TRUE),
max_airport_name = max(str_length(airportname), na.rm = TRUE),
min_airport_name = min(str_length(airportname), na.rm = TRUE),
total_records = n()
)
```

3. How many records are in the **carrier** table?
```{r}
carriers %>%
tally()
```

4. How many characters is the longest **carriername**?
```{r}
carriers %>%
summarise(x = max(str_length(carriername), na.rm = TRUE))
```

5. What is the SQL statement sent in exercise 4?
```{r}
carriers %>%
summarise(x = max(str_length(carriername), na.rm = TRUE)) %>%
show_query()
```

```{r, include = FALSE}
dbDisconnect(con)
Expand Down

0 comments on commit 046bf35

Please sign in to comment.