<font size="6"><b>DATA VISUALIZATION</b></font>

In [None]:
library(data.table)
library(tidyverse)
library(plotly)
library(nycflights13)

In [None]:
pw1 <- getOption("repr.plot.width")
pw1

In [None]:
ph1 <- getOption("repr.plot.width")
ph1

In [None]:
options(repr.matrix.max.rows=20, repr.matrix.max.cols=30) # for limiting the number of top and bottom rows of tables printed 

![xkcd](../imagesbb/movie_narrative_charts_large.png)

(https://xkcd.com/657)

In this session, we will visualize data using ggplot2 and plotly packages

We will use the same data from nycflights13 package

# Datasets

Let's remember the tables from nycflights13 package:

In [None]:
head(airlines)

In [None]:
head(airports)

In [None]:
head(planes)

In [None]:
head(weather)

In [None]:
head(flights)

# ggplot2

Let's wrangle flights slightly:

In [None]:
flights2 <- copy(flights)

In [None]:
setDT(flights2)

In [None]:
flights2[, date1 := as.Date(time_hour)]

In [None]:
flights2 <- flights2 %>% mutate_at(vars(ends_with("time")), function(x) hm(sprintf("%s:%s", x %/% 100, x %% 100)))

In [None]:
flights2[, speed := distance / period_to_seconds(air_time) * 3600]

In [None]:
flights2

Now first let's create a line chart across dates, where y axis shows the average departure delay for each day

The dimensions are passed by `aes()` function:

In [None]:
flights2 %>%
group_by(date1) %>%
summarise_at("dep_delay", mean, na.rm = T) %>%
ggplot(aes(x = date1, y = dep_delay)) +
geom_line()

Now let's differentiate the lines by color according to the origin airport, so we add a third dimension:

In [None]:
flights2 %>%
group_by(date1, origin) %>%
summarise_at("dep_delay", mean, na.rm = T) %>%
ggplot(aes(x = date1, y = dep_delay, color = origin)) +
geom_line()

Let's try a scatter plot across average daily departure and array delays:

In [None]:
flights2 %>%
group_by(date1) %>%
summarise_at(c("dep_delay", "arr_delay"), mean, na.rm = T) %>%
ggplot(aes(x = dep_delay, y = arr_delay)) +
geom_point()

Let's add a third dimension by changing the size of points according to the flight distance, so we have a bubble chart:

In [None]:
flights2 %>%
group_by(date1) %>%
summarise_at(c("dep_delay", "arr_delay", "distance"), mean, na.rm = T) %>%
ggplot(aes(x = dep_delay, y = arr_delay, size = distance)) +
geom_point(alpha = 0.5)

And let's add a fourth dimension by changing the color of points according to the origin airport:

In [None]:
flights2 %>%
group_by(date1, origin) %>%
summarise_at(c("dep_delay", "arr_delay", "distance"), mean, na.rm = T) %>%
ggplot(aes(x = dep_delay, y = arr_delay, size = distance, color = origin)) +
geom_point(alpha = 0.5)

And we can create multiple charts for each weekday using `facet_wrap()` function, now we have five dimensions:

In [None]:
flights2 %>%
group_by(date1, origin) %>%
summarise_at(c("dep_delay", "arr_delay", "distance"), mean, na.rm = T) %>%
mutate(weekday = lubridate::wday(date1, label = T)) %>%
ggplot(aes(x = dep_delay, y = arr_delay, size = distance, color = origin)) +
geom_point(alpha = 0.5) +
facet_wrap(. ~ weekday)

And let's create a separate chart for each combinations of a weekday and a quarter (three months) using `facet_grid()` function, we have six dimensions!:

In [None]:
options(repr.plot.width = 10, repr.plot.height = 10)

In [None]:
flights2 %>%
group_by(date1, origin) %>%
summarise_at(c("dep_delay", "arr_delay", "distance"), mean, na.rm = T) %>%
mutate(weekday = lubridate::wday(date1, label = T)) %>%
mutate(mnt = month(date1)) %>%
mutate(quartx = paste("Q", (mnt - 1) %/% 3 + 1), sep = "") %>%
ggplot(aes(x = dep_delay, y = arr_delay, size = distance, color = origin)) +
geom_point(alpha = 0.5) +
facet_grid(weekday ~ quartx)

# plotly

So far we had static charts with no interactions, just pictures.

Now let's do something fancy very easily:

- First create a ggplot chart and assign to a named object
- Call that object with `ggplotly()` function from plotly package

In [None]:
pl1 <- flights2 %>%
group_by(date1, origin) %>%
summarise_at(c("dep_delay", "arr_delay", "distance"), mean, na.rm = T) %>%
ggplot(aes(x = dep_delay, y = arr_delay, size = distance, color = origin)) +
geom_point(alpha = 0.5)

In [None]:
ggplotly(pl1)

It is the same chart, but we can hover over points and see the data in pop-up tooltip, we can switch colors on and off, zoom, pan, etc.

Faceted charts can also be converted to plotly:

In [None]:
pl2 <- flights2 %>%
group_by(date1, origin) %>%
summarise_at(c("dep_delay", "arr_delay", "distance"), mean, na.rm = T) %>%
mutate(weekday = lubridate::wday(date1, label = T)) %>%
ggplot(aes(x = dep_delay, y = arr_delay, size = distance, color = origin)) +
geom_point(alpha = 0.5) +
facet_wrap(. ~ weekday)

In [None]:
ggplotly(pl2)

We can also create animated charts using plotly's own syntax and passing a feature for the `frame` dimensions:

In [None]:
flights2 %>%
group_by(date1, origin) %>%
summarise_at(c("dep_delay", "arr_delay", "distance"), mean, na.rm = T) %>%
mutate(mnt = month(date1)) %>%
plot_ly(x = ~dep_delay, y = ~arr_delay) %>%
add_trace(marker = list(color = origin), frame = ~mnt, type = "scatter") %>%
animation_opts(
    frame = 200, redraw = T, easing = "linear", mode = "next"
)

We can even create 3d interactive rotatable charts easily in plotly:

In [None]:
flights3 <- flights2 %>%
group_by(date1, origin) %>%
summarise_at(c("dep_delay", "arr_delay", "distance"), mean, na.rm = T) %>%
mutate(mnt = month(date1))

In [None]:
if (T)
{    
    plot_ly() %>% 
          add_trace(data = flights3,  x = flights3$dep_delay, y = flights3$arr_delay, z = flights3$distance, type="mesh3d") %>%
            layout(autosize = F, width = 800, height = 800,
                  scene = list(xaxis = list(title = "dep_delay"),
                  yaxis = list(title = "arr_delay"),
                  zaxis = list(title = "distance")))
}