# Time-Series Data Visualization with R and ggplot2

Simone Santoni  
2024-11-27

This notebook shows how to create some key data visualization for time
series data. The first half of the notebook deals with classic visual
forms in the field of time-series analysis. The second half highlights
some specialized, data visualization libraries for time series data.

# Notebook setup

## Load (time-series) data visualization libraries

Commong R libraries for time-series visualization include:

-   `dygraphs`
-   `ggplot2`
-   `ggplot2`’s extension `ggTimeSeries`

In [None]:
library(ggplot2)
library(ggthemes)
library(ggTimeSeries)
library(dygraphs)

## Load further libraries for time-series data

We are also installing two libraries for fetching stock market data and
manipulating time-series data

In [None]:
library(tseries)
library(quantmod)

# Time series data structure

## Toy data in R’s `tseries` package

Let us consider a few examples, available in the package `tseries`.

### Historical data on wheat price

In [None]:
data("bev")
bev

Time Series:
Start = 1500 
End = 1869 
Frequency = 1 
  [1]  17.0  19.0  20.0  15.0  13.0  14.0  14.0  14.0  14.0  11.0  16.0  19.0
 [13]  23.0  18.0  17.0  20.0  20.0  18.0  14.0  16.0  21.0  24.0  15.0  16.0
 [25]  20.0  14.0  16.0  25.5  25.8  26.0  26.0  29.0  20.0  18.0  16.0  22.0
 [37]  22.0  16.0  19.0  17.0  17.0  19.0  20.0  24.0  28.0  36.0  20.0  14.0
 [49]  18.0  27.0  29.0  36.0  29.0  27.0  30.0  38.0  50.0  24.0  25.0  30.0
 [61]  31.0  37.0  41.0  36.0  32.0  47.0  42.0  37.0  34.0  36.0  43.0  55.0
 [73]  64.0  79.0  59.0  47.0  48.0  49.0  45.0  53.0  55.0  55.0  54.0  56.0
 [85]  52.0  76.0 113.0  68.0  59.0  74.0  78.0  69.0  78.0  73.0  88.0  98.0
 [97] 109.0 106.0  87.0  77.0  77.0  63.0  70.0  70.0  63.0  61.0  66.0  78.0
[109]  93.0  97.0  77.0  83.0  81.0  82.0  78.0  75.0  80.0  87.0  72.0  65.0
[121]  74.0  91.0 115.0  99.0  99.0 115.0 101.0  90.0  95.0 108.0 147.0 112.0
[133] 108.0  99.0  96.0 102.0 105.0 114.0 103.0  98.0 103.0 101.0 110.0 109.0
[145]  98.

### Monthly wine sales

In [None]:
setwd("/home/simone/githubRepos/data-viz-smm635/data") 
ws <- read.csv("wineSales/monthly-australian-wine-sales.csv", header=TRUE)
ws.ts <- ts(ws[,2], frequency=12, start=c(1985,1))
ws.ts

       Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
1985 15136 16733 20016 17708 18019 19227 22893 23739 21133 22591 26786 29740
1986 15028 17977 20008 21354 19498 22125 25817 28779 20960 22254 27392 29945
1987 16933 17892 20533 23569 22417 22084 26580 27454 24081 23451 28991 31386
1988 16896 20045 23471 21747 25621 23859 25500 30998 24475 23145 29701 34365
1989 17556 22077 25702 22214 26886 23191 27831 35406 23195 25110 30009 36242
1990 18450 21845 26488 22394 28057 25451 24872 33424 24052 28449 33533 37351
1991 19969 21701 26249 24493 24603 26485 30723 34569 26689 26157 32064 38870
1992 21337 19419 23166 28286 24570 24001 33151 24878 26804 28967 33311 40226
1993 20504 23060 23562 27562 23940 24584 34303 25517 23494 29095 32903 34379
1994 16991 21109 23740 25552 21752 20294 29009 25500 24166 26960 31222 38641
1995 14672 17543 25453 32683 22449 22316 27595 25451 25421 25288 32568 35110
1996 16052 22146 21198 19543 22084 23816 29961 26773 26635 26972 30207 38687

## Stock market data

We can also fetch stock market data using the `quantmod` package. Here,
we are fetching data on Nvidia stock price.

In [None]:
# custom function
fetch_stock <- function(tkr, sd, ed) {
  data <- getSymbols(tkr, src = "yahoo", from = sd, to = ed, auto.assign = FALSE)
  colnames(data) <- gsub(paste0(tkr, "\\."), "", colnames(data))
  data <- data[, c("Volume", "Adjusted")]
  return(data)
}
# fetch data
nv <- fetch_stock(tkr = "NVDA", sd = "2023-01-01", ed = "2024-10-31")

# Terminology

A time series is said to be **continuous** when observations are made
continuously. The adjective ‘continuous’ is used for series of this type
even when the measured variable can only take a discrete set of values.

A time series is said to be **discrete** when observations are taken
only at specific times, usually equally spaced. The term ‘discrete’ is
used for series of this type even when the measured variable is a
continuous variable.

# Time-series decomposition

Time-series decomposition is a common time-series data manipulation,
which is also very helpful for data visualization purposes. The
intuition is that we can express a time series $X_{t}$ as the linear
combination of several terms: X\_{t} = T\_{t} + S\_{t}+ \_{t}
\end{equation}

where $T_{t}$ is the trend component, $S_{t}$ is seasonal component, and
$\epsilon_{t}$ is the residual component (hopefully, stochastic!!).

In [None]:
ws.de<-decompose(ws.ts, type="additive")
plot(ws.de)

# Data visualization libraries for time-series data

The library `tseries` offers some options for visualizing time-series
data, which are based on R’s built-in plotting capabilities. The package
`ggplot2` — and some of its extensions — provides further data
visualization options, which are quite flexible and customizable.

## `dygraphs` and stock market data

The library [`dygraphs`](https://rstudio.github.io/dygraphs/) is a
popular, specialized library for time-series data visualization. It is
particularly useful for stock market data, as it allows for the
visualization of time-series data in an interactive way.

The following four charts show how to visualize Nvidia stock price data
using `dygraphs` with increasing levels of customization.

In [None]:
dygraph(nv$Adjusted)

In [None]:
dygraph(
  nv$Adjusted,
  main = "Nvidia Stock Price (NVDA)",
  xlab = "Time period",
  ylab = "Adjusted price (USD)",
  width = 600,
  height = 400
)

In [None]:
dygraph(
  nv$Adjusted,
  main = "Nvidia Stock Price (NVDA)",
  xlab = "Time period", 
  ylab = "Adjusted price (USD)"
  ) %>% 
  dySeries(
    color = "red", 
    drawPoints = TRUE, 
    pointSize = 1.5, 
    pointShape = "circle"
    )

`dygraphs` also allows for the visualization of multiple time-series
data. Here, we are plotting the adjusted price of Nvidia stock and the
trade volume.

In [None]:
nv$VolumeScaled <- nv[, "Volume"] / 10000000

dygraph(
  nv[, c("Adjusted", "VolumeScaled")], 
  main = "Nvidia Stock Price (AAPL) and Trade Volume") %>% 
  dySeries(
    "Adjusted", 
    label = "Adjusted Price (USD)", 
    color = "magenta", 
    drawPoints = TRUE, 
    pointSize = 1.5, 
    pointShape = "circle") %>% 
  dySeries(
    "VolumeScaled", 
    label = "Trade Volume (10XM)", 
    stepPlot = TRUE, 
    fillGraph = TRUE, 
    color = "green"
    )

## `ggplot2`

The library `ggplot2` is a Swiss-army knife for data visualization in R,
which is also very useful for time-series data. The following code
snippet shows how to visualize unemployment data in the US using
`ggplot2`. To do so we are using the dataset `economics` available in
the package `ggplot2`.

In [None]:
df <- economics[economics$date > as.Date("2000-01-01"), ]
dim(df)

[1] 183   6

# A tibble: 6 × 6
  date         pce    pop psavert uempmed unemploy
  <date>     <dbl>  <dbl>   <dbl>   <dbl>    <dbl>
1 2000-02-01 6620. 281190     4.8     6.1     5858
2 2000-03-01 6686. 281409     4.5     6       5733
3 2000-04-01 6671. 281653     5       6.1     5481
4 2000-05-01 6708. 281877     4.9     5.8     5758
5 2000-06-01 6744. 282126     4.9     5.7     5651
6 2000-07-01 6764. 282385     5.2     6       5747

Mainly, creating a time-series plot with `ggplot2` is a matter of using
the function `geom_line()`.

In [None]:
# set theme
theme_set(theme_minimal())
# create the plot 
p <- ggplot(data = df, mapping = aes(x = date, y = unemploy))
p + geom_line()

Adding a fit line is straightforward with `ggplot2`. Here, we are adding
a linear fit line to the unemployment data.

In [None]:
p + geom_line() + geom_smooth(method = "lm", se = FALSE, color = "purple") + geom_smooth(method = "loess", se = FALSE, color="green")

## `ggplot2`s extension `ggTimeSeries`

One of the most developed extensions of `ggplot2` is
[`ggTimeSeries`](https://github.com/AtherEnergy/ggTimeSeries), which
provides at least three distinctive visual form implementations for
time-series data (see https://github.com/AtherEnergy/ggTimeSeries):

-   Waterfall charts
-   Occurence dot plots
-   Calendar heatmaps