# ML: Time Series Analysis with zoo and dynlm
In this notebook, we demonstrate simple time series analyses using the `zoo` package.

        The package zoo provides an S3 class and methods for indexed totally ordered
        observations, such as both regular and irregular time series.

        An indexed object of class "zoo" can be thought of as data plus index
        where the data are essentially vectors or matrices and the index can be
        a vector of (in principle) arbitrary class.
        
Time Series regression is performed with the `dynlm` package.

## Preparatory code

In [None]:
# essentials
library(dplyr)
library(magrittr)
library(ggplot2)

# best package for dates
# https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
library(lubridate)

google_stock <- readRDS("small_data/google_stock.RDS")

## Working with dates is simple with lubridate (and dplyr)

In [None]:
google_stock$Date %<>% ymd

In [None]:
google_stock %>% glimpse

In [None]:
# easy filter with dplyr
# note: jupyter seems to have an issue with `scale_x_date()`
google_stock %>%
    filter(Date > "2015-01-01") %>%
    qplot(Date, Open, data=.) # + scale_x_date()

In [None]:
#qplot(Date, Open, data=google_stock) + geom_line() + scale_x_date()

## Time series analysis with zoo

In [None]:
library(zoo)

### Handling missing data
Make sure this is appropriate!


(need to have missing values. this data does not)

    ## linear interpolation
    interpolated  <- na.approx(google_stock$Date)

    ## last observation carried forward
    locf <- na.locf(google_stock$Date)

### Convert a data frame to zoo object
Details: https://cran.r-project.org/web/packages/zoo/vignettes/zoo-read.pdf

In [None]:
# convert to zoo object
google <- google_stock %>%
    select(Date, Open, Close, Volume) %>% as.data.frame %>% read.zoo 

In [None]:
# use as.data.frame only for printing
google %>% as.data.frame %>% head

In [None]:
# automatic paneling with zoo object
plot(google)

In [None]:
plot(google %>% log %>% diff)

### Obtain lagged values

In [None]:
google$Open %>% as.data.frame %>% head

In [None]:
# lag 2 values back
lag(google$Open, 2) %>% as.data.frame %>% head

### Compute rolling/moving average and other statistics

In [None]:
## apply mean over a window of 2 time points
rollapply(google, 2, mean) %>% as.data.frame %>% head

In [None]:
# options: align and fill 
rollapply(google, 2, mean, align="right") %>% as.data.frame %>% head
rollapply(google, 2, mean, fill=NA, align = "right")%>% as.data.frame %>% head

In [None]:
# more efficient implementation of common functions
# rollmean(), rollmedian(), rollmax()
rollmax(google, 2, align="right", fill=NA) %>% as.data.frame %>% head

## Regression with dynlm

In [None]:
library(broom)
library(dynlm)

In [None]:
dynlm(Open ~ lag(Open, 1), data=google) %>% summary

In [None]:
# tidy output with broom!
dynlm(Open ~ lag(Open, 1), data=google) %>% tidy

In [None]:
dynlm(Open ~ lag(Open, 1), data=google) %>% summary %>% glance

In [None]:
# add more predictors
dynlm(Open ~ lag(Open, 1) + lag(Volume, 2), data=google) %>% summary

## Resources

[The lubridate vignette from CRAN](https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html)

[Convert a data frame to zoo object [PDF]](https://cran.r-project.org/web/packages/zoo/vignettes/zoo-read.pdf)

*Copyright &copy; 2016 The Data Incubator.  All rights reserved.*