# Using R With Quandl Data

We begin by ensuring a few key R packages are loaded

In [None]:
library(MASS)
library(Quandl)
library(ggplot2)
library(stringr)
library(R.cache)

We are likely to run this notebook over and over, so we add memoizaiton to the data loading function.  This makes testing and re-running our code far faster, and even lets us work offline.

In [None]:
reload.data = TRUE
QLoad <- R.cache::addMemoization(Quandl::Quandl)

### Load the Data

Now we have the code that loads our raw data for Exxon Mobil (XOM) and the Oil futures.

In [None]:
if (reload.data) {
    py <- plotly('brianboonstra', BrianBoonstraPrivateKeys.Plotly.API)
    Quandl.auth(BrianBoonstraPrivateKeys.Quandl)
    cat("Data Reload/n")
    xom.raw <- QLoad(c('WIKI/XOM'))
    oil.raw <- QLoad(c('NSE/OIL'))
    }


Adjust column names so that we can merge the two data sets into a single data frame for analysis

In [None]:
xom <- xom.raw
oil <- oil.raw
names(xom) <- paste("WIKI.XOM -",names(xom))
names(oil) <- paste("NSE.OIL -",names(oil))
names(xom)[1] <-"Date"
names(oil)[1] <-"Date"
raw_data = merge(xom, oil)
names(raw_data)

Peek at the data

In [None]:
head(raw_data)

### Common Quant Task: Massaging The Data

Here we create a function that cleans up column names in our data, for better-looking plot labels and plot specifications

In [None]:
clean.quandl.name <- function(x) {
    cleaned <- x
    if (x=='Date') {
        
    } else {
        tryCatch({
                    parts <- stringr::str_split(x, " - ",n=2)
                    first.parts <- stringr::str_split(parts[[1]][[1]], "\\.",n=2)
                    cleaned <- paste(first.parts[[1]][[2]], parts[[1]][[2]], sep=".")
                    cleaned <- str_replace_all(cleaned," ","")
                    },
                 error = function(e) {cat(paste0("Err on",x,"\n"))}
                 )
    }
    stringr::str_trim(cleaned)
}

Now we can use the _clean.quandl.name_ function to make a better set of column names

In [None]:
fixed.names <- lapply(names(raw_data),clean.quandl.name)
cat(paste(fixed.names,sep="\n"))
renamed_data <- raw_data
names(renamed_data) <- fixed.names

### Examining The Data

Make a standard linear fit

In [None]:
modl <- lm(XOM.Open~OIL.Open, data=renamed_data)
summary(modl)

#### Our linear fit has a *great* t-statistic.  It must be a superb description of the data!

But isn't it suspicious that the slope is negative rather than positive?  Let's take a closer look

In [None]:
gp <- ggplot(renamed_data, aes(x=OIL.Open, y=XOM.Open))+geom_point()+geom_smooth(method=lm)

In [None]:
print(gp)