In [26]:
library('RODBC')
library('quantmod')
library('PerformanceAnalytics')
library('forecast')
source("/usr/common/config.R")
lcon <- odbcDriverConnect(sprintf("Driver={SQL Server};Server=%s;Database=%s;Uid=%s;Pwd=%s;", ldbserver, ldbname, ldbuser, ldbpassword), case = "nochange", believeNRows = TRUE)

In [27]:
lookbacks<-c(200, 500, 1000)
startDate<-as.Date("1997-01-01")
indexName<-'NIFTY 50'

prices<-sqlQuery(lcon, sprintf("select TIME_STAMP, PX_CLOSE from BHAV_INDEX
                                    where TIME_STAMP >='%s'
                                    and INDEX_NAME='%s'", startDate, indexName))

pXts<-xts(prices[,2], as.Date(prices[,1]))
retXts<-dailyReturn(pXts)

What is the best ARIMA fit for the entire time-series?

In [28]:
aFit<-auto.arima(retXts)
print(aFit)
aOrder<-arimaorder(aFit)
print(c(aOrder[1], aOrder[2], aOrder[3]))

Series: retXts 
ARIMA(1,0,1) with non-zero mean 

Coefficients:
          ar1     ma1   mean
      -0.3893  0.4538  6e-04
s.e.   0.1587  0.1538  2e-04

sigma^2 estimated as 0.0002378:  log likelihood=14026.2
AIC=-28044.4   AICc=-28044.39   BIC=-28018.25
[1] 1 0 1


Calculate the ARIMA fit for rolling windows of different lenghts. If the series are alike, then the fit should be the same.

In [29]:
arimaFits<-data.frame(LB=0, p=0, d=0, q=0, stringsAsFactors = F)
for(i in 1:length(lookbacks)){
    lb<-as.numeric(lookbacks[i])
    autoFitpdq<-rollapply(retXts, lb, FUN=function(x){
        aFit<-auto.arima(x)
        aOrder<-arimaorder(aFit)
        c(aOrder[1], aOrder[2], aOrder[3])
    }, by.column=F)

    autoFitpdq<-na.omit(autoFitpdq)
    names(autoFitpdq)<-c("p", "d", "q")
    autoFitpdq$LB<-lb
    arimaFits<-rbind(arimaFits, data.frame(autoFitpdq))
}
arimaFits<-arimaFits[-1,]
arimaFits<-na.omit(arimaFits)

print(head(arimaFits))
print(tail(arimaFits))

            LB p d q
1997-10-27 200 0 0 1
1997-10-28 200 0 0 1
1997-10-29 200 0 0 0
1997-10-30 200 0 0 0
1997-11-03 200 0 0 0
1997-11-04 200 0 0 0
              LB p d q
2017-06-062 1000 2 0 2
2017-06-072 1000 1 0 1
2017-06-082 1000 1 0 1
2017-06-092 1000 2 0 1
2017-06-122 1000 1 0 1
2017-06-132 1000 2 0 1


What is the best fit for different windows?
ARIMA(0,0,0) represents white noise.
The paper(https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2979108) says that ARIMA(1,1,1) is the best fit for NIFTY. Really?

In [50]:
arimaFits2<-arimaFits
arimaFits2$MODEL<-sprintf("%d,%d,%d", arimaFits2$p, arimaFits2$d, arimaFits2$q)
bestFit<-aggregate(cbind(count=LB) ~ MODEL+LB, arimaFits2, FUN = function(x){NROW(x)})

for(i in 1:length(lookbacks)){
    lb<-as.numeric(lookbacks[i])
    whiteNoise<-length(arimaFits2[arimaFits2$LB==lb & arimaFits2$p==0 & arimaFits2$d==0 & arimaFits2$q==0,1])
    fit111<-length(arimaFits2[arimaFits2$LB==lb & arimaFits2$p == 1 & arimaFits2$d == 1 & arimaFits2$q == 1,1])
    
    bestFitSubSet<-bestFit[bestFit$LB==lb,]
    bestModelCount<-max(bestFitSubSet[bestFitSubSet$MODEL != "0,0,0",]$count)
    bestModel<-first(bestFitSubSet[bestFitSubSet$MODEL != "0,0,0" & bestFitSubSet$count==bestModelCount,]$MODEL)
    
    total<-length(arimaFits2[arimaFits2$LB==lb,1])
    
    print(sprintf("For lookback=%d: white noise = %.2f%%, (1,1,1): %.2f%%, best: %s (%.2f%%)", lb, 100.0*whiteNoise/total, 100.0*not111/total, bestModel, 100.0*bestModelCount/total))
    print(head(bestFitSubSet[order(bestFitSubSet$count, decreasing=T),], 5))
}


[1] "For lookback=200: white noise = 56.40%, (1,1,1): 0.00%, best: 1,0,1 (11.05%)"
   MODEL  LB count
1  0,0,0 200  2761
8  1,0,1 200   541
2  0,0,1 200   340
16 2,0,2 200   326
7  1,0,0 200   245
[1] "For lookback=500: white noise = 41.96%, (1,1,1): 0.00%, best: 1,0,1 (11.56%)"
   MODEL  LB count
41 0,0,0 500  1928
48 1,0,1 500   531
56 2,0,2 500   507
42 0,0,1 500   362
57 2,0,3 500   270
[1] "For lookback=1000: white noise = 26.96%, (1,1,1): 0.00%, best: 3,0,2 (14.90%)"
    MODEL   LB count
91  0,0,0 1000  1104
112 3,0,2 1000   610
92  0,0,1 1000   576
97  1,0,1 1000   341
104 2,0,2 1000   278
