## Autoregressive Time Series Model
----

When analyzing time series data, we have a sequence a time series values $\{x_i\}$ that are indexed by time values in order. The autoregressive model $AR(p)$ attempts to predict the next value in the series by making a linear regression using the $p$ previous points. 

Since we have already programmed linear regression in a previous folder, we will use Sci-Kit Learns implementation of linear regression to fit the coefficient values.

In [93]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
class autoregressive(object):
    '''Produce an AR(p) model for time series data.'''
    
    
    def __init__(self, p):
        '''Initialize the order p.'''
        self.p = p
        
        
    def fit(self, time_data, holdout_percent):
        '''Enter time series training data to be fit on the model in the form of a 
        list, numpy array, and pandas series. This will automatically hold out data
        for unseen data statistics. 
        The holdout_percentage is the ammount of data held out for evaluation of the 
        model. Input an integer between 1 and 99 for this value.'''
        
        data = np.array(time_data)
        
        percent = holdout_percent/100
        
        #Values to predict
        y = data[self.p:]
        #Lagged data as features
        features = np.array([data[i:i+self.p] for i in range(len(data) - self.p)])
        #Create holdout data for evaluation
        X_train, X_test, y_train, y_test = train_test_split(features, y, test_size=percent, random_state=42)
        
        #Now fit the linear regression
        self.ar_model = LinearRegression().fit(X_train, y_train)
        self.coef_, self.intercept_ = self.ar_model.coef_, self.ar_model.intercept_
        self.train_score = self.ar_model.score(X_train, y_train)
        self.test_score = self.ar_model.score(X_test, y_test)
        self.predict = self.ar_model.predict
        
        output1 = "The Autoregressive model of order {} has been fit to your time series data.\n".format(self.p)
        output2 = "    Train r-squared statistic: {}.\n".format(self.train_score)
        output3 = "     Test r-squared statistic: {}.".format(self.test_score)
        print(output1+output2+output3)

In [94]:
x = np.array([i for i in range(1000)])+np.random.uniform(0, 7, size = 1000)
x[:10]

array([ 4.20064476,  6.17198984,  5.31641965,  7.20035394,  9.36318397,
        9.94404053,  8.4560752 , 12.25984297, 12.56477447, 12.01135715])

In [104]:
f = autoregressive(4)
f.fit(x, 33)

The Autoregressive model of order 4 has been fit to your time series data.
    Train r-squared statistic: 0.9999387950054849.
     Test r-squared statistic: 0.9999416686076599.


In [105]:
f.coef_

array([0.29008498, 0.19697016, 0.24111882, 0.27160835])

In [106]:
f.intercept_

2.7061646056939708

In [108]:
f.predict(np.array([[1,2,3,4],[20,19,22,24]]))

array([ 5.19997979, 24.0735119 ])