## <center> Time Series Models with Multiple Linear Regression

### Outcomes
- Utilize linear regression to model time series data
- Build multiple time series models at once
- Be prepared for Mod 4 project

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

<a id="acs"></a>

In [None]:
df = pd.read_csv('australia_drug_sales.csv',index_col=0)
df.head()

In [None]:
df['time'].mod(1)

In [None]:
(df['time'].mod(1)*12)+1

<a id="settimeseries"></a>

In [None]:
df['month'] = ((df['time'].mod(1)*12)+1).round(0).astype(int)
df['year'] = df['time'].astype(int)
df['day'] = np.ones(len(df))
df.head()

In [None]:
df['date'] = pd.to_datetime(df[['year','month','day']])
df.index = pd.DatetimeIndex(df['date'])
df.drop(['time', 'month', 'year', 'day', 'date'],axis=1,inplace=True)
df.head()

<a id="exploretimeseries"></a>

In [None]:
df.plot()

In [None]:
# Import and apply seasonal_decompose()
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(df)
# Gather the trend, seasonality, and residuals 
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
# Plot gathered statistics
plt.figure(figsize=(12,8))
plt.subplot(411)
plt.plot(df, label='Original', color='blue')
plt.legend(loc='best')
plt.subplot(412)
plt.plot(trend, label='Trend', color='blue')
plt.legend(loc='best')
plt.subplot(413)
plt.plot(seasonal,label='Seasonality', color='blue')
plt.legend(loc='best')
plt.subplot(414)
plt.plot(residual, label='Residuals', color='blue')
plt.legend(loc='best')
plt.tight_layout()

In [None]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plot_acf(df,lags=25); plt.xlim(0,25); plt.show()
plot_pacf(df,lags=25); plt.xlim(0,25); plt.ylim(-1,1);plt.show()

<a id="autoarima"></a>

In [None]:
from pmdarima.arima import auto_arima
test_period = 6*12
arima = auto_arima(df['value'][:-test_period], trace=True, error_action='ignore', suppress_warnings=True, seasonal=True, m=12)

In [None]:
arima_forecast = arima.predict(n_periods=test_period)

In [None]:
from sklearn.metrics import mean_squared_error
mean_squared_error(df['value'][-test_period:], arima_forecast)

### <center> Generating Features
<a id="regressionfeatures"></a>

- DateTime components (Year, Month, Day)
- Previous values (Yesterday's value, last week's value, etc.)
- Polynomial terms (Squared and cubed previous values)
- Interaction terms (Yesterday's value times last week's value)

In [None]:
## Add year and month features
df['Year'] = df.index.year - np.min(df.index.year)
df['Month'] = df.index.month
date_features = ['Year', 'Month']
df.head()

In [None]:
## Add polynomials of datetime components
polynomial_terms = [2,3]
for feature in date_features:
    for i in polynomial_terms:
        df[feature+'**'+str(i)] = df[feature]**i
df.head()

In [None]:
## Add previous values and polynomial terms of previous values
previous_values_range = 10*12
for i in range(1,previous_values_range):
    df['Previous'+str(i)] = df['value'].shift(i).bfill()
    for j in polynomial_terms:
        df['Previous'+str(i)+'**'+str(j)] = (df['value'].shift(i).bfill())**j
df.head()

<a id="lassomodel"></a>

In [None]:
## fit lasso regression
from sklearn.linear_model import LassoLarsCV
reg = LassoLarsCV(cv=5).fit(df.drop('value',axis=1)[:-test_period], df['value'][:-test_period])

In [None]:
## view features selected
selected_features = pd.DataFrame()
selected_features['Feature'] = df.drop('value',axis=1).columns[reg.coef_>0]
selected_features['Coefficient'] = reg.coef_[reg.coef_>0]
selected_features

<a id="forecasting"></a>

In [None]:
## create and populate forecast dataframe
forecast_df = df.copy()
for datetime, date in zip(df.index[-test_period:], range(len(df)-test_period, len(df))):
    values = []
    ## add datetime components
    values.append(datetime.year - np.min(df.index.year))
    values.append(datetime.month)
    ## add polynomial terms of datetime components
    for feature in date_features:
        for i in polynomial_terms:
            values.append((forecast_df[feature][date])**i)
    ## add previous values and polynomial terms of previous values
    for i in range(1,previous_values_range):
        values.append(forecast_df['value'][date-i])
        for j in polynomial_terms:
            values.append((forecast_df['value'][date-i])**j)
    ## make prediction on current datetime
    forecast = reg.predict(np.array(values).reshape(1,-1))
    ## append prediction to start of values array
    values.insert(0, forecast[0])
    ## set forecast row in dataframe
    forecast_df.loc[datetime] = values

<a id="comparison"></a>

In [None]:
print('ARIMA MSE:', mean_squared_error(df['value'][-test_period:], arima_forecast))
print('Regression MSE:', mean_squared_error(df['value'][-test_period:], forecast_df['value'][-test_period:]))

In [None]:
plt.plot(df.index[-test_period:], df['value'][-test_period:])
plt.plot(df.index[-test_period:], arima_forecast)
plt.title('ARIMA Forecast')
plt.legend(['Actual', 'ARIMA'])

In [None]:
plt.plot(df.index[-test_period:], df['value'][-test_period:])
plt.plot(df.index[-test_period:], forecast_df['value'][-test_period:])
plt.title('Regression Forecast')
plt.legend(['Actual', 'Regression'])

## <center> Residuals

In [None]:
plt.plot(df.index[:-test_period], arima.resid())
plt.title('ARIMA Residuals')

In [None]:
regression_resid = df['value'][:-test_period]-reg.predict(df.drop('value',axis=1)[:-test_period])
plt.plot(regression_resid)
plt.title('Regression Residuals')

In [None]:
## ACF and PACF of residuals
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plot_acf(arima.resid(),lags=48); plt.xlim(0,48); plt.show()
plot_pacf(arima.resid(),lags=48); plt.xlim(0,48); plt.ylim(-1,1);plt.show()

In [None]:
plot_acf(regression_resid,lags=48); plt.xlim(0,48); plt.show()
plot_pacf(regression_resid,lags=48); plt.xlim(0,48); plt.ylim(-1,1);plt.show()

### <center> Another example

<a id="loaddata2"></a>

In [None]:
sunspots_df = pd.read_csv('Sunspots.csv')
sunspots_df.head()

<a id="settimeseries2"></a>

In [None]:
sunspots_df.index = pd.DatetimeIndex(sunspots_df['Month'])
sunspots_df.drop('Month', axis=1, inplace=True)
sunspots_df.head()

In [None]:
sunspots_df.plot()

In [None]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
lags = 36
plot_acf(sunspots_df,lags=lags); plt.xlim(0,lags); plt.show()
plot_pacf(sunspots_df,lags=lags); plt.xlim(0,lags); plt.ylim(-1,1);plt.show()

<a id="autoarima2"></a>

In [None]:
arima = auto_arima(sunspots_df['Sunspots'][:-test_period], trace=True, error_action='ignore', suppress_warnings=True, seasonal=True, m=12)

In [None]:
arima_forecast = arima.predict(n_periods=test_period)

<a id="regressionfeatures2"></a>

In [None]:
sunspots_df['Year'] = sunspots_df.index.year - np.min(sunspots_df.index.year)
sunspots_df['Month'] = sunspots_df.index.month
date_features = ['Year', 'Month']
sunspots_df.head()

In [None]:
## Add polynomials of datetime components
polynomial_terms = [2,3]
for feature in date_features:
    for i in polynomial_terms:
        sunspots_df[feature+'**'+str(i)] = sunspots_df[feature]**i
sunspots_df.head()

In [None]:
## Add previous values and polynomial terms of previous values
previous_values_range = 10*12
for i in range(1,previous_values_range):
    sunspots_df['Previous'+str(i)] = sunspots_df['Sunspots'].shift(i).bfill()
    for j in polynomial_terms:
        sunspots_df['Previous'+str(i)+'**'+str(j)] = (sunspots_df['Sunspots'].shift(i).bfill())**j
sunspots_df.head()

<a id="lassomodel2"></a>

In [None]:
## fit lasso regression
from sklearn.linear_model import LassoLarsCV
test_period = 20*12
reg = LassoLarsCV(cv=10).fit(sunspots_df.drop('Sunspots',axis=1)[:-test_period], sunspots_df['Sunspots'][:-test_period])

In [None]:
## view features selected
selected_features = pd.DataFrame()
selected_features['Feature'] = sunspots_df.drop('Sunspots',axis=1).columns[reg.coef_>0]
selected_features['Coefficient'] = reg.coef_[reg.coef_>0]
selected_features.sort_values('Coefficient')

<a id="forecasting2"></a>

In [None]:
## create and populate forecast dataframe
forecast_df = sunspots_df.copy()
for datetime, date in zip(sunspots_df.index[-test_period:], range(len(sunspots_df)-test_period, len(sunspots_df))):
    values = []
    ## add datetime components
    values.append(datetime.year - np.min(sunspots_df.index.year))
    values.append(datetime.month)
    ## add polynomial terms of datetime components
    for feature in date_features:
        for i in polynomial_terms:
            values.append((forecast_df[feature][date])**i)
    ## add previous values and polynomial terms of previous values
    for i in range(1,previous_values_range):
        values.append(forecast_df['Sunspots'][date-i])
        for j in polynomial_terms:
            values.append((forecast_df['Sunspots'][date-i])**j)
    ## make prediction on current datetime
    forecast = reg.predict(np.array(values).reshape(1,-1))
    ## append prediction to start of values array
    values.insert(0, forecast[0])
    ## set forecast row in dataframe
    forecast_df.loc[datetime] = values

<a id="comparison2"></a>

In [None]:
print('ARIMA MSE:', mean_squared_error(sunspots_df['Sunspots'][-test_period:], arima_forecast))
print('Regression MSE:', mean_squared_error(sunspots_df['Sunspots'][-test_period:], forecast_df['Sunspots'][-test_period:]))

In [None]:
plt.plot(sunspots_df.index[-test_period:], sunspots_df['Sunspots'][-test_period:])
plt.plot(sunspots_df.index[-test_period:], arima_forecast)
plt.title('ARIMA Forecast')
plt.legend(['Actual', 'ARIMA'])

In [None]:
plt.plot(sunspots_df.index[-test_period:], sunspots_df['Sunspots'][-test_period:])
plt.plot(sunspots_df.index[-test_period:], forecast_df['Sunspots'][-test_period:])
plt.title('Regression Forecast')
plt.legend(['Actual', 'Regression'])

In [None]:
plt.plot(sunspots_df.index[:-test_period], arima.resid())
plt.title('ARIMA Residuals')

In [None]:
regression_resid = sunspots_df['Sunspots'][:-test_period]-reg.predict(sunspots_df.drop('Sunspots',axis=1)[:-test_period])
plt.plot(regression_resid)
plt.title('Regression Residuals')

In [None]:
lags = 48  ## ARIMA
plot_acf(arima.resid(),lags=lags); plt.xlim(0,lags); plt.show()
plot_pacf(arima.resid(),lags=lags); plt.xlim(0,lags); plt.ylim(-1,1);plt.show()

In [None]:
lags = 48 ## Regression
plot_acf(regression_resid,lags=lags); plt.xlim(0,lags); plt.show()
plot_pacf(regression_resid,lags=lags); plt.xlim(0,lags); plt.ylim(-1,1);plt.show()

## <center> Activity

Using the data in <i>stock_data.csv</i>, your goal is to suggest the best stock for investment in order to make the biggest return if sold tomorrow. <br> <br>
Use only the 'Close' data to build time series models for each stock using linear regression.<br><br>
Use the most recent year of data to test your models on. <br> <br>

1) Store seperate dataframes for each stock in a list.<br><br>
2) For each dataframe in the list, generate features from the datetime data.<br><br>
3) Build a LassoLars model for each stock.<br><br>
4) For each stock, forecast each date in the test data in a stepwise fashion.<br><br>
5) Compare the predictions to the actual test data.<br><br>
<b>Bonus:</b> Determine the stock which is predicted to gain the most in value over the next day.

1. Australian Corticosteroid Sales <br>
    1.1 [Reading in data](#loaddata) <br>
    1.2 [Setting up time series](#settimeseries)<br>
    1.3 [Exploring time series](#exploretimeseries)<br>
    1.4 [Fitting auto-ARIMA](#autoarima)<br>
    1.5 [Generating regression features](#regressionfeatures)<br>
    1.6 [Fitting Lasso model](#lassomodel)<br>
    1.7 [Forecasting with regression model](#forecasting)<br>
    1.8 [Comparing ARIMA and regression](#comparison)<br>
1. Sunspots Data<br>
    1.1 [Reading in data](#loaddata2)<br>
    1.2 [Setting up time series](#settimeseries2)<br>
    1.3 [Exploring time series](#exploretimeseries2)<br>
    1.4 [Fitting auto-ARIMA](#autoarima2)<br>
    1.5 [Generating regression features](#regressionfeatures2)<br>
    1.6 [Fitting Lasso model](#lassomodel2)<br>
    1.7 [Forecasting with regression model](#forecasting2)<br>
    1.8 [Comparing ARIMA and regression](#comparison2)<br>

In [None]:
## Create list of dataframes

In [None]:
## Iterate through dataframes, performing feature generation and storing dataframe in new list

In [None]:
## Fit a LassoLars regression model to each stock's data

In [None]:
## For each model, iterate through the test dates, 
## calculating and generating values and forecasting the next value

In [None]:
## Compare each model's predictions to true data

In [None]:
## Predict next day stock values for each stock, determine the biggest increase in stock value