# ISLR-Python: Ch5 Applied 7

- [Load Weekly Dataset](#Load-Weekly-Dataset)
- [A. Fit Logistic Model for Market Direction](#A.-Fit-Logistic-Model-for-Market-Direction)
- [B. Fit Logistic Model Leaving Out One Observation](#B.-Fit-Logistic-Model-Leaving-Out-One-Observation)
- [C. Use One-Out Model to Predict Unused Observation](#C.-Use-One-Out-Model-to-Predict-Unused-Observation)
- [D-E. Perform LOOCV](#D-E.-Perform-LOOCV)

In [1]:
## perform imports and set-up
import numpy as np
import pandas as pd
import scipy
import statsmodels.api as sm

from matplotlib import pyplot as plt

%matplotlib inline
plt.style.use('ggplot') # emulate pretty r-style plots

# print numpy arrays with precision 4
np.set_printoptions(precision=4)

## Load Weekly Dataset

In [2]:
df = pd.read_csv('../data/Weekly.csv', true_values=['Up'], false_values=['Down'])
df.head()

Unnamed: 0,Year,Lag1,Lag2,Lag3,Lag4,Lag5,Volume,Today,Direction
0,1990,0.816,1.572,-3.936,-0.229,-3.484,0.154976,-0.27,False
1,1990,-0.27,0.816,1.572,-3.936,-0.229,0.148574,-2.576,False
2,1990,-2.576,-0.27,0.816,1.572,-3.936,0.159837,3.514,True
3,1990,3.514,-2.576,-0.27,0.816,1.572,0.16163,0.712,True
4,1990,0.712,3.514,-2.576,-0.27,0.816,0.153728,1.178,True


## A. Fit Logistic Model for Market Direction

When we loaded the dataframe we assert that 'Up' directions be encoded as true and 'Down' market directions be encoded as False. We now fit a logistic model to the data using the lag1 and lag2 variables as the predictors and the direction as the response class.

In [5]:
# Construct Design matrix #
###########################
X = sm.add_constant(df[['Lag1', 'Lag2']])
y = df.Direction

# Model and Fit #
#################
results = sm.Logit(y,X).fit()
print(results.summary())

Optimization terminated successfully.
         Current function value: 0.683297
         Iterations 4
                           Logit Regression Results                           
Dep. Variable:              Direction   No. Observations:                 1089
Model:                          Logit   Df Residuals:                     1086
Method:                           MLE   Df Model:                            2
Date:                Wed, 27 Jul 2016   Pseudo R-squ.:                0.005335
Time:                        14:52:03   Log-Likelihood:                -744.11
converged:                       True   LL-Null:                       -748.10
                                        LLR p-value:                   0.01848
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          0.2212      0.061      3.599      0.000         0.101     0.342
Lag1          -0.0387      0.

## B. Fit Logistic Model Leaving Out One Observation

We are working our way slowly up to a function that will perform LOOCV. As a first step we will leave out the 1st observation and try to predict Market Direction for this observation from the remaining observations.

In [13]:
# Construct Design matrix #
###########################
X_oo = sm.add_constant(df[['Lag1', 'Lag2']]).loc[1:]
y_oo = df.Direction.loc[1:]

# Model and Fit #
#################
results_oo = sm.Logit(y_oo,X_oo).fit(disp = 0)

 ## C. Use One-Out Model to Predict Unused Observation

In [18]:
y_oo_predicted = results_oo.predict(X.loc[0])
print(y_oo_predicted > 0.5)

[ True]


So the model incorrectly classified the 1st observation as 'Up' when in fact it is 'Down'.

## D-E. Perform LOOCV

#### In this exercise we are asked to perform LOOCV by hand; that is compute in a for loop (1) leave out the ith observation, (2) construct model and fit (3) predict the market direction for the left out observation (4) compute the error rate of the model's predictions

In [32]:
y_predictions = np.array([])

for obs in range(len(df)):
    indices = df.index
    # use list slicing and concatenate to generate a list without obs
    X_train = sm.add_constant(df[['Lag1','Lag2']]).loc(np.concatenate((indices[:obs],indices[obs+1:]),axis=0))
    y_train = df['Direction'].loc(np.concatenate((indices[:obs], indices[obs+1:]),axis=0))
    
    # fit the model on the training observation
    result = sm.Logit(y,X).fit(disp=0)
    
    # predict market direction for the left out obs and append
    y_predictions = np.append(y_predictions, result.predict(sm.add_constant(df[['Lag1', 'Lag2']]).loc[obs]))
    
# Compare the y_predictions with the actual market directions to get Error Rate
y_predictions = (y_predictions > 0.5)
print('LOOCV Error Rate =', np.mean(y_predictions != df.Direction.values))

LOOCV Error Rate = 0.444444444444


In Chapter 4, "Classification", we fit a logistic regression model in Applied question 10 using all of the variables as predictors we determined the error rate to be 43%. This value is close to that previous finding. We noted in that model that Lag2 was the most significant variable p=0.03.