# Introduction to Machine Learning, UZH 2018, Group Project
### Group 2: Barbara Capl, Mathias Lüthi, Pamela Matias, Stefanie Rentsch
##       
# 3. Classification / Prediction 
# A. with Multiple Logistic Regression

In this section we use the feature matrices and response vectors with features selected in chapter 2.  

#### We use two different versions (created in chapter 1, features-selected in chapter 2):
Version 1: Feature Matrix consists only of the Ratios                                                                        
Version 2: Feature Matrix consists of Ratios + dummy variables for seasonality + other market data
####  We will do Classification and Prediction with Single and Multiple Regression


In [1]:
# hide unnecessary warnings ("depreciation" of packages etc.)
import warnings
warnings.filterwarnings('ignore')

# Load Packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf
from sklearn import metrics

plt.style.use("seaborn-whitegrid")
%matplotlib inline 

## 3.0. SETTINGS:

### (1) Choose the Dataset Version you want

##### Reduced Feature Matrix (Features pre-selected)
VERSION = 1; Reduced Feature Matrix with only ratios                                  
VERSION = 2;  Reduced Feature Matrix with ratios + saisonality + other market data

In [2]:
### Choose which dataset version you want the selection of features and the prediction to be based on 
VERSION = 1.1
"""
INSERT NUMBER 1.1 or 2.1
"""


# Defining sel_state variable for usage later and printing text according to your choise   
if VERSION == 1.1:
    sel_version = 'based on reduced Dataset with only Ratios as predictive Features.'
elif VERSION == 2.1:
    sel_version = 'based on reduced Dataset with Ratios + Seasonality + other Market   Data as predictive Features.'
else: raise ValueError('VERSION must be either 1.1 or 2.1')
print('You chose VERSION '+str(VERSION)+' as working dataset. '+'\n'+'The following Classification/ Prediction will be therefore ' 
          + sel_version)

You chose VERSION 1.1 as working dataset. 
The following Classification/ Prediction will be therefore based on reduced Dataset with only Ratios as predictive Features.


### (2) Choose which method you want to have the features been pre-selected with
SELECTON  = RF ; Features pre-selected with Random Forest Classifier                                                           
SELECTION = PCA; Features pre-selected with Principal Component Analysis (PCA)

In [3]:
### Choose whether you want the datasets with features selected with RF or PCA
SELECTION = 'PCA'
"""
INSERT 'RF' OR 'PCA'
"""


# Define sel_state variable for easier printing out    
if SELECTION == 'RF':
    sel_feat = 'Random Forest (RF)'
elif SELECTION == 'PCA':
    sel_feat = 'Principal Component Analysis (PCA)'
else: raise ValueError('SELECTION must be either RF or PCA')
briefing = ('You chose dataset VERSION '+str(VERSION)+' and SELECTION method '+str(SELECTION)+'.'+'\n'+'Features therefore pre-selected with '+str(sel_feat)+'.')
#print(sel_feat)
print('You chose SELECTION method '+str(sel_feat)+'.')

You chose SELECTION method Principal Component Analysis (PCA).


### (3) SUMMARY OF SETTINGS

In [4]:
print(briefing, '\n')
print('VERSION '+str(VERSION)+' is '+str(sel_version),'\n')
print('You are now done with the Settings. You can run the whole Code now by Default.')

You chose dataset VERSION 1.1 and SELECTION method PCA.
Features therefore pre-selected with Principal Component Analysis (PCA). 

VERSION 1.1 is based on reduced Dataset with only Ratios as predictive Features. 

You are now done with the Settings. You can run the whole Code now by Default.


## 3.1. Preparation

### 3.1.1. Import the Response Vector and the Feature Matrix

In [5]:
# import Data (already splitted to train/test-data and selected features-> bc_randomforest_feature_selection)
if VERSION == 1.1:
    if SELECTION == 'RF':
        X_train_s = pd.read_csv('Data/generated_splits/features_selected_randomforest/X1_train_f.csv', sep=',')
        X_test_s = pd.read_csv('Data/generated_splits/features_selected_randomforest/X1_test_f.csv', sep=',')
        y_train_s = pd.read_csv('Data/generated_splits/features_selected_randomforest/y1_train_f.csv', sep=',')
        y_test_s = pd.read_csv('Data/generated_splits/features_selected_randomforest/y1_test_f.csv', sep=',')
    elif SELECTION == 'PCA':
        X_train_s = pd.read_csv('Data/generated_splits/features_selected_pca/X1_train_p.csv', sep=',', header=0)
        X_test_s = pd.read_csv('Data/generated_splits/features_selected_pca/X1_test_p.csv', sep=',', header=0)
        y_train_s = pd.read_csv('Data/generated_splits/features_selected_pca/y1_train_p.csv', sep=',', header=0)
        y_test_s = pd.read_csv('Data/generated_splits/features_selected_pca/y1_test_p.csv', sep=',', header=0)
elif VERSION == 2.1:
    if SELECTION == 'RF':
        X_train_s = pd.read_csv('Data/generated_splits/features_selected_randomforest/X2_train_f.csv', sep=',', header=0)
        X_test_s = pd.read_csv('Data/generated_splits/features_selected_randomforest/X2_test_f.csv', sep=',', header=0)
        y_train_s = pd.read_csv('Data/generated_splits/features_selected_randomforest/y2_train_f.csv', sep=',', header=0)
        y_test_s = pd.read_csv('Data/generated_splits/features_selected_randomforest/y2_test_f.csv', sep=',', header=0)
    elif SELECTION == 'PCA':
        X_train_s = pd.read_csv('Data/generated_splits/features_selected_pca/X2_train_p.csv', sep=',', header=0)
        X_test_s = pd.read_csv('Data/generated_splits/features_selected_pca/X2_test_p.csv', sep=',', header=0)
        y_train_s = pd.read_csv('Data/generated_splits/features_selected_pca/y2_train_p.csv', sep=',', header=0)
        y_test_s = pd.read_csv('Data/generated_splits/features_selected_pca/y2_test_p.csv', sep=',', header=0)
else: raise ValueError('_VERSION_ value must be either 1.1 or 2.1, _SELECTION_ must be either RF or PCA')   

### 3.1.2. Print out Shape and Form of Feature Matrix and Response Vector


### Train Set

In [6]:
# print status
print('Features Selected with ' + str(sel_feat))
print('Version ' + str(VERSION) + '; ' + str(sel_version),'\n')

# print properties and head of datasets
print('Shape (rows, columns) of Feature Matrix X (Train) ' + '=' + str(X_train_s.shape),'\n')
print('Feature Matrix X (Train) with Selected Features')
display(X_train_s[0:3])
print("")
print('Response Vector y (Train) after Feature Selection')
display(y_train_s[0:3])
print("")

Features Selected with Principal Component Analysis (PCA)
Version 1.1; based on reduced Dataset with only Ratios as predictive Features. 

Shape (rows, columns) of Feature Matrix X (Train) =(2836, 20) 

Feature Matrix X (Train) with Selected Features


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,-1.992491,-3.854058,-1.358215,0.890327,-1.684515,0.354582,0.985876,-1.014961,-0.450949,-1.154523,-0.575385,1.231744,-0.001642,0.184264,0.428175,-0.200128,-0.589276,1.031207,-0.245835,0.915196
1,3.623365,1.688617,-3.632058,1.759622,-2.073377,-0.236955,0.596057,-0.452289,0.272625,-0.160948,0.767959,-0.566013,0.264339,-1.212195,-0.071803,-0.456456,0.379755,-0.558656,-0.338545,0.588471
2,3.970356,-6.469188,0.335661,-1.871439,0.619278,-1.121073,-0.659667,-0.419784,1.375519,0.225101,-0.627626,1.031381,-2.009648,0.295378,-1.021045,-0.206973,-1.234521,1.231399,0.074544,-1.060889



Response Vector y (Train) after Feature Selection


Unnamed: 0,0
0,0
1,1
2,0





### Test Set

In [7]:
# print status
print('Features Selected with ' + str(sel_feat))
print('Version ' + str(VERSION) + '; ' + str(sel_version),'\n')

# print properties and head of datasets
print('Shape (rows, columns) of Feature Matrix X (Test) ' + '=' + str(X_test_s.shape),'\n')
print('Feature Matrix X (Train) with Selected Features')
display(X_test_s[0:3])
print("")
print('Response Vector y (Test) after Feature Selection')
display(y_test_s[0:3])
print("")

Features Selected with Principal Component Analysis (PCA)
Version 1.1; based on reduced Dataset with only Ratios as predictive Features. 

Shape (rows, columns) of Feature Matrix X (Test) =(710, 20) 

Feature Matrix X (Train) with Selected Features


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,-4.398996,-0.251344,1.228421,1.32904,1.541763,-0.887472,-0.672034,0.404906,-0.127137,1.289256,0.782215,-0.317004,-0.155777,1.665708,-0.545484,-0.595885,-0.311746,0.256147,-0.102956,0.596696
1,4.547781,-2.110889,1.2803,0.45822,0.2571,0.256832,-0.347967,-0.575476,-0.683411,0.455061,0.746221,0.845377,1.345798,0.91998,-0.666636,-0.541558,-0.264367,0.821406,-0.328575,-0.660287
2,-2.589346,-3.531489,-0.962613,0.668408,-1.82295,0.761607,1.144378,-0.859776,0.626761,-1.481275,-0.90521,-0.126703,-0.411628,0.844952,-0.330264,0.022627,-0.027291,0.077469,0.641487,-0.394016



Response Vector y (Test) after Feature Selection


Unnamed: 0,0
0,1
1,0
2,1





## 3.2. Simple Logistic Regression (statsmodels) (SLM)

### 3.2.1. Preparation and fitting (on Training Set) , define BEST FEATURE (SLM)

In [8]:
# best feature selection taking the column name
colNms = X_train_s.columns.values
display(colNms[0])

'0'

In [9]:
# print status
print('Features Selected with ' + str(sel_feat))
print('Version ' + str(VERSION) + '; ' + str(sel_version),'\n')

# Run Simple Logistic Regression
# Logistic Regression (with the most important feature from feature selection)
# Assign "best_feature" to matrix X and response to y, acording to chosen Version of datasets and feature Selection method
pca_error = ('ERROR: PCA best feature not defined (PCA is unlabeled)! Thus no Simple Regression available.'+'\n'+'Proceed to Multiple Regression in chapter 3.3.')
if VERSION == 1.1:
    if SELECTION == 'RF':
        best_feature = colNms[0]
        logReg = sm.Logit(endog = y_train_s, exog= sm.add_constant(X_train_s[best_feature])).fit()
    elif SELECTION == 'PCA':
        print(pca_error)
elif VERSION == 2.1:
    if SELECTION == 'RF':
        best_feature = colNms[0]
        logReg = sm.Logit(endog = y_train_s, exog= sm.add_constant(X_train_s[best_feature])).fit()
    elif SELECTION == 'PCA':
        print(pca_error)
else: raise ValueError('VERSION value must be either 1 or 2, SELECTION must be either RF or PCA')

Features Selected with Principal Component Analysis (PCA)
Version 1.1; based on reduced Dataset with only Ratios as predictive Features. 

ERROR: PCA best feature not defined (PCA is unlabeled)! Thus no Simple Regression available.
Proceed to Multiple Regression in chapter 3.3.


### 3.2.2. Summary (SLM)

In [10]:
# Workaround solution for error ("AttributeError: module 'scipy.stats' has no attribute 'chisqprob'")
from scipy import stats
stats.chisqprob = lambda chisq, df: stats.chi2.sf(chisq, df)

In [11]:
# LogReg with only one feature as exogen variable
if SELECTION == 'RF':
    print(logReg.summary(),'\n')
    print('logReg pvalues: '+'\n')
    print(logReg.pvalues)
elif SELECTION == 'PCA':
    print(pca_error)
else: raise ValueError('VERSION value must be either 1 or 2, SELECTION must be either RF or PCA')

ERROR: PCA best feature not defined (PCA is unlabeled)! Thus no Simple Regression available.
Proceed to Multiple Regression in chapter 3.3.


### 3.2.3. Assessing Output (SLM)

### Hypothesis testing / Confidence Interval

In [12]:
significance_level = 0.01

if SELECTION == 'RF':
    print(str(int(100 - significance_level*100)) + '% Confidence Interval (Significance Level ' 
          + str(int(significance_level*100)) + '%)')
    display(logReg.conf_int(alpha=significance_level))
elif SELECTION == 'PCA':
    print(pca_error)
else: raise ValueError('VERSION value must be either 1 or 2, SELECTION must be either RF or PCA')

ERROR: PCA best feature not defined (PCA is unlabeled)! Thus no Simple Regression available.
Proceed to Multiple Regression in chapter 3.3.


### 3.2.2. Prediction (SML)

### I: In-sample Prediction of probability for returns going UP in the next period (predict y_train)


#### Prediction of whole Response Vector (Train) based on all available values of the single chosen feature (Train)

In [13]:
# print status
print('Features Selected with ' + str(sel_feat))
print('Version ' + str(VERSION) + '; ' + str(sel_version),'\n')

# X must include 1 in first column for intercept
# we wish to get the probability of 'UP' (=1) for the whole test set
if SELECTION == 'RF':
    pred_train_all = logReg.predict(sm.add_constant(X_train_s[best_feature]))
    print('Predicted probability of price going UP for whole Feature Train Set is: ')
    display(pred_train_all.head(3))
    print('Actual Response Vector y_train is:')
    display(y_train_s.head(3))
elif SELECTION == 'PCA':
    print(pca_error)
else: raise ValueError('VERSION value must be either 1 or 2, SELECTION must be either RF or PCA')

Features Selected with Principal Component Analysis (PCA)
Version 1.1; based on reduced Dataset with only Ratios as predictive Features. 

ERROR: PCA best feature not defined (PCA is unlabeled)! Thus no Simple Regression available.
Proceed to Multiple Regression in chapter 3.3.


### ii: New-sample Prediction of probability for returns going UP in the next period (predict y_test)

#### ONE prediction For ONE specific chosen value of predictive variable


In [14]:
# print status
print('Features Selected with ' + str(sel_feat))
print('Version ' + str(VERSION) + '; ' + str(sel_version),'\n')
print('Chosen best feature = ' + str(best_feature))


# Loop for Version differenciation
# X must include 1 in first column for intercept
# we wish to get the probability of 'UP' (=1) for a best_feature_value of USD 15
if VERSION == 1.1:
    if SELECTION == 'RF':
        #### Choose value fo chosen best feature manually
        # Here: chosen median of feature values + /3 of median of feature values
        bfv = np.median(X_train_s[best_feature])+ np.median(X_train_s[best_feature])/3
        #bfv = np.mean(X_train_s[best_feature])
        #bfv = np.median(X_train_s[best_feature])
        best_feature_value = bfv
        print('Chosen value of best feature = ' + str(best_feature_value),'\n')
        pred_test_one = logReg.predict([1, best_feature_value])
        ratio_response_train = y_train_s.sum() / y_train_s.size
        print('Predicted probability of price going UP with chosen ' + str(best_feature) + ' value is: '
              + str("%.4f" % round(float(pred_test_one*100),4)) + '%'+'\n')
        print('Actual Ratio of "UP" (Train)  =  ' + str("%.4f" % round(float(ratio_response_train*100),4)) + '%')
    elif SELECTION == 'PCA':
        print(pca_error)  
elif VERSION == 2.1:
    if SELECTION == 'RF':
        #### Choose value fo chosen best feature manually
        # Here: chosen median of feature values + /3 of median of feature values
        bfv = np.median(X_train_s[best_feature])+ np.median(X_train_s[best_feature])/3
        #bfv = np.mean(X_train_s[best_feature])
        #bfv = np.median(X_train_s[best_feature])
        best_feature_value = bfv
        print('Chosen value of best feature = ' + str(best_feature_value),'\n')
        pred_test_one = logReg.predict([1, best_feature_value])
        ratio_response_train = y_train_s.sum() / y_train_s.size
        print('Predicted probability of price going UP with chosen ' + str(best_feature) + ' value is: '
              + str("%.4f" % round(float(pred_test_one*100),4)) + '%')
        print('Ratio of "UP" (Train)  =  ' + str("%.4f" % round(float(ratio_response_train*100),4)) + '%')
    elif SELECTION == 'PCA':
        print(pca_error)
else: raise ValueError('VERSION value must be either 1 or 2, SELECTION must be either RF or PCA.')

Features Selected with Principal Component Analysis (PCA)
Version 1.1; based on reduced Dataset with only Ratios as predictive Features. 



NameError: name 'best_feature' is not defined

#### Prediction of whole Response Vector (Test) based on all available values of the single chosen feature (Test)

In [None]:
# print status
print('Features Selected with ' + str(sel_feat))
print('Version ' + str(VERSION) + '; ' + str(sel_version),'\n')

# we wish to get the probability of 'UP' (=1) for the whole test set
if SELECTION == 'RF':
    pred_test_all = logReg.predict(sm.add_constant(X_test_s[best_feature]))
    print('Predicted probability of price going UP for whole Feature Test Set is: '+'\n')
    display(pred_test_all.head(3))
    print('Actual Response Vector y_train is:')
    display(y_test_s.head(3))
elif SELECTION == 'PCA':
    print(pca_error)
else: raise ValueError('VERSION value must be either 1 or 2, SELECTION must be either RF or PCA')    

### 3.2.3. Plot Results for Training Set (SLM)

In [None]:
# print status
print('Features Selected with ' + str(sel_feat))
print('Version ' + str(VERSION) + '; ' + str(sel_version),'\n')
print('Chosen best feature = ' + str(best_feature))

display(logReg.predict())
# Plot scatter and log.Reg
if SELECTION == 'RF':
    
    # Transfer best_feature column an prediction for response vector in a newly made dataframe "res"
    res = pd.DataFrame()
    res['best_feature'] = X_train_s[best_feature]
    res['probability'] = logReg.predict()
    # Sort results by values of the best_feature column
    res = res.sort_values('best_feature')

    # Plot
    plt.figure(figsize =(8,5))
    plt.title(str(best_feature) + ' vs. probability of returns going UP in the next period');
    plt.scatter(X_train_s[best_feature], y_train_s, marker ='.')
    plt.plot(res.best_feature, res.probability, c = 'k')
    plt.axhline(y=0, color = "gray", linestyle = "dashed")
    plt.axhline(y=1, color = "gray", linestyle = "dashed")
    plt.ylabel("Probability of UP (=1)", fontsize =12)
    plt.xlabel(str(best_feature), fontsize =12)
    
elif SELECTION == 'PCA':
    print(pca_error)
    
else: raise ValueError('VERSION value must be either 1 or 2, SELECTION must be either RF or PCA')  

###   
###   
## 3.3. Multiple Logistic Regression with n pre-selected features (MLR1)

### 3.3.1. Preparation and fitting (on Training Set) (MLR1)

In [None]:
## Multiple Log. Regression (with all n best features chosen in Chapter 2 in the feature selection process)
logReg_m = sm.Logit(endog = y_train_s, exog = sm.add_constant(X_train_s)).fit() 

### 3.3.2. Summary (MLR1)

In [None]:
# print status
print('Features Selected with ' + str(sel_feat))
print('Version ' + str(VERSION) + '; ' + str(sel_version),'\n')
print("Multiple Logistic Regression with all selected features"+'\n')

# Workaround solution for error ("AttributeError: module 'scipy.stats' has no attribute 'chisqprob'")
from scipy import stats
stats.chisqprob = lambda chisq, df: stats.chi2.sf(chisq, df)

# Run Multiple Logistic Regression
print(logReg_m.summary().tables[0])
print(logReg_m.summary().tables[1])

### 3.3.3. Assessing Output (MLR1)

### Hypothesis testing / Confidence Interval

In [None]:
significance_level = 0.01

# Print Confidence Interval with Title
print(str(int(100 - significance_level*100)) + '% Confidence Interval (Significance Level ' 
      + str(int(significance_level*100)) + '%)')
display(logReg_m.conf_int(alpha=significance_level))

### 3.3.4. Prediction (MLR1)
Multiple Logistic Regression 1 (pre-selected features with RandomForest in Chapter 2) 

### I: In-sample Prediction of probability for returns going UP in the next period (predict y_train)

#### For whole Training Set


In [None]:
# print status
print('Features Selected with ' + str(sel_feat))
print('Version ' + str(VERSION) + '; ' + str(sel_version),'\n')

# Get the probability of 'UP' (=1) for the whole training set
pred_train_all = logReg_m.predict(sm.add_constant(X_train_s))

# Print Prediction and Response Vector, with Title
print('Predicted probabilities of price going UP for whole Feature Set (Train) are: ')
display(pred_train_all[0:3])
print("")
print('Response Vector (Train): ')
display(y_train_s[0:3])

### II: New-sample Prediction of probability for returns going UP in the next period (predict y_test)

#### For whole Test Set


In [None]:
# print status
print('Features Selected with ' + str(sel_feat))
print('Version ' + str(VERSION) + '; ' + str(sel_version),'\n')

# Get the probability of 'UP' (=1) for the whole test set
pred_test_all = logReg_m.predict(sm.add_constant(X_test_s))

# Print Prediction and Response Vector, with Title
print('Predicted probability of price going UP for whole Feature Set (Test) is: ')
display(pred_test_all.head(3))
print("")
print('Response Vector (Test): ')
display(y_test_s.head(3))

###   
###   
## 3.4. Multiple Logistic Regression with only significant features (MLR2)

Apply an other multiple logistic regression on a transformed dataset with only all significant values from LogReg_m (above)

### 3.4.1. Extract significant features (MLR2)

In [None]:
# extracting significant features with an alpha-boundery of 0.05
condition = np.where(logReg_m.pvalues < 0.05)
print('"Condition" array with index starting with intercept:')
display(condition)

# Subtract '1' fom every entry in the numpy array "condition", because in logReg index 0 starts with the intercept
# but in the X_train_s set the index 0 starts with the first feature already
# otherwise there would be an error "index out of range" when we would try to apply the condition to the X_train_s set
condition_sub = np.subtract(condition, 1)
print('"Condition" array with index starting with the first feature, leaving intercept out of the counting:')
display(condition_sub)

# extract significant features
sign_features = X_train_s.columns.values[condition_sub].tolist()[0]
print('Features that were significant in the previous MLR in chapter 3.3.:')
print(sign_features)

### 3.4.2. Preparation and fitting (on Training Set) (MLR2)

In [None]:
## Multiple Log. Regression (with significant features from logreg above)
# Assign features to X and response vector y

if SELECTION == 'RF':
    logReg_mm = sm.Logit(endog = y_train_s, exog=sm.add_constant(X_train_s[sign_features])).fit()
elif SELECTION == 'PCA':
    logReg_mm = sm.Logit(endog = y_train_s, exog=sm.add_constant(X_train_s[sign_features])).fit()
else: raise ValueError('SELECTION must be either RF or PCA')

### 3.4.3. Summary (MLR2)

In [None]:
if SELECTION == 'RF':
    print("Multiple Logistic Regression with selected significant features"+'\n')
    print(logReg_mm.summary().tables[0])
    print(logReg_mm.summary().tables[1])
elif SELECTION == 'PCA':
    print("Multiple Logistic Regression with selected significant features"+'\n')
    print(logReg_mm.summary().tables[0])
    print(logReg_mm.summary().tables[1])
else: raise ValueError('SELECTION must be either RF or PCA')

### 3.4.4. Assessing Output (MLR2)

### Hypothesis testing / Confidence Interval

In [None]:
significance_level = 0.01

# Print Confidence Interval with Title
print(str(int(100 - significance_level*100)) + '% Confidence Interval (Significance Level ' 
      + str(int(significance_level*100)) + '%)')
display(logReg_mm.conf_int(alpha=significance_level))

### 3.4.5. Prediction (MLR2)
Multiple Logistic Regression 1 (pre-selected features with RandomForest in Chapter 2) 

### I: In-sample Prediction of probability for returns going UP in the next period (predict y_train)

#### For whole Training Set


In [None]:
# print status
print('Features Selected with ' + str(sel_feat))
print('Version ' + str(VERSION) + '; ' + str(sel_version),'\n')

# Get the probability of 'UP' (=1) for the whole training set
pred_train_all = logReg_mm.predict(sm.add_constant(X_train_s[sign_features]))

# Print Prediction and Response Vector, with Title
print('Predicted probabilities of price going UP for whole Feature Set (Train) are: ')
display(pred_train_all[0:3])
print("")
print('Response Vector (Train): ')
display(y_train_s[0:3])

### II: New-sample Prediction of probability for returns going UP in the next period (predict y_test)

#### For whole Test Set


In [None]:
# print status
print('Features Selected with ' + str(sel_feat))
print('Version ' + str(VERSION) + '; ' + str(sel_version),'\n')

# Get the probability of 'UP' (=1) for the whole test set
pred_test_all = logReg_mm.predict(sm.add_constant(X_train_s[sign_features]))

# Print Prediction and Response Vector, with Title
print('Predicted probability of price going UP for whole Feature Set (Test) is: ')
display(pred_test_all.head(3))
print("")
print('Response Vector (Test): ')
display(y_test_s.head(3))