### Super Simple Script on Multivariate Linear Regression ###

Multi-variate regression just means you regress one variable against a whole range of other variables.

A good example in finance is the Arbitrage Pricing Theory (APT). 
Compared to CAPM, APT basically assumes that the stock return is only dependent on one factor, and so the regression there was a single variate linear regression. 

The APT goes a step further, and assumes that the stock return can come from multiple factors, and uses a multivariate linear regression model, as follows -

$$ E[R_i] = \alpha_i + \beta_{i, 1}F_1 + ... + \beta_{i, n}F_n $$

where $R_i$ is the stock's returns, and the subsequent terms are the factors to be regressed against. 

Our objective is to find the values of $\alpha_i$ and the various $\beta$s

Let's first create a random set of values

In [1]:
import numpy as np
import pandas as pd

num_periods = 100
num_factors = 8

all_values = np.array([np.random.random(num_factors+1) for i in range(num_periods)])

Just to show how the data looks like in Pandas

In [2]:
multifactor_data = pd.DataFrame(data=all_values[:,0], index=range(num_periods), columns=['StockReturns'])

Let's create factors that are highly correlated with the variable we are predicting - by multiplying each of them by the StockReturns variable.

In [3]:
for i in range(1, num_factors+1):
    label = 'Factor' + str(i)
    multifactor_data[label] = pd.DataFrame(data=all_values[:,i]*all_values[:,0], index=range(num_periods))

In [4]:
multifactor_data.head(10)

Unnamed: 0,StockReturns,Factor1,Factor2,Factor3,Factor4,Factor5,Factor6,Factor7,Factor8
0,0.104148,0.001463,0.055067,0.00668,0.035604,0.024595,0.024984,0.06815,0.042042
1,0.322437,0.006423,0.096316,0.261771,0.290816,0.160277,0.288845,0.244392,0.067183
2,0.807019,0.331176,0.427496,0.121126,0.281866,0.109936,0.077227,0.65856,0.530451
3,0.375065,0.324686,0.08492,0.219045,0.143188,0.152693,0.137051,0.058964,0.169739
4,0.596416,0.047848,0.207737,0.579259,0.554828,0.474295,0.57074,0.400263,0.554946
5,0.690272,0.424033,0.447409,0.120427,0.401322,0.38558,0.210282,0.108569,0.435577
6,0.887546,0.809754,0.122643,0.663487,0.141914,0.04767,0.363444,0.307023,0.522531
7,0.285732,0.153735,0.016488,0.251658,0.156171,0.256327,0.228095,0.254842,0.062886
8,0.464955,0.105131,0.433098,0.285241,0.21287,0.016585,0.162213,0.127166,0.448148
9,0.004636,0.001139,0.001633,0.004559,0.000626,0.000147,0.003393,0.002923,0.00236


In [5]:
y_values = multifactor_data['StockReturns'].values
x_values = multifactor_data.iloc[:,1:].values

In [6]:
import statsmodels.api as sm
x_values = sm.add_constant(x_values) # include an intercept

In [7]:
results = sm.OLS(y_values, x_values).fit()

In [8]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.865
Model:                            OLS   Adj. R-squared:                  0.853
Method:                 Least Squares   F-statistic:                     72.80
Date:                Sun, 16 Sep 2018   Prob (F-statistic):           3.29e-36
Time:                        19:03:06   Log-Likelihood:                 81.026
No. Observations:                 100   AIC:                            -144.1
Df Residuals:                      91   BIC:                            -120.6
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0796      0.022      3.624      0.0

In [9]:
print(results.params)

[0.07963282 0.15905534 0.22078298 0.30905536 0.16536714 0.15993752
 0.19818225 0.24396488 0.22337066]
