# Multicolinearity tests
Here we are going to mensure the severity of multicollinearity in our models. Our model base functions are described below:
$$
\begin{equation*}
\begin{split}
Lin(j) & = \theta_0 + \theta_1 p_j + \theta_2 q_j + \theta_3 r_j\\
Sq(j) & = Lin(j) + \theta_4 p_j^2 + \theta_5 q_j^2 + \theta_6 r_j^2 + \theta_7 pq\\
Cub(j) & = Sq(j) + \theta_8 p_j^3 + \theta_{9} q_j^3 + \theta_{10} r_j^3 + \theta_{11} p^2q + \theta_{12} pq^2 + \theta_{13} (pq)^2\\
Qua(j) & = Cub(j) + \theta_{14} p_j^4 + \theta_{15} q_j^4 + \theta_{16} r_j^4 + \theta_{17} p^3q + \theta_{18} pq^3 + \theta_{19} (pq)^3\\
\end{split}
\end{equation*}
$$

In [1]:
import pandas as pd

In [2]:
data_set = pd.read_csv('score-distribution-2.csv', names=['p', 'q', 'r', 'score'])
data_set.head()

Unnamed: 0,p,q,r,score
0,7955,32,2643,0.032182
1,20005,2,3085,0.046431
2,132,1,3092,0.040886
3,36,4,3100,0.039235
4,9887,1,3119,0.040176


## Convert from seconds to minutes/hours
I suspect that negative VIF values of $p^2$ and $q^2$ came from computational precision. So converting the values might solve it.

In [3]:
data_set['p'] = data_set['p'] / 3600
data_set['r'] = data_set['r'] / 3600
data_set.head()

Unnamed: 0,p,q,r,score
0,2.209722,32,0.734167,0.032182
1,5.556944,2,0.856944,0.046431
2,0.036667,1,0.858889,0.040886
3,0.01,4,0.861111,0.039235
4,2.746389,1,0.866389,0.040176


## Linear

In [4]:
from statsmodels.stats.outliers_influence import variance_inflation_factor

In [5]:
# the independent variables set
X = data_set[['p', 'q', 'r']]

def vif_test(X):
    # VIF dataframe
    vif_data = pd.DataFrame()
    vif_data["feature"] = X.columns

    # calculating VIF for each feature
    vif_data["VIF"] = [variance_inflation_factor(X.values, i)
                              for i in range(len(X.columns))]
    print(vif_data)

vif_test(X)

  feature       VIF
0       p  1.328593
1       q  1.287783
2       r  1.216593


## Quadratic

In [6]:
X['p^2'] = X['p']**2
X['q^2'] = X['q']**2
X['r^2'] = X['r']**2
X['pq'] = X['p']*X['q']
vif_test(X)

  feature       VIF
0       p  3.753965
1       q  9.384029
2       r  5.046083
3     p^2  2.520770
4     q^2  8.143353
5     r^2  4.029256
6      pq  3.764781


## Cubic

In [7]:
X['p^3'] = X['p']**3
X['q^3'] = X['q']**3
X['r^3'] = X['r']**3
X['p^2q'] = X['p']**2 * X['q']
X['pq^2'] = X['p'] * X['q']**2
X['(pq)^2'] = (X['p']*X['q'])**2
vif_test(X)

   feature         VIF
0        p   12.332851
1        q   37.313305
2        r   19.957551
3      p^2   45.999222
4      q^2  267.415940
5      r^2   72.771584
6       pq  125.382358
7      p^3   26.051335
8      q^3  152.387636
9      r^3   31.637796
10    p^2q   67.857908
11    pq^2  174.528194
12  (pq)^2   88.215992


## Quartic

In [8]:
X['p^4'] = X['p']**4
X['q^4'] = X['q']**4
X['r^4'] = X['r']**4
X['p^3q'] = X['p']**3 * X['q']
X['pq^3'] = X['p'] * X['q']**3
X['(pq)^3'] = (X['p']*X['q'])**3
vif_test(X)

   feature           VIF
0        p     40.817033
1        q    104.123457
2        r     56.524822
3      p^2    430.451124
4      q^2   2841.919021
5      r^2    623.116102
6       pq    545.328957
7      p^3   1301.603746
8      q^3  10583.406425
9      r^3   1693.612845
10    p^2q    863.787873
11    pq^2   2148.213272
12  (pq)^2   1365.560786
13     p^4    448.833803
14     q^4   3585.263718
15     r^4    528.991699
16    p^3q    109.862027
17    pq^3    519.405391
18  (pq)^3    176.758730
