# Variance Inflation Factor (VIF)

El VIF mide cuánto la varianza de un coeficiente estimado está inflada debido a la correlación entre las características en un modelo de regresión. Un VIF alto indica que la característica está altamente correlacionada con otras características, lo que sugiere multicolinealidad.

Un VIF mayor de 5 es generalmente considerado un indicador de multicolinealidad problemática, y deberíamos eliminar las características con valores altos de VIF para mejorar la estabilidad del modelo.


***VIF:*** Cuanto más alto es el VIF, mayor es la multicolinealidad entre la característica y las demás.

***Tolerance:*** Un Tolerance bajo indica que la característica está fuertemente colineal con las otras variables.

In [2]:
!pip install statsmodels

Collecting statsmodels
  Downloading statsmodels-0.14.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.2 kB)
Collecting patsy>=0.5.6 (from statsmodels)
  Downloading patsy-0.5.6-py2.py3-none-any.whl.metadata (3.5 kB)
Downloading statsmodels-0.14.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.8/10.8 MB[0m [31m59.5 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hDownloading patsy-0.5.6-py2.py3-none-any.whl (233 kB)
Installing collected packages: patsy, statsmodels
Successfully installed patsy-0.5.6 statsmodels-0.14.4


In [11]:
import pandas as pd
from statsmodels.stats.outliers_influence import variance_inflation_factor
from sklearn.preprocessing import StandardScaler

In [12]:
dataTest = pd.read_csv("subject/Test_knight.csv")
dataTest.head()

Unnamed: 0,Sensitivity,Hability,Strength,Power,Agility,Dexterity,Awareness,Prescience,Reactivity,Midi-chlorien,...,Recovery,Evade,Stims,Sprint,Combo,Delay,Attunement,Empowered,Burst,Grasping
0,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
1,18.25,19.98,119.6,1040.0,0.09463,0.109,0.1127,0.074,0.1794,0.05742,...,22.88,27.66,153.2,1606.0,0.1442,0.2576,0.3784,0.1932,0.3063,0.08368
2,14.68,20.13,94.74,684.5,0.09867,0.072,0.07395,0.05259,0.1586,0.05922,...,19.07,30.88,123.4,1138.0,0.1464,0.1871,0.2914,0.1609,0.3029,0.08216
3,13.54,14.36,87.46,566.3,0.09779,0.08129,0.06664,0.04781,0.1885,0.05766,...,15.11,19.26,99.7,711.2,0.144,0.1773,0.239,0.1288,0.2977,0.07259
4,15.34,14.26,102.5,704.4,0.1073,0.2135,0.2077,0.09756,0.2521,0.07032,...,18.07,19.08,125.1,980.9,0.139,0.5954,0.6305,0.2393,0.4667,0.09946


In [14]:
# Normalizar los datos (opcional, pero recomendado si las escalas son muy diferentes)
scaler = StandardScaler()
data_scaled = scaler.fit_transform(dataTest)

# Crear un DataFrame para las características normalizadas
data_scaled_df = pd.DataFrame(data_scaled, columns=dataTest.columns)
print(data_scaled_df)

     Sensitivity  Hability  Strength     Power   Agility  Dexterity  \
0      -0.740658  0.148792 -0.572707 -0.731155  3.069239   3.139630   
1       1.081470  0.061769  1.048193  0.989469 -0.106314   0.056468   
2       0.129054  0.094402  0.089231  0.054033  0.161688  -0.595774   
3      -0.175078 -1.160906 -0.191591 -0.256990  0.103311  -0.432008   
4       0.305131 -1.182662  0.388569  0.106396  0.734176   1.898609   
..           ...       ...       ...       ...       ...        ...   
166    -1.273424  1.780475 -1.279392 -1.033231 -0.995230  -0.988705   
167     0.105044  0.649175  0.152879 -0.018065 -0.763050   0.479543   
168     1.793781  1.173489  1.950837  1.797286  0.906652   2.076653   
169     0.641278  1.823986  0.612301  0.510831 -0.774991  -0.061641   
170    -1.717084  1.053832 -1.716827 -1.270839 -2.892468  -1.096060   

     Awareness  Prescience  Reactivity  Midi-chlorien  ...  Recovery  \
0     1.891994    1.366958    2.603467       4.306019  ... -0.273193   
1  

In [15]:
# Función para calcular VIF y Tolerance
def calculate_vif(data):
    vif_data = pd.DataFrame()
    vif_data["Feature"] = data.columns
    vif_data["VIF"] = [variance_inflation_factor(data.values, i) for i in range(data.shape[1])]
    vif_data["Tolerance"] = 1 / vif_data["VIF"]
    return vif_data

In [16]:
# Calcular y mostrar el VIF inicial sin eliminar características
vif_data = calculate_vif(data_scaled_df)

# Mostrar el VIF y el Tolerance
print("VIF inicial (sin eliminar características):")
print(vif_data)

VIF inicial (sin eliminar características):
          Feature          VIF  Tolerance
0     Sensitivity  4282.499725   0.000234
1        Hability    18.559728   0.053880
2        Strength  4003.652047   0.000250
3           Power   477.730519   0.002093
4         Agility    13.713180   0.072923
5       Dexterity    57.554993   0.017375
6       Awareness    96.794082   0.010331
7      Prescience    95.728833   0.010446
8      Reactivity     5.860520   0.170633
9   Midi-chlorien    25.555332   0.039131
10          Slash   160.894576   0.006215
11           Push     8.206184   0.121859
12           Pull   133.100410   0.007513
13     Lightsaber    94.559187   0.010575
14       Survival     9.293929   0.107597
15        Repulse    41.250309   0.024242
16     Friendship    33.496386   0.029854
17       Blocking    12.589272   0.079433
18     Deflection     8.977914   0.111384
19           Mass    15.562145   0.064258
20       Recovery  1367.001571   0.000732
21          Evade    31.477268  