### Multicolinealidad
- Last update: 2026-02-10
- Professor: Marvin Padilla
- Teaching Assitant: Facundo Cabral

Razones por las cuales se piensa que hay Multicolinealidad:
- R cuadrado muy alto y no rechazo de hipotesis de significancia individual
- Pequeños cambios en los datos produce importantes variaciones en la estimación OLS
- Los parámetros tiene signos opuestos a los esperados o una maginitud poco creíble

In [2]:
# Librerías 
import pandas as pd
import numpy as np
import wooldridge as woo
import statsmodels.api as sm
import matplotlib.pyplot as plt

In [3]:
# Librerías específicas
from statsmodels.stats.outliers_influence import variance_inflation_factor
from scipy.stats import chi2

In [4]:
# Leer base de datos:
url = "https://github.com/facundocabralvaldivia/econometrics_classes/raw/main/data/gujarati/t10.7_Multicollinearity.csv"
df = pd.read_csv(url, sep=";")

In [5]:
# Creación de variables y matriz de regresores
df['logYd'] = np.log(df['Yd'])
df['logW'] = np.log(df['W'])
X = df[['logYd','logW','I']]
Xc = sm.add_constant(X)
Xc.head()

Unnamed: 0,const,logYd,logW,I
0,1.0,6.94235,8.550011,-10.3509
1,1.0,6.993933,8.571825,-4.7198
2,1.0,6.999057,8.631834,1.0441
3,1.0,7.083975,8.658609,0.4073
4,1.0,7.112327,8.713756,-5.2831


In [6]:
# Creación de variable y matriz de la dependiente
df['logC'] = np.log(df['C'])
y = df['logC']
modelo = sm.OLS(y, Xc).fit()
print(modelo.summary())

                            OLS Regression Results                            
Dep. Variable:                   logC   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 3.783e+04
Date:                Wed, 11 Feb 2026   Prob (F-statistic):           7.12e-84
Time:                        14:16:23   Log-Likelihood:                 164.59
No. Observations:                  54   AIC:                            -321.2
Df Residuals:                      50   BIC:                            -313.2
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.4677      0.043    -10.933      0.0

In [7]:
print(f"R-squared: {modelo.rsquared:.6f}")
print(f"R-squared Adj.: {modelo.rsquared_adj:.6f}")

R-squared: 0.999560
R-squared Adj.: 0.999533


### Test 1: Variance Inflation Factor (VIF)
Si el VIF es mayor a 5, entonces es signo de multicolinealidad

Método 1: VIF bajo regresiones

In [8]:
# Regresión y VIF con respecto a logYd
XlogYd = Xc.drop(columns=['logYd']).copy()
modeloLogYd = sm.OLS(df['logYd'], XlogYd).fit()
RsquaredLogYd = modeloLogYd.rsquared
VIFLogYd = 1/(1-RsquaredLogYd)
VIFLogYd

np.float64(35.02078950304796)

In [9]:
# Regresión y VIF con respecto a logW
XlogW = Xc.drop(columns=['logW']).copy()
modeloLogW = sm.OLS(df['logW'], XlogW).fit()
RsquaredLogW = modeloLogW.rsquared
VIFLogW = 1/(1-RsquaredLogW)
VIFLogW

np.float64(35.56244997528693)

In [10]:
# Regresión y VIF con respecto a I
XI = Xc.drop(columns=['I']).copy()
modeloI = sm.OLS(df['I'], XI).fit()
RsquaredI = modeloI.rsquared
VIFI = 1/(1-RsquaredI)
VIFI

np.float64(1.6276523936140925)

Método 2: Con función VIF

In [11]:
from statsmodels.stats.outliers_influence import variance_inflation_factor as vif

In [12]:
# Tabla de VIFs
VIF = pd.DataFrame()
VIF['Variable'] = Xc.columns
VIF['VIF'] = [vif(Xc, i) for i in range(Xc.shape[1])]
VIF

Unnamed: 0,Variable,VIF
0,const,693.876454
1,logYd,35.02079
2,logW,35.56245
3,I,1.627652


### Test 2: Condition Index
- CI < 10: sin problema
- 10 < CI < 30: multicolinealidad moderada
- 30 < CI: multicolinealidad severa

In [13]:
XtX = X.T @ X
autovalores = np.linalg.eigvals(XtX)
lambdaMax = np.max(autovalores)
lambdaMin = np.min(autovalores)
indiceCondicion = np.sqrt(lambdaMax / lambdaMin)
indiceCondicion

np.float64(143.95248002323854)

### Test 3: Farrar Glauber
##### Test Chi-Cuadrado
H0: Las X's son ortogonales (no hay multicolinealidad)
Ha: Las X's no son ortogonales (hay multicolinealidad)

In [14]:
from scipy.stats import chi2

In [15]:
corrM = X.corr()
detR  = np.linalg.det(corrM)
n = X.shape[0]
k = X.shape[1]

# Alpha
alpha = 0.05
# Degree of Freedom
DoF = k*(k-1)/2
# Chi Cuadrado Crítico
chi2Crit = chi2.ppf(1 - alpha, DoF)
# Chi Cuadrado Calculado
chi2Calc = -(n-1-(2*k+5)/6)*np.log(detR)
# p-value
pValue = 1 - chi2.cdf(chi2Calc, DoF)

FGChiCuadradoTest = pd.DataFrame({
    'Chi Cuadrado Calculado' : [chi2Calc],
    'Chi Cuadrado Crítico' : [chi2Crit],
    'p-value' : [pValue]
}, index=['Resultados']).T
FGChiCuadradoTest



Unnamed: 0,Resultados
Chi Cuadrado Calculado,206.866544
Chi Cuadrado Crítico,7.814728
p-value,0.0


Dado que $X^2_{cal}>X^2_{crit}$ entonces se rechaza H0 y decimos que hay evidencia de multicolinealidad en el modelo.

##### Test F
H0: $X_i$ no es multicolineal
Ha: $X_i$ es multicolineal

Método 1: Generar uno por uno

In [16]:
from scipy.stats import f

In [22]:
FLogYd = (RsquaredLogYd/(1-RsquaredLogYd)) * (n-k)/(k-1)
pValueLogYdF = 1 - f.cdf(FLogYd, n-k, k-1)
FGFTestLogYd = pd.DataFrame({
    'F Calculado' : [FLogYd],
    'p-value' : [pValueLogYdF]
}, index=['Resultados']).T
FGFTestLogYd

Unnamed: 0,Resultados
F Calculado,867.530132
p-value,0.001152


In [21]:
FLogW = (RsquaredLogW/(1-RsquaredLogW)) * (n-k)/(k-1)
pValueLogW = 1 - f.cdf(FLogW, n-k, k-1)
FGFTestLogW = pd.DataFrame({
    'F Calculado' : [FLogW],
    'p-value' : [pValueLogW]
}, index=['Resultados']).T
FGFTestLogW

Unnamed: 0,Resultados
F Calculado,881.342474
p-value,0.001134


In [20]:
FI = (RsquaredI/(1-RsquaredI)) * (n-k)/(k-1)
pValueI = 1 - f.cdf(FI, n-k, k-1)
FGFTestI = pd.DataFrame({
    'F Calculado' : [FI],
    'p-value' : [pValueI]
}, index=['Resultados']).T
FGFTestI

Unnamed: 0,Resultados
F Calculado,16.005136
p-value,0.060496


Método 2: Generar un iterable

In [24]:
RsquaredList = [RsquaredLogYd, RsquaredLogW, RsquaredI]
for i in range(len(RsquaredList)):
    FTest = (RsquaredList[i]/(1-RsquaredList[i])) * (n-k)/(k-1)
    pValue = 1 - f.cdf(FTest, n-k, k-1)
    FGFTest = pd.DataFrame({
        'F Calculado' : [FTest],
        'p-value' : [pValue]
    }, index=['Resultados']).T
    display(FGFTest)

Unnamed: 0,Resultados
F Calculado,867.530132
p-value,0.001152


Unnamed: 0,Resultados
F Calculado,881.342474
p-value,0.001134


Unnamed: 0,Resultados
F Calculado,16.005136
p-value,0.060496


##### Test T - student
H0: Las Xi y Xj no son colineales (generan multicolinealidad)  
Ha: Las Xi y Xj son colineales (generan multicolinealidad)  

In [27]:
from scipy.stats import t

In [29]:
results = []
for i in range(len(X.columns)):
    for j in range(i + 1, len(X.columns)):
        xi = X.columns[i]
        xj = X.columns[j]

        r_ij = X[xi].corr(X[xj])

        t_stat = (r_ij * np.sqrt(n - k)) / np.sqrt(1 - r_ij**2)
        p_value = 2 * (1 - t.cdf(abs(t_stat), n - 2))

        results.append([xi, xj, r_ij, t_stat, p_value])

        
FG_T_Test = pd.DataFrame(
    results,
    columns=['X_i', 'X_j', 'r_ij', 't_stat', 'p_value']
)

FG_T_Test

Unnamed: 0,X_i,X_j,r_ij,t_stat,p_value
0,logYd,logW,0.985618,41.652206,0.0
1,logYd,I,0.613239,5.544265,9.995196e-07
2,logW,I,0.620939,5.657129,6.66152e-07
