# Estimador de desviación estándar *leave-one-out (LOO)* 

Considere un conjunto de datos $(x_{1i}, \ldots, x_{6i})$ y $y_i$, donde $i = 1, \ldots, n$. A partir de este conjunto, es posible estimar el modelo de regresión lineal:

$$ y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \beta_3 x_{3i}+ \beta_4 x_{4i}+ \beta_5 x_{5i}+ \beta_6 x_{6i}$$ 

el cual puede expresarse en forma matricial como $y = X\beta$, donde $\beta$ corresponde al vector de parámetros del modelo.


In [47]:
import numpy as np
import statsmodels.formula.api as sm
import pandas as pd

In [60]:
n = 50

In [61]:
x3 = np.random.randint(0, 20, (n, 3))

In [62]:
x4 = 10*np.random.random((n, 3))

In [63]:
X = np.column_stack((np.ones(n), x3, x4))
X.shape

(50, 7)

In [64]:
beta = np.array([1., 0.3, 0.5, 0.7, 2.1, 1.5, 1.8]).reshape((7, 1))
beta

array([[1. ],
       [0.3],
       [0.5],
       [0.7],
       [2.1],
       [1.5],
       [1.8]])

In [65]:
y = np.matmul(X, beta) + 5*(np.random.random((n, 1)) - 1)
y

array([[34.82125672],
       [44.80589205],
       [31.08735857],
       [37.99315444],
       [21.41242039],
       [47.3197688 ],
       [26.7119935 ],
       [32.95108779],
       [47.69022077],
       [35.34568792],
       [45.11883709],
       [40.07752994],
       [22.12878336],
       [37.33331689],
       [40.72323415],
       [43.22083647],
       [29.71760437],
       [39.79408285],
       [45.29352448],
       [14.33655517],
       [33.4533468 ],
       [44.46526101],
       [18.44277459],
       [30.5968086 ],
       [41.19098272],
       [34.20199091],
       [30.63295535],
       [29.99844424],
       [45.96918599],
       [28.75448255],
       [38.9279507 ],
       [32.10801135],
       [13.39371941],
       [39.63161144],
       [57.12383568],
       [45.04025704],
       [30.47677385],
       [17.44466137],
       [44.69783568],
       [42.23580397],
       [21.57174446],
       [39.13058875],
       [36.78494408],
       [42.99411474],
       [48.99101108],
       [23

In [66]:
data = np.column_stack((X, y))
data

array([[1.00000000e+00, 1.60000000e+01, 1.00000000e+00, 3.00000000e+00,
        5.71817495e+00, 5.74296768e+00, 3.62430396e+00, 3.48212567e+01],
       [1.00000000e+00, 5.00000000e+00, 1.90000000e+01, 0.00000000e+00,
        7.21603202e+00, 3.05135780e+00, 9.01730965e+00, 4.48058921e+01],
       [1.00000000e+00, 1.10000000e+01, 3.00000000e+00, 5.00000000e+00,
        5.35482202e+00, 5.25215933e-01, 8.02328755e+00, 3.10873586e+01],
       [1.00000000e+00, 1.90000000e+01, 1.80000000e+01, 0.00000000e+00,
        1.50374100e+00, 3.07021641e+00, 9.53886385e+00, 3.79931544e+01],
       [1.00000000e+00, 1.90000000e+01, 1.30000000e+01, 0.00000000e+00,
        2.81754392e+00, 3.08919454e+00, 4.85237186e-01, 2.14124204e+01],
       [1.00000000e+00, 3.00000000e+00, 0.00000000e+00, 1.10000000e+01,
        6.93762736e+00, 8.65858590e+00, 8.20375946e+00, 4.73197688e+01],
       [1.00000000e+00, 1.30000000e+01, 1.90000000e+01, 3.00000000e+00,
        1.52413420e+00, 6.58181466e-01, 4.70675807e+00, 2.

In [67]:
np.savetxt('data.txt', data)
np.savetxt('data.csv', data, delimiter=',')
np.save('data', data)

In [68]:
df = pd.DataFrame(data, columns=['1s','x1','x2','x3','x4','x5','x6','y'])
df = df.iloc[:, 1:]
df

Unnamed: 0,x1,x2,x3,x4,x5,x6,y
0,16.0,1.0,3.0,5.718175,5.742968,3.624304,34.821257
1,5.0,19.0,0.0,7.216032,3.051358,9.01731,44.805892
2,11.0,3.0,5.0,5.354822,0.525216,8.023288,31.087359
3,19.0,18.0,0.0,1.503741,3.070216,9.538864,37.993154
4,19.0,13.0,0.0,2.817544,3.089195,0.485237,21.41242
5,3.0,0.0,11.0,6.937627,8.658586,8.203759,47.319769
6,13.0,19.0,3.0,1.524134,0.658181,4.706758,26.711993
7,13.0,0.0,0.0,9.663613,3.302648,2.049603,32.951088
8,2.0,8.0,15.0,8.86207,0.699296,9.240629,47.690221
9,4.0,12.0,11.0,9.520994,1.558143,0.291961,35.345688


In [69]:
result = sm.ols(formula='y ~ x1 + x2 + x3 + x4 + x5 + x6', data=df).fit()
print(result.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.982
Model:                            OLS   Adj. R-squared:                  0.979
Method:                 Least Squares   F-statistic:                     390.5
Date:                Thu, 16 Jan 2020   Prob (F-statistic):           8.07e-36
Time:                        18:02:13   Log-Likelihood:                -87.836
No. Observations:                  50   AIC:                             189.7
Df Residuals:                      43   BIC:                             203.1
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.6917      0.943     -0.733      0.4

In [70]:
result.params

Intercept   -0.691715
x1           0.285130
x2           0.473372
x3           0.671312
x4           2.074063
x5           1.527403
x6           1.733444
dtype: float64

In [71]:
beta

array([[1. ],
       [0.3],
       [0.5],
       [0.7],
       [2.1],
       [1.5],
       [1.8]])