## Linear Models

In [1]:
import numpy as np
import pandas as pd

from sklearn.datasets import load_diabetes
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression, Lars, ElasticNet, Lasso, Ridge, BayesianRidge

In [2]:
pd.set_option('display.float_format', lambda x: "{:,.2f}".format(x))

In [4]:
dc_scores = {} # store score models
diabetes = load_diabetes() # dataset
df = pd.DataFrame(data=diabetes["data"], columns=diabetes["feature_names"])
df["target"] = diabetes["target"]

In [5]:
df.head()

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,target
0,0.04,0.05,0.06,0.02,-0.04,-0.03,-0.04,-0.0,0.02,-0.02,151.0
1,-0.0,-0.04,-0.05,-0.03,-0.01,-0.02,0.07,-0.04,-0.07,-0.09,75.0
2,0.09,0.05,0.04,-0.01,-0.05,-0.03,-0.03,-0.0,0.0,-0.03,141.0
3,-0.09,-0.04,-0.01,-0.04,0.01,0.02,-0.04,0.03,0.02,-0.01,206.0
4,0.01,-0.04,-0.04,0.02,0.0,0.02,0.01,-0.0,-0.03,-0.05,135.0


In [6]:
tgt = "target" # y
ls_pred = [x for x in df.columns if x not in [tgt]] # X

X = df[ls_pred]
y = df[tgt]

print(df.shape)

(442, 11)


In [9]:
df.sample(5)

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,target
219,-0.09,-0.04,-0.04,-0.02,-0.07,-0.07,0.01,-0.04,0.0,-0.03,185.0
351,-0.09,0.05,-0.04,-0.03,-0.08,-0.07,-0.01,-0.04,-0.06,-0.04,71.0
79,-0.1,-0.04,-0.04,-0.03,0.0,0.02,0.01,-0.0,-0.07,-0.03,113.0
145,-0.04,-0.04,0.13,0.06,-0.03,-0.03,0.01,-0.04,-0.02,-0.05,259.0
271,0.04,0.05,0.01,0.04,-0.04,-0.02,-0.04,-0.0,-0.02,0.01,127.0


* Target: Is measure of disease progression one year after baseline (Progresión de la enfernedad un año después de inicio)

- Columnas:
  - age:age in years,
  - sex: M/F,
  - bmi: body mass index,
  - bp: average blood pressure,
  - s1: tc, total serum cholesterol,
  - s2: ldl, low-density lipoproteins,
  - s3: hdl, high-density lipoproteins,
  - s4: tch, total cholesterol / HDL,
  - s5: ltg, possibly log of serum triglycerides level,
  - s6: glu, blood sugar level

## Lineal regression

In [11]:
linreg = LinearRegression()
linreg.fit(X, y)

In [12]:
ls_res = cross_val_score( estimator=linreg, X=X, y=y, cv=4, n_jobs=-1, scoring="r2" )
print(ls_res)

[0.37459248 0.49678312 0.50950026 0.55755577]


In [13]:
print( "MEAN: ", np.mean(ls_res))
print( "STD:  ", np.std(ls_res))
print( "Interceot: ", linreg.intercept_)
print( "Coef:      ", linreg.coef_ )

MEAN:  0.4846079075127147
STD:   0.06744006563276969
Interceot:  152.13348416289597
Coef:       [ -10.0098663  -239.81564367  519.84592005  324.3846455  -792.17563855
  476.73902101  101.04326794  177.06323767  751.27369956   67.62669218]


In [14]:
dc_scores.update({str(linreg).split("(")[0]: np.mean(ls_res)})
dc_scores

{'LinearRegression': np.float64(0.4846079075127147)}

## Regresión LARS 

__Least Angle Regression__

El nombre de "Least Angle" ("ángulo anonimo") surge de una interpretación geométrica de este proceso: $uk$ shace el ángulo más pequeño (e igual) con cada uno de los predictores de $Ak$.

In [16]:
larsreg = Lars()
larsreg.fit(X, y)

In [17]:
ls_res = cross_val_score(estimator=larsreg, X=X, y=y, cv=4, n_jobs=-1, scoring="r2")
print(ls_res)

[ 0.37459248  0.50439581  0.50950026 -1.03421759]


In [18]:
print( "MEAN: ", np.mean(ls_res))
print( "STD:  ", np.std(ls_res))
print( "Interceot: ", larsreg.intercept_)
print( "Coef:      ", larsreg.coef_ )

MEAN:  0.0885677401975642
STD:   0.6504910108465571
Interceot:  152.13348416289597
Coef:       [ -10.0098663  -239.81564367  519.84592005  324.3846455  -792.17563855
  476.73902101  101.04326794  177.06323767  751.27369956   67.62669218]


In [19]:
dc_scores.update({str(larsreg).split("(")[0]: np.mean(ls_res)})
dc_scores

{'LinearRegression': np.float64(0.4846079075127147),
 'Lars': np.float64(0.0885677401975642)}

## Regresión Cresta

La idea principal de la regresión cresta consiste en generar una línea que se ajuste a los puntos mostrados, pero evitando el sobreajuste, esto es agregar una pequeña proporción de ruido a la estimación general.

A diferencia de minimos cuadrados, la regresión cresta minimiza el error cuadrático más $\lambda*pendiente^2$, donde $\lambda$ determina la severidad de la penalización y puede tomar valores de $0$ a $\infty$, entre mas grande el valor, la pendiente de la recta generada tiende asintoticamente a $0$. una forma de encontrar el mejor $\lambda$ es usando $cross-validation$.


In [20]:
#Ridge?

In [21]:
ridgereg = Ridge(alpha=0)
ridgereg.fit(X, y)

In [22]:
ls_res = cross_val_score(estimator=ridgereg, X=X, y=y, cv=4, n_jobs=-1, scoring="r2")
print(ls_res)

[0.37459248 0.49678312 0.50950026 0.55755577]


In [23]:
print( "MEAN: ", np.mean(ls_res))
print( "STD:  ", np.std(ls_res))
print( "Interceot: ", ridgereg.intercept_)
print( "Coef:      ", ridgereg.coef_ )

MEAN:  0.48460790751271476
STD:   0.06744006563276939
Interceot:  152.13348416289597
Coef:       [ -10.0098663  -239.81564367  519.84592005  324.3846455  -792.17563855
  476.73902101  101.04326794  177.06323767  751.27369956   67.62669218]


In [24]:
for i in range(0, 1000, 10):
    ridgereg = Ridge(alpha=i)
    ridgereg.fit(X, y)
    ls_res = cross_val_score(estimator=ridgereg, X=X, y=y, cv=4, n_jobs=-1, scoring="r2")
    print(i,"{:,.2%}".format(np.mean(ls_res)),"{:,.2f}".format(np.std(ls_res)))

0 48.46% 0.07
10 12.25% 0.04
20 5.60% 0.04
30 2.84% 0.04
40 1.33% 0.04
50 0.38% 0.04
60 -0.27% 0.05
70 -0.75% 0.05
80 -1.12% 0.05
90 -1.40% 0.05
100 -1.63% 0.05
110 -1.82% 0.05
120 -1.98% 0.05
130 -2.12% 0.05
140 -2.24% 0.05
150 -2.34% 0.05
160 -2.43% 0.05
170 -2.51% 0.05
180 -2.58% 0.05
190 -2.64% 0.05
200 -2.70% 0.05
210 -2.75% 0.05
220 -2.80% 0.05
230 -2.84% 0.05
240 -2.88% 0.05
250 -2.92% 0.05
260 -2.95% 0.05
270 -2.98% 0.05
280 -3.01% 0.05
290 -3.04% 0.05
300 -3.07% 0.05
310 -3.09% 0.05
320 -3.11% 0.05
330 -3.13% 0.05
340 -3.15% 0.05
350 -3.17% 0.05
360 -3.19% 0.05
370 -3.21% 0.05
380 -3.22% 0.05
390 -3.24% 0.05
400 -3.25% 0.05
410 -3.26% 0.05
420 -3.28% 0.05
430 -3.29% 0.05
440 -3.30% 0.05
450 -3.31% 0.05
460 -3.32% 0.05
470 -3.33% 0.05
480 -3.34% 0.05
490 -3.35% 0.05
500 -3.36% 0.05
510 -3.37% 0.05
520 -3.38% 0.05
530 -3.39% 0.05
540 -3.40% 0.05
550 -3.40% 0.05
560 -3.41% 0.05
570 -3.42% 0.05
580 -3.42% 0.05
590 -3.43% 0.05
600 -3.44% 0.05
610 -3.44% 0.05
620 -3.45% 0.05
630 -3.

In [25]:
ridgereg = Ridge(alpha=0.07)
ridgereg.fit(X, y)

In [26]:
print( "Interceot: ", ridgereg.intercept_)
print( "Coef:      ", ridgereg.coef_ )

Interceot:  152.13348416289602
Coef:       [  -0.99570177 -215.4519048   500.35762995  307.48457479 -108.33179939
  -56.91498712 -183.32156567  114.17338294  464.03605979   82.45654586]


In [27]:
dc_scores.update({str(ridgereg).split("(")[0]: np.mean(ls_res)})
dc_scores

{'LinearRegression': np.float64(0.4846079075127147),
 'Lars': np.float64(0.0885677401975642),
 'Ridge': np.float64(-0.03583937996264125)}

## Regresión LASSO

A diferencia de la regresión cresta, la regresión lasso minimiza el error cuadrático más $\lambda * |pendiente|$, donde $\lambda$ determina la severidad de la penalización y puede tomar valores de $0$ a $\infty$, entre mas grande el valor, la pendiente de la recta generada tiende a $0$. Una forma de encontrar el mejor $\lambda$ es usando $cross-validation$.

Este modelo permite eliminar variables innecesarias del conjunto de datos, volviendo los modelos más simples y fáciles de interpretar.

In [28]:
#Lasso?

In [29]:
lassreg = Lasso(alpha=0.029)
lassreg.fit(X, y)

In [30]:
ls_res = cross_val_score(estimator=larsreg, X=X, y=y, cv=4, n_jobs=-1, scoring="r2")
print(ls_res)

[ 0.37459248  0.50439581  0.50950026 -1.03421759]


In [31]:
print( "MEAN: ", np.mean(ls_res))
print( "STD:  ", np.std(ls_res))
print( "Interceot: ", lassreg.intercept_)
print( "Coef:      ", lassreg.coef_ )

MEAN:  0.0885677401975642
STD:   0.6504910108465571
Interceot:  152.13348416289602
Coef:       [  -0.         -211.78335379  524.52050384  305.65345911 -148.66758543
   -0.         -189.02013244   52.12535668  522.19452766   59.57142179]


In [32]:
for i in range(0, 1000, 10):
    lasso = Lasso(alpha=i/10000)
    lasso.fit(X, y)
    ls_res = cross_val_score(estimator = lasso, X=X, y=y, cv=4, n_jobs=-1, scoring="r2")
    print(i/10000, "{:,.2%}".format(np.mean(ls_res)), "{:,.2f}".format(np.std(ls_res)))

  return fit_method(estimator, *args, **kwargs)
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  return fit_method(estimator, *args, **kwargs)
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  return fit_method(estimator, *args, **kwargs)
  return fit_method(estimator, *args, **kwargs)
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  return fit_method(estimator, *args, **kwargs)
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(


0.0 48.46% 0.07
0.001 48.47% 0.07
0.002 48.47% 0.07
0.003 48.47% 0.07
0.004 48.46% 0.07
0.005 48.45% 0.07
0.006 48.46% 0.07
0.007 48.47% 0.07
0.008 48.48% 0.07
0.009 48.49% 0.07
0.01 48.49% 0.07
0.011 48.49% 0.07
0.012 48.49% 0.07
0.013 48.48% 0.07
0.014 48.47% 0.07
0.015 48.47% 0.07
0.016 48.48% 0.07
0.017 48.48% 0.07
0.018 48.48% 0.07
0.019 48.49% 0.07
0.02 48.49% 0.07
0.021 48.50% 0.07
0.022 48.50% 0.07
0.023 48.50% 0.07
0.024 48.51% 0.07
0.025 48.51% 0.07
0.026 48.51% 0.07
0.027 48.51% 0.07
0.028 48.51% 0.07
0.029 48.51% 0.07
0.03 48.50% 0.07
0.031 48.50% 0.07
0.032 48.50% 0.07
0.033 48.50% 0.07
0.034 48.50% 0.07
0.035 48.49% 0.07
0.036 48.49% 0.07
0.037 48.49% 0.07
0.038 48.48% 0.07
0.039 48.48% 0.07
0.04 48.48% 0.07
0.041 48.47% 0.07
0.042 48.47% 0.07
0.043 48.47% 0.07
0.044 48.47% 0.07
0.045 48.46% 0.07
0.046 48.46% 0.07
0.047 48.46% 0.07
0.048 48.46% 0.07
0.049 48.45% 0.07
0.05 48.45% 0.07
0.051 48.45% 0.07
0.052 48.45% 0.07
0.053 48.44% 0.07
0.054 48.44% 0.07
0.055 48.44% 0.07

In [33]:
lassreg = Lasso(alpha=0.07)
lassreg.fit(X, y)

In [34]:
dc_scores.update({str(lassreg).split("(")[0]: np.mean(ls_res)})
dc_scores

{'LinearRegression': np.float64(0.4846079075127147),
 'Lars': np.float64(0.0885677401975642),
 'Ridge': np.float64(-0.03583937996264125),
 'Lasso': np.float64(0.4803803502423855)}

## Red Elastica

Lasso y Ridge Regression funcionan bien para conjuntos pequeños de datos, sin embargo, la tarea se complica al tener un gran número de variables predictoras, ya que no sabemos si las variables serán útiles o innecesarias. Es aquí donde entra la red elástica, al ser un gran caso general de Lasso y Ridge.

Al igual que en los dos métodos anteriores, la red elástica minimiza el error cuadrático más $\lambda_1 * |pendiente| + \lambda_2*pendiente^2$. De igual modo, encontramos la mejor combinación de ambos parámetros ($\lambda_1$ y $\lambda_2$) mediante el uso de $cross-validation$.

Este método es particularmente bueno cuando existe correlación entre los parametros.


$$the sum of the squared residuals$$
$$+$$
$$\lambda_1x|variable_1| + ... + |variable_x| + \lambda_2x variable_1^2 + ... + variable_x^2$$

In [35]:
#ElasticNet?

In [36]:
elasnet = ElasticNet(alpha=0, l1_ratio=.001)
elasnet.fit(X, y)

  return fit_method(estimator, *args, **kwargs)
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(


In [37]:
ls_res = cross_val_score(estimator = elasnet, X=X, y=y, cv=4, n_jobs=-1, scoring="r2")
ls_res

  return fit_method(estimator, *args, **kwargs)
  return fit_method(estimator, *args, **kwargs)
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  return fit_method(estimator, *args, **kwargs)
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  return fit_method(estimator, *args, **kwargs)
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(


array([0.37459248, 0.49678312, 0.50950026, 0.55755577])

In [38]:
print( "MEAN: ", np.mean(ls_res))
print( "STD:  ", np.std(ls_res))
print( "Interceot: ", elasnet.intercept_)
print( "Coef:      ", elasnet.coef_ )

MEAN:  0.4846079074940278
STD:   0.06744006526476029
Interceot:  152.13348416289597
Coef:       [ -10.00986622 -239.81564354  519.84592031  324.38464537 -792.17561159
  476.73899989  101.04325589  177.0632342   751.27368961   67.62669221]


In [39]:
for i in range(0, 1000, 10):
    elasnet = ElasticNet(alpha=i/10000,l1_ratio=.001)
    elasnet.fit(X, y)
    ls_res = cross_val_score(estimator = elasnet, X=X, y=y, cv=4, n_jobs=-1, scoring="r2")
    print(i/10000, "{:,.2%}".format(np.mean(ls_res)), "{:,.2f}".format(np.std(ls_res)))

  return fit_method(estimator, *args, **kwargs)
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  return fit_method(estimator, *args, **kwargs)
  return fit_method(estimator, *args, **kwargs)
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  return fit_method(estimator, *args, **kwargs)
  model = cd_fast.enet_coordinate_descent(
  return fit_method(estimator, *args, **kwargs)
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(


0.0 48.46% 0.07
0.001 46.33% 0.07
0.002 43.14% 0.07
0.003 40.16% 0.06
0.004 37.51% 0.06
0.005 35.16% 0.06
0.006 33.07% 0.06
0.007 31.19% 0.06
0.008 29.50% 0.05
0.009 27.96% 0.05
0.01 26.56% 0.05
0.011 25.28% 0.05
0.012 24.11% 0.05
0.013 23.02% 0.05
0.014 22.01% 0.05
0.015 21.08% 0.05
0.016 20.21% 0.05
0.017 19.40% 0.05
0.018 18.65% 0.05
0.019 17.94% 0.05
0.02 17.27% 0.05
0.021 16.64% 0.05
0.022 16.05% 0.05
0.023 15.50% 0.05
0.024 14.97% 0.05
0.025 14.47% 0.04
0.026 14.00% 0.04
0.027 13.55% 0.04
0.028 13.12% 0.04
0.029 12.71% 0.04
0.03 12.32% 0.04
0.031 11.95% 0.04
0.032 11.60% 0.04
0.033 11.26% 0.04
0.034 10.94% 0.04
0.035 10.63% 0.04
0.036 10.33% 0.04
0.037 10.04% 0.04
0.038 9.77% 0.04
0.039 9.51% 0.04
0.04 9.25% 0.04
0.041 9.01% 0.04
0.042 8.77% 0.04
0.043 8.55% 0.04
0.044 8.33% 0.04
0.045 8.12% 0.04
0.046 7.91% 0.04
0.047 7.72% 0.04
0.048 7.52% 0.04
0.049 7.34% 0.04
0.05 7.16% 0.04
0.051 6.99% 0.04
0.052 6.82% 0.04
0.053 6.66% 0.04
0.054 6.50% 0.04
0.055 6.35% 0.04
0.056 6.20% 0.04


In [40]:
elasnet = ElasticNet(alpha=0, l1_ratio=.007)
elasnet.fit(X, y)

  return fit_method(estimator, *args, **kwargs)
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(


In [41]:
dc_scores.update({str(elasnet).split("(")[0]: np.mean(ls_res)})
dc_scores

{'LinearRegression': np.float64(0.4846079075127147),
 'Lars': np.float64(0.0885677401975642),
 'Ridge': np.float64(-0.03583937996264125),
 'Lasso': np.float64(0.4803803502423855),
 'ElasticNet': np.float64(0.0233658370775591)}

## Regresión Bayesiana 

En el punto de vista Bayesiano, formulamos una regresión lineal utilizando distribuciones de probabilidad en lugar de estimaciones puntuales. La variable objetivo no se estima como un valor único, pero se supone que se extrae de una distribución de probabilidad.

El objetivo de la regresión lineal bayesiana no es encontrar el único "mejor" valir de los parámetros del modelo, sino más bien determinar la distribución posterior de los parámetros del modelo. La respuesta no solo se genra a partir de una distribución de probabilidad, sino que se también se supone que los parámetros del modelo provienen de una distribución. La probabilidad posterior de los parámetros del modelo depende de las entradas y salidas de entrenamiento.

En los problemas en los que tenemos datos limitados o tenemos algún conocimeinto previo que deseamos utilizar en nuestro modelo, el enfoque de regresión lineal bayesiana puede incorporar información previa y mostrar nuestra incestidumbre.

In [42]:
#BayesianRidge?

In [43]:
bayreg = BayesianRidge()
bayreg.fit(X, y)

In [44]:
ls_res = cross_val_score(estimator=bayreg, X=X, y=y, cv=4, n_jobs=-1, scoring="r2")
print(ls_res)

[0.37316514 0.49264262 0.52052578 0.54899138]


In [45]:
print( "MEAN: ", np.mean(ls_res))
print( "STD:  ", np.std(ls_res))
print( "Interceot: ", bayreg.intercept_)
print( "Coef:      ", bayreg.coef_ )

MEAN:  0.48383122881684953
STD:   0.06692712133407915
Interceot:  152.13348416289602
Coef:       [  -4.23356256 -226.32799122  513.47304015  314.90385885 -182.28434068
   -4.36854822 -159.20103916  114.63541259  506.82345986   76.25617559]


In [46]:
dc_scores.update({str(bayreg).split("(")[0]: np.mean(ls_res)})
dc_scores

{'LinearRegression': np.float64(0.4846079075127147),
 'Lars': np.float64(0.0885677401975642),
 'Ridge': np.float64(-0.03583937996264125),
 'Lasso': np.float64(0.4803803502423855),
 'ElasticNet': np.float64(0.0233658370775591),
 'BayesianRidge': np.float64(0.48383122881684953)}

## Resultados

In [47]:
resul = pd.DataFrame(columns=[])
alfas = pd.DataFrame(columns=[])

In [48]:
for model in [linreg, larsreg, ridgereg, lassreg, elasnet, bayreg]:
    resul[str(model).split("(")[0]] = model.coef_
    alfas[str(model).split("(")[0]] = [model.intercept_]

In [49]:
resul["features"] = ls_pred
resul = resul.set_index("features")
alfas["features"] = ["intercepto"]
alfas = alfas.set_index("features")
alfas

Unnamed: 0_level_0,LinearRegression,Lars,Ridge,Lasso,ElasticNet,BayesianRidge
features,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
intercepto,152.13,152.13,152.13,152.13,152.13,152.13


In [50]:
resul = pd.concat([resul,alfas])
resul

Unnamed: 0_level_0,LinearRegression,Lars,Ridge,Lasso,ElasticNet,BayesianRidge
features,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
age,-10.01,-10.01,-1.0,-0.0,-10.01,-4.23
sex,-239.82,-239.82,-215.45,-178.57,-239.82,-226.33
bmi,519.85,519.85,500.36,519.97,519.85,513.47
bp,324.38,324.38,307.48,287.16,324.38,314.9
s1,-792.18,-792.18,-108.33,-80.69,-792.18,-182.28
s2,476.74,476.74,-56.91,-0.0,476.74,-4.37
s3,101.04,101.04,-183.32,-217.69,101.04,-159.2
s4,177.06,177.06,114.17,0.0,177.06,114.64
s5,751.27,751.27,464.04,500.79,751.27,506.82
s6,67.63,67.63,82.46,45.23,67.63,76.26


* Target: Is measure of disease progression one year after baseline (Progresión de la enfernedad un año después de inicio)

- Columnas:
  - age:age in years,
  - sex: M/F,
  - bmi: body mass index,
  - bp: average blood pressure,
  - s1: tc, total serum cholesterol,
  - s2: ldl, low-density lipoproteins,
  - s3: hdl, high-density lipoproteins,
  - s4: tch, total cholesterol / HDL,
  - s5: ltg, possibly log of serum triglycerides level,
  - s6: glu, blood sugar level

__NOTAS:__

* Cuando hacemos cross validation para encontrar el mejor alpha, se busca el pico o valor más alto al que llega antes de empezar a decrecer, el valor i al que pertenezca, se considera el mejor parámetro para el modelo.
* Lasso generalmente funciona mejor con una penalización más chica que Ridge.
* En la red elástica, si el valor de L1 ratio es grande, converge más lento.
* Si el valor de las alphas es 0 en Ridge y Lasso, se vuelven la regresión lineal, y si ambas son cero en la Red elástica, también se vuelve al regresión lineal.
* Los coeficientes más pequeños te dicen que la variable no es tan importante
* Los coeficientes más grandes te dicen que la variable es importe y tiene un impacto grande en la predicción