Este conjunto de datos contiene diez variables base como edad, sexo, índice de masa corporal, presión arterial promedio y seis mediciones del suero sanguíneo para 442 pacientes con diabetes, así como la respuesta de interés, una medida cuantitativa de la progresión de la enfermedad en un año.

In [224]:
import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split 
from sklearn.neural_network import MLPRegressor
from sklearn import metrics
from sklearn.tree import DecisionTreeRegressor
import numpy as np
import random


In [225]:
diabetes = load_diabetes( return_X_y=False, as_frame=True)
diabetes.data

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6
0,0.038076,0.050680,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019908,-0.017646
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.068330,-0.092204
2,0.085299,0.050680,0.044451,-0.005671,-0.045599,-0.034194,-0.032356,-0.002592,0.002864,-0.025930
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022692,-0.009362
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031991,-0.046641
...,...,...,...,...,...,...,...,...,...,...
437,0.041708,0.050680,0.019662,0.059744,-0.005697,-0.002566,-0.028674,-0.002592,0.031193,0.007207
438,-0.005515,0.050680,-0.015906,-0.067642,0.049341,0.079165,-0.028674,0.034309,-0.018118,0.044485
439,0.041708,0.050680,-0.015906,0.017282,-0.037344,-0.013840,-0.024993,-0.011080,-0.046879,0.015491
440,-0.045472,-0.044642,0.039062,0.001215,0.016318,0.015283,-0.028674,0.026560,0.044528,-0.025930


### Definir y entrenar regresor

In [226]:
X_train, X_test, y_train, y_test = train_test_split(diabetes.data, diabetes.target, test_size=0.3, random_state=1)
value = X_train.values[5]
print(value)

[ 0.03444337  0.05068012 -0.00189471 -0.01255635  0.03833367  0.01371725
  0.0780932  -0.03949338  0.00455189 -0.09634616]


### Definimos el regresor

In [227]:
mlp = MLPRegressor(max_iter=9000,
                   hidden_layer_sizes=(7),
                   activation='logistic',
                   learning_rate_init=0.01,  
                   verbose = True
                   )
mlp.fit(X_train, y_train)
print("Cantidad de iteraciones: " +str(mlp.n_iter_))

Iteration 1, loss = 15080.27861115
Iteration 2, loss = 15064.91411256
Iteration 3, loss = 15049.72229094
Iteration 4, loss = 15034.84472198
Iteration 5, loss = 15019.91886830
Iteration 6, loss = 15005.06914773
Iteration 7, loss = 14990.25076967
Iteration 8, loss = 14975.39174351
Iteration 9, loss = 14960.72717824
Iteration 10, loss = 14945.69406400
Iteration 11, loss = 14930.91869130
Iteration 12, loss = 14915.93390001
Iteration 13, loss = 14900.97273196
Iteration 14, loss = 14885.90616925
Iteration 15, loss = 14870.77314652
Iteration 16, loss = 14855.57691842
Iteration 17, loss = 14840.11315834
Iteration 18, loss = 14824.65319730
Iteration 19, loss = 14808.88126980
Iteration 20, loss = 14793.15761765
Iteration 21, loss = 14777.20020408
Iteration 22, loss = 14761.38915388
Iteration 23, loss = 14744.83585237
Iteration 24, loss = 14728.41876562
Iteration 25, loss = 14711.80983400
Iteration 26, loss = 14694.91044051
Iteration 27, loss = 14677.81763630
Iteration 28, loss = 14660.43290959
I



### Evaluar error obtenido

In [228]:
y_pred = mlp.predict(X_test)
error =metrics.mean_squared_error(y_test,y_pred)
print(error)

3145.328157935265


In [229]:
X_train, X_test, y_train, y_test = train_test_split(diabetes.data, diabetes.target, test_size=0.3, random_state=1)
value = X_train.values[5]
train_random = []
output_random = []
size = len(X_train)
val = np.random.choice(range(size-1), size=10, replace=False)
for x in val:
    train_random.append(X_train.iloc[x].tolist())
    output_random.append(y_train.iloc[x].tolist())

output_random


[80.0, 244.0, 75.0, 201.0, 163.0, 84.0, 77.0, 104.0, 263.0, 44.0]

In [230]:
predict = mlp.predict(train_random)
for x, y in zip(train_random, output_random):
    y_predict = mlp.predict(np.array(x).reshape(1,-1))
    print(f"El valor de Y es:{y_predict[0]}, valor real: {y}")

El valor de Y es:95.54386410688107, valor real: 80.0
El valor de Y es:187.16729324546588, valor real: 244.0
El valor de Y es:68.59103617617845, valor real: 75.0
El valor de Y es:103.5758267120701, valor real: 201.0
El valor de Y es:235.0784826457964, valor real: 163.0
El valor de Y es:118.26669163391236, valor real: 84.0
El valor de Y es:162.59897440174845, valor real: 77.0
El valor de Y es:85.86216393904871, valor real: 104.0
El valor de Y es:287.57305175411295, valor real: 263.0
El valor de Y es:95.99456441922646, valor real: 44.0


### Prueba

Predecir y comparar el resultado para 10 valores cualquiera del conjunto de prueba
- ¿Qué diferencia existe entre el valor obtenido del real? ¿Qué significa esto para el paciente?

Existe una diferencia entre el valor real y el obtenido pero al obtener valores mayores que el real el paciente obtiene una respuesta favorable con respecto a su salud lo cual es exagerado en estos casos

- ¿Este modelo es suficientemente exacto para utilizarse con nuevos pacientes? ¿Por qué?

No es exacto ya que produce una diferencia entre los valores reales los cuales pueden provocar errores en el diagnostico del paciente

- Predecir los mismos valores utilizando un árbol de desición. ¿Cuál método se adapta mejor a los datos?

Se adapta mejor utilizando un arbol de decisiones ya que los valores son 



In [239]:
regression = DecisionTreeRegressor( max_depth=4)
regression = regression.fit(X_train, y_train)
y_pred = regression.predict(train_random)

for x, y in zip(train_random, output_random):
    y_predict = mlp.predict(np.array(x).reshape(1,-1))
    y_pre = regression.predict(np.array(x).reshape(1,-1))
    print(f"Red:{y_predict[0]} || Arbol: {y_pre[0]} || valor real: {y}")


Red:95.54386410688107 || Arbol: 110.45833333333333 || valor real: 80.0
Red:187.16729324546588 || Arbol: 216.875 || valor real: 244.0
Red:68.59103617617845 || Arbol: 78.78947368421052 || valor real: 75.0
Red:103.5758267120701 || Arbol: 156.87234042553192 || valor real: 201.0
Red:235.0784826457964 || Arbol: 216.875 || valor real: 163.0
Red:118.26669163391236 || Arbol: 220.59375 || valor real: 84.0
Red:162.59897440174845 || Arbol: 110.45833333333333 || valor real: 77.0
Red:85.86216393904871 || Arbol: 110.45833333333333 || valor real: 104.0
Red:287.57305175411295 || Arbol: 271.0 || valor real: 263.0
Red:95.99456441922646 || Arbol: 78.78947368421052 || valor real: 44.0
