Este conjunto de datos obtiene el precio de casas expresado por unidad de area. Los datos contienen la fecha de la última transacción sobre la propiedad, la edad de la casa, la distancia a la estación de metro más cercana, la cantidad de tiendas cercanas, latitud y longitud.

In [114]:
import pandas as pd
from sklearn.model_selection import train_test_split 
from sklearn.neural_network import MLPRegressor
from sklearn import metrics
from sklearn.tree import DecisionTreeRegressor 
import numpy as np

In [115]:
realstate = pd.read_csv("real_state.csv")
realstate.head()

Unnamed: 0,No,X1 transaction date,X2 house age,X3 distance to the nearest MRT station,X4 number of convenience stores,X5 latitude,X6 longitude,Y house price of unit area
0,1,2012.9166667,32.0,84.87882,10,24.98298,121.54024,37.9
1,2,2012.9166667,19.5,306.5947,9,24.98034,121.53951,42.2
2,3,2013.5833333,13.3,561.9845,5,24.98746,121.54391,47.3
3,4,2013.5,13.3,561.9845,5,24.98746,121.54391,54.8
4,5,2012.8333333,5.0,390.5684,5,24.97937,121.54245,43.1


In [116]:
realstate.drop(
    columns=[
        'No',
        'X1 transaction date',
        'X5 latitude',
        'X6 longitude'
    ], inplace=True
)

data = realstate.values[:, :3]
data

array([[ 32.     ,  84.87882,  10.     ],
       [ 19.5    , 306.5947 ,   9.     ],
       [ 13.3    , 561.9845 ,   5.     ],
       ...,
       [ 18.8    , 390.9696 ,   7.     ],
       [  8.1    , 104.8101 ,   5.     ],
       [  6.5    ,  90.45606,   9.     ]])

In [117]:
data_columns = list(realstate.columns.values[:3])
target = realstate.values[:, 3]

### Definir y entrenar regresor

In [118]:
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.3, random_state=1)
value = X_train
print(value)

[[2.750000e+01 3.940173e+02 7.000000e+00]
 [1.770000e+01 4.516419e+02 8.000000e+00]
 [1.640000e+01 2.893248e+02 5.000000e+00]
 [1.250000e+01 5.619845e+02 5.000000e+00]
 [3.730000e+01 5.878877e+02 8.000000e+00]
 [9.100000e+00 1.402016e+06 0.000000e+00]
 [3.600000e+00 3.838624e+02 5.000000e+00]
 [1.720000e+01 3.905684e+02 5.000000e+00]
 [3.700000e+00 5.779615e+02 6.000000e+00]
 [1.100000e+00 1.935845e+02 6.000000e+00]
 [1.330000e+01 3.360532e+02 5.000000e+00]
 [1.330000e+01 4.922313e+02 5.000000e+00]
 [1.300000e+01 4.922313e+02 5.000000e+00]
 [3.500000e+00 7.573377e+02 3.000000e+00]
 [6.500000e+00 9.045606e+01 9.000000e+00]
 [3.350000e+01 1.978671e+06 2.000000e+00]
 [2.560000e+01 4.519690e+03 0.000000e+00]
 [3.000000e+01 1.013341e+06 5.000000e+00]
 [3.170000e+01 1.159454e+06 0.000000e+00]
 [4.100000e+00 3.128963e+02 5.000000e+00]
 [3.800000e+00 3.838624e+02 5.000000e+00]
 [8.900000e+00 1.406430e+03 0.000000e+00]
 [4.000000e+00 2.147376e+06 3.000000e+00]
 [9.900000e+00 2.791726e+02 7.0000

In [119]:
mlp = MLPRegressor(max_iter=1000,
                   hidden_layer_sizes=(7),
                   activation='logistic',
                   learning_rate_init=0.01,  
                   verbose = True
                   )
mlp.fit(X_train, y_train)
print("Cantidad de iteraciones: " +str(mlp.n_iter_))

Iteration 1, loss = 789.61329709
Iteration 2, loss = 785.87915522
Iteration 3, loss = 782.12045791
Iteration 4, loss = 778.36496247
Iteration 5, loss = 774.66064336
Iteration 6, loss = 770.95666651
Iteration 7, loss = 767.24184151
Iteration 8, loss = 763.55636474
Iteration 9, loss = 759.87256199
Iteration 10, loss = 756.21213358
Iteration 11, loss = 752.51260440
Iteration 12, loss = 748.84052085
Iteration 13, loss = 745.18005236
Iteration 14, loss = 741.49063089
Iteration 15, loss = 737.61661682
Iteration 16, loss = 733.51871465
Iteration 17, loss = 729.66596717
Iteration 18, loss = 725.27980489
Iteration 19, loss = 719.78060593
Iteration 20, loss = 712.68796558
Iteration 21, loss = 703.57004690
Iteration 22, loss = 695.48737894
Iteration 23, loss = 691.06322509
Iteration 24, loss = 686.69277262
Iteration 25, loss = 682.22100622
Iteration 26, loss = 677.78602863
Iteration 27, loss = 673.36417418
Iteration 28, loss = 669.03333371
Iteration 29, loss = 664.64449287
Iteration 30, loss = 66

In [120]:
y_pred = mlp.predict(X_test)
error =metrics.mean_squared_error(y_test,y_pred)
print(error)

230.4795669758888


### Evaluar error obtenido

In [122]:
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.3, random_state=1)
value = X_train
train_random = []
output_random = []
size = len(X_train)
val = np.random.choice(range(size-1), size=10, replace=False)
for x in val:
    train_random.append(X_train[x].tolist())
    output_random.append(y_train[x].tolist())

output_random

[34.3, 19.2, 21.8, 23.5, 50.5, 34.1, 31.3, 41.0, 32.9, 46.8]

In [123]:
predict = mlp.predict(train_random)
for x, y in zip(train_random, output_random):
    y_predict = mlp.predict(np.array(x).reshape(1,-1))
    print(f"El valor de Y es:{y_predict[0]}, valor real: {y}")

El valor de Y es:37.84632516275544, valor real: 34.3
El valor de Y es:37.84632516275544, valor real: 19.2
El valor de Y es:37.84632516275544, valor real: 21.8
El valor de Y es:37.84632516275544, valor real: 23.5
El valor de Y es:37.84632516275544, valor real: 50.5
El valor de Y es:37.84632516275544, valor real: 34.1
El valor de Y es:37.84632516275544, valor real: 31.3
El valor de Y es:37.84632516275544, valor real: 41.0
El valor de Y es:37.84632516275544, valor real: 32.9
El valor de Y es:37.84632516275544, valor real: 46.8


### Prueba

Predecir y comparar el resultado para 10 valores cualquiera del conjunto de prueba
- ¿Qué diferencia existe entre el valor obtenido del real?

Existe una diferencia mayor en la red que en el valor real que tiende a exagerar los precios de las casas.

- Predecir los mismos valores utilizando un árbol de desición. ¿Cuál método se adapta mejor a los datos?

El metodo que mas se adapta al problema es el de decision de arboles ya que se aproxima mucho mas a los valores reales de las casas que el de redes neuronales.

In [124]:
regression = DecisionTreeRegressor( max_depth=4)
regression = regression.fit(X_train, y_train)
y_pred = regression.predict(train_random)

for x, y in zip(train_random, output_random):
    y_predict = mlp.predict(np.array(x).reshape(1,-1))
    y_pre = regression.predict(np.array(x).reshape(1,-1))
    print(f"Red:{y_predict[0]} || Arbol: {y_pre[0]} || valor real: {y}")


Red:37.84632516275544 || Arbol: 31.650000000000002 || valor real: 34.3
Red:37.84632516275544 || Arbol: 20.7 || valor real: 19.2
Red:37.84632516275544 || Arbol: 26.981081081081083 || valor real: 21.8
Red:37.84632516275544 || Arbol: 25.06 || valor real: 23.5
Red:37.84632516275544 || Arbol: 40.382716049382715 || valor real: 50.5
Red:37.84632516275544 || Arbol: 31.650000000000002 || valor real: 34.1
Red:37.84632516275544 || Arbol: 40.382716049382715 || valor real: 31.3
Red:37.84632516275544 || Arbol: 55.25 || valor real: 41.0
Red:37.84632516275544 || Arbol: 40.382716049382715 || valor real: 32.9
Red:37.84632516275544 || Arbol: 44.524137931034495 || valor real: 46.8
