# **REDES NEURAIS ARTIFICIAIS: REGRESSÃO**

Este projeto tem por objetivo desenvolver um algoritmo de Machine Learning para prever o valor do preço médio de casas em Boston. 

Os dados foram extraídos do site do Kaggle:

https://www.kaggle.com/schirmerchad/bostonhoustingmlnd

In [5]:
import numpy as np
import pandas as pd
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [6]:
df = pd.read_csv('/content/drive/MyDrive/machine_learning/housing.csv',
                    sep=',', encoding='iso-8859-1')

In [7]:
df.head()

Unnamed: 0,RM,LSTAT,PTRATIO,MEDV
0,6.575,4.98,15.3,504000.0
1,6.421,9.14,17.8,453600.0
2,7.185,4.03,17.8,728700.0
3,6.998,2.94,18.7,701400.0
4,7.147,5.33,18.7,760200.0


**Atributos previsores**

RM: é o número médio de cômodos entre os imóveis no bairro.

LSTAT: é a porcentagem de proprietários no bairro considerados de "classe baixa" (proletariado).

PTRATIO: é a razão entre estudantes e professores nas escolas de ensino fundamental e médio no bairro.

**Variável alvo**

MEDV: valor médio das casas

In [8]:
df.shape

(489, 4)

In [9]:
independente = df.iloc[:, 0:3].values
independente

array([[ 6.575,  4.98 , 15.3  ],
       [ 6.421,  9.14 , 17.8  ],
       [ 7.185,  4.03 , 17.8  ],
       ...,
       [ 6.976,  5.64 , 21.   ],
       [ 6.794,  6.48 , 21.   ],
       [ 6.03 ,  7.88 , 21.   ]])

In [10]:
independente.shape

(489, 3)

In [11]:
dependente = df.iloc[:, 3].values

In [12]:
dependente.shape

(489,)

## **TREINAMENTO**

In [13]:
from sklearn.model_selection import train_test_split
x_treino, x_teste, y_treino, y_teste = train_test_split(independente, dependente, test_size = 0.3, random_state = 0)

In [14]:
x_treino.shape, x_teste.shape

((342, 3), (147, 3))

In [15]:
from sklearn.neural_network import MLPRegressor

In [16]:
redes = MLPRegressor(hidden_layer_sizes=(100, 100), activation='relu', verbose=True, max_iter=2000,
                    solver='lbfgs', random_state = 12)

In [17]:
redes.fit(x_treino, y_treino)

MLPRegressor(hidden_layer_sizes=(100, 100), max_iter=2000, random_state=12,
             solver='lbfgs', verbose=True)

In [18]:
redes.n_layers_

4

In [19]:
redes.score(x_treino, y_treino)

0.8536909521067109

## **TESTE**

In [20]:
redes.score(x_teste, y_teste)

0.8178252425873315

In [21]:
previsoes_teste = redes.predict(x_teste)

## **MÉTRICAS**

In [22]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

In [23]:
# Erro médio Absoluto
mean_absolute_error(y_teste, previsoes_teste)

54889.90145604481

In [24]:
# Raiz do erro quadrático médio (RMSE)
np.sqrt(mean_squared_error(y_teste, previsoes_teste))

72702.15789942915

### **Validação Cruzada**

In [25]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

In [26]:
# Separando os dados em folds
kfold = KFold(n_splits = 12, shuffle=True, random_state = 5)

In [27]:
# Criando o modelo
from sklearn.neural_network import MLPRegressor
modelo = MLPRegressor(hidden_layer_sizes=(100, 100), activation='relu', verbose=True, max_iter=2000,
                    solver='lbfgs', random_state = 12)
resultado = cross_val_score(modelo, independente, dependente, cv = kfold)
resultado

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


array([0.89287907, 0.87189736, 0.81416127, 0.78567585, 0.80046946,
       0.80653329, 0.56896982, 0.79833847, 0.74399243, 0.68651471,
       0.85013494, 0.5445834 ])

In [28]:
# Usamos a média e o desvio padrão
print("Acurácia Média: %.2f%%" % (resultado.mean() * 100.0))

Acurácia Média: 76.37%


## **RESULTADOS**

**REGRESSÃO LINEAR SIMPLES:** R^2 = 0,57/0,60; RMSE = 99315,5; R^2 Validação Cruzada: 55,97%

**REGRESSÃO LINEAR MÚLTIPLA:** R^2 = 0,73/0,68; RMSE = 96087,3; R^2 Validação Cruzada: 69,25%

**REGRESSÃO POLINOMIAL:** R^2 = 0,59/0,54; RMSE = 114670,6.

**REGRESSÃO SVR:** R^2 = 0,87/0,81; RMSE = 73422,7. R^2 Validação Cruzada: 82,37%.

**REGRESSÃO ÁRVORE DE DECISÃO:** R^2 = 0,91/0,83; RMSE = 71114,5. R^2 Validação Cruzada: 74,60%.

**REGRESSÃO COM RANDOM FOREST:** R^2 = 0,92/0,85; RMSE = 66729,3. R^2 Validação Cruzada: 82,85%.

**REGRESSÃO COM XGBOOST:** R^2 = 0,93/0,84; RMSE = 67788,8. R^2 Validação Cruzada: 83,22%.

**REGRESSÃO COM LIGHT GBM:** R^2 = 0,88/0,82; RMSE = 71906,4. R^2 Validação Cruzada: 82,38%.

**REGRESSÃO COM CATBOOST:** R^2 = 0,90/0,84; RMSE = 69053,3 R^2 Validação Cruzada: 83,40%.

**REGRESSÃO COM REDES NEURAIS:** R^2 = 0,88/0,83; RMSE = 69717,4. R^2 Validação Cruzada: 77,15%. Escalonado.

## **Padronização de escala**

In [29]:
from sklearn.preprocessing import StandardScaler
x_scaler = StandardScaler()
x_treino_scaler = x_scaler.fit_transform(x_treino)

In [30]:
x_treino_scaler

array([[ 0.05327517, -0.70150711, -0.05467118],
       [ 1.12799963, -0.44487061, -0.52922816],
       [ 0.60711128, -0.79792304,  0.230063  ],
       ...,
       [-0.33111532, -0.36121561, -0.33940537],
       [-0.31699486,  0.84398345, -0.29194967],
       [-0.33268427, -0.38815536, -0.90887374]])

In [31]:
y_scaler = StandardScaler()
y_treino_scaler = y_scaler.fit_transform(y_treino.reshape(-1,1))

In [32]:
y_treino_scaler

array([[-1.05925606e-02],
       [ 6.46900118e-01],
       [ 2.85923746e-01],
       [-1.13728667e-01],
       [ 1.44111599e-01],
       [-7.84113359e-01],
       [-1.24822584e+00],
       [-2.81324840e-01],
       [-1.39512694e-01],
       [-1.01616960e+00],
       [ 2.00056152e+00],
       [ 1.21414870e+00],
       [ 1.27860877e+00],
       [-1.52404707e-01],
       [ 9.04740384e-01],
       [ 2.29945267e-03],
       [-2.81324840e-01],
       [ 2.73031732e-01],
       [-2.81324840e-01],
       [ 2.73031732e-01],
       [ 4.27735892e-01],
       [ 2.60139719e-01],
       [-1.06773765e+00],
       [-1.17087376e+00],
       [ 3.49603506e+00],
       [ 9.04740384e-01],
       [-2.07331469e+00],
       [ 1.69895626e-01],
       [ 4.27735892e-01],
       [ 2.76119030e+00],
       [-3.45784907e-01],
       [ 2.73031732e-01],
       [-1.48028208e+00],
       [ 1.57003612e-01],
       [-1.91080747e-01],
       [-1.01616960e+00],
       [ 3.37491799e-01],
       [-2.34845740e-02],
       [ 1.9

In [33]:
x_teste_scaler = x_scaler.transform(x_teste)
x_teste_scaler

array([[-6.24507256e-01, -6.20687880e-01,  1.17917695e+00],
       [ 9.56985082e-01, -8.43295235e-01, -2.61727885e+00],
       [-1.30072075e+00,  1.98112421e+00, -1.81053199e+00],
       [ 9.72674490e-01,  9.82935809e-01,  7.99531373e-01],
       [ 2.90185237e-01, -5.72479917e-01, -3.39405369e-01],
       [ 2.72926888e-01,  9.46070896e-01,  7.99531373e-01],
       [-2.08519115e+00,  2.33134087e+00, -1.81053199e+00],
       [-1.88341711e-01, -2.51777567e-02,  7.99531373e-01],
       [-1.08325729e-01, -2.13755962e-01, -2.44493974e-01],
       [-4.18976010e-01,  1.39296468e-01,  1.17917695e+00],
       [-1.41892805e-02,  1.26651206e+00,  7.99531373e-01],
       [-6.19800434e-01,  4.03022379e-01,  7.99531373e-01],
       [-2.99736508e-01, -7.29864736e-01,  5.14797187e-01],
       [-2.51664988e+00,  3.05162455e+00,  7.99531373e-01],
       [ 7.76556889e-01, -4.85989161e-01,  1.13172126e+00],
       [-7.45315699e-01,  6.32719141e-01,  1.27408835e+00],
       [ 7.36714050e-02, -1.24429444e-01

In [34]:
y_teste_scaler = y_scaler.transform(y_teste.reshape(-1,1))
y_teste_scaler

array([[-2.29756787e-01],
       [ 1.08522857e+00],
       [-1.06773765e+00],
       [ 7.50036225e-01],
       [ 1.18327572e-01],
       [-5.90733160e-01],
       [-1.27400987e+00],
       [ 1.18327572e-01],
       [-3.63765873e-02],
       [-2.68432827e-01],
       [-1.48028208e+00],
       [-1.48028208e+00],
       [-1.65296720e-01],
       [-4.87597053e-01],
       [ 7.50036225e-01],
       [-7.84113359e-01],
       [-2.16864774e-01],
       [ 2.08571666e-01],
       [ 9.25435458e-02],
       [ 1.13679662e+00],
       [ 1.20125669e+00],
       [ 1.44620494e+00],
       [-1.89282650e+00],
       [-8.79446405e-02],
       [-5.39165106e-01],
       [ 3.13505869e+00],
       [ 1.84585736e+00],
       [ 2.65805419e+00],
       [ 2.29945267e-03],
       [-6.21606139e-02],
       [-8.09897386e-01],
       [ 2.29945267e-03],
       [-1.52404707e-01],
       [ 2.52913406e+00],
       [-6.21606139e-02],
       [ 2.58070211e+00],
       [-2.42648800e-01],
       [-1.14508973e+00],
       [ 8.2

In [35]:
redes = MLPRegressor(hidden_layer_sizes=(6,6,6), activation='relu', verbose=True, max_iter=1500,
                    solver='lbfgs', random_state = 12)

In [36]:
redes.fit(x_treino_scaler, y_treino_scaler.ravel())

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


MLPRegressor(hidden_layer_sizes=(6, 6, 6), max_iter=1500, random_state=12,
             solver='lbfgs', verbose=True)

In [37]:
redes.n_layers_

5

In [38]:
redes.score(x_treino_scaler, y_treino_scaler)

0.8878832063281447

**TESTE**

In [39]:
redes.score(x_teste_scaler, y_teste_scaler)

0.8253823445831391

In [40]:
previsoes_teste_scaler = redes.predict(x_teste_scaler)

In [41]:
previsoes_teste_scaler

array([-2.79009165e-01,  1.32394960e+00, -7.67865923e-01, -8.73478907e-01,
        1.97571627e-01, -9.28368206e-01, -8.11042827e-01, -2.17872279e-01,
       -5.00237481e-02, -4.56693640e-01, -1.17498866e+00, -5.89204783e-01,
        6.52844207e-02, -1.29749796e+00,  8.77587620e-02, -7.61596489e-01,
       -7.12537690e-02,  6.49821804e-01, -1.13740984e-01,  1.18025811e+00,
        1.42467227e+00,  1.62690796e+00, -1.25759132e+00,  1.27222850e-01,
        9.77156047e-03,  2.87652159e+00,  1.84662380e+00,  2.82101544e+00,
       -3.79894239e-01, -1.30881031e-01, -8.50843275e-01, -2.46670900e-01,
        1.09719967e-01,  1.55036148e+00, -8.46596221e-02,  2.96554405e+00,
       -2.83467074e-02, -1.54867050e+00,  9.14054942e-02,  5.60368334e-01,
       -1.15571763e+00, -2.58440758e-01, -1.13871506e-01,  1.17969245e-01,
        5.05797245e-02, -4.32434035e-01,  1.66500867e-01, -8.26679337e-01,
        1.18009296e+00, -1.51548367e-01,  1.57464075e+00, -2.57920674e-02,
        7.55298472e-01, -

## **MÉTRICAS**

**Revertendo a transformação**

In [42]:
previsoes_teste_scaler

array([-2.79009165e-01,  1.32394960e+00, -7.67865923e-01, -8.73478907e-01,
        1.97571627e-01, -9.28368206e-01, -8.11042827e-01, -2.17872279e-01,
       -5.00237481e-02, -4.56693640e-01, -1.17498866e+00, -5.89204783e-01,
        6.52844207e-02, -1.29749796e+00,  8.77587620e-02, -7.61596489e-01,
       -7.12537690e-02,  6.49821804e-01, -1.13740984e-01,  1.18025811e+00,
        1.42467227e+00,  1.62690796e+00, -1.25759132e+00,  1.27222850e-01,
        9.77156047e-03,  2.87652159e+00,  1.84662380e+00,  2.82101544e+00,
       -3.79894239e-01, -1.30881031e-01, -8.50843275e-01, -2.46670900e-01,
        1.09719967e-01,  1.55036148e+00, -8.46596221e-02,  2.96554405e+00,
       -2.83467074e-02, -1.54867050e+00,  9.14054942e-02,  5.60368334e-01,
       -1.15571763e+00, -2.58440758e-01, -1.13871506e-01,  1.17969245e-01,
        5.05797245e-02, -4.32434035e-01,  1.66500867e-01, -8.26679337e-01,
        1.18009296e+00, -1.51548367e-01,  1.57464075e+00, -2.57920674e-02,
        7.55298472e-01, -

In [43]:
previsoes_teste_inverse = y_scaler.inverse_transform(previsoes_teste_scaler.reshape(-1,1))

In [44]:
previsoes_teste_inverse

array([[409877.20388823],
       [670985.63789757],
       [330246.57011611],
       [313043.10758694],
       [487508.18673451],
       [304102.10477261],
       [323213.41737586],
       [419835.88585231],
       [447176.99285273],
       [380933.90482812],
       [263929.71799006],
       [359348.95970549],
       [465959.7189231 ],
       [243973.98731669],
       [469620.59915949],
       [331267.80781037],
       [443718.80188535],
       [561175.91787095],
       [436797.99364137],
       [647579.50817296],
       [687392.51015981],
       [720334.99374618],
       [250474.44252241],
       [476048.9655771 ],
       [456917.14320416],
       [923886.49237391],
       [756124.84658509],
       [914845.00907664],
       [393443.87808465],
       [434006.02484127],
       [316730.26072148],
       [415144.83391841],
       [473197.89376288],
       [707866.21868829],
       [441535.10172826],
       [938387.49804902],
       [450707.99956037],
       [203060.10373119],
       [4702

In [45]:
y_teste

array([ 417900.,  632100.,  281400.,  577500.,  474600.,  359100.,
        247800.,  474600.,  449400.,  411600.,  214200.,  214200.,
        428400.,  375900.,  577500.,  327600.,  420000.,  489300.,
        470400.,  640500.,  651000.,  690900.,  147000.,  441000.,
        367500.,  966000.,  756000.,  888300.,  455700.,  445200.,
        323400.,  455700.,  430500.,  867300.,  445200.,  875700.,
        415800.,  268800.,  590100.,  497700.,  231000.,  315000.,
        388500.,  449400.,  413700.,  352800.,  453600.,  306600.,
        898800.,  514500.,  743400.,  474600.,  600600.,  304500.,
        661500.,  489300.,  422100.,  184800.,  525000.,  249900.,
        407400.,  361200.,  428400.,  392700.,  428400.,  472500.,
        258300.,  550200.,  346500.,  199500.,  302400.,  611100.,
        396900.,  585900.,  279300.,  483000.,  462000.,  218400.,
        518700.,  420000.,  392700.,  980700.,  455700.,  514500.,
        480900.,  520800.,  485100.,  525000.,  390600.,  5691

In [46]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

In [47]:
# Erro médio Absoluto
mean_absolute_error(y_teste, previsoes_teste_inverse)

53801.85330918682

In [48]:
# Raiz do erro quadrático médio (RMSE)
np.sqrt(mean_squared_error(y_teste, previsoes_teste_inverse))

71178.24536738635

**Revertendo a transformação**

In [49]:
x_treino_inverse = x_scaler.inverse_transform(x_treino_scaler)

In [50]:
x_treino_inverse

array([[ 6.266,  7.9  , 18.4  ],
       [ 6.951,  9.71 , 17.4  ],
       [ 6.619,  7.22 , 19.   ],
       ...,
       [ 6.021, 10.3  , 17.8  ],
       [ 6.03 , 18.8  , 17.9  ],
       [ 6.02 , 10.11 , 16.6  ]])

In [51]:
x_treino

array([[ 6.266,  7.9  , 18.4  ],
       [ 6.951,  9.71 , 17.4  ],
       [ 6.619,  7.22 , 19.   ],
       ...,
       [ 6.021, 10.3  , 17.8  ],
       [ 6.03 , 18.8  , 17.9  ],
       [ 6.02 , 10.11 , 16.6  ]])

In [52]:
y_treino_inverse = y_scaler.inverse_transform(y_treino_scaler)

In [53]:
y_treino_inverse

array([[ 453600.],
       [ 560700.],
       [ 501900.],
       [ 436800.],
       [ 478800.],
       [ 327600.],
       [ 252000.],
       [ 409500.],
       [ 432600.],
       [ 289800.],
       [ 781200.],
       [ 653100.],
       [ 663600.],
       [ 430500.],
       [ 602700.],
       [ 455700.],
       [ 409500.],
       [ 499800.],
       [ 409500.],
       [ 499800.],
       [ 525000.],
       [ 497700.],
       [ 281400.],
       [ 264600.],
       [1024800.],
       [ 602700.],
       [ 117600.],
       [ 483000.],
       [ 525000.],
       [ 905100.],
       [ 399000.],
       [ 499800.],
       [ 214200.],
       [ 480900.],
       [ 424200.],
       [ 289800.],
       [ 510300.],
       [ 451500.],
       [ 766500.],
       [ 510300.],
       [ 369600.],
       [ 466200.],
       [ 312900.],
       [ 407400.],
       [ 273000.],
       [ 392700.],
       [ 294000.],
       [ 556500.],
       [ 178500.],
       [ 728700.],
       [ 157500.],
       [ 367500.],
       [ 327

In [54]:
x_teste_inverse = x_scaler.inverse_transform(x_teste_scaler)

In [55]:
x_teste_inverse

array([[ 5.834,  8.47 , 21.   ],
       [ 6.842,  6.9  , 13.   ],
       [ 5.403, 26.82 , 14.7  ],
       [ 6.852, 19.78 , 20.2  ],
       [ 6.417,  8.81 , 17.8  ],
       [ 6.406, 19.52 , 20.2  ],
       [ 4.903, 29.29 , 14.7  ],
       [ 6.112, 12.67 , 20.2  ],
       [ 6.163, 11.34 , 18.   ],
       [ 5.965, 13.83 , 21.   ],
       [ 6.223, 21.78 , 20.2  ],
       [ 5.837, 15.69 , 20.2  ],
       [ 6.041,  7.7  , 19.6  ],
       [ 4.628, 34.37 , 20.2  ],
       [ 6.727,  9.42 , 20.9  ],
       [ 5.757, 17.31 , 21.2  ],
       [ 6.279, 11.97 , 18.7  ],
       [ 6.51 ,  7.39 , 14.7  ],
       [ 5.807, 16.03 , 18.6  ],
       [ 6.739,  4.69 , 15.2  ],
       [ 7.327, 11.25 , 13.   ],
       [ 7.135,  4.45 , 17.   ],
       [ 4.519, 36.98 , 20.2  ],
       [ 5.85 ,  8.77 , 19.2  ],
       [ 5.569, 15.1  , 19.2  ],
       [ 7.645,  3.01 , 14.9  ],
       [ 7.333,  7.79 , 13.   ],
       [ 7.61 ,  3.11 , 14.7  ],
       [ 6.395, 13.27 , 20.2  ],
       [ 6.019, 12.92 , 19.2  ],
       [ 6

In [56]:
y_teste_inverse = y_scaler.inverse_transform(y_teste_scaler)

In [57]:
y_teste_inverse

array([[ 417900.],
       [ 632100.],
       [ 281400.],
       [ 577500.],
       [ 474600.],
       [ 359100.],
       [ 247800.],
       [ 474600.],
       [ 449400.],
       [ 411600.],
       [ 214200.],
       [ 214200.],
       [ 428400.],
       [ 375900.],
       [ 577500.],
       [ 327600.],
       [ 420000.],
       [ 489300.],
       [ 470400.],
       [ 640500.],
       [ 651000.],
       [ 690900.],
       [ 147000.],
       [ 441000.],
       [ 367500.],
       [ 966000.],
       [ 756000.],
       [ 888300.],
       [ 455700.],
       [ 445200.],
       [ 323400.],
       [ 455700.],
       [ 430500.],
       [ 867300.],
       [ 445200.],
       [ 875700.],
       [ 415800.],
       [ 268800.],
       [ 590100.],
       [ 497700.],
       [ 231000.],
       [ 315000.],
       [ 388500.],
       [ 449400.],
       [ 413700.],
       [ 352800.],
       [ 453600.],
       [ 306600.],
       [ 898800.],
       [ 514500.],
       [ 743400.],
       [ 474600.],
       [ 600

In [58]:
y_teste

array([ 417900.,  632100.,  281400.,  577500.,  474600.,  359100.,
        247800.,  474600.,  449400.,  411600.,  214200.,  214200.,
        428400.,  375900.,  577500.,  327600.,  420000.,  489300.,
        470400.,  640500.,  651000.,  690900.,  147000.,  441000.,
        367500.,  966000.,  756000.,  888300.,  455700.,  445200.,
        323400.,  455700.,  430500.,  867300.,  445200.,  875700.,
        415800.,  268800.,  590100.,  497700.,  231000.,  315000.,
        388500.,  449400.,  413700.,  352800.,  453600.,  306600.,
        898800.,  514500.,  743400.,  474600.,  600600.,  304500.,
        661500.,  489300.,  422100.,  184800.,  525000.,  249900.,
        407400.,  361200.,  428400.,  392700.,  428400.,  472500.,
        258300.,  550200.,  346500.,  199500.,  302400.,  611100.,
        396900.,  585900.,  279300.,  483000.,  462000.,  218400.,
        518700.,  420000.,  392700.,  980700.,  455700.,  514500.,
        480900.,  520800.,  485100.,  525000.,  390600.,  5691