# Módulo: Aprendizaje Supervisado
## Redes Neuronales Artificiales (ANN)

# Objetivos de la clase
- Conocer los fundamentos y formulación de las redes neuronales artificiales, con enfoque en el perceptrón multicapa.
- Analizar algunos de los hiperparámetros más importantes de estos modelos.
- Comprender el funcionamiento del algoritmo de propagación hacia atrás.
- Aprender algunos tipos de normalización de atributos.

# Introducción

## Inspiración
Las redes neuronales artificiales (artificial neural networks, ANN) están inspiradas por el funcionamiento de las neuronas en el cerebro de los animales. Su concepción se basó en las investigaciones neurológicas de Donald Hebb sobre el comportamiento animal.

<center>
    <img src="figures/brain-1.jpg" width="700"/>
</center>

## Clasificación y tipos de ANN
Redes de propagación hacia adelante (feedforward): información viaja "hacia adelante"
- Perceptrón
- Perceptrón multicapa
- Convolucionales

Redes de función base radial: tienen un criterio de distancia hacia un centro

Redes recurrentes: información se propaga hacia adelante y atrás

Redes modulares: formadas por varios "módulos" de distintas redes

# Formulación

## Perceptrón
Es la unidad fundamental de distintos tipos de ANN. Corresponde a la combinación lineal de los atributos a cuyo resultado se le aplica una función de activación.

<center>
    <img src="figures/perceptron.png" width="500"/>
</center>

\begin{split}
\hat{y}(w) = f \left( w_0 + \sum_{i=1}^{n} w_{i} x_{i} \right)
\end{split}

## Perceptrón multicapa

Al juntar varios perceptrones en varias capas se forma una red llamada perceptrón multicapa

<center>
    <img src="figures/mlp-2.png" width="600"/>
</center>

La primera capa es la entrada de los atributos y última capa es la capa de salida con las variables objetivo.

Las capas intermedias se llaman capas ocultas, porque su información generalmente no se observa.

<center>
    <img src="figures/ann1.png" width="500"/>
</center>

En notación vectorial se pueden expresar las siguientes transformaciones para un total de $k$ capas:

Capa de entrada a capa oculta:    
\begin{split}
    \mathbf{h}_{1} = f \left( \mathbf{W}_{1}^{T} \cdot \mathbf{x}  \right)
\end{split}  

Capa oculta a capa oculta:
\begin{split}
    \mathbf{h}_{p+1} = f \left( \mathbf{W}_{p+1}^{T} \cdot \mathbf{h}_{p}  \right)  & \quad 1 \le p \le k-1
\end{split}

Capa oculta a capa de salida:
\begin{split}
    \mathbf{y} = f \left( \mathbf{W}_{k+1}^{T} \cdot \mathbf{h}_{k}  \right)
 \end{split}

## Funciones de activación

La elección de la función de activación dependerá de la aplicación y del diseño de la ANN

Para problemas de regresión donde la variable objetivo es un valor real se suele usar la función **identidad**:

\begin{split}
    f(x) = x
\end{split}   

Para problemas de clasificación se suele usar una función del tipo **logística**:

\begin{split}
    f(x) = \frac{1}{1 + e^{-x}}
\end{split}

Esta función del tipo logístico también se denomina **sigmoid**

También para tareas de clasificación se suele usar la función **tangente hiperbólica**:

\begin{split}
    f(x) = \tanh (x)
\end{split}

La función rectificadora lineal (**ReLU**) se usa para distintas aplicaciones y es especialmente versátil para aproximar datos con comportamiento no-lineal:

\begin{split}
    f(x) = \Bigg\{
    \begin{array}{ c c }
    x  & \quad \textrm{if } x \ge 0 \\
    0  & \quad \textrm{if } x < 0
  \end{array}
\end{split}

Una generalización de la función logística para clasificadores de más de 2 clases es la función **softmax**:

\begin{split}
    \sigma(x_{i}) = \frac{ e^{x_{i}} }{ \sum_{j=1}^{k} e^{x_{j}}}
\end{split}

Esta función suele aplicarse sobre la última capa de la red y sus resultados se interpretan como la distribución de probabilidad sobre las distintas clases.

<br><center>
    <img src="figures/softmax.png" width="500"/>
</center>

# Entrenamiento

El entrenamiento de la ANN consiste en "aprender" los pesos apropiados de todas las interconexiones entre neuronas para todas las capas.

Esto se lleva a cabo optimizando una función de costo a través de algún método.

## Funciones de costo

Para variables objetivo continuas (regresión) se suele usar una función de costo de **error medio cuadrático**:

\begin{split}
    S(w) = \sum_{i=1}^{m} \left( y_{i} - \sum_{j=1}^{n} \left( w_{0} + w_{j} x_{ij} \right) \right)^{2}
\end{split}

Para variables objetivos binarias (clasificación) se puede usar la función de costo **logística**:

\begin{split}
    S(w) = - \sum_{i=1}^{m} y_{i} \log \left[ \hat{y}_{i}(w) \right] + (1 - y_{i}) \log \left[ 1 - \hat{y}_{i}(w) \right]
\end{split}

En ambos casos también se puede (y se debe) considerar usar regularización

Para clasificación multivariada de $k$ clases es común usar la función de costo de **entropía cruzada**:

\begin{split}
    S(w) = - \sum_{i=1}^{k} y \log [\hat{y}(\omega)]
\end{split}

Combinada con una última capa softmax, esta función generaliza a la función de costo logística. Un mayor (menor) costo estará asociado a una menor (mayor) probabilidad con que se predice una clase.

<br><center>
    <img src="figures/entropy-1.png" width="900"/>
</center>

## Backpropagation

Un algoritmo llamado propagación hacia atrás (backpropagation) se encarga de que la ANN "aprenda" los valores óptimos para los pesos usando una variante del método del gradiente descendente.

Procedimiento típico:
- Pesos son inicializados aleatoriamente
- Al entrar un vector de atributos a la red, se calculan los valores en las capas ocultas y de salida hacia adelante (forward propagation)
- Se calculan los gradientes de la función de costo respecto de los pesos desde la capa de salida hasta la capa de entrada usando la regla de la cadena.
- Como consecuencia, se van ajustando los pesos desde atrás hacia adelante (backpropagation)


**Ejemplo**

Supongamos la siguiente ANN:

<center>
    <img src="figures/backpropagation-1a.png" width="500"/>
</center>

Las ecuaciones que relacionan las capas son las siguientes

\begin{split}
    h_1 = w_1 x_1 + w_3 x_2
\end{split}

\begin{split}
    h_2 = w_2 x_1 + w_4 x_2
\end{split}

\begin{split}
    \hat{y} = w_5 h_1 + w_6 h_2
\end{split}

Usaremos una función de costo de error cuadrático medio:

\begin{split}
    S(w) = \frac{1}{2} \sum_{i=1}^{m} (\hat{y} - y)^{2}
\end{split}

En un principio los pesos son inicializados de forma aleatoria:

<br><center>
    <img src="figures/backpropagation-2.png" width="500"/>
</center>

Suponemos que en el entrenamiento tenemos la observación $(x_1, x_2)$ = (8, 5) con la variable objetivo $y$ = 3

Primero hacemos la propagación hacia adelante para calcular los valores de la capa oculta y de la capa de salida (predicción)

<center>
    <img src="figures/backpropagation-3.png" width="500"/>
</center>

Notar que tenemos una diferencia en la predicción de: $\hat{y}-y$ = -0.76

Actualizamos los pesos calculando de gradientes desde atrás hacia adelante. Asumimos una tasa de aprendizaje $L=0.01$. Empezamos con $w_6$:

\begin{split}
    \frac{\partial S}{\partial w_6} = \frac{\partial S}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial w_6} = (\hat{y} - y) h_2
\end{split}

\begin{split}
    w_6' = w_6 - L \frac{\partial S}{\partial w_6} = w_6 - L (\hat{y} - y) h_2
\end{split}

\begin{split}
    w_6' = 0.2 + 0.01 \cdot 0.76 \cdot 4.9 = 0.23724
\end{split}

Para $w_5$ el procedimiento es análogo y resulta:

\begin{split}
    w_5' = 0.61596
\end{split}

Ahora seguimos llendo hacia atrás en la ANN y actualizamos el valor para $w_4$:

\begin{split}
    \frac{\partial S}{\partial w_4} = \frac{\partial S}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial h_2}  \cdot\frac{\partial h_2}{\partial w_4} = (\hat{y} - y) w_6 x_2
\end{split}

\begin{split}
    w_4' = w_4 - L \frac{\partial S}{\partial w_4} = w_4 - L (\hat{y} - y) w_6 x_2
\end{split}

\begin{split}
    w_4' = 0.5 + 0.01 \cdot 0.076 \cdot 0.2 \cdot 5 = 0.50076
\end{split}

Para $w_1$, $w_2$ y $w_3$ el procedimiento es análogo y resulta:

\begin{split}
    w_3' = 0.11216
\end{split}

\begin{split}
    w_2' = 0.3228
\end{split}

\begin{split}
    w_1' = 0.23648
\end{split}

Luego del ajuste de pesos con la observación considerada, la ANN pasa de esto:

<br><center>
    <img src="figures/backpropagation-2.png" width="500"/>
</center>

a esto otro:

<br><center>
    <img src="figures/backpropagation-4a.png" width="500"/>
</center>

Volvemos a introducir la misma observación $(x_1, x_2)$ = (8, 5) a la ANN para ver si la predicción mejoró. El valor correcto debe ser $y$ = 3 y el valor anteriormente predicho fue $\hat{y}$ = 2.24

<br><center>
    <img src="figures/backpropagation-5.png" width="500"/>
</center>

<br>La nueva predicción es $\hat{y}$ = 2.7174. ¡El ajuste de pesos mejoró el desempeño de la ANN!

# Perceptrón multicapa en Scikit

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html

# Ejemplo "Wine recognition"

- Perceptrón multicapa para clasificación
- Influencia de hiperparámetros
- Validación cruzada con búsqueda de malla

<center>
    <img src="figures/wine.jpg" width="300"/>
</center>

https://scikit-learn.org/dev/datasets/toy_dataset.html#wine-recognition-dataset


In [4]:
from sklearn import datasets
import pandas as pd 

X, y = datasets.load_wine(return_X_y=True, as_frame=True) #cargamos dataset con atributos "X" y target "y"
X.insert(0, "target", y)
display(X) #set de datos

Unnamed: 0,target,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
0,0,14.23,1.71,2.43,15.6,127.0,2.80,3.06,0.28,2.29,5.64,1.04,3.92,1065.0
1,0,13.20,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.40,1050.0
2,0,13.16,2.36,2.67,18.6,101.0,2.80,3.24,0.30,2.81,5.68,1.03,3.17,1185.0
3,0,14.37,1.95,2.50,16.8,113.0,3.85,3.49,0.24,2.18,7.80,0.86,3.45,1480.0
4,0,13.24,2.59,2.87,21.0,118.0,2.80,2.69,0.39,1.82,4.32,1.04,2.93,735.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
173,2,13.71,5.65,2.45,20.5,95.0,1.68,0.61,0.52,1.06,7.70,0.64,1.74,740.0
174,2,13.40,3.91,2.48,23.0,102.0,1.80,0.75,0.43,1.41,7.30,0.70,1.56,750.0
175,2,13.27,4.28,2.26,20.0,120.0,1.59,0.69,0.43,1.35,10.20,0.59,1.56,835.0
176,2,13.17,2.59,2.37,20.0,120.0,1.65,0.68,0.53,1.46,9.30,0.60,1.62,840.0


In [23]:
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = datasets.load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=11)

clf = MLPClassifier(random_state=4)
clf.fit(X_train, y_train)

y_train_predict = clf.predict(X_train)
y_test_predict = clf.predict(X_test)

print('La exactitud en la fase de entrenamiento es: ', accuracy_score(y_train, y_train_predict))
print('La exactitud en la fase de test es: ', accuracy_score(y_test, y_test_predict))

La exactitud en la fase de entrenamiento es:  0.5639097744360902
La exactitud en la fase de test es:  0.5111111111111111


In [33]:
clf = MLPClassifier(hidden_layer_sizes=(6, 2), activation='tanh', solver='sgd', alpha=0.00001, random_state=4)
clf.fit(X_train, y_train)

y_train_predict = clf.predict(X_train)
y_test_predict = clf.predict(X_test)

print('La exactitud en la fase de entrenamiento es: ', accuracy_score(y_train, y_train_predict))
print('La exactitud en la fase de test es: ', accuracy_score(y_test, y_test_predict))

La exactitud en la fase de entrenamiento es:  0.39849624060150374
La exactitud en la fase de test es:  0.4




In [36]:
#comparacion con regresion logistica
from sklearn.linear_model import LogisticRegression

logregre = LogisticRegression()
logregre.fit(X_train, y_train)

y_train_predict = logregre.predict(X_train)
y_test_predict = logregre.predict(X_test)

print('la exactitud de entrenamiento es = ', accuracy_score(y_train, y_train_predict))
print('la exactitud de test es = ', accuracy_score(y_test, y_test_predict))

la exactitud de entrenamiento es =  0.9774436090225563
la exactitud de test es =  0.9111111111111111


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [38]:
from sklearn.model_selection import GridSearchCV

parameters = {'hidden_layer_sizes':[(), (6), (9, 3)], 'activation':('identity', 'relu', 'tanh'), 'solver':('sgd', 'adam', 'lbfgs'), 'alpha':[0.0001, 0.01, 1.]}
clf = MLPClassifier(random_state=4)
gridcv = GridSearchCV(clf, parameters, scoring='accuracy', cv=5)
gridcv.fit(X_train, y_train)

ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION

ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION

ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION

ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("

ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("

In [39]:
print(gridcv.best_index_)
print(gridcv.best_params_)

5
{'activation': 'identity', 'alpha': 0.0001, 'hidden_layer_sizes': 6, 'solver': 'lbfgs'}


In [43]:
mydict = {'params':gridcv.cv_results_['params'], 'rank_test_score':gridcv.cv_results_['rank_test_score'], 'mean_test_score':gridcv.cv_results_['mean_test_score']}
mydata = pd.DataFrame.from_dict(mydict)
display(mydata)

Unnamed: 0,params,rank_test_score,mean_test_score
0,"{'activation': 'identity', 'alpha': 0.0001, 'h...",64,0.368091
1,"{'activation': 'identity', 'alpha': 0.0001, 'h...",34,0.398291
2,"{'activation': 'identity', 'alpha': 0.0001, 'h...",18,0.531624
3,"{'activation': 'identity', 'alpha': 0.0001, 'h...",34,0.398291
4,"{'activation': 'identity', 'alpha': 0.0001, 'h...",34,0.398291
...,...,...,...
76,"{'activation': 'tanh', 'alpha': 1.0, 'hidden_l...",12,0.592877
77,"{'activation': 'tanh', 'alpha': 1.0, 'hidden_l...",7,0.797151
78,"{'activation': 'tanh', 'alpha': 1.0, 'hidden_l...",34,0.398291
79,"{'activation': 'tanh', 'alpha': 1.0, 'hidden_l...",8,0.639316


In [44]:
display(mydata[mydata.rank_test_score == 1])

print(mydata.iloc[20]['params'])
print(mydata.iloc[47]['params'])
print(mydata.iloc[5]['params'])

Unnamed: 0,params,rank_test_score,mean_test_score
5,"{'activation': 'identity', 'alpha': 0.0001, 'h...",1,0.933048


{'activation': 'identity', 'alpha': 1.0, 'hidden_layer_sizes': (), 'solver': 'lbfgs'}
{'activation': 'relu', 'alpha': 1.0, 'hidden_layer_sizes': (), 'solver': 'lbfgs'}
{'activation': 'identity', 'alpha': 0.0001, 'hidden_layer_sizes': 6, 'solver': 'lbfgs'}


In [45]:
clf = MLPClassifier(hidden_layer_sizes=(6), activation='identity', solver='lbfgs', alpha=0.0001, random_state=4)
clf.fit(X_train, y_train)

y_train_predict = clf.predict(X_train)
y_test_predict = clf.predict(X_test)

print('El accuracy en la fase de entrenamiento es: ', accuracy_score(y_train, y_train_predict))
print('El accuracy en la fase de test es: ', accuracy_score(y_test, y_test_predict))

El accuracy en la fase de entrenamiento es:  0.9624060150375939
El accuracy en la fase de test es:  0.9111111111111111


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


# Otros aspectos del entrenamiento

## Número de iteraciones (o épocas)

En algoritmos de resolución (solvers) del tipo Newton-Raphson (LBFGS) el número de iteraciones se asocia a las veces que se itera para ir mejorando una solución inicial.

\begin{split}
    x_{n+1} = x_{n} - \frac{f(x_{n})}{f'(x_{n})}
\end{split}

En solvers del tipo gradiente descendente (Adam o SGD) las iteraciones se llaman épocas (epochs). 

Las épocas son la cantidad de veces que cada observación (atributos, objetivo) se usa para entrenar la ANN.

<center>
    <img src="figures/epochs.png" width="600"/>
</center>

## Número de lotes (batches)

En solvers del tipo gradiente descendente, las actualizaciones de los pesos se pueden hacer de 3 formas según cómo se divida el set de entrenamiento:

- Punto a punto (estocástico)
- Por lote completo (batch)
- Por mini-lotes (mini-batch)

En el caso de lote o mini-lotes, los gradientes se calculan de forma conjunta para varias observaciones al mismo tiempo. Se trabaja entonces con matrices de pesos en vez de vectores de pesos.




Independiente de cómo se considere el set de entrenamiento, para cada estrategia se pueden considerar varias épocas.
<center>
    <img src="figures/batches.png" width="800"/>
</center>

**Reflexión: ¿por qué es necesario considerar para el entrenamiento varias épocas y mini-batches?**

## Normalización de datos

Siempre es una buena práctica normalizar los datos antes de entrenar un modelo.

Esto se hace para evitar distorsiones debido a los diferentes rangos numéricos de los atributos.

<center>
    <img src="figures/normalize-1.png" width="800"/>
</center>


**Algunas normalizaciones de Scikit**:

**StandardScaler** remueve la media y divide por la desviación estándar de cada atributo:

\begin{align}
    \tilde{\mathbf{x_i}} = \frac{\mathbf{x_i} - \bar{x_i}}{S_{x_{i}}}
\end{align}

**MaxAbsScaler** divide por el valor absoluto máximo de cada atributo:

\begin{align}
    \tilde{\mathbf{x_i}} = \frac{\mathbf{x_i}}{ \text{max} |\mathbf{x_i}| }
\end{align}

Otras normalizaciones son **MaxMinScaler** o **RobustScaler**


In [46]:
from sklearn.model_selection import GridSearchCV
import pandas as pd

parameters = {'max_iter':[200, 10000], 'hidden_layer_sizes':[(), (6), (9, 3)], 'activation':('identity', 'relu', 'tanh'), 'solver':('sgd', 'adam', 'lbfgs'), 'alpha':[0.0001, 0.01, 1.]}
clf = MLPClassifier(random_state=4)
gridcv = GridSearchCV(clf, parameters, scoring='accuracy', cv=5)
gridcv.fit(X_train, y_train)


ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION

  ret = a @ b
  tmp = X - X.max(axis=1)[:, np.newaxis]
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_resul

ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_r

  ret = a @ b
  ret = a @ b
  tmp = X - X.max(axis=1)[:, np.newaxis]
  ret = a @ b
  ret = a @ b
  tmp = X - X.max(axis=1)[:, np.newaxis]
  ret = a @ b
  tmp = X - X.max(axis=1)[:, np.newaxis]
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
  ret = a @ b
  ret = a @ b
  tmp = X - X.max(axis=1)[:, np.newaxis]
  ret = a @ b
  ret = a @ b
  tmp = X - X.max(axis=1)[:, np.newaxis]
  ret = a @ b
  ret = a @ b
  tmp = X - X.max(axis=1)[:, np.newaxis]
  ret = a @ b
  ret = a @ b
  tmp = X - X.max(axis=1)[:, np.newaxis]
  ret = a @ b
  tmp = X - X.max(axis=1)[:, np.newaxis]
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/

  ret = a @ b
  tmp = X - X.max(axis=1)[:, np.newaxis]
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_resul

ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION

ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_

ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION

ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: T

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("

In [47]:
print(gridcv.best_index_)
print(gridcv.best_params_)

101
{'activation': 'relu', 'alpha': 1.0, 'hidden_layer_sizes': 6, 'max_iter': 10000, 'solver': 'lbfgs'}


In [49]:
mydict = {'params':gridcv.cv_results_['params'], 'rank_test_score':gridcv.cv_results_['rank_test_score'], 'mean_test_score':gridcv.cv_results_['mean_test_score']}
mydata = pd.DataFrame.from_dict(mydict)
display(mydata)

Unnamed: 0,params,rank_test_score,mean_test_score
0,"{'activation': 'identity', 'alpha': 0.0001, 'h...",130,0.368091
1,"{'activation': 'identity', 'alpha': 0.0001, 'h...",70,0.398291
2,"{'activation': 'identity', 'alpha': 0.0001, 'h...",38,0.531624
3,"{'activation': 'identity', 'alpha': 0.0001, 'h...",130,0.368091
4,"{'activation': 'identity', 'alpha': 0.0001, 'h...",70,0.398291
...,...,...,...
157,"{'activation': 'tanh', 'alpha': 1.0, 'hidden_l...",25,0.639316
158,"{'activation': 'tanh', 'alpha': 1.0, 'hidden_l...",12,0.844160
159,"{'activation': 'tanh', 'alpha': 1.0, 'hidden_l...",70,0.398291
160,"{'activation': 'tanh', 'alpha': 1.0, 'hidden_l...",11,0.886040


In [50]:
display(mydata[mydata.rank_test_score == 1])
print(mydata.iloc[38]['params'])
print(mydata.iloc[41]['params'])
print(mydata.iloc[92]['params'])
print(mydata.iloc[95]['params'])
print(mydata.iloc[146]['params'])
print(mydata.iloc[149]['params'])
print(mydata.iloc[161]['params'])

Unnamed: 0,params,rank_test_score,mean_test_score
101,"{'activation': 'relu', 'alpha': 1.0, 'hidden_l...",1,0.969801
155,"{'activation': 'tanh', 'alpha': 1.0, 'hidden_l...",1,0.969801
161,"{'activation': 'tanh', 'alpha': 1.0, 'hidden_l...",1,0.969801


{'activation': 'identity', 'alpha': 1.0, 'hidden_layer_sizes': (), 'max_iter': 200, 'solver': 'lbfgs'}
{'activation': 'identity', 'alpha': 1.0, 'hidden_layer_sizes': (), 'max_iter': 10000, 'solver': 'lbfgs'}
{'activation': 'relu', 'alpha': 1.0, 'hidden_layer_sizes': (), 'max_iter': 200, 'solver': 'lbfgs'}
{'activation': 'relu', 'alpha': 1.0, 'hidden_layer_sizes': (), 'max_iter': 10000, 'solver': 'lbfgs'}
{'activation': 'tanh', 'alpha': 1.0, 'hidden_layer_sizes': (), 'max_iter': 200, 'solver': 'lbfgs'}
{'activation': 'tanh', 'alpha': 1.0, 'hidden_layer_sizes': (), 'max_iter': 10000, 'solver': 'lbfgs'}
{'activation': 'tanh', 'alpha': 1.0, 'hidden_layer_sizes': (9, 3), 'max_iter': 10000, 'solver': 'lbfgs'}


In [54]:
clf = MLPClassifier(hidden_layer_sizes=(6), activation='relu', solver='lbfgs', alpha=1.0, max_iter=10000, random_state=4)
clf.fit(X_train, y_train)

y_train_predict = clf.predict(X_train)
y_test_predict = clf.predict(X_test)

print('El accuracy en la fase de entrenamiento es: ', accuracy_score(y_train, y_train_predict))
print('El accuracy en la fase de test es: ', accuracy_score(y_test, y_test_predict))

El accuracy en la fase de entrenamiento es:  0.9924812030075187
El accuracy en la fase de test es:  0.9111111111111111


In [55]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

clf = MLPClassifier(hidden_layer_sizes=(6), activation='relu', solver='lbfgs', alpha=1.0, max_iter=10000, random_state=4)
clf.fit(X_train_scaled, y_train)

y_train_predict = clf.predict(X_train_scaled)
y_test_predict = clf.predict(X_test_scaled)

print('El accuracy en la fase de entrenamiento es: ', accuracy_score(y_train, y_train_predict))
print('El accuracy en la fase de test es: ', accuracy_score(y_test, y_test_predict))

El accuracy en la fase de entrenamiento es:  1.0
El accuracy en la fase de test es:  0.9777777777777777


**Reflexión: ¿por qué el objeto de normalización de ajusta solo con el set de entrenamiento y no con todos los datos?**

# Canalización a través de pipeline

https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html

# Sumario
- Las redes neuronales artificiales del tipo perceptrón multicapa utilizan composiciones de múltiples combinaciones lineales de los atributos en varias capas para aproximar la variable objetivo.
- Estos modelos se suelen entrenar mediante el algoritmo de propagación hacia atrás que va actualizando los pesos de las interconexiones.
- El ajuste de hiperparámetros es un proceso fundamental para estos modelos, ya que intervienen muchos más de ellos en lograr un buen desempeño.