## Instrucciones generales 

1. Forme un grupo de **máximo tres estudiantes**
1. Versione su trabajo usando un **repositorio <font color="red">privado</font> de github**. Agregue a sus compañeros y a su profesor (usuario github: phuijse) en la pestaña *Settings/Manage access*. No se aceptarán consultas si la tarea no está en github. No se evaluarán tareas que no estén en github.
1. Se evaluará el **resultado, la profundidad de su análisis y la calidad/orden de sus códigos** en base al último commit antes de la fecha y hora de entrega". Se bonificará a quienes muestren un método de trabajo incremental y ordenado según el histórico de *commits*
1. Sean honestos, ríganse por el [código de ética de la ACM](https://www.acm.org/about-acm/code-of-ethics-in-spanish)

## Integrantes:

**Manuel Jaramillo, Christian Matzner, Felipe Salazar**

# Mi primera Red Neuronal Bayesiana

Las redes neuronales son modelos del estado del arte para hacer regresión y clasificación con datos complejos

Generalmente estos modelos requieren de una gran cantidad de datos para poder entrenarlos de forma efectiva y sin que se sobreajusten. Sin embargo, en algunos problemas los datos disponibles son simplemente muy escasos o muy difíciles de obtener. Adicionalmente, no es directo tomar decisiones en base al modelo, y se requiere un paso adicional de calibración. ¿Cómo podemos confiar en las decisiones del modelo?

Podemos intentar solucionar estos problemas escribiendo la red neuronal como un modelo bayesiano y aprender el posterior de sus parámetros con un método de Markov Chain Monte Carlo (siempre y cuando el modelo sea simple). 

Incorporando priors el modelo estará regularizado y en lugar de estimadores puntuales tendremos la distribución a posteriori completa. Esta rica información extra nos permite medir la confianza del modelo sobre sus predicciones (el modelo sabe cuando no sabe) facilitando la tarea de calibración. 



## Formulación clásica

En esta tarea se pide que programen un modelo de red neuronal para clasificación de datos bidimensionales, de dos clases, con una capa oculta y con función de activación sigmoidal

Sea el conjunto de datos y etiquetas

$$
\mathcal{D} = \{(x, y)^{(i)}, i=1,2,\ldots,N\} \quad x^{(i)} \in \mathbb{R}^2,  y^{(i)} \in \{0, 1\}
$$

Consideremos ahora una tupla en particular $(X, Y)$. La salida de la capa oculta en notación matricial es

$$
Z = \text{sigmoide}( W_Z X + B_Z)
$$

donde $W_Z \in \mathbb{R}^{M \times 2}$, $B_Z \in \mathbb{R}^{M}$ y  $M$ es el tamaño de la capa oculta

La salida de la capa visible (última capa) en notación matricial es

$$
Y = \text{sigmoide}( W_Y Z + B_Y)
$$

donde $W_Y \in \mathbb{R}^{1 \times M}$, $B_Y \in \mathbb{R}$

La función sigmoide se define como

$$
\text{sigmoide}(x) = \frac{1}{1+ e^{-x}}
$$

Luego $Z$ es un vector de largo $M$ con valores en $[0, 1]$ e $Y$ es un escalar con valor en el rango $[0, 1]$

## Formulación bayesiana

Para darle un toque bayesiano a este modelo debemos

- Definir priors para $W_Z$, $B_Z$, $W_Y$ y $B_Y$. Se pide que utilice priors **normales con media cero y desviación estándar diez**.
- Definir una verosimilitud para le problema. Dado que el problema es de clasificación binaria, utilice una distribución de **Bernoulli** con $p=Y$
- Considere los datos $X$ como una variable determínista. 

## Indicaciones

Utilice

- El atributo `shape` para darle la dimensión correcta a cada variable cada uno
- El atributo `observed` para asignar las etiquetas reales a esta variable aleatoria observada
- `pm.Data` para la variable independiente
- `theano.tensor.sigmoid` para calcular la función sigmoide
- `A.dot(B)` para calcular el producto matricial entre `A` y `B`




## Instrucciones específicas

- Considere el dataset sintético `two-moons` que se muestra a continuación. Se pide que realice dos experimentos variando el valor de `n_samples`, primero a $100$ y finalmente a $10$
- Implemente el modelo de red neuronal bayesiana en `pymc3` dejando $M$ como un argumento. Para cada valor de `n_samples` entrene tres modelos con $M=1$, $M=3$ y $M=10$
- Seleccione y calibre un algoritmo de MCMC para entrenar este modelo. Justifique y respalde sus decisiones en base al comportamiento de las trazas, al estadístico Gelman-Rubin y a la función de autocorrelación
- Estudie el posterior de los parámetros y evalue el posterior predictivo sobre los datos de prueba. Muestre graficamente la media y la varianza del posterior predictivo en el espacio de los datos. Haga observaciones y comparaciones entre los 6 casos (3 valores de $M$ y 2 valores de `n_samples`)

## 1) Utilizando n_samples de valores 10 y 100.

### 1.1) n_sample = 10

In [1]:
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_moons

n_samples=10
x, y = make_moons(n_samples, # Varie este parámetro
                  shuffle=True, noise=0.2, random_state=123456)
x = (x - np.mean(x, keepdims=True))/np.std(x, keepdims=True)

fig, ax = plt.subplots(figsize=(6, 3), tight_layout=True)
ax.scatter(x[y==0, 0], x[y==0, 1], marker='o')
ax.scatter(x[y==1, 0], x[y==1, 1], marker='x')

x1, x2 = np.meshgrid(np.linspace(-3, 3, 100), 
                     np.linspace(-3, 3, 100))
x_test = np.vstack([x1.ravel(), x2.ravel()]).T

<IPython.core.display.Javascript object>

### 1.2) n_sample = 100

In [2]:
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_moons
import warnings

n_samples=100
x, y = make_moons(n_samples, # Varie este parámetro
                  shuffle=True, noise=0.2, random_state=123456)
x = (x - np.mean(x, keepdims=True))/np.std(x, keepdims=True)

fig, ax = plt.subplots(figsize=(6, 3), tight_layout=True)
ax.scatter(x[y==0, 0], x[y==0, 1], marker='o')
ax.scatter(x[y==1, 0], x[y==1, 1], marker='x')

x1, x2 = np.meshgrid(np.linspace(-3, 3, 100), 
                     np.linspace(-3, 3, 100))
x_test = np.vstack([x1.ravel(), x2.ravel()]).T

<IPython.core.display.Javascript object>

## 2) Red neuronal bayesiana para M = 1, 3 y 10

In [116]:
#Modelo pymc3
import pymc3 as pm
import theano.tensor as T
from theano.tensor.nnet import sigmoid
mu=0
def modelo(M):
    with pm.Model() as bayes_reg:
        #Considerar los datos X como una variable deterministica
        X_data = pm.Data("x", x)
        # Se definen los prior
        Wz = pm.Normal(name='Wz', mu=0, sd=10, shape=(2,M))
        Bz = pm.Normal(name='Bz', mu=0, sd=10, shape=(M))
        Wy = pm.Normal(name='Wy', mu=0, sd=10, shape=(M,1))
        By = pm.Normal(name='By', mu=0, sd=10, shape=(1))

        Z=pm.Deterministic('Z', sigmoid(T.dot(x,Wz)+Bz))
        Y=pm.Deterministic('Y', sigmoid(T.dot(Z,Wy)+By))

        #Verosimilitud
        Y_obs = pm.Bernoulli('Y_obs', p=Y, observed=y)
    return bayes_reg

### 2.1) n_samples = 10

#### 2.1.1) M = 1

In [117]:
M=1
n_samples=10

In [118]:
#Entrenamiento de modelo con MCMC
x, y = make_moons(n_samples, shuffle=True, noise=0.2, random_state=123456)
x = (x - np.mean(x, keepdims=True))/np.std(x, keepdims=True)

with modelo(M):
    trace1_10 = pm.sample(draws=100, tune=500, chains=2, cores=2, step=pm.NUTS())

Only 100 samples in chain.
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [By, Wy, Bz, Wz]
Sampling 2 chains, 0 divergences: 100%|██████████| 1200/1200 [00:03<00:00, 328.79draws/s]
The acceptance probability does not match the target. It is 0.6623066835269523, but should be close to 0.8. Try to increase the number of tuning steps.
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The number of effective samples is smaller than 10% for some parameters.


In [6]:
pm.traceplot(trace1_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'], combined=True);

<IPython.core.display.Javascript object>

  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"


In [7]:
pm.plot_posterior(trace1_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By']);


<IPython.core.display.Javascript object>

In [8]:
pm.plots.autocorrplot(trace1_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'])

<IPython.core.display.Javascript object>

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001E02BD71088>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E02BE86D08>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E02BEB9EC8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E02BEF27C8>]],
      dtype=object)

In [9]:
pm.summary(trace1_10, var_names=['Wz', 'Bz', 'Wy', 'By']).round(3)

Unnamed: 0,mean,sd,hpd_3%,hpd_97%,mcse_mean,mcse_sd,ess_mean,ess_sd,ess_bulk,ess_tail,r_hat
"Wz[0,0]",0.332,9.955,-19.595,17.581,0.95,0.7,110.0,102.0,113.0,90.0,1.0
"Wz[1,0]",-0.729,10.442,-19.82,18.337,0.856,0.704,149.0,111.0,149.0,110.0,1.05
Bz[0],-0.319,10.473,-20.317,18.318,0.764,0.824,188.0,81.0,192.0,179.0,0.99
"Wy[0,0]",-4.139,8.315,-18.976,9.138,0.919,0.652,82.0,82.0,88.0,183.0,1.01
By[0],-10.465,6.291,-22.766,-1.148,0.662,0.469,90.0,90.0,89.0,115.0,1.04


**Análisis:** Al saber que entrenaremos un modelo con 10 datos y solo 1 neurona, podemos inferir que la preducción no sera la mas óptima por la falta de información, esto se ve claramente reflejado en un comienzo, cuando los graficos de las trazas muestran un comportamiento bastante errático, en comparación a lo que sería una comportamiento aceptable como lo es el similar al ruido blanco. Por otro lado, aún podemos ver notables correlaciones, lo que nos dice que el modelo aun no es aceptable. Por último, si vemos la última tabla (sumary), la variable r_hat nos muestra valores entre 1.01 y 1.04, lo que nos dice que el modelo esta dentro de los parametros de aceptados pero, al no haber alguno en 1.00 y si hay cercano a 1.05 podemos decir claramente que este modelo que se puede mejorar.

#### 2.1.2) M = 3

In [10]:
M = 3

In [11]:
x, y = make_moons(n_samples, shuffle=True, noise=0.2, random_state=123456)
x = (x - np.mean(x, keepdims=True))/np.std(x, keepdims=True)

with modelo(M):
    trace3_10 = pm.sample(draws=100, tune=500, chains=2, cores=4, step=pm.NUTS())

Only 100 samples in chain.
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [By, Wy, Bz, Wz]
Sampling 2 chains, 0 divergences: 100%|██████████| 1200/1200 [00:07<00:00, 151.00draws/s]
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.


In [12]:
pm.traceplot(trace3_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'], combined=True);

<IPython.core.display.Javascript object>

  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"


In [13]:
pm.summary(trace3_10, var_names=['Wz', 'Bz', 'Wy', 'By']).round(3)

Unnamed: 0,mean,sd,hpd_3%,hpd_97%,mcse_mean,mcse_sd,ess_mean,ess_sd,ess_bulk,ess_tail,r_hat
"Wz[0,0]",-0.58,8.692,-16.681,15.352,0.578,0.602,227.0,105.0,224.0,187.0,1.0
"Wz[0,1]",0.158,10.037,-15.96,23.346,0.795,0.725,159.0,96.0,168.0,105.0,1.02
"Wz[0,2]",-0.96,11.197,-22.706,20.118,0.721,0.941,241.0,71.0,232.0,122.0,1.01
"Wz[1,0]",0.035,9.024,-17.872,14.558,0.603,0.635,224.0,102.0,233.0,137.0,1.0
"Wz[1,1]",-1.063,11.804,-21.152,21.594,1.16,0.958,104.0,77.0,107.0,83.0,1.0
"Wz[1,2]",0.2,9.233,-21.561,17.183,0.676,0.631,186.0,108.0,184.0,152.0,1.0
Bz[0],-0.247,10.7,-21.413,18.184,0.728,0.84,216.0,82.0,223.0,180.0,1.05
Bz[1],-0.779,10.714,-18.68,18.61,0.813,0.812,174.0,88.0,174.0,154.0,1.01
Bz[2],-0.023,8.968,-17.792,14.97,0.757,0.635,140.0,100.0,142.0,154.0,1.05
"Wy[0,0]",-3.327,8.921,-19.914,12.352,0.805,0.571,123.0,123.0,122.0,132.0,1.01


In [14]:
pm.plots.autocorrplot(trace3_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'])

<IPython.core.display.Javascript object>

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001E02A3BC648>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E02A49E708>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E02A587908>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E02A618308>]],
      dtype=object)

In [15]:
pm.plot_posterior(trace3_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By']);

<IPython.core.display.Javascript object>

**Análisis:** Aún que hemos aumentado la cantidad de neuronas de 1 a 3, la verdad no podemos ver alguna mejora notable, los posteriosrs siguen irregulares, las trazas siguen lejos de llevar forma de ruido blanco, sigue existiendo bastante correlacion y por último el r_hat varía desde 1.00 hasta 1.07, lo que demuestra que si bien hay parametros que estan clavados en 1.00, hay otros que escapan de lo admisible.

#### 2.1.3) M = 10

In [16]:
M = 10

In [17]:
x, y = make_moons(n_samples, shuffle=True, noise=0.2, random_state=123456)
x = (x - np.mean(x, keepdims=True))/np.std(x, keepdims=True)

with modelo(M):
    trace10_10 = pm.sample(draws=100, tune=500, chains=2, cores=4, step=pm.NUTS())

Only 100 samples in chain.
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [By, Wy, Bz, Wz]
Sampling 2 chains, 0 divergences: 100%|██████████| 1200/1200 [00:17<00:00, 68.67draws/s] 
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.
The number of effective samples is smaller than 25% for some parameters.


In [18]:
pm.traceplot(trace10_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'], combined=True);

<IPython.core.display.Javascript object>

  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_d

In [19]:
pm.summary(trace10_10, var_names=['Wz', 'Bz', 'Wy', 'By']).round(3)

Unnamed: 0,mean,sd,hpd_3%,hpd_97%,mcse_mean,mcse_sd,ess_mean,ess_sd,ess_bulk,ess_tail,r_hat
"Wz[0,0]",0.12,10.5,-21.358,18.274,0.924,0.746,129.0,100.0,128.0,80.0,1.01
"Wz[0,1]",1.134,9.874,-14.056,26.465,0.651,0.692,230.0,103.0,221.0,138.0,1.0
"Wz[0,2]",-0.684,8.624,-18.001,15.954,0.836,0.685,106.0,80.0,106.0,117.0,1.02
"Wz[0,3]",0.496,10.036,-16.711,18.583,0.583,0.651,296.0,119.0,356.0,165.0,1.01
"Wz[0,4]",0.181,9.847,-16.746,20.093,0.916,0.688,116.0,103.0,110.0,123.0,1.01
"Wz[0,5]",-0.943,10.025,-21.649,15.08,0.656,0.736,234.0,93.0,246.0,137.0,1.0
"Wz[0,6]",0.83,9.759,-18.823,16.526,0.866,0.662,127.0,109.0,124.0,122.0,1.0
"Wz[0,7]",-0.175,10.023,-19.913,19.433,0.64,0.781,245.0,83.0,235.0,153.0,0.99
"Wz[0,8]",-0.013,9.758,-16.938,17.627,0.778,0.58,157.0,142.0,151.0,162.0,1.0
"Wz[0,9]",-0.184,9.791,-20.62,15.18,0.676,0.665,210.0,109.0,206.0,154.0,1.01


In [20]:
pm.plots.autocorrplot(trace10_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'])

<IPython.core.display.Javascript object>

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001E02EB5D3C8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E02EB4DD08>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E02EC21A08>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E02EC555C8>]],
      dtype=object)

In [21]:
pm.plot_posterior(trace10_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By']);

<IPython.core.display.Javascript object>

**Análisis:** Podemos ver una leve mejora al aplicar 10 neuronas en la capa, reflejando unas trazas un poco mas estables, un r_hat un poco mas cercano a 1.00 y la correlacion se comienza a estabilizar, sin embargo los posterior siguen muy irregulares.

### 2.2) n_samples = 100

#### 2.2.1) M = 1

In [107]:
M = 1
n_samples=100


In [108]:
x, y = make_moons(n_samples, shuffle=True, noise=0.2, random_state=123456)
x = (x - np.mean(x, keepdims=True))/np.std(x, keepdims=True)

with modelo(M):
    trace1_100 = pm.sample(draws=100, tune=500, chains=2, cores=4, step=pm.NUTS())

Only 100 samples in chain.
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [By, Wy, Bz, Wz]
Sampling 2 chains, 8 divergences: 100%|██████████| 1200/1200 [00:47<00:00, 25.37draws/s]
The acceptance probability does not match the target. It is 0.9797705045601213, but should be close to 0.8. Try to increase the number of tuning steps.
There were 8 divergences after tuning. Increase `target_accept` or reparameterize.
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.
The number of effective samples is smaller than 25% for some parameters.


In [109]:
pm.traceplot(trace1_100, figsize=(6, 3), var_names=['Wz', 'Bz', 'Wy', 'By'], combined=True);

ImportError: cannot import name '_cov' from 'arviz.utils' (/home/felipe/anaconda3/lib/python3.8/site-packages/arviz/utils.py)

In [25]:
pm.summary(trace1_100, var_names=['Wz', 'Bz', 'Wy', 'By']).round(3)

Unnamed: 0,mean,sd,hpd_3%,hpd_97%,mcse_mean,mcse_sd,ess_mean,ess_sd,ess_bulk,ess_tail,r_hat
"Wz[0,0]",0.289,9.99,-15.482,21.053,0.9,0.706,123.0,101.0,137.0,131.0,1.03
"Wz[1,0]",-1.218,9.705,-17.79,17.075,0.828,0.62,137.0,123.0,139.0,134.0,1.0
Bz[0],-0.971,10.472,-23.69,13.431,1.059,0.835,98.0,79.0,106.0,74.0,1.0
"Wy[0,0]",-3.706,7.804,-16.818,11.021,0.86,0.705,82.0,62.0,84.0,97.0,1.03
By[0],-11.26,5.248,-22.002,-4.111,0.594,0.439,78.0,72.0,92.0,103.0,1.01


In [26]:
pm.plots.autocorrplot(trace1_100, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'])

<IPython.core.display.Javascript object>

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001E02E1C8FC8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E02E1B8908>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E02E2B0C08>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E030D6EAC8>]],
      dtype=object)

In [27]:
pm.plot_posterior(trace1_100, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By']);

<IPython.core.display.Javascript object>

**Respuesta:** Ahora estamos utilizando diez veces mas datos que en el apartado 2.1, sin embargo al comparar con los resultados obtenidos con 1 neurona, vemos un peor desempeño en todos los sentidos, un r_hat que varia entre 1.02 y 1.09, una correlación bastante marcada, posterior muy irregular y unas trazas que no definen alguna forma.

#### 2.2.2) M = 3

In [28]:
M = 3

In [29]:
x, y = make_moons(n_samples, shuffle=True, noise=0.2, random_state=123456)
x = (x - np.mean(x, keepdims=True))/np.std(x, keepdims=True)

with modelo(M):
    trace3_100 = pm.sample(draws=100, tune=500, chains=2, cores=4, step=pm.NUTS())

Only 100 samples in chain.
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [By, Wy, Bz, Wz]
Sampling 2 chains, 4 divergences: 100%|██████████| 1200/1200 [00:09<00:00, 123.01draws/s]
There were 2 divergences after tuning. Increase `target_accept` or reparameterize.
There were 2 divergences after tuning. Increase `target_accept` or reparameterize.
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.
The number of effective samples is smaller than 25% for some parameters.


In [30]:
pm.traceplot(trace3_100, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'], combined=True);

<IPython.core.display.Javascript object>

  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"


In [31]:
pm.summary(trace3_100, var_names=['Wz', 'Bz', 'Wy', 'By'])

Unnamed: 0,mean,sd,hpd_3%,hpd_97%,mcse_mean,mcse_sd,ess_mean,ess_sd,ess_bulk,ess_tail,r_hat
"Wz[0,0]",-0.444,9.62,-22.119,16.8,0.593,0.702,263.0,95.0,239.0,194.0,1.0
"Wz[0,1]",1.5,10.381,-20.949,19.826,1.026,1.129,102.0,43.0,116.0,45.0,1.02
"Wz[0,2]",-0.121,9.407,-19.414,14.542,0.822,0.582,131.0,131.0,132.0,147.0,1.0
"Wz[1,0]",-0.329,9.873,-20.941,16.761,0.771,0.759,164.0,85.0,160.0,124.0,1.0
"Wz[1,1]",-0.042,9.428,-17.563,14.605,1.285,0.914,54.0,54.0,50.0,123.0,1.05
"Wz[1,2]",-0.339,9.297,-17.834,14.991,0.865,0.651,116.0,102.0,110.0,96.0,1.01
Bz[0],0.158,11.74,-22.518,18.325,1.619,1.151,53.0,53.0,63.0,66.0,1.03
Bz[1],0.582,10.174,-18.494,19.557,0.792,0.686,165.0,111.0,168.0,73.0,1.02
Bz[2],-1.123,10.544,-21.911,18.989,0.98,0.728,116.0,105.0,117.0,129.0,1.01
"Wy[0,0]",-3.846,8.425,-18.486,12.069,0.894,0.634,89.0,89.0,87.0,123.0,1.0


In [32]:
pm.plots.autocorrplot(trace3_100, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'])

<IPython.core.display.Javascript object>

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001E02F72FAC8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E030B19AC8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E030913C88>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E02F720208>]],
      dtype=object)

In [33]:
pm.plot_posterior(trace3_100, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By']);

<IPython.core.display.Javascript object>

**Respuesta:** Podemos ver una mejora en algunos parámetros, un posterior con forma bastante mas regular, una correlación bastante menos marcada, sin embargo, aún no podemos ver una traza que tenga forma de ruido blanco, o un r_hat que lotre estar muy cerca de 1.00.

#### 2.2.3) M = 10

In [35]:
M = 10

In [36]:
x, y = make_moons(n_samples, shuffle=True, noise=0.2, random_state=123456)
x = (x - np.mean(x, keepdims=True))/np.std(x, keepdims=True)

with modelo(M):
    trace10_100 = pm.sample(draws=100, tune=500, chains=2, cores=4, step=pm.NUTS())

Only 100 samples in chain.
Multiprocess sampling (2 chains in 4 jobs)
NUTS: [By, Wy, Bz, Wz]
Sampling 2 chains, 0 divergences: 100%|██████████| 1200/1200 [00:17<00:00, 69.10draws/s]
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.


In [37]:
pm.traceplot(trace10_100, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'], combined=True);

<IPython.core.display.Javascript object>

  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_d

In [38]:
pm.summary(trace10_100, var_names=['Wz', 'Bz', 'Wy', 'By']).round(3)

Unnamed: 0,mean,sd,hpd_3%,hpd_97%,mcse_mean,mcse_sd,ess_mean,ess_sd,ess_bulk,ess_tail,r_hat
"Wz[0,0]",-0.421,9.919,-22.055,15.154,0.718,0.57,191.0,152.0,194.0,187.0,1.02
"Wz[0,1]",0.167,8.879,-14.718,17.169,0.693,0.515,164.0,149.0,168.0,171.0,1.01
"Wz[0,2]",-0.403,9.88,-19.773,17.463,0.731,0.594,183.0,139.0,182.0,142.0,1.01
"Wz[0,3]",0.869,11.187,-20.07,21.803,0.813,0.951,189.0,70.0,191.0,106.0,1.02
"Wz[0,4]",1.207,10.78,-18.989,22.172,1.01,0.77,114.0,99.0,118.0,74.0,1.06
"Wz[0,5]",1.145,9.273,-21.43,14.731,0.719,0.635,166.0,107.0,163.0,129.0,1.0
"Wz[0,6]",-0.851,9.379,-17.041,16.912,0.711,0.796,174.0,70.0,197.0,121.0,1.01
"Wz[0,7]",0.953,9.324,-13.742,20.738,0.684,0.817,186.0,66.0,195.0,73.0,1.0
"Wz[0,8]",0.976,10.321,-16.891,18.788,0.882,0.67,137.0,119.0,134.0,135.0,1.01
"Wz[0,9]",0.312,8.884,-16.37,16.929,0.967,1.055,84.0,36.0,92.0,46.0,1.03


In [39]:
pm.plots.autocorrplot(trace10_100, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'])

<IPython.core.display.Javascript object>

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001E031AB0608>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E031AF7888>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E0317BE0C8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E031665CC8>]],
      dtype=object)

In [40]:
pm.plot_posterior(trace10_100, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By']);

<IPython.core.display.Javascript object>

**Respuesta:** Podemos observar que los resultados no son muy diferentes al resto, sin embargo, podemos notar una pequeña mejora respecto a los anteriores, de manera que el r_hat mas alto es de 1.03 y hay muchos en 1.00, las trazas estan comenzando a tomar una forma de ruido blanco. Por otro lado, los posteriors siguen bastante irregulares y no logramos eliminar hasta niveles aceptables la correlación.

## 3) Seleccione y calibre un algoritmo de MCMC para entrenar este modelo. Justifique y respalde sus decisiones en base al comportamiento de las trazas, al estadístico Gelman-Rubin y a la función de autocorrelación

### Modelo utilizado
##### El modelo utilizado es NUTS ya que en comparación a  Metropolis Hastings....
**Tu tienes que justificar tu elección de NUTS, puedes hacerlo mediante una comparaicón con Metropolis por ejemplo** $Palabras$ $del$ $profe$

In [42]:
def modeloHamilton(M):
    with pm.Model() as bayes_reg:
        #Considerar los datos X como una variable deterministica
        X_data = pm.Data("x", x)
        # Se definen los prior
        Wz = pm.Normal(name='Wz', mu=0, sd=10, shape=())
        Bz = pm.Normal(name='Bz', mu=0, sd=10, shape=())
        Wy = pm.Normal(name='Wy', mu=0, sd=10, shape=())
        By = pm.Normal(name='By', mu=0, sd=10, shape=())

        Z=sigmoid(T.dot(x,Wz)+Bz)
        Y=sigmoid(T.dot(Z,Wy)+By)

        #Verosimilitud
        Y_obs = pm.Bernoulli('Y_obs', p=Y, observed=Y)
        return bayes_reg

### Número de Muestras igual a 10

In [43]:
n_samples=10

#### M = 1

In [44]:
M = 1

In [45]:
x, y = make_moons(n_samples, shuffle=True, noise=0.2, random_state=123456)
x = (x - np.mean(x, keepdims=True))/np.std(x, keepdims=True)

with modeloHamilton(M):
    traceH1_10 = pm.sample(draws=100, tune=500, chains=2, cores=2, step=pm.Metropolis())

Only 100 samples in chain.
Multiprocess sampling (2 chains in 2 jobs)
CompoundStep
>Metropolis: [By]
>Metropolis: [Wy]
>Metropolis: [Bz]
>Metropolis: [Wz]
Sampling 2 chains, 0 divergences: 100%|██████████| 1200/1200 [00:01<00:00, 693.64draws/s]
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.
The number of effective samples is smaller than 10% for some parameters.


In [46]:
pm.traceplot(traceH1_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'], combined=True);

<IPython.core.display.Javascript object>

  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"


In [47]:
pm.summary(traceH1_10, var_names=['Wz', 'Bz', 'Wy', 'By']).round(3)

Unnamed: 0,mean,sd,hpd_3%,hpd_97%,mcse_mean,mcse_sd,ess_mean,ess_sd,ess_bulk,ess_tail,r_hat
Wz,-0.28,9.958,-19.014,17.993,1.709,1.219,34.0,34.0,43.0,53.0,1.11
Bz,1.703,9.357,-15.22,19.041,2.449,1.767,15.0,15.0,14.0,59.0,1.1
Wy,-4.737,7.066,-18.022,7.642,1.338,0.956,28.0,28.0,28.0,58.0,1.09
By,-7.913,6.722,-17.709,6.866,1.134,0.808,35.0,35.0,39.0,13.0,1.08


In [48]:
pm.plots.autocorrplot(traceH1_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'])

  fig, ax = plt.subplots(rows, cols, **backend_kwargs)


<IPython.core.display.Javascript object>

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001E033C140C8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E033C0E4C8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E032A19888>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E034DC4D08>]],
      dtype=object)

In [49]:
pm.plot_posterior(traceH1_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By']);

  fig, ax = plt.subplots(rows, cols, **backend_kwargs)


<IPython.core.display.Javascript object>

**Análisis:** Podemos observar que el comportamiento de las trazas es bastante errático, muy alejado de lo que sería un ruido blanco. De la misma forma podemos ver los comportamientos de los posterior, tomando una forma amorfa. Al observar la tabla de resumen podemos notar que lo r_hat estan bastante lejos del 1.00 e incluso están por arriba del 1.05, lo que nos confirma que es un modelo muy deficiente. Si lo comparamos con el otro algoritmo de MCMC (NUTS()), podemos ver que se comporta de manera mas deficiente en todos los aspectos estudiados.

#### M = 3

In [50]:
M = 3

In [51]:
x, y = make_moons(n_samples, shuffle=True, noise=0.2, random_state=123456)
x = (x - np.mean(x, keepdims=True))/np.std(x, keepdims=True)

with modeloHamilton(M):
    traceH3_10 = pm.sample(draws=100, tune=500, chains=2, cores=2, step=pm.Metropolis())

Only 100 samples in chain.
Multiprocess sampling (2 chains in 2 jobs)
CompoundStep
>Metropolis: [By]
>Metropolis: [Wy]
>Metropolis: [Bz]
>Metropolis: [Wz]
Sampling 2 chains, 0 divergences: 100%|██████████| 1200/1200 [00:01<00:00, 702.17draws/s] 
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.
The number of effective samples is smaller than 10% for some parameters.


In [52]:
pm.traceplot(traceH3_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'], combined=True);

  _, axes = plt.subplots(len(plotters), 2, squeeze=False, figsize=figsize, **backend_kwargs)


<IPython.core.display.Javascript object>

  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"


In [53]:
pm.summary(traceH3_10, var_names=['Wz', 'Bz', 'Wy', 'By']).round(3)

Unnamed: 0,mean,sd,hpd_3%,hpd_97%,mcse_mean,mcse_sd,ess_mean,ess_sd,ess_bulk,ess_tail,r_hat
Wz,-2.547,10.118,-20.814,11.339,1.455,1.035,48.0,48.0,50.0,61.0,1.04
Bz,-2.366,9.438,-16.107,16.805,1.09,0.934,75.0,52.0,79.0,86.0,1.01
Wy,-2.862,9.539,-18.544,16.035,1.886,1.349,26.0,26.0,22.0,60.0,1.12
By,-13.098,7.396,-25.056,-1.879,1.865,1.344,16.0,16.0,17.0,20.0,1.11


In [54]:
pm.plots.autocorrplot(traceH3_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'])

  fig, ax = plt.subplots(rows, cols, **backend_kwargs)


<IPython.core.display.Javascript object>

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001E034F23848>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E034709A88>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E034FB0388>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E031A00B08>]],
      dtype=object)

In [55]:
pm.plot_posterior(traceH3_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By']);

  fig, ax = plt.subplots(rows, cols, **backend_kwargs)


<IPython.core.display.Javascript object>

#### M = 10

In [56]:
M = 10

In [57]:
x, y = make_moons(n_samples, shuffle=True, noise=0.2, random_state=123456)
x = (x - np.mean(x, keepdims=True))/np.std(x, keepdims=True)

with modeloHamilton(M):
    traceH10_10 = pm.sample(draws=100, tune=500, chains=2, cores=2, step=pm.Metropolis())

Only 100 samples in chain.
Multiprocess sampling (2 chains in 2 jobs)
CompoundStep
>Metropolis: [By]
>Metropolis: [Wy]
>Metropolis: [Bz]
>Metropolis: [Wz]
Sampling 2 chains, 0 divergences: 100%|██████████| 1200/1200 [00:01<00:00, 707.97draws/s]
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.
The number of effective samples is smaller than 10% for some parameters.


In [58]:
pm.traceplot(traceH10_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'], combined=True);

  _, axes = plt.subplots(len(plotters), 2, squeeze=False, figsize=figsize, **backend_kwargs)


<IPython.core.display.Javascript object>

  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"


In [59]:
pm.summary(traceH10_10, var_names=['Wz', 'Bz', 'Wy', 'By']).round(3)

Unnamed: 0,mean,sd,hpd_3%,hpd_97%,mcse_mean,mcse_sd,ess_mean,ess_sd,ess_bulk,ess_tail,r_hat
Wz,1.731,9.62,-16.186,16.483,1.315,0.935,54.0,54.0,54.0,72.0,1.05
Bz,0.456,9.897,-17.745,17.878,1.479,1.053,45.0,45.0,50.0,63.0,1.06
Wy,-5.636,7.815,-17.989,8.287,2.3,1.669,12.0,12.0,13.0,45.0,1.14
By,-9.662,6.396,-23.712,-1.343,1.117,0.831,33.0,30.0,38.0,64.0,1.08


In [60]:
pm.plots.autocorrplot(traceH10_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'])

  fig, ax = plt.subplots(rows, cols, **backend_kwargs)


<IPython.core.display.Javascript object>

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001E034F40F88>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E031954A08>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E034F30148>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E034668B48>]],
      dtype=object)

In [61]:
pm.plot_posterior(traceH10_10, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By']);

  fig, ax = plt.subplots(rows, cols, **backend_kwargs)


<IPython.core.display.Javascript object>

**Análisis:** Podemos observar que al aumentar el número de meustras, el algoritmo no logra modelarse a un punto aceptable, teniendo un r_hat desde 1.04 hasta 1.10, bastante correlacion en general, unos posteriors deficientes y trazas erráticas. Se concluye entonces que con 1 sola neurona, el algoritmo **Metrópolis()** es peor que el **NUTS()**.

### Numero de muestras igual a 100

In [62]:
n_samples=100

#### M = 1

In [63]:
M = 1

In [64]:
x, y = make_moons(n_samples, shuffle=True, noise=0.2, random_state=123456)
x = (x - np.mean(x, keepdims=True))/np.std(x, keepdims=True)

with modeloHamilton(M):
    traceH1_100 = pm.sample(draws=100, tune=500, chains=2, cores=2, step=pm.Metropolis())

Only 100 samples in chain.
Multiprocess sampling (2 chains in 2 jobs)
CompoundStep
>Metropolis: [By]
>Metropolis: [Wy]
>Metropolis: [Bz]
>Metropolis: [Wz]
Sampling 2 chains, 0 divergences: 100%|██████████| 1200/1200 [00:01<00:00, 636.94draws/s] 
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The number of effective samples is smaller than 10% for some parameters.


In [65]:
pm.traceplot(traceH1_100, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'], combined=True);

  _, axes = plt.subplots(len(plotters), 2, squeeze=False, figsize=figsize, **backend_kwargs)


<IPython.core.display.Javascript object>

  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"


In [66]:
pm.summary(traceH1_100, var_names=['Wz', 'Bz', 'Wy', 'By']).round(3)

Unnamed: 0,mean,sd,hpd_3%,hpd_97%,mcse_mean,mcse_sd,ess_mean,ess_sd,ess_bulk,ess_tail,r_hat
Wz,-0.135,7.629,-14.335,12.786,1.268,0.904,36.0,36.0,37.0,36.0,1.09
Bz,0.659,10.828,-14.492,20.048,5.353,4.094,4.0,4.0,5.0,17.0,1.42
Wy,-1.547,10.443,-17.55,19.906,3.718,2.733,8.0,8.0,8.0,16.0,1.18
By,-11.221,6.357,-25.312,-0.942,1.469,1.055,19.0,19.0,20.0,21.0,1.11


In [67]:
pm.plots.autocorrplot(traceH1_100, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'])

  fig, ax = plt.subplots(rows, cols, **backend_kwargs)


<IPython.core.display.Javascript object>

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001E037F0D888>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E037ECA888>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E037F965C8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E038382908>]],
      dtype=object)

In [68]:
pm.plot_posterior(traceH1_100, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By']);

  fig, ax = plt.subplots(rows, cols, **backend_kwargs)


<IPython.core.display.Javascript object>

**Análisis:** Podemos ver una mejora respecto al modelo entrenada con solo 10 samples, el r_hat va de 1.01 hasta 1.07 y los posteriors comienzan a tener forma, sin embargo, hay mucha correlacion a esta altura y las trazas siguen teniendo un comportamiento deficiente. Al compararlo con **NUTS()** podemos seguir afirmando que **Metrópolis()** sigue en desventaja notablemente.


#### M = 3

In [69]:
M = 3

In [70]:
x, y = make_moons(n_samples, shuffle=True, noise=0.2, random_state=123456)
x = (x - np.mean(x, keepdims=True))/np.std(x, keepdims=True)

with modeloHamilton(M):
    traceH3_100 = pm.sample(draws=100, tune=500, chains=2, cores=2, step=pm.Metropolis())

Only 100 samples in chain.
Multiprocess sampling (2 chains in 2 jobs)
CompoundStep
>Metropolis: [By]
>Metropolis: [Wy]
>Metropolis: [Bz]
>Metropolis: [Wz]
Sampling 2 chains, 0 divergences: 100%|██████████| 1200/1200 [00:01<00:00, 628.93draws/s]
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The number of effective samples is smaller than 10% for some parameters.


In [71]:
pm.traceplot(traceH3_100, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'], combined=True);

  _, axes = plt.subplots(len(plotters), 2, squeeze=False, figsize=figsize, **backend_kwargs)


<IPython.core.display.Javascript object>

  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"


In [72]:
pm.summary(traceH3_100, var_names=['Wz', 'Bz', 'Wy', 'By']).round(3)

Unnamed: 0,mean,sd,hpd_3%,hpd_97%,mcse_mean,mcse_sd,ess_mean,ess_sd,ess_bulk,ess_tail,r_hat
Wz,2.74,9.323,-17.133,17.919,1.453,1.035,41.0,41.0,42.0,66.0,1.05
Bz,3.87,10.653,-11.832,24.552,3.436,2.507,10.0,10.0,11.0,15.0,1.17
Wy,-4.299,8.816,-20.77,10.034,2.957,2.275,9.0,8.0,13.0,11.0,1.15
By,0.865,17.542,-21.68,29.85,10.328,8.203,3.0,3.0,4.0,26.0,1.58


In [73]:
pm.plots.autocorrplot(traceH3_100, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'])

  fig, ax = plt.subplots(rows, cols, **backend_kwargs)


<IPython.core.display.Javascript object>

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001E03B257BC8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E03B1EBD88>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E03B0E5448>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E03B219D48>]],
      dtype=object)

In [74]:
pm.plot_posterior(traceH3_100, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By']);

  fig, ax = plt.subplots(rows, cols, **backend_kwargs)


<IPython.core.display.Javascript object>

#### M = 10

In [75]:
M = 10

In [76]:
x, y = make_moons(n_samples, shuffle=True, noise=0.2, random_state=123456)
x = (x - np.mean(x, keepdims=True))/np.std(x, keepdims=True)

with modeloHamilton(M):
    traceH10_100 = pm.sample(draws=100, tune=500, chains=2, cores=2, step=pm.Metropolis())

Only 100 samples in chain.
Multiprocess sampling (2 chains in 2 jobs)
CompoundStep
>Metropolis: [By]
>Metropolis: [Wy]
>Metropolis: [Bz]
>Metropolis: [Wz]
Sampling 2 chains, 0 divergences: 100%|██████████| 1200/1200 [00:02<00:00, 586.80draws/s] 
The rhat statistic is larger than 1.2 for some parameters.
The number of effective samples is smaller than 10% for some parameters.


In [77]:
pm.traceplot(traceH10_100, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'], combined=True);

  _, axes = plt.subplots(len(plotters), 2, squeeze=False, figsize=figsize, **backend_kwargs)


<IPython.core.display.Javascript object>

  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"
  "Argument backend_kwargs has not effect in matplotlib.plot_dist"


In [78]:
pm.summary(traceH10_100, var_names=['Wz', 'Bz', 'Wy', 'By']).round(3)

Unnamed: 0,mean,sd,hpd_3%,hpd_97%,mcse_mean,mcse_sd,ess_mean,ess_sd,ess_bulk,ess_tail,r_hat
Wz,-0.033,9.494,-24.43,13.159,3.318,2.435,8.0,8.0,8.0,44.0,1.21
Bz,4.578,8.734,-10.983,21.396,1.707,1.22,26.0,26.0,27.0,69.0,1.08
Wy,-4.082,9.07,-17.752,15.266,2.098,1.507,19.0,19.0,19.0,30.0,1.07
By,-10.122,5.702,-21.261,-2.59,0.844,0.601,46.0,46.0,45.0,56.0,1.03


In [79]:
pm.plots.autocorrplot(traceH10_100, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By'])

  fig, ax = plt.subplots(rows, cols, **backend_kwargs)


<IPython.core.display.Javascript object>

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001E03B167B48>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E038400D08>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E0360B5F08>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000001E039BA8788>]],
      dtype=object)

In [80]:
pm.plot_posterior(traceH10_100, figsize=(9, 4), var_names=['Wz', 'Bz', 'Wy', 'By']);

  fig, ax = plt.subplots(rows, cols, **backend_kwargs)


<IPython.core.display.Javascript object>

Hint: Si `model_preds` es el posterior predictivo en el conjunto de test donde la primera dimensión son las muestras y la segunda dimensión los ejemplos, podemos graficar la media de ese posterior como:

In [None]:
pm.sample_posterior_predictive?

In [None]:
#Modelo pymc3
import pymc3 as pm
import theano.tensor as T
from theano.tensor.nnet import sigmoid
mu=0
def modelo(M):
    with pm.Model() as bayes_reg:
        #Considerar los datos X como una variable deterministica
        X_data = pm.Data("x", x)
        # Se definen los prior
        Wz = pm.Normal(name='Wz', mu=0, sd=10, shape=(2,M))
        Bz = pm.Normal(name='Bz', mu=0, sd=10, shape=(M))
        Wy = pm.Normal(name='Wy', mu=0, sd=10, shape=(M,1))
        By = pm.Normal(name='By', mu=0, sd=10, shape=(1))
        #Verosimilitud
        Y_obs = pm.Bernoulli('Y_obs', p=Y, observed=y)
    return bayes_reg

In [137]:
with modelo(M):
    pm.set_data({"x": x_test})
    posterior_predictive = pm.sample_posterior_predictive(trace1_10, samples=100, 
                                                          var_names=['Y_obs'])

100%|██████████| 100/100 [00:00<00:00, 657.12it/s]


In [139]:
posterior_predictive['Y_obs'].shape

(100, 10, 10)

In [91]:
fig, ax = plt.subplots(figsize=(6, 3), tight_layout=True)
cmap = ax.pcolormesh(x1, x2, np.mean(posterior_predictive, axis=0).reshape(len(x1), len(x2)), 
                     cmap=plt.cm.RdBu_r, shading='gouraud', vmin=0, vmax=1)
plt.colorbar(cmap, ax=ax)
ax.scatter(x[y==0, 0], x[y==0, 1], c='k', marker='o')
ax.scatter(x[y==1, 0], x[y==1, 1], c='k', marker='x')

<IPython.core.display.Javascript object>

ValueError: cannot reshape array of size 20000 into shape (100,100)