<a href="https://colab.research.google.com/github/gibranfp/CursoAprendizajeProfundo/blob/2023-1/notebooks/3a_unidad_recurrente_basica_numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Unidad recurrente básica
En esta libreta vamos a programar una unidad recurrente básica usando NumPy y la compararemos con la clase que provee PyTorch.

In [1]:
import numpy as np
np.random.seed(123)
from collections import Counter

## Datos
Para probar nuestra unidad recurrente básica, extraemos los tensores correspondientes a dos textos. 

In [2]:
textos = ['un bolillo para el susto', 
          'tembló el 19 de septiembre']
n_docs = len(textos)

Para ello generamos listas de palabras dividiendo cada texto en subcadenas (_tokens_) separadas por un espacio.

In [3]:
tokens = [t.split() for t in textos]
max_sec = len(tokens[0])  
print(tokens[0])
print(tokens[1])

['un', 'bolillo', 'para', 'el', 'susto']
['tembló', 'el', '19', 'de', 'septiembre']


Obtenemos el vocabulario completo, asignando un índice a cada palabra, y convertimos las listas de palabras a listas de índices.

In [4]:
voc = Counter([p for t in tokens for p in t])

i2p = {p:i for i,(p,f) in enumerate(voc.most_common())}
p2i = {p:i for i,(p,f) in enumerate(voc.most_common())}

tam_voc = len(i2p)

indsecs = np.array([[p2i[p] for p in t] for t in tokens])

print(tam_voc)
print(indsecs)

9
[[1 2 3 0 4]
 [5 0 6 7 8]]


Calculamos la representación 1-de-K de cada índice en las listas que representan los textos. 

In [5]:
ohesecs = np.zeros((n_docs, max_sec, tam_voc))

for i in range(ohesecs.shape[0]):
  for j in range(ohesecs.shape[1]):
    ohesecs[i, j, indsecs[i, j]] = 1

print(ohesecs)

[[[0. 1. 0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 1. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 1. 0. 0. 0. 0. 0.]
  [1. 0. 0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 1. 0. 0. 0. 0.]]

 [[0. 0. 0. 0. 0. 1. 0. 0. 0.]
  [1. 0. 0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 1. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 1. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 1.]]]


## Operación de una celda recurrente básica

Inicializamos los pesos y sesgos de nuestra unidad recurrente básica muestreando de una distribución uniforme $\mathcal{U}(-\sqrt(k), \sqrt(k))$, donde $k = \frac{1}{\text{tamaño de estado}}$.

In [6]:
tam_edo = 5

h = np.zeros((tam_edo, 1))

k = 1 / tam_edo
Wih = np.random.uniform(-np.sqrt(k), np.sqrt(k), size=(tam_voc, tam_edo))
Whh = np.random.uniform(-np.sqrt(k), np.sqrt(k), size=(tam_edo, tam_edo))
print(Wih)
print(Whh)

# W = np.random.uniform(-np.sqrt(k), np.sqrt(k), size=(tam_edo + tam_voc, tam_edo))
b = np.random.uniform(-np.sqrt(k), np.sqrt(k), size=(tam_edo, 1))

# print(W)
print(b)
print(h)

[[ 0.17572738 -0.19128279 -0.24431149  0.04589732  0.19629901]
 [-0.06877567  0.43000857  0.16531674 -0.01705503 -0.09649303]
 [-0.14026585  0.20486829 -0.05494265 -0.39383606 -0.09119199]
 [ 0.21286956 -0.28398803 -0.29028477  0.02822041  0.02846746]
 [ 0.12021187  0.3125413   0.20075895  0.09930245  0.19895941]
 [-0.15835036 -0.12361998 -0.24304876 -0.18450777  0.11714861]
 [-0.36483243 -0.05929947 -0.06183822 -0.00564822 -0.06633941]
 [-0.16791867 -0.06587339  0.35185796  0.3972688   0.00164277]
 [ 0.11086689 -0.34380136 -0.16342483 -0.07618175  0.32763687]]
[[-0.22319951 -0.01517462  0.43429788  0.01742802  0.10097593]
 [-0.33932004  0.29188809  0.09217978  0.04031005 -0.1406363 ]
 [-0.17519969 -0.07421759  0.16216033  0.33581881  0.00932202]
 [ 0.15143885  0.07686399  0.11171709  0.15624664  0.30620038]
 [-0.37280174  0.2358451  -0.22927176 -0.2734953   0.06480747]]
[[-0.36160572]
 [ 0.34464679]
 [ 0.11381494]
 [ 0.19982967]
 [-0.43278719]]
[[0.]
 [0.]
 [0.]
 [0.]
 [0.]]


Calculamos el estado $\mathbf{h}^{[1]}$ a partir de un estado inicial $\mathbf{h}^{[0]}$ con ceros y el vector $\mathbf{x}^{[1]}$ 1-de-K de la palabra 1 para el primer texto.

In [7]:
print(h)
print(np.concatenate((ohesecs[0, 4].reshape(-1, 1), h), axis=0))

# xh = np.concatenate((ohesecs[0, :, 0].reshape(-1, 1), h), axis=0)
# h = np.tanh(W.T @ xh + b)

x = ohesecs[0, 0].reshape(-1, 1)
h = np.tanh(Wih.T @ x + Whh.T @ h + b)

print(x)
print(h)

[[0.]
 [0.]
 [0.]
 [0.]
 [0.]]
[[0.]
 [0.]
 [0.]
 [0.]
 [1.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]]
[[0.]
 [1.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]]
[[-0.40563999]
 [ 0.64962831]
 [ 0.27210124]
 [ 0.18076618]
 [-0.48483069]]


De la misma forma, calculamos el estado $\mathbf{h}^{[2]}$ a partir de $\mathbf{h}^{[1]}$ y el vector de entrada $\mathbf{x}^{[2]}$

In [8]:
# xh = np.concatenate((ohesecs[0, 1].reshape(-1, 1), h), axis=0)
# h = np.tanh(W.T @ xh + b)

x = ohesecs[0, 1].reshape(-1, 1)
h = np.tanh(Wih.T @ x + Whh.T @ h + b)

print(h)

[[-0.43926211]
 [ 0.5543532 ]
 [ 0.11751747]
 [ 0.07717661]
 [-0.55793778]]


Obtenemos el estado $\mathbf{h}^{[3]}$ con  $\mathbf{h}^{[2]}$ y $\mathbf{x}^{[3]}$.

In [9]:
# xh = np.concatenate((ohesecs[0, 2].reshape(-1, 1), h), axis=0)
# h = np.tanh(W.T @ xh + b)

x = ohesecs[0, 2].reshape(-1, 1)
h = np.tanh(Wih.T @ x + Whh.T @ h + b)

print(h)

[[-0.03967669]
 [ 0.09447425]
 [-0.15917709]
 [ 0.41931214]
 [-0.49152439]]


Hacemos lo mismo para calcular $\mathbf{h}^{[4]}$ con $\mathbf{h}^{[3]}$ y $\mathbf{x}^{[4]}$ .

In [10]:
# xh = np.concatenate((ohesecs[0, 3].reshape(-1, 1), h), axis=0)
# h = np.tanh(W.T @ xh + b)

x = ohesecs[0, 3].reshape(-1, 1)
h = np.tanh(Wih.T @ x +  Whh.T @ h + b)

print(h)

[[ 0.06545583]
 [ 0.10922462]
 [-0.0052946 ]
 [ 0.3759502 ]
 [-0.15740617]]


Finalmente, se obtiene $\mathbf{h}^{[5]}$ a partir de $\mathbf{h}^{[4]}$ y $\mathbf{x}^{[5]}$ .

In [11]:
# xh = np.concatenate((ohesecs[0, :, 4].reshape(-1, 1), h), axis=0)
# h = np.tanh(W.T @ xh + b)

x = ohesecs[0, 4].reshape(-1, 1)
h = np.tanh(Wih.T @ ohesecs[0, 4].reshape(-1, 1) + Whh.T @ h + b)

print(h)

[[-0.17471234]
 [ 0.59167715]
 [ 0.40557182]
 [ 0.38395347]
 [-0.1368496 ]]


## Clase `CeldaRecurrente`
Ponemos todo en una clase usando una sola matriz de pesos y concatenando el estado y la entrada.

In [12]:
class CeldaRecurrente:
  def __init__(self, tam_ent, tam_edo):
    self.input_size = tam_ent
    self.hidden_size = tam_edo
    
    k = 1 / tam_edo
    self.W = np.random.uniform(-np.sqrt(k), np.sqrt(k), 
                               size=(tam_edo + tam_voc, tam_edo))
    self.b = np.random.uniform(-np.sqrt(k), np.sqrt(k), 
                               size=(tam_edo, 1))

  def __call__(self, x, h):
    xh = np.concatenate((x.reshape(-1, 1), h), axis=0)
    return np.tanh(self.W.T @ xh + self.b)

Ejecutamos la celda con nuestros datos.

In [13]:
def ejecuta_celda(celda, X, h0):
  # Calculamos los estados para todas las entradas 
  # del primer ejemplo
  for i in range(X.shape[0]):
    print(f'Ejemplo {i}')
    h = h0 # primer estado con ceros
    print(f'\tEstado 0: {h.reshape(-1)}')
    for t in range(X.shape[1]):
      print(f'\tEntrada {t}: {X[i, t]}')
      h = celda(X[i, t], h)
      print(f'\tEstado {t + 1}: {h.reshape(-1)}')

rec = CeldaRecurrente(tam_voc, tam_edo)
ejecuta_celda(rec, ohesecs, np.zeros((tam_edo, 1)))

Ejemplo 0
	Estado 0: [0. 0. 0. 0. 0.]
	Entrada 0: [0. 1. 0. 0. 0. 0. 0. 0. 0.]
	Estado 1: [ 0.06457034  0.36966199 -0.34090955  0.0871176   0.59390562]
	Entrada 1: [0. 0. 1. 0. 0. 0. 0. 0. 0.]
	Estado 2: [ 0.37687605 -0.01841305 -0.55755929  0.03592867  0.03430022]
	Entrada 2: [0. 0. 0. 1. 0. 0. 0. 0. 0.]
	Estado 3: [ 0.37782777  0.62019359 -0.57183215  0.23668518  0.27575848]
	Entrada 3: [1. 0. 0. 0. 0. 0. 0. 0. 0.]
	Estado 4: [ 0.38757733  0.14095459 -0.63622935 -0.12647092  0.29847672]
	Entrada 4: [0. 0. 0. 0. 1. 0. 0. 0. 0.]
	Estado 5: [ 0.38611973 -0.08070906 -0.43639092 -0.181639   -0.0025051 ]
Ejemplo 1
	Estado 0: [0. 0. 0. 0. 0.]
	Entrada 0: [0. 0. 0. 0. 0. 1. 0. 0. 0.]
	Estado 1: [ 0.23408101  0.34992792 -0.63886244 -0.14249657  0.14321179]
	Entrada 1: [1. 0. 0. 0. 0. 0. 0. 0. 0.]
	Estado 2: [ 0.39434801  0.29752595 -0.60259723 -0.21394585  0.23697586]
	Entrada 2: [0. 0. 0. 0. 0. 0. 1. 0. 0.]
	Estado 3: [ 0.47203922  0.471456   -0.30887382  0.39635923  0.03528528]
	Entrada 3: 

Definimos otra clase usando dos matrices de pesos, una para el estado y otra para la entrada.

In [14]:
class CeldaRecurrenteSep:
  def __init__(self, tam_ent, tam_edo):
    self.input_size = tam_ent
    self.hidden_size = tam_edo
    
    k = 1 / tam_edo
    self.Wih = np.random.uniform(-np.sqrt(k), np.sqrt(k), 
                                 size=(tam_voc, tam_edo))
    self.Whh = np.random.uniform(-np.sqrt(k), np.sqrt(k), 
                                 size=(tam_edo, tam_edo))
    self.b = np.random.uniform(-np.sqrt(k), np.sqrt(k), 
                               size=(tam_edo, 1))

  def __call__(self, x, h):
    x = x.reshape(-1, 1)
    return np.tanh(self.Wih.T @ x + self.Whh @ h + self.b)

recsep = CeldaRecurrenteSep(tam_voc, tam_edo)
ejecuta_celda(recsep, ohesecs, np.zeros((tam_edo, 1)))

Ejemplo 0
	Estado 0: [0. 0. 0. 0. 0.]
	Entrada 0: [0. 1. 0. 0. 0. 0. 0. 0. 0.]
	Estado 1: [-0.3982426   0.5261976   0.11085984  0.49932932  0.31027864]
	Entrada 1: [0. 0. 1. 0. 0. 0. 0. 0. 0.]
	Estado 2: [-0.45470333  0.64515945 -0.39988937  0.00306154  0.34062492]
	Entrada 2: [0. 0. 0. 1. 0. 0. 0. 0. 0.]
	Estado 3: [-0.30991417  0.05287202 -0.31029001  0.45370778  0.22976188]
	Entrada 3: [1. 0. 0. 0. 0. 0. 0. 0. 0.]
	Estado 4: [-0.52933642  0.70165203 -0.49938087  0.23449275  0.56333901]
	Entrada 4: [0. 0. 0. 0. 1. 0. 0. 0. 0.]
	Estado 5: [-0.47183156 -0.05972636 -0.44712403  0.52465584  0.49640892]
Ejemplo 1
	Estado 0: [0. 0. 0. 0. 0.]
	Entrada 0: [0. 0. 0. 0. 0. 1. 0. 0. 0.]
	Estado 1: [-0.38518985  0.06325393 -0.5072375   0.27552021  0.29727162]
	Entrada 1: [1. 0. 0. 0. 0. 0. 0. 0. 0.]
	Estado 2: [-0.58125671  0.64915703 -0.56846544  0.17926301  0.53243155]
	Entrada 2: [0. 0. 0. 0. 0. 0. 1. 0. 0.]
	Estado 3: [-0.58466958 -0.04429241 -0.78651842  0.50221207  0.16619711]
	Entrada 3: 

## Unidad recurrente básica en PyTorch
La celda recurrente básica está definida en la clase `RNNCell` de PyTorch. Instanciamos esta clase y ejecutamos la celda con nuestros datos.

In [15]:
import torch as th

recth = th.nn.RNNCell(tam_voc, tam_edo)
oheth = th.from_numpy(ohesecs).type(th.float32)

ejecuta_celda(recth, oheth, th.zeros((tam_edo)))

Ejemplo 0
	Estado 0: tensor([0., 0., 0., 0., 0.])
	Entrada 0: tensor([0., 1., 0., 0., 0., 0., 0., 0., 0.])
	Estado 1: tensor([ 0.1306,  0.0095,  0.3533, -0.0299,  0.1718],
       grad_fn=<ReshapeAliasBackward0>)
	Entrada 1: tensor([0., 0., 1., 0., 0., 0., 0., 0., 0.])
	Estado 2: tensor([-0.0537,  0.1493,  0.5209,  0.1047, -0.1407],
       grad_fn=<ReshapeAliasBackward0>)
	Entrada 2: tensor([0., 0., 0., 1., 0., 0., 0., 0., 0.])
	Estado 3: tensor([-0.0263, -0.0422,  0.5343,  0.2975,  0.3832],
       grad_fn=<ReshapeAliasBackward0>)
	Entrada 3: tensor([1., 0., 0., 0., 0., 0., 0., 0., 0.])
	Estado 4: tensor([-0.4039, -0.2840,  0.1887, -0.3228,  0.1632],
       grad_fn=<ReshapeAliasBackward0>)
	Entrada 4: tensor([0., 0., 0., 0., 1., 0., 0., 0., 0.])
	Estado 5: tensor([-0.4188, -0.4205,  0.5763, -0.3623,  0.0517],
       grad_fn=<ReshapeAliasBackward0>)
Ejemplo 1
	Estado 0: tensor([0., 0., 0., 0., 0.])
	Entrada 0: tensor([0., 0., 0., 0., 0., 1., 0., 0., 0.])
	Estado 1: tensor([0.2786, 0.1614