<a href="https://colab.research.google.com/github/fjme95/calculo-optimizacion/blob/main/Semana%206/BPTT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Todo el código viene de https://github.com/devnag/tensorflow-bptt

Post en medium: https://medium.com/@devnag/a-simple-design-pattern-for-recurrent-deep-learning-in-tensorflow-37aba4e2fd6b

# Dependencias


In [1]:
%tensorflow_version 1.x

TensorFlow 1.x selected.


In [2]:
import tensorflow as tf
import numpy as np
from pprint import pprint

import math
import random
import sys

# See https://medium.com/@devnag/

# Backpropagation Through Time

In [3]:
class BPTT(object):
    """
    Convenience design pattern for handling simple recurrent graphs, implementing backpropagation through time.
    See https://medium.com/@devnag/
    Typical usage:
    - Graph building
        - Define a function that takes a BPTT object and the depth flag (will be BPTT.DEEP or BPTT.SHALLOW)
              and builds your computational graph; should return any I/O placeholders in an array.
          - Use get_past_variable() to define a name (string) and pass in a constant value (numpy).
          - Use name_variable() to name (string) the same value for the current loop, for the future.
    - Unrolling
        - bp.generate_graphs() will take the function above and the desired BPTT depth and provide the
            sequence of stitched DAGs.
    - Training
        - generate_feed_dict() on the relevant depth (BPTT.DEEP) with the array data to be fed into the
           I/O placeholders that your custom graph function returned. This will also include the working
           state for the recurrent variables (whether the starting constants or state from the last loop).
           Must also include a count of the number of I/O slots.
        - generate_output_definitions() will provide an array of variables that must be fetched to extract state.
        - save_output_state() will take the results and save for the next loop.
    - Inference
        - Same three functions as in training, but use BPTT.SHALLOW instead.
        - Can optionally call copy_state_forward() before inference if you want to start with the final training state.
    """

    DEEP = "deep"
    SHALLOW = "shallow"
    MODEL_NAME = "unrolled_model"
    LOOP_SCOPE = "unroll"

    def __init__(self):
        """
        Initialize the name dictionaries (state, placeholders, constants, etc)
        """
        self.graph_dict = {}

        # Name -> Constants: Starting values (typically np.arrays). Shared between shallow/deep, used in run-time
        self.starting_constants = {}
        # Name -> State: np.arrays reflecting state between run-times (starting from C)
        self.state = {self.DEEP: {}, self.SHALLOW: {}}
        # Name -> Variables: Py variables passed through during build-time
        self.vars = {self.DEEP: {}, self.SHALLOW: {}}
        # Name -> Placeholder: Placeholders: to inject state, set during build-time
        self.placeholders = {self.DEEP: {}, self.SHALLOW: {}}

        self.current_depth = self.DEEP

    def get_past_variable(self, variable_name, starting_value):
        """
        Get-or-set a recurrent variable from the past (time t-1)
        :param variable_name: A unique (to this object) string representing this variable.
        :param starting_value: A constant that can be fed into a placeholder eventually
        :return: A variable (representing the value at t-1) that can be computed on to generate current value (at t)
        """

        if variable_name not in self.placeholders[self.current_depth]:
            # First time being called
            self.starting_constants[variable_name] = starting_value

            # First initial state is the constant np.array sent in
            self.state[self.current_depth][variable_name] = starting_value

            # Define a mirror placeholder with same type/shape
            self.placeholders[self.current_depth][variable_name] = tf.placeholder(starting_value.dtype,
                                                                                  shape=starting_value.shape)
            # Set current (starting) variable as that placeholder, to be filled in later
            self.vars[self.current_depth][variable_name] = self.placeholders[self.current_depth][variable_name]

        # Return the pyvariable: placeholder the first time, pydescendant on later calls
        return self.vars[self.current_depth][variable_name]

    def name_variable(self, variable_name, v):
        """
        Set/assign a recurrent variable for the current time (time t)
        :param variable_name: A unique (to this object) string, must have been used in a get_past_variable() call
        :param v: A Tensorflow variable representing the current value of this variable (at t)
        :return: v, unchanged, for easy in-line usage
        """
        assert variable_name in self.vars[self.current_depth], \
            "Tried to set variable name that was never defined with get_past_variable()"
        self.vars[self.current_depth][variable_name] = v
        return v

    def generate_graphs(self, func, num_loops=10):
        """
        Generate the two graphs -- the deep (unrolled) connected graphs and the shallow/simple graph.
        :param func: A function which takes the BPTT object and the depth_type (BPTT.{DEEP,SHALLOW}), returns
                    array of I/O placeholders.
        :param num_loops: The desired number of loops to unroll
        :return: A dictionary of the two graphs (deep+shallow).
        """
        # Scoping -- generate the deep/unrolled graph (training)
        self.current_depth = self.DEEP
        with tf.variable_scope(self.MODEL_NAME, reuse=False):
            self.graph_dict[self.DEEP] = self.unroll(func, self.DEEP, num_loops)

        # Now, generate the shallow graph (inference)
        self.current_depth = self.SHALLOW
        with tf.variable_scope(self.MODEL_NAME, reuse=True):
            # Shallow is depth 1, but sharing all variables with deep graph above
            self.graph_dict[self.SHALLOW] = self.unroll(func, self.SHALLOW, 1)

        # pprint(self.graph_dict)
        return self.graph_dict

    def unroll(self, func, depth_type, num_loops):
        """
        Given the graph-generating function, unroll to the desired depth.
        :param func: A function which takes the BPTT object and the depth_type (BPTT.{DEEP,SHALLOW}), returns
                    array of I/O placeholders.
        :param depth_type: The depth_type (BPTT.{DEEP,SHALLOW})
        :param num_loops: The desired number of loops to unroll
        :return: A list of the graphs, connected by variables.
        """
        frames = []
        for loop in range(num_loops):
            # Scoping on top of each depth
            # We need 'False' for the first time and 'True' for all others
            with tf.variable_scope(self.LOOP_SCOPE, reuse=(loop != 0)):
                frames.append(func(self, depth_type))

        return frames

    def generate_feed_dict(self, depth_type, data_array, num_settable):
        """
        Generate a feed dictionary; takes in an array of the data that will be inserted into the unrolled
        placeholders.
        :param depth_type: The depth_type (BPTT.{DEEP,SHALLOW})
        :param data_array: An array of arrays of data to insert into the unrolled placeholders
        :param num_settable: How many elements of the data_array to use.
        :return: A dictionary to feed into tf.Session().run()
        """
        frames = self.graph_dict[depth_type]
        d = {}

        # Recurrent: Auto-defined placeholders / current variables
        for variable_name in self.placeholders[depth_type]:
            d[self.placeholders[depth_type][variable_name]] = self.state[depth_type][variable_name]

        # User-provided data to unroll/insert into the placeholders
        for frame_index in range(len(frames)):       # Unroll index
            for var_index in range(num_settable):    # Variable index
                frame_var = frames[frame_index][var_index]
                d[frame_var] = np.reshape(data_array[var_index][frame_index],
                                          frame_var.get_shape())
        return d

    def copy_state_forward(self):
        """
        Copy the working state from the DEEP pipeline to the SHALLOW pipeline
        """
        for key in self.state[self.DEEP]:
            self.state[self.SHALLOW][key] = np.copy(self.state[self.DEEP][key])

    def generate_output_definitions(self, depth_type):
        """
        Generate the desired output variables to fetch from the graph run
        :param depth_type: The depth_type (BPTT.{DEEP,SHALLOW})
        :return: An array of variables to add to the fetch list
        """
        d = self.vars[depth_type]
        # Define consistent sort order by the variable names
        return [d[k] for k in sorted(d.keys())]

    def save_output_state(self, depth_type, arr):
        """
        Save the working state for the next run (will be available in generate_feed_dict() in the next loop)
        :param depth_type: The depth_type (BPTT.{DEEP,SHALLOW})
        :param arr: An array of values (returned by tf.Session.run()) which map to generate_output_definitions()
        """
        d = self.state[depth_type]
        sorted_names = sorted(d.keys())
        assert len(sorted_names) == len(arr), \
            "Sent in the wrong number of variables (%s) to update state (%s)" % (len(arr), len(sorted_names))
        for variable_index in range(len(sorted_names)):
            variable_name = sorted_names[variable_index]
            # Saved for next time.
            self.state[depth_type][variable_name] = arr[variable_index]

La clase definida en el bloque de código anterior, BPTT, se se va a encargar de guardar los estados de la red recurrente, así como sus placeholders de entrada y salida. También, Creará el grafo "desdoblado" de la red recurrente a partir de una función que indiquemos. En el ejemplo se puede ver más a detalle.

# Ejemplo de uso

Vamos a definir los parámetros que vamos a ocupar para los datos de entrada y salida, el modelo, el optimizador y el entrenamiento.

In [4]:
# Data parameters: simple one-number-at-a-time for now
input_dimensions = 1
output_dimensions = 1
batch_size = 1

# Model parameters
rnn_width = 3
m = 0.0
s = 0.5
init = tf.random_normal_initializer(m, s)
noise_m = 0.0
noise_s = 0.03

# Optimization parameters
learning_rate = 0.05
beta1 = 0.95
beta2 = .999
epsilon = 1e-3
momentum = 0.4
gradient_clipping = 4.0
unroll_depth = 10
max_reset_loops = 20

# Training parameters
num_training_loops = 3000
num_inference_loops = 100
num_inference_warmup_loops = 1900


In [5]:
def build_rnn_layer(bp, layer_index, raw_x, width):
    """
    Construye una capa RNN de acuerdo a https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks
    """
    global init, noise_m, noise_s
    # Define variable names
    h_name = "hidden-%s" % layer_index  
    # raw_x is [input_size, 1]
    input_size = raw_x.get_shape()[0].value
    # Why so serious? Introduce a little anarchy. Upset the established order...
    x = raw_x + tf.random_normal(raw_x.get_shape(), noise_m, noise_s)

    with tf.variable_scope("rnn_layer_%s" % layer_index):

        # Define shapes for all the weights/biases, limited to just this layer (not shared with other layers)
        # Sizes are 'input_size' when mapping x and 'width' otherwise
        W_ax = tf.get_variable("W_ax", [width, input_size], initializer=init)
        W_aa = tf.get_variable("W_aa", [width, width], initializer=init) 
        b_a =  tf.get_variable("b_a",  [width, 1], initializer=init)
        
        W_ya = tf.get_variable("W_ya", [width, width], initializer=init)
        b_y =  tf.get_variable("b_y",  [width, 1], initializer=init)

        # Retrieve the previous roll-depth's data, with starting random data if first roll-depth.
        h_past = bp.get_past_variable(h_name, np.float32(np.random.normal(m, s, [width, 1])))

        h = bp.name_variable(h_name,  tf.tanh(tf.matmul(W_aa, h_past) + tf.matmul(W_ax, x) + b_a))
        o = tf.tanh(tf.matmul(W_ya, h) + b_y)

    return [o]

In [6]:
def build_single_rnn_frame(bp, depth_type):
    global init, input_dimensions, output_dimensions, batch_size, rnn_width

    # I/O DATA
    input_placeholder = tf.placeholder(tf.float32, shape=(input_dimensions, batch_size))
    output_placeholder = tf.placeholder(tf.float32, shape=(output_dimensions, batch_size))

    last_output = input_placeholder
    for layer_index in range(1):
        [o]= build_rnn_layer(bp, depth_type, layer_index, last_output, rnn_width)
        last_output = o

    output_result = o

    # return array of whatever you want, but I/O placeholders FIRST.
    return [input_placeholder, output_placeholder, output_result]

In [7]:
def palindrome(step):
    """
    Turn sequential integers into a palindromic sequence (so look-ahead mapping is not a function, but requires state)
    """
    return (5.0 - abs(float(step % 10) - 5.0)) / 10.0

In [8]:
bp = None
sess = None
graphs = None
done = False
tf.reset_default_graph()

# Loop until you get out of a local minimum or you hit max reset loops
for reset_loop_index in range(max_reset_loops):

    # Clean any previous loops
    if reset_loop_index > 0:
        tf.reset_default_graph()

    # Generate unrolled+shallow graphs
    bp = BPTT()
    graphs = bp.generate_graphs(build_single_rnn_frame, unroll_depth)

    # Define loss and clip gradients
    error_vec = [[o - p] for [i, p, o] in graphs[bp.DEEP]]
    loss = tf.reduce_mean(tf.square(error_vec))
    optimizer = tf.train.AdamOptimizer(learning_rate, beta1, beta2, epsilon)
    grads = optimizer.compute_gradients(loss)
    clipped_grads = [(tf.clip_by_value(grad, -gradient_clipping, gradient_clipping), var) for grad, var in grads]
    optimizer.apply_gradients(clipped_grads)
    train = optimizer.minimize(loss)

    # Boilerplate initialization
    init_op = tf.global_variables_initializer()
    sess = tf.Session()
    sess.run(init_op)
    reset = False

    print("=== Training the unrolled model (reset loop %s) ===" % (reset_loop_index))

    for step in range(num_training_loops):
        # 1.) Generate the dictionary of I/O placeholder data
        start_index = step * unroll_depth
        in_data = np.array([palindrome(x) for x in range(start_index, start_index + unroll_depth)], dtype=np.float32)
        out_data = np.array([palindrome(x+1) for x in range(start_index, start_index + unroll_depth)], dtype=np.float32)

        # 2a.) Generate the working state to send in, along with data to insert into unrolled placeholders
        frame_dict = bp.generate_feed_dict(bp.DEEP, [in_data, out_data], 2)

        # 2b.) Define the output (training/loss) that we'd like to see (optional)
        session_out = [train, loss] + [o for [i, p, o] in graphs[bp.DEEP]]   # calculated output

        # 3.) Define state variables to pull out as well.
        state_vars = bp.generate_output_definitions(bp.DEEP)
        session_out.extend(state_vars)

        # 4.) Execute the graph
        results = sess.run(session_out, feed_dict=frame_dict)

        # 5.) Extract the state for next training loop; need to make sure we have right part of result array
        bp.save_output_state(bp.DEEP, results[-len(state_vars):])  # for simple RNN

        # 6.) Show training progress; reset graph if loss is stagnant.
        if (step % 100) == 0:
            print(results[0])
            print("Loss: %s => %s (output: %s)" % (step, results[1], [str(x) for x in results[2:-len(state_vars)]]))
            sys.stdout.flush()

            if step >= 1000 and (results[1] > 0.01):
                print("\nResetting; loss (%s) is stagnating after 1k rounds...\n" % (results[1]))
                reset = True
                break  # To next reset loop

    if not reset:
        break

print("=== Evaluating on shallow model ===")

# Copy final deep state from the training loop above to the shallow state.
bp.copy_state_forward()
[in_ph, out_ph, out_out] = graphs[bp.SHALLOW][0]

# Evaluate one step at a time, and burn in first.
for step in range(num_inference_loops + num_inference_warmup_loops):
    # 1.) Convert step to the palindromic sequence (current and look-ahead-by-one)
    in_value = palindrome(step)
    expected_out_value = palindrome(step+1)

    # 2.) Generate the feed dictionary to send in, both I/O data and recurrent variables
    frame_dict = bp.generate_feed_dict(bp.SHALLOW, np.array([[in_value]], np.float32), 1)

    # 3.) Define state variables to pull out
    session_out = [out_out]
    state_vars = bp.generate_output_definitions(bp.SHALLOW)
    session_out.extend(state_vars)

    # 4.) Execute the graph
    results = sess.run(session_out, feed_dict=frame_dict)

    # 5.) Extract/save state variables for the next loop
    bp.save_output_state(bp.SHALLOW, results[-len(state_vars):])

    # 6.) How we doin'?
    if step > num_inference_warmup_loops:
        print("%s: %s => (%s) %s actual vs %s expected (diff: %s)" %
              (step, in_value, np.round(results[0][0][0], 1), results[0][0][0], expected_out_value, expected_out_value - results[0][0][0]))
        sys.stdout.flush()

=== Training the unrolled model (reset loop 0) ===
None
Loss: 0 => 0.19637738 (output: ['[[-0.72890705]\n [-0.32281178]\n [-0.69416434]]', '[[-0.5443011 ]\n [-0.24014583]\n [-0.4530898 ]]', '[[-0.33188686]\n [-0.1299187 ]\n [-0.13634363]]', '[[-0.19915637]\n [-0.09391376]\n [ 0.12934706]]', '[[-0.03247888]\n [-0.01130863]\n [ 0.4305343 ]]', '[[0.07504684]\n [0.02062082]\n [0.62217516]]', '[[0.15336551]\n [0.11282057]\n [0.69603074]]', '[[0.07344776]\n [0.07460824]\n [0.6447511 ]]', '[[0.03427105]\n [0.09117195]\n [0.5999252 ]]', '[[-0.07985406]\n [ 0.05318728]\n [ 0.46335745]]'])
None
Loss: 100 => 0.006473334 (output: ['[[0.10213082]\n [0.08570457]\n [0.079606  ]]', '[[0.19977358]\n [0.18664253]\n [0.18466932]]', '[[0.23653129]\n [0.22355193]\n [0.22104402]]', '[[0.3333786 ]\n [0.31994775]\n [0.32064646]]', '[[0.36226377]\n [0.34880826]\n [0.3486868 ]]', '[[0.43517402]\n [0.42330736]\n [0.42542332]]', '[[0.33010107]\n [0.31216863]\n [0.31081223]]', '[[0.2623094 ]\n [0.24255002]\n [0.23

In [9]:

# def build_lstm_layer(bp, depth_type, layer_index, raw_x, width):
#     """
#     Build a single LSTM layer (Graves 2013); can be stacked, but send in sequential layer_indexes to scope properly.
#     """
#     global init, noise_m, noise_s
#     # Define variable names
#     h_name = "hidden-%s" % layer_index  # Really the 'output' of the LSTM layer
#     c_name = "cell-%s" % layer_index
#     # raw_x is [input_size, 1]
#     input_size = raw_x.get_shape()[0].value
#     # Why so serious? Introduce a little anarchy. Upset the established order...
#     x = raw_x + tf.random_normal(raw_x.get_shape(), noise_m, noise_s)

#     with tf.variable_scope("lstm_layer_%s" % layer_index):

#         # Define shapes for all the weights/biases, limited to just this layer (not shared with other layers)
#         # Sizes are 'input_size' when mapping x and 'width' otherwise
#         W_xi = tf.get_variable("W_xi", [width, input_size], initializer=init)
#         W_hi = tf.get_variable("W_hi", [width, width], initializer=init)
#         W_ci = tf.get_variable("W_ci", [width, width], initializer=init)
#         b_i =  tf.get_variable("b_i",  [width, 1], initializer=init)
#         W_xf = tf.get_variable("W_xf", [width, input_size], initializer=init)
#         W_hf = tf.get_variable("W_hf", [width, width], initializer=init)
#         W_cf = tf.get_variable("W_cf", [width, width], initializer=init)
#         b_f =  tf.get_variable("b_f",  [width, 1], initializer=init)
#         W_xc = tf.get_variable("W_xc", [width, input_size], initializer=init)
#         W_hc = tf.get_variable("W_hc", [width, width], initializer=init)
#         b_c =  tf.get_variable("b_c",  [width, 1], initializer=init)
#         W_xo = tf.get_variable("W_xo", [width, input_size], initializer=init)
#         W_ho = tf.get_variable("W_ho", [width, width], initializer=init)
#         W_co = tf.get_variable("W_co", [width, width], initializer=init)
#         b_o =  tf.get_variable("b_o",  [width, 1], initializer=init)

#         # Retrieve the previous roll-depth's data, with starting random data if first roll-depth.
#         h_past = bp.get_past_variable(h_name, np.float32(np.random.normal(m, s, [width, 1])))
#         c_past = bp.get_past_variable(c_name, np.float32(np.random.normal(m, s, [width, 1])))

#         # Build graph - looks almost like Alex Graves wrote it!
#         i = tf.sigmoid(tf.matmul(W_xi, x) + tf.matmul(W_hi, h_past) + tf.matmul(W_ci, c_past) + b_i)
#         f = tf.sigmoid(tf.matmul(W_xf, x) + tf.matmul(W_hf, h_past) + tf.matmul(W_cf, c_past) + b_f)
#         c = bp.name_variable(c_name, tf.multiply(f, c_past) + tf.multiply(i, tf.tanh(tf.matmul(W_xc, x) + tf.matmul(W_hc, h_past) + b_c)))
#         o = tf.sigmoid(tf.matmul(W_xo, x) + tf.matmul(W_ho, h_past) + tf.matmul(W_co, c) + b_o)
#         h = bp.name_variable(h_name, tf.multiply(o, tf.tanh(c)))

#     return [c, h]


# def build_dual_lstm_frame(bp, depth_type):
#     """
#     Build a dual-layer LSTM followed by standard sigmoid/linear mapping
#     """
#     global init, input_dimensions, output_dimensions, batch_size, lstm_width

#     # I/O DATA
#     input_placeholder = tf.placeholder(tf.float32, shape=(input_dimensions, batch_size))
#     output_placeholder = tf.placeholder(tf.float32, shape=(output_dimensions, batch_size))

#     last_output = input_placeholder
#     for layer_index in range(2):
#         [_, h] = build_lstm_layer(bp, depth_type, layer_index, last_output, lstm_width)
#         last_output = h

#     W = tf.get_variable("W", [1, lstm_width], initializer=init)
#     b = tf.get_variable("b", [1,1], initializer=init)
#     output_result = tf.sigmoid(tf.matmul(W, last_output) + b)

#     # return array of whatever you want, but I/O placeholders FIRST.
#     return [input_placeholder, output_placeholder, output_result]


# def build_single_lstm_frame(bp, depth_type):
#     """
#     Build a dual-layer LSTM followed by standard sigmoid/linear mapping
#     """
#     global init, input_dimensions, output_dimensions, batch_size, lstm_width

#     # I/O DATA
#     input_placeholder = tf.placeholder(tf.float32, shape=(input_dimensions, batch_size))
#     output_placeholder = tf.placeholder(tf.float32, shape=(output_dimensions, batch_size))

#     last_output = input_placeholder
#     for layer_index in range(1):
#         [o, _] = build_lstm_layer(bp, depth_type, layer_index, last_output, lstm_width)
#         last_output = o

#     W = tf.get_variable("W", [1, lstm_width], initializer=init)
#     b = tf.get_variable("b", [1,1], initializer=init)
#     output_result = tf.sigmoid(tf.matmul(W, last_output) + b)

#     # return array of whatever you want, but I/O placeholders FIRST.
#     return [input_placeholder, output_placeholder, output_result]


# Ligas interesantes

- Sobre LSTM: http://turing.iimas.unam.mx/~ivanvladimir/slides/rpyaa/11_lstm.html#/18
- Más sobre LSTM: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
- LSTM de 2003: https://arxiv.org/pdf/1308.0850.pdf