<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"></ul></div>

# Gated Recurrent Units (GRU)

In a coventional <a href='https://emmanuel-arize.github.io/datascience-blog/deeplearning/deep-learning/2021/05/06/RNN.html' target="_blank">  recurrent neural network</a>, during the backpropagation phase, in which the error signal (gradients) are backpropagated through time, the recurrent hidden layers (weight matrix associated with the layers) are subject to repeated multiplications as determined by as the number of timesteps (length of the sequence), and this might result in numerical instability for lengthy sequence. For lengthy sequence, small weights tends to lead to a situation known as <b>vanishing gradients</b> 
where the error signal propagating backwards gets so small that learning either becomes very slow or stops working altogether (error signals fowing backwards in time tend to vanish). <b>Conversely </b>larger weights tends to lead to a situation where the error signal is so large that it can cause learning to diverge , a situation known as <b>exploding gradients</b>.

To read more on exploding and vanishing gradients have a look at this papers
<br/>
<a href='https://arxiv.org/pdf/1211.5063v1.pdf' target="_blank">Understanding the exploding gradient problem</a><br/>
<a href='https://www.semanticscholar.org/paper/Learning-long-term-dependencies-with-gradient-is-Bengio-Simard/d0be39ee052d246ae99c082a565aba25b811be2d' target="_blank">Learning long-term dependencies with gradient descent is difficult</a><br/> 

<a href='https://www.bioinf.jku.at/publications/older/2304.pdf' target="_blank">THE VANISHING GRADIENT PROBLEM DURING LEARNING RECURRENT NEURAL NETS AND PROBLEM SOLUTIONS</a><br/>




The vanishing and exploding gradients problem, limit the ability of conventional RNNs in modeling sequences with long range contextual dependencies and to address these issues, more complex network architectures known as Gated Neural Networks (GNNs) have been designed to help mitigate this problem by introducing “gates”  to control the flow of information into and out of the  network layers. There are several GNNs but in this tutorial we were learn about a notable example  known as <a href='https://arxiv.org/pdf/1406.1078v3.pdf' target='_blank'>Gated Recurrent Unit or GRU (Cho et al., 2014)</a>) which is similar to LSTM but with fewer parameters than LSTM, as it lacks an output gate and faster to train due to the simpler architecture. 

<img id='GRU' src="./images/GRU.png" /><span id='GRU'>Figure 1</span>
<a href='https://en.wikipedia.org/wiki/File:Gated_Recurrent_Unit,_base_type.svg'>Source</a>

From <a href='#GRU'>Figure 1</a> ***GRU has two gates, a reset $(r_{t})$ and update gates $(z_{t})$ ***. Tthe reset gate determines how to combine the new input with the previous hidden state. let assume we have a minibatch of inputs $X_{t} \in R^{n×d}$ where each row of $X_{t}$ corresponds to one example at time step ***t*** from the sequence and the hidden state of the previous time step as $h_{t−1} \in R^{n×h}$. Given an input, the first step of the GRU model is for the reset gate to decide whether to ignore the previous hidden state or not. With a reset gate value close to 0, the previous hidden state is dimmed irrelevant and the hidden state is forced to ignore the previous hidden state and reset with the current input. The reset gate is defined as


$$ r_{t}=\sigma(W_{xr}X_{t}+U_{hr}h_{t-1}+b_{r} )$$

where $W_{xr}$, $U_{hr}$ are weight paramaters, $b_r$ the bias term and $\sigma$ is the sigmoid activation function.



The ***update gate*** is defined as

$$ z_{t}=\sigma(W_{xz}X_{t}+U_{hz}h_{t-1}+b_{z})$$

and it controls how information from the previous hidden state are carried over to the current hidden state

Let now examine how the reset and update gates are integrated into the hidden state

<b>The Hidden State</b> is computed by
$$ h_{t}=z_{t} \odot h_{t-1}+ (1-z_{t})\odot \bar h_{t} $$

where

$$\bar h_{t}=\phi(W_{h}X_{t}+U_{h}(r_{t} \odot h_{t-1})+b_{z})$$

is known as the ***Candidate Hidden State*** , the operator $\odot$ denotes the Hadamard product and the update gate **$z_t$** decides whether the hidden state is to be updated with
the new Candidate Hidden State $\bar h$  . 



$$ z_{t}=\sigma(W_{xz}X_{t}+U_{hz}h_{t-1}+b_{z} \rightarrow update \ gate \ vector$$


$$ r_{t}=\sigma(W_{xr}X_{t}+U_{hr}h_{t-1}+b_{r} \rightarrow reset \ gate \ vector $$



$$\bar h_{t}=\phi(W_{h}X_{t}+U_{h}(r_{t} \odot h_{t-1})+b_{z}) \rightarrow candidate\ hidden\ state$$


$$ h_{t}=z_{t} \odot h_{t-1}+ (1-z_{t})\odot \bar h_{t} \rightarrow hidden \ state $$




# multi-class classification on Stack Overflow questions
This tutorial showed how to train a multi-class classifier to predict the tag of a programming question on Stack Overflow.

In [1]:
import matplotlib.pyplot as plt
import os
import re
import shutil
import string
import tensorflow as tf
from tensorflow import keras as K
from tensorflow.keras import layers
from tensorflow.keras import losses
from tensorflow.keras import preprocessing
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization

In [2]:
#uncomment to download the data
# url='http://storage.googleapis.com/download.tensorflow.org/data/stack_overflow_16k.tar.gz'
# dataset=tf.keras.utils.get_file('stack_overflow',origin=url,untar=True,cache_dir='./data',
#                                 cache_subdir='stackoverflow')

In [3]:
def load_dir(data):
    dataset_dir=os.path.join(os.path.dirname('.'),data)
    stackoverflow=os.path.join(os.path.dirname(dataset_dir),'stackoverflow/')
    train_dir=os.path.join(os.path.dirname(stackoverflow),'train')
    test_dir=os.path.join(os.path.dirname(stackoverflow),'test')
    return train_dir, test_dir    

In [4]:
train_dir,test_dir=load_dir('data/')

In [5]:
os.listdir(train_dir),os.listdir(test_dir)

(['csharp', 'java', 'javascript', 'python'],
 ['csharp', 'java', 'javascript', 'python'])

In [6]:
batch_size=100
train_data=preprocessing.text_dataset_from_directory(directory=train_dir,subset='training',
                                                    validation_split=0.15,batch_size=batch_size,
                                                    seed=20)
batch_size=10
val_data=preprocessing.text_dataset_from_directory(directory=train_dir,subset='validation',
                                                    validation_split=0.15,seed=20
                                                     )
test_data=preprocessing.text_dataset_from_directory(directory=test_dir,batch_size=batch_size)

Found 8000 files belonging to 4 classes.
Using 6800 files for training.
Found 8000 files belonging to 4 classes.
Using 1200 files for validation.
Found 8000 files belonging to 4 classes.


In [7]:
for i,label in enumerate(train_data.class_names):
    print('index' ,i," for the label corresponds to ", label)

index 0  for the label corresponds to  csharp
index 1  for the label corresponds to  java
index 2  for the label corresponds to  javascript
index 3  for the label corresponds to  python


In [8]:
for x,y in train_data.take(3):
    for i in range(1):
        X=(x.numpy()[i])
        print(x.numpy()[i])
        print(' \n \n')
        print(train_data.class_names[i])
        print(' \n \n ')

b'"blank clean up of (local) objects referenced in ""delayed"" functions does blank (pure, not jquery, if it matters) know to clear up/free/release from the last reference to an object in a ""delayed"" function called from a timer or event?..take the following code:..function myinitfunc().{.  var myinitobj = new object();.  myinitobj.properties = lotsofstuff;..  var mydelayedinitfunc = function ().  {.    dosomethingwith(myinitobj);.    // i shall not be accessing myinitobj again now..  };..  // let\'s say, *one* of the following:.  settimeout(mydelayedinitfunc, 1000);.  window.addeventlistener(\'load\', mydelayedinitfunc);.  document.addeventlistener(\'domcontentloaded\', mydelayedinitfunc);.}...note that mydelayedinitfunc() is deliberately accessing variable myinitobj, which is local to myinitfunc()...in, say, http://blank.info/tutorial/memory-leaks it states ""functions used in settimeout/setinterval are also referenced internally and tracked until complete, then cleaned up"".  does

The first time the dataset is iterated over, its elements will be cached
either in the specified file or in memory. Subsequent iterations will
use the cached data.


In [9]:
import re

In [10]:
train_data=train_data.cache().prefetch(tf.data.AUTOTUNE)
val_data=val_data.cache().prefetch(tf.data.AUTOTUNE)
train_data=test_data.cache().prefetch(tf.data.AUTOTUNE)

In [11]:
def remove_br(input_data):
    lowercase = tf.strings.lower(input_data)
    lowercase=tf.strings.strip(lowercase)    
    stripped_html = tf.strings.regex_replace(lowercase,"<[^>]+>" , '')
    return tf.strings.regex_replace(stripped_html,
                                  '[%s]' % re.escape(string.punctuation), '')
max_features = 10000  # Maximum vocab size.
max_tokens=150

encode_input=TextVectorization(standardize=remove_br,
                               max_tokens=max_features,output_mode='int',
                               output_sequence_length=max_tokens
                                )
#

In [12]:
encode_input.adapt(train_data.map(lambda x,y:x))


In [13]:
embedded_dim=16
class GRU(K.models.Model):
    def __init__(self):
        super(GRU,self).__init__()
        self.embedd=K.layers.Embedding(input_dim=max_features,output_dim=embedded_dim,
                                       input_length=max_tokens)
        self.gru=K.layers.GRU(32)
        self.f=K.layers.Flatten()
        self.dense=K.layers.Dense(4,activation='softmax')
        self.drop=K.layers.Dropout(0.3)
    def call(self,x):
        encoder=encode_input(x)
        embedd=self.embedd(encoder)
        gru=self.gru(embedd)
        f=self.f(gru)
        drop=self.drop(f)
        output=self.dense(drop)
        return output
gru=GRU()
# since the labels for each class are integers we will use  'sparse_categorical_crossentropy'
# as the loss function
gru.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics=['acc'])

In [14]:
history=gru.fit(train_data,batch_size=batch_size,epochs=20,validation_data=val_data)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [15]:
loss,acc=gru.evaluate(test_data)



according the text data the answer for the 1st text is python , 2nd is javascript 3rd is java and
4th=pyton

In [16]:
sample=["variables keep changing back to their original value inside a while loop i am doing the mitx 6.00.01x course and i am on the second problem set on the 3rd problem and i am stuck. .my code:  ..    balance = 320000.    annualinterestrate = 0.2.    monthlyinterestrate = (annualinterestrate) / 12.0.    monthlyfixedpayment = 0.    empbalance = balance.    lowerbound = round((balance)/12,2).    upperbound = (balance*(1+monthlyinterestrate)**12)/12.    monthlyfixedpayment = round( ( (lowerbound+upperbound)/2) ,2).    while tempbalance != 0: .        monthlyfixedpayment = round( ( (lowerbound+upperbound)/2) ,2)  .        for m in range(12) :.            tempbalance -= monthlyfixedpayment .            tempbalance += (monthlyinterestrate)*(tempbalance).            tempbalance = round(tempbalance,2) .        if tempbalance &gt; 0:.            lowerbound = round(monthlyfixedpayment,2).            tempbalance = balance.        elif tempbalance &lt; 0: .            upperbound = round(monthlyfixedpayment,2).            tempbalance = balance..    print('lowest payment: ' + str(round(monthlyfixedpayment,2)))...my code uses bisection search to generate the monthlyfixedpayment but after i get to the lines at the end that changes the upperbound or lowerbound values and then start the loop again, the lowerbound and upperbound values reset to their values to the ones outside the loop. does anyone knows how to prevent this?",
        "how pass window handler from one page to another? (blank) i have a very strange problem , please donâ€™t ask me why do i need thisâ€¦.i have a page1. page1 has a link which opens new window (page2) using  window.open function..chatwindow is a handler of child window with returns from window.open function..now i'm moving from page1 to page3 (by link &lt;a href=""...."" target=""_self""&gt;some text&lt;/a&gt;). and i need to check on the page3 if page2 is close or open..how to pass handler chatwindow from page1 to page3?..thank you in advance!",
        "what is the difference between text and string? in going through the blankfx tutorial i've run into the text, and it's being used where i would have thought a string would be used. is the only difference between..string foo = new string(""bat"");...and..text bar = new text(""bat"");...that bar cannot be edited, or are there other differences that i haven't been able to find in my research?",
        "idiomatic blank iterating and adding to a dict i'm running through a string, creating all substrings of size 10, and adding them to a dict. this is my code,..sequence_map = {}.for i in range(len(s)):.    sub = s[i:i+10].    if sub in sequence_map:.       sequence_map[sub] += 1.    else:.       sequence_map[sub] = 1...is there a way to do this more blankically?..also how do i do the reverse blankically, as in interating through the dict and composing a list where value is equal to something?..[k for k, v in sequence_map.items()]"
]

In [17]:
def pred(result):
    for i in result:
        if i==0:
            print('csharp')
        elif i==1:
            print('java')
        elif i==2:
            print('javascript')
        elif i==3:
            print('python')
    

In [18]:
result=tf.argmax(gru.predict(sample)).numpy()
result


array([1, 2, 1, 0], dtype=int64)

In [19]:
pred(result)

java
javascript
java
csharp


<p> <b>References:</b></p>
<a href='https://arxiv.org/pdf/1406.1078v3.pdf' target='_blank'>Gated Recurrent Unit or GRU (Cho et al., 2014)</a>

<a href='https://d2l.ai/chapter_recurrent-modern/gru.html' target='_blank'
title="Dive Into Deep Learning Chapter 9">Gated Recurrent Units (GRU)</a>

<a href='https://en.wikipedia.org/wiki/Gated_recurrent_unit' target='_blank'
title="wikipedia">Gated Recurrent Units</a>

