# Bidirectional Recurrent Neural Networks

Until now, most of the discussion on sequence learning has been pure language modelling, where it's sensible to see why there is an inherent directionality in the underlying data which might be reflected as an inducted bias in the model structure, however it's not a guarantee that this is always the case. 

E.g. for SMILES strings in chemistry, although there's a canonical ordering to the strings themselves, there isn't any inherent directionality in them, so it would make sense to train both backwards and forwards. 

Ideally, we'd train on both the forwards and backwards directions simultaneously. 

Simple technique to train this. IMplement two unidirectional RNN layers, chained together in opposite directions but acting on the same input. For the first RNN, the first input is $X_1$ and the last is $X_T$, while for the other RNN the first input is $X_T$ and the last is $X_1$. The output is then simply the result of concatenating the two sets of outputs together. 

In [1]:
import torch
from torch import nn
from d2l import torch as d2l

In [2]:
class BiRNNScratch(d2l.Module):
    def __init__(self, num_inputs, num_hiddens, sigma=0.01):
        super().__init__()
        self.save_hyperparameters()

        self.f_rnn = d2l.RNNScratch(num_inputs. num_hiddens, sigma)
        self.b_rnn = d2l.RNNScratch(num_inputs. num_hiddens, sigma)

        self.num_hiddens *= 2
    
    def forward(self, inputs, Hs=None):
        f_H, b_H = Hs if Hs is not None else (None, None)

        f_outputs, f_h = self.f_rnn(inputs, f_H)
        b_outputs, b_h = self.b_rnn(reversed(inputs), b_H)

        outputs = [torch.cat((f, b), -1) for f, b in zip(f_outputs, reversed(b_outputs))]

        return outputs, f_H, b_H

## Concise implementation

In [4]:
class BiGRU(d2l.RNN):
    def __init__(self, num_inputs, num_hiddens):
        d2l.Module.__init__(self)
        self.save_hyperparameters()

        # bidirectional = true implements this for us
        self.rnn = nn.GRU(num_inputs, num_hiddens, bidirectional=True)
        self.num_hiddens *=2 

In bidirectional RNNs, the hidden state for each time step is determined simultaneously by the forwards and reverse timestep. Very costly due tot long gradient chains. 
