Skip to content

mineshmathew/pyTorch_RNN_Examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sequence Learning using RNNs in Pytorch

Few sequence learning problems using RNNs are introduced here. Maintained mostly from a learning persepctive.
If you are looking for reusable, optimal solutions this repo need not suit your requirements
I will try to add more explanatory figures, and write-up in the future

Table of Contents

Memorize Kth Integer in a Sequence

The problem here is to memorize kth digit in a sequence of digits of varibale length. Say if the sequence is 9,7,0,5,6 and k=3, network should output 0. This is a sequence to one problem where input digits are represented suing one hot vectors of length=10 (since there are 10 digits)

Binary Strings Addition

This problem introduces a simple sequence to sequence learning example where the task is to make an RNN learn the binary addition. Problem is modelled as a seq2seq, where input and output sequences are aligned.

Corresponding bits from the input strings forms a 2 - element input vector at each time- step and the target or desired bit is the corresponding bit from the output binary string. Prior to this we make sure that input sequences are padded with a 'zero' on the left if the output string has one extra bit. The sequence is processed from the Least significant bit (LSB) to the Most Significant Bit(MSB) since thats how the addition is done.

Network has 2 input nodes, 10 or 20 hidden LSTM nodes and a single output node followed by a Sigmoid activation. MSE loss is computed at each each time step between the output activation at the output node and the desired bit at that time step. At the last time step, sum of the losses is calcuated and then the error is backropagated.

OCR using RNN and CRNN

OCR is probably one of the simplest vision problems to introduce the utility of RNN architecures for seq2seq learning without alignment. 'Without alignment' means that, you dont have a corresponding target at each time step and the input and output lengths may vary. All you know is that you have an input sequence of features and you need to map it to an output sequence of class labels.

Here OCR is modelled as a seq2seq learning, where a sequence of image features are extracted from the input word or line image and the output is a sequence of characters or Unicodes.

CTC loss is employed to cacluate the cost here.

This example also showcases two variants of the soltuion:-

  • one using raw pixel values as input features for the RNN
  • and in the second case a convolutional stack is added at the head of the network to extract more robust features from the input image. Then a sequence of convolutional features are supplied to the RNN. This model is generally referred by the name -CRNN (The first paper which introduced such a model for text recognition is this one

About

Sequence learning using RNNs in Pytorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published