Skip to content

georgeretsi/HTR-ctc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HTR-ctc

Pytorch implementation of Handwritten Text Recognition using CTC loss on IAM dataset.

A newer updated version of this repo can be found here using the built-in pytorch ctc loss and extra modules.

Selected Features:

  • Dataset is saved in a '.pt' file after the initial preprocessing for faster loading operations
  • Loader can handle both word and line-level segmentation of words (change loader parameters in train_htr.py).
    E.g. IAMLoader('train', level='line', fixed_size=(128, None)) or IAMLoader('train', level='word', fixed_size=(128, None))
  • Image resize operations are set through the loader and specifically the fixed_sized argument. If the width variable is None, the the resize operation keeps the aspect ratio and resize the image according to the specified height (e.g. 128). This case generates images of different sizes and thus they cannot be collected to a fixed sized batch. To this end, we update the network every K single image operations (e.g. we set batch_size = 1 and iter_size = 16 in in train_code/config.py). If a fixed size is selected (across all dimensions), e.g. IAMLoader('train', level='line', fixed_size=(128, 1024)), a batch size could be set (e.g. batch_size = 16 and iter_size = 1).
  • Model architecture can be modified by changing the the cnn_cfg and rnn_cfg variables in train_code/config.py. Specifically, CNN is consisted of multiple stacks of ResBlocks and the default setting cnn_cfg = [(2, 32), 'M', (4, 64), 'M', (6, 128), 'M', (2, 256)] is interpeted as follows: the first stack consists of 2 resblocks with output channels of 32 dimensions, the second of 4 resblocks with 64 output channels etc. The 'M' denotes a max-pooling operation of kernel size and stride equal to 2. CNN backbone is topped by an RNN head which finally produces the character predictions. The recurrent newtork is a bidirectional LSTM and its basic configuration is given by the variable rnn_cfg. The deafult setting rnn_cfg = (256, 1) corresponds to a single layerd LSTM with 256 hidden size.

Example:
python train_htr.py -lr 1e-3 -gpu 0

Note: Local paths of IAM dataset (https://fki.tic.heia-fr.ch/databases/iam-handwriting-database) are hardcoded in iam_data_loader/iam_config.py

The initial code was developed for the split dubbed as IAM-C (see this paper and this repo for more details)

Developed with Pytorch 0.4.1 and warpctc_pytorch lib (https://github.com/SeanNaren/warp-ctc)
A newer version is coming with the build-in CTC loss of Pytorch (>1.0)

About

Pytorch implementation of HTR on IAM dataset (word or line level + CTC loss)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages