An attention based method for recognition of offline handwritten Urdu text

Tayaba Anjum and Nazar Khan

ICFHR 2020

Abstract

Compared to derivatives from Latin script, recognition of derivatives from Arabic hand-written script is a complex task due to the presence of two-dimensional structure, context-dependent shape of characters, high number of ligatures, overlap of characters, and placement of diacritics. While significant attempts exist for Latin and Arabic scripts, very few attempts have been made for offline, handwritten, Urdu script. In this paper, we introduce a large, annotated dataset of handwritten Urdu sentences. We also present a methodology for the recognition of offline handwritten Urdu text lines. A deep learning based encoder/decoder framework with attention mechanism is used to handle two-dimensional text structure. While existing approaches report only character level accuracy, the proposed model improves on BLSTM-based state-of-the-art by a factor of 2 in terms of character level accuracy and by a factor of 37 in terms of word level accuracy. Incorporation of attention before a recurrent decoding framework helps the model in looking at appropriate locations before classifying the next character and therefore results in a higher word level accuracy.

Project page

http://faculty.pucit.edu.pk/nazarkhan/work/urdu_ohtr/index.html

Dataset

https://drive.google.com/file/d/1itd147PcQYpduO1-jvm0HAq5k7HFbwfV/view?usp=sharing

Code

Tentative release by end of 2020.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An attention based method for recognition of offline handwritten Urdu text

Abstract

Project page

Dataset

Code

About

Releases

Packages

License

nazar-khan/urdu_ohtr

Folders and files

Latest commit

History

Repository files navigation

An attention based method for recognition of offline handwritten Urdu text

Abstract

Project page

Dataset

Code

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages