Skip to content

nazar-khan/urdu_ohtr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

An attention based method for recognition of offline handwritten Urdu text

Tayaba Anjum and Nazar Khan

ICFHR 2020

Abstract

Compared to derivatives from Latin script, recognition of derivatives from Arabic hand-written script is a complex task due to the presence of two-dimensional structure, context-dependent shape of characters, high number of ligatures, overlap of characters, and placement of diacritics. While significant attempts exist for Latin and Arabic scripts, very few attempts have been made for offline, handwritten, Urdu script. In this paper, we introduce a large, annotated dataset of handwritten Urdu sentences. We also present a methodology for the recognition of offline handwritten Urdu text lines. A deep learning based encoder/decoder framework with attention mechanism is used to handle two-dimensional text structure. While existing approaches report only character level accuracy, the proposed model improves on BLSTM-based state-of-the-art by a factor of 2 in terms of character level accuracy and by a factor of 37 in terms of word level accuracy. Incorporation of attention before a recurrent decoding framework helps the model in looking at appropriate locations before classifying the next character and therefore results in a higher word level accuracy.

Project page

http://faculty.pucit.edu.pk/nazarkhan/work/urdu_ohtr/index.html

Dataset

https://drive.google.com/file/d/1itd147PcQYpduO1-jvm0HAq5k7HFbwfV/view?usp=sharing

Code

Tentative release by end of 2020.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published