Skip to content

shubhampateliitm/Awesome-Indic-Scene-Text-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Indic Scene Text Dataset

Awesome Indic Scene Text Dataset is a project to collect Scene Text Images of Indian languages. Currently we are focusing on collection of images of the following languages(Scripts).

  1. Bengali
  2. Gujrati
  3. Hindi
  4. Kannada
  5. Malayalam
  6. Oriya
  7. Punjabi
  8. Tamil
  9. Telugu
  10. Urdu

More Indic Laguages can be added in the list in future.

Introduction

Scene Text Detection and Recognition is an active area of research. It is popularized through the launch of the Robust Reading Competition by ICDAR. Since its initialization in 2003, competition has been held again in 2005, 2011, 2013, 2015 and in the year of 2017. But, dataset introduced in the competition is mainly focused on English language and limited in size (Until the introduction of MS-COCO-Text). And such dataset does not suit Indian Environment which is very diverse in sense of language. So, we are starting this project to collect quality Indic Text Images in large quantity, so that they can be further annotated and converted into the well-defined dataset, which can be publicly available.

Application

Detecting text in photographs has various applications like It can help blind in mundane activities like locating an object in their surrounding, which can further assist them in shopping in the mall, in knowing the name of a product and other pieces of information.

Image Courtesy : IBN Lokmat.

Similarly, text in the photograph when translated to a particular language can be of great help to a foreign as well as local tourist, since India is a diverse country and here languages changes from state to state. It can help them in navigate, purchase stuff and explore shops without language as a barrier.


Image Courtesy : Rediff.com .

There are many other applications possible.

What Scene Text Images looks like ?

Here are some images for the sample purpose :

We can find Scene text at many places such as :

  1. Name Board on Shops.
  2. Sign Boards/Signage in Streets, Museums, Railway Stations, Malls and other such public places.
  3. Billboards.
  4. Hoardings.
  5. Name on the buildings.

How can you contribute ?

You can send your own captured photograph containing scene text by following ways.

  1. You can mail images with language image contain as the subject(e.g. Tamil)(More e.g. Tamil, Telugu, Hindi) at indicSceneText@gmail.com.
  2. You and also share these photos using Google Photos. Add photos to an album. Put the name of the album as "Language"(e.g. Hindi), if Hindi is the majority languages in the images.
  3. Or you can directly contribute to specific language folder. Here are some tutorial for that.

    https://gist.github.com/MarcDiethelm/7303312, https://akrabat.com/the-beginners-guide-to-contributing-to-a-github-project/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published