[datasets] Add a synthetic recognition dataset #262

fg-mindee · 2021-05-17T08:44:03Z

The library would greatly benefit from synthetic data. It could be very helpful for character classification, and text recognition.

Here are suggestions about the augmentations operating on discrete spectrum:

choice of character in the vocab
choice of font family
choice of font style (bold, italic, etc)
The last two would not be relevant for handwritten characters

And those operating on a continuous spectrum: all common image classifications augmentations (rotation, color modifications, etc.)

felixdittrich92 · 2021-11-22T09:53:23Z

@fg-mindee
@charlesmindee
following #640 what is your idea to implement this or better what you expect to generate ?
My closed PR has implemented a Word-Image generator with the depending word labels where you can specify:

language (vocab)
image size
background (noise /white)
font (family / size)
was planned also to add: rotation / distorsion / skew / blur

My orientation was a bit on this: TRDG
An example:

label: "sweat"

wdyt ? :)

fg-mindee added help wanted Extra attention is needed module: datasets Related to doctr.datasets labels May 17, 2021

fg-mindee added this to the 0.3.0 milestone May 17, 2021

charlesmindee added the side-project Long-term issue which is particularly independant of other issues label Jun 1, 2021

fg-mindee modified the milestones: 0.3.0, 0.4.0 Jul 1, 2021

This was referenced Aug 13, 2021

feat: Added character generator dataset #412

Merged

Require detailed explanation on few points #411

Closed

fg-mindee added the topic: text recognition Related to the task of text recognition label Aug 25, 2021

fg-mindee modified the milestones: 0.4.0, 0.4.1 Sep 20, 2021

fg-mindee modified the milestones: 0.4.1, 1.0.0, 0.5.0 Oct 30, 2021

felixdittrich92 mentioned this issue Nov 19, 2021

[WIP] Add synthetic word generator #640

Closed

3 tasks

fg-mindee removed the help wanted Extra attention is needed label Dec 26, 2021

fg-mindee self-assigned this Dec 26, 2021

fg-mindee mentioned this issue Dec 26, 2021

feat: Added WordGenerator dataset #760

Merged

fg-mindee closed this as completed in #760 Dec 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[datasets] Add a synthetic recognition dataset #262

[datasets] Add a synthetic recognition dataset #262

fg-mindee commented May 17, 2021 •

edited

Loading

felixdittrich92 commented Nov 22, 2021

[datasets] Add a synthetic recognition dataset #262

[datasets] Add a synthetic recognition dataset #262

Comments

fg-mindee commented May 17, 2021 • edited Loading

felixdittrich92 commented Nov 22, 2021

fg-mindee commented May 17, 2021 •

edited

Loading