ValueError: '\xe2' is not in list error when i try to train with color parameter #110

kulkarnivishal · 2018-09-09T15:26:17Z

I get this error after a first few steps when I train with --color parameter (set channels to 3). Could you please help?
Please find the error log below:

emedvedev · 2018-09-09T15:43:03Z

This error means that your image labels have a character that's not in the model charmap. Check your labels, because by default the supported charset is numbers + uppercase only:

attention-ocr/aocr/util/data_gen.py

Line 23 in 7bb17af

CHARMAP = ['', '', ''] + list('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ')

.

kulkarnivishal · 2018-09-09T16:06:13Z

Thank you for the prompt response. I used --full-ascii and --no-force-uppercase flags as well, I assumed --full-ascii covers all characters. Am I missing something?

emedvedev · 2018-09-09T16:07:47Z

--full-ascii covers the ASCII range (uppercase, lowercase, symbols, etc.). There's no flag for covering the entire Unicode range, and you'd have to manually modify the CHARMAP (at the line I've mentioned before). Not sure about the performance in that case though—you might have to consider modifying the dataset labels unless Unicode plays significant part in it.

kulkarnivishal · 2018-09-18T23:27:04Z

thank you for the reply. I am able to train the model. However, I now see a very weird issue. I added synthetic images (generated using GANs) to the training data, and few cropped COCO images. So about 1M synthetic images and Synth90k and about 60k coco images. The training loss is improving but when I test the model for prediction, it performs very poorly. It just prints "cccc" or "aaaaa" etc.
Am I doing something wrong?

Here's the training log:

emedvedev · 2018-09-26T10:23:44Z

Everything looks correct, so I wouldn't know. Really depends on your dataset, separation of training/testing data, and a whole bunch of other factors.

This might, of course, be a fault in the code or the model itself. In that case, once you pin the issue, please submit a detailed report or a PR—that'd be much appreciated if the aocr code is indeed the problem.

aosetrov · 2018-10-31T16:33:06Z

Hi @kulkarnivishal ,
Did you manage to solve this problem?
If yes, please share a hint)

kulkarnivishal · 2018-11-16T16:50:16Z

Not really. Although, I continued the training for 3 more weeks and results look better, not that great though. The main issue I am facing is predicting symbols. No matter how I train the inference seems to be getting it always wrong.

aosetrov · 2018-11-16T19:37:34Z

You can try to manually change the dictionary instead of using keys. At the same time I have no idea how you managed to add a non-ASCII labeled targets to the train-set .tfrecords file. When I tried to do this it used to throw an error of encoding all the time(

kulkarnivishal · 2018-11-16T20:06:59Z

you mean manually adding symbols instead of using full-ascii flag?
And I used python's string.printable to filter non-ascii characters

emedvedev closed this as completed Jan 17, 2019

kspook mentioned this issue Jun 26, 2019

different output at exported model #135

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: '\xe2' is not in list error when i try to train with color parameter #110

ValueError: '\xe2' is not in list error when i try to train with color parameter #110

kulkarnivishal commented Sep 9, 2018

emedvedev commented Sep 9, 2018

kulkarnivishal commented Sep 9, 2018

emedvedev commented Sep 9, 2018

kulkarnivishal commented Sep 18, 2018

emedvedev commented Sep 26, 2018

aosetrov commented Oct 31, 2018

kulkarnivishal commented Nov 16, 2018

aosetrov commented Nov 16, 2018

kulkarnivishal commented Nov 16, 2018

ValueError: '\xe2' is not in list error when i try to train with color parameter #110

ValueError: '\xe2' is not in list error when i try to train with color parameter #110

Comments

kulkarnivishal commented Sep 9, 2018

emedvedev commented Sep 9, 2018

kulkarnivishal commented Sep 9, 2018

emedvedev commented Sep 9, 2018

kulkarnivishal commented Sep 18, 2018

emedvedev commented Sep 26, 2018

aosetrov commented Oct 31, 2018

kulkarnivishal commented Nov 16, 2018

aosetrov commented Nov 16, 2018

kulkarnivishal commented Nov 16, 2018