Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: '\xe2' is not in list error when i try to train with color parameter #110

Closed
kulkarnivishal opened this issue Sep 9, 2018 · 9 comments

Comments

@kulkarnivishal
Copy link

Hi @emedvedev

I get this error after a first few steps when I train with --color parameter (set channels to 3). Could you please help?
Please find the error log below:
image

@emedvedev
Copy link
Owner

This error means that your image labels have a character that's not in the model charmap. Check your labels, because by default the supported charset is numbers + uppercase only:

CHARMAP = ['', '', ''] + list('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ')
.

@kulkarnivishal
Copy link
Author

Thank you for the prompt response. I used --full-ascii and --no-force-uppercase flags as well, I assumed --full-ascii covers all characters. Am I missing something?

@emedvedev
Copy link
Owner

--full-ascii covers the ASCII range (uppercase, lowercase, symbols, etc.). There's no flag for covering the entire Unicode range, and you'd have to manually modify the CHARMAP (at the line I've mentioned before). Not sure about the performance in that case though—you might have to consider modifying the dataset labels unless Unicode plays significant part in it.

@kulkarnivishal
Copy link
Author

thank you for the reply. I am able to train the model. However, I now see a very weird issue. I added synthetic images (generated using GANs) to the training data, and few cropped COCO images. So about 1M synthetic images and Synth90k and about 60k coco images. The training loss is improving but when I test the model for prediction, it performs very poorly. It just prints "cccc" or "aaaaa" etc.
Am I doing something wrong?

Here's the training log:
image

@emedvedev
Copy link
Owner

Everything looks correct, so I wouldn't know. Really depends on your dataset, separation of training/testing data, and a whole bunch of other factors.

This might, of course, be a fault in the code or the model itself. In that case, once you pin the issue, please submit a detailed report or a PR—that'd be much appreciated if the aocr code is indeed the problem.

@aosetrov
Copy link

Hi @kulkarnivishal ,
Did you manage to solve this problem?
If yes, please share a hint)

@kulkarnivishal
Copy link
Author

Not really. Although, I continued the training for 3 more weeks and results look better, not that great though. The main issue I am facing is predicting symbols. No matter how I train the inference seems to be getting it always wrong.

@aosetrov
Copy link

You can try to manually change the dictionary instead of using keys. At the same time I have no idea how you managed to add a non-ASCII labeled targets to the train-set .tfrecords file. When I tried to do this it used to throw an error of encoding all the time(

@kulkarnivishal
Copy link
Author

you mean manually adding symbols instead of using full-ascii flag?
And I used python's string.printable to filter non-ascii characters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants