Error when training on Synth 90k #31

thisismohitgupta · 2017-10-23T03:27:15Z

2017-10-22 23:07:17.471187: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Invalid JPEG data, size 1024
In
image = tf.image.decode_png(img, channels=1)

emedvedev · 2017-10-23T09:51:25Z

Thanks for the report! Could you please provide the image in the dataset that errors out? Does this error appear from the beginning, or at some point in the middle of the training process?

thisismohitgupta · 2017-10-23T21:15:04Z

happens in the middle, roughly around 1300-1500 steps with batch size of 512. I hard tried but I could not identify that problematic image. Please help.

emedvedev · 2017-10-24T02:53:21Z

It's pretty much impossible to help unless I know what the image is, unfortunately. You can try inserting some debugging line that would output the list of images in the batch, and then try to narrow it down to a particular one, or just add a catch that would ignore a failed batch and continue training. Might be that your dataset is corrupted.

tumusudheer · 2017-10-26T01:48:03Z

Hi @emedvedev,

I've faced similar errors while training on my data. I debugged which images are giving the similar errors and I used Imagemagick's convert command to convert that image to gray scale and the commands are working fine.

I think the issue is with this line image = tf.image.decode_png(img, channels=1) here
while converting to gray scale from image bytes.

How about changing it to:

rgb_image = tf.image.decode_png(img,  channels=3)
image = tf.image.rgb_to_grayscale(rgb_image)

emedvedev · 2017-10-26T02:32:02Z

@tumusudheer thanks for investigating the issue! Can you confirm that the proposed change works with a "broken" image?

@thisismohitgupta you can try applying the proposed patch and re-training your model. Please tell me if it helps!

thisismohitgupta · 2017-10-26T13:09:08Z

@emedvedev it did'nt work for me.
@tumusudheer how did you debug from 9 M images. any pointers?

thisismohitgupta · 2017-10-26T19:38:04Z

adding the following lines solves the problem
here

try:
    image = Image.open(IO(img)).convert('RGB')
except Exception as e:
    continue
if self.max_width and (image.size[0] <= self.max_width):
...

emedvedev · 2017-10-26T21:37:01Z

Well, then we're just silently skipping broken images, which isn't very good. Is there any way to make an image reading/conversion more bulletproof so that we wouldn't skip anything?

tumusudheer · 2017-10-27T00:38:53Z

HI @emedvedev ,

I trained with my proposed change yesterday, and I just verified the results. They are good

This change worked for me:

rgb_image = tf.image.decode_png(img,  channels=3)
image = tf.image.rgb_to_grayscale(rgb_image)

While skipping the images, we can create a log file which lists all broken images. After preparing training data, people can verify what is wrong with the images in log file and try to fix/verify the images.

emedvedev · 2017-10-27T07:41:32Z

Sweet! Would you mind opening a PR with the change then? If you want, you can also implement image skipping there, would be great.

tumusudheer · 2017-10-28T00:46:41Z

Hi @emedvedev ,
Sure I'll send a PR with my changes.

lmolhw5252 · 2017-12-20T04:19:08Z

Hi I got some problems when I want to train.
Caused by op 'IteratorGetNext', defined at:
File "/home/user/anaconda3/bin/aocr", line 11, in
sys.exit(main())
File "/home/user/PycharmProjects/attention-ocr-master/aocr/main.py", line 308, in main
num_epoch=parameters.num_epoch
File "/home/user/PycharmProjects/attention-ocr-master/aocr/model/model.py", line 347, in train
for batch in s_gen.gen(self.batch_size):
File "/home/user/PycharmProjects/attention-ocr-master/aocr/util/data_gen.py", line 54, in gen
images, labels, comments = iterator.get_next()
File "/home/user/anaconda3/lib/python3.5/site-packages/tensorflow/contrib/data/python/ops/dataset_ops.py", line 304, in get_next
name=name))
File "/home/user/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 379, in iterator_get_next
output_shapes=output_shapes, name=name)
File "/home/user/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/user/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/user/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): ./home/user/Dataset/CAPTCHAs/training.tfrecords
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?], [?], [?]], output_types=[DT_STRING, DT_STRING, DT_STRING], _device="/job:localhost/replica:0/task:0/cpu:0"]]

Did you met this problem before? I don't know how to figure it.

emedvedev · 2017-12-20T08:09:46Z

NotFoundError (see above for traceback): ./home/user/Dataset/CAPTCHAs/training.tfrecords

Your dataset path is incorrect. I'd say it's that dot in the beginning. :)

MBleeker · 2018-02-13T12:23:54Z

Hi Guys,

I encounter the same problem. But, for some reason both solutions are not working for me. Therefore I set the batch size to 1 and checked witch image caused the problem, it is:

mnt/ramdisk/max/90kDICT32px/2194/2/334_EFFLORESCENT_24742.jpg'

But, there could be more ...

Cheers,
Maurits

emedvedev · 2018-02-16T09:07:14Z

@MBleeker Hi there! Just checking: have you set the max-prediction parameter while training and testing? From an earlier issue:

The max-prediction parameter is set to 8 by default, so it'll error out on labels longer than 8 characters. Just set it to whatever makes sense for you in the CLI when you run the training subcommand.

If it's set correctly, then could you provide the full log of your run?

MBleeker · 2018-02-16T09:43:43Z

Hi @emedvedev,

I found the problem already. I never used setup.py before ... I did not know about the .egg files. The updates I made were therefor not used while running the code. It is working now. There are several corrupted images

About the bias terms we discussed in #70 . I added them, results seem not be significantly better, but not worse either (the only problem is that you cannot use previous trained models anymore, because the variables are not stored in the checkpoint).

Did you try this code with a different set op hyper params than the defaults? Any different results?

Cheers,
Maurits

emedvedev · 2018-02-16T11:04:19Z

About the bias terms we discussed in #70 . I added them, results seems not be significantly better, but not worse either (the only problem is that you cannot use previous trained models anymore, because the variables are not stored in the checkpoint).

If there's no visible benefit, I'd rather maintain backward compatibility, but if you find out that bias terms do have significant benefit with some datasets, please submit a PR, I'd really appreciate it!

Did you try this code with a different set op hyper params than the defaults? Any different results?

I tried to tweak it a little, but mostly just depends on the dataset. I find the defaults to be sensible, but maybe someone else will have something to add here, too. :)

emedvedev · 2018-04-10T09:31:13Z

I'll close the issue since the original problem has been fixed, so if anyone else has issues with Synth90k, just open a new one. :)

emedvedev closed this as completed Apr 10, 2018

kulkarnivishal mentioned this issue Jun 19, 2018

Can't understand the error: Premature end of JPEG data #96

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when training on Synth 90k #31

Error when training on Synth 90k #31

thisismohitgupta commented Oct 23, 2017

emedvedev commented Oct 23, 2017

thisismohitgupta commented Oct 23, 2017

emedvedev commented Oct 24, 2017

tumusudheer commented Oct 26, 2017

emedvedev commented Oct 26, 2017

thisismohitgupta commented Oct 26, 2017

thisismohitgupta commented Oct 26, 2017

emedvedev commented Oct 26, 2017

tumusudheer commented Oct 27, 2017

emedvedev commented Oct 27, 2017

tumusudheer commented Oct 28, 2017

lmolhw5252 commented Dec 20, 2017

emedvedev commented Dec 20, 2017

MBleeker commented Feb 13, 2018 •

edited

emedvedev commented Feb 16, 2018

MBleeker commented Feb 16, 2018 •

edited

emedvedev commented Feb 16, 2018

emedvedev commented Apr 10, 2018

Error when training on Synth 90k #31

Error when training on Synth 90k #31

Comments

thisismohitgupta commented Oct 23, 2017

emedvedev commented Oct 23, 2017

thisismohitgupta commented Oct 23, 2017

emedvedev commented Oct 24, 2017

tumusudheer commented Oct 26, 2017

emedvedev commented Oct 26, 2017

thisismohitgupta commented Oct 26, 2017

thisismohitgupta commented Oct 26, 2017

emedvedev commented Oct 26, 2017

tumusudheer commented Oct 27, 2017

emedvedev commented Oct 27, 2017

tumusudheer commented Oct 28, 2017

lmolhw5252 commented Dec 20, 2017

emedvedev commented Dec 20, 2017

MBleeker commented Feb 13, 2018 • edited

emedvedev commented Feb 16, 2018

MBleeker commented Feb 16, 2018 • edited

emedvedev commented Feb 16, 2018

emedvedev commented Apr 10, 2018

MBleeker commented Feb 13, 2018 •

edited

MBleeker commented Feb 16, 2018 •

edited