Can't understand the error: Premature end of JPEG data #96

kulkarnivishal · 2018-06-18T15:15:40Z

Hi,

I am trying to train the model on SynthText data MJSynth 90K. After successfully creating the tfrecords, I started training passing image max-prediction-length and max image width as Command Line Arguments. After about 10,000 steps the process breaks with premature end of JPEG data. Please help. This is the exact error:

2018-06-15 19:39:40,006 root INFO Step 10607: 0.288s, loss: 0.220111, perplexity: 1.246215.
2018-06-15 19:39:40,304 root INFO Step 10608: 0.288s, loss: 0.390393, perplexity: 1.477562.
2018-06-15 19:39:40.334325: E tensorflow/core/lib/jpeg/jpeg_mem.cc:307] Premature end of JPEG data. Stopped at line 0/31
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/bin/aocr", line 11, in
sys.exit(main())
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/aocr/main.py", line 252, in main
num_epoch=parameters.num_epoch
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/aocr/model/model.py", line 364, in train
result = self.step(batch, self.forward_only)
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/aocr/model/model.py", line 445, in step
outputs = self.sess.run(output_feed, input_feed)
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1140, in _run
feed_dict_tensor, options, run_metadata)
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
run_metadata)
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid JPEG data or crop window, data size 1024
[[Node: map/while/DecodePng = DecodePngchannels=1, dtype=DT_UINT8, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
[[Node: map/while/cond/cond/resize_images/ExpandDims/_862 = _SendT=DT_UINT8, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_42253_map/while/cond/cond/resize_images/ExpandDims", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Caused by op u'map/while/DecodePng', defined at:
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/bin/aocr", line 11, in
sys.exit(main())
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/aocr/main.py", line 246, in main
channels=parameters.channels,
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/aocr/model/model.py", line 122, in init
self.img_data = tf.map_fn(self._prepare_image, self.img_data, dtype=tf.float32)
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/tensorflow/python/ops/functional_ops.py", line 413, in map_fn
swap_memory=swap_memory)
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3202, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2940, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2877, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/tensorflow/python/ops/functional_ops.py", line 403, in compute
packed_fn_values = fn(packed_values)
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/aocr/model/model.py", line 465, in _prepare_image
img = tf.image.decode_png(image, channels=self.channels)
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/tensorflow/python/ops/gen_image_ops.py", line 1058, in decode_png
name=name)
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
op_def=op_def)
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1654, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Invalid JPEG data or crop window, data size 1024
[[Node: map/while/DecodePng = DecodePngchannels=1, dtype=DT_UINT8, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
[[Node: map/while/cond/cond/resize_images/ExpandDims/_862 = _SendT=DT_UINT8, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_42253_map/while/cond/cond/resize_images/ExpandDims", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Best regards,
Vishal

@emedvedev

kulkarnivishal · 2018-06-19T18:14:23Z

Same error even if I change the number of channels to 3 (pass --color as CLI parameter) and convert the image to grayscale. I thought a PR for this was already submitted as per #31

emedvedev · 2018-06-20T03:22:00Z

@kulkarnivishal #92 should've fixed this, but it's not in pip yet, so if you did pip install aocr, try running the version in master instead. Otherwise, see #91 for how to skip the affected images.

kulkarnivishal · 2018-06-20T03:41:51Z

Got it working, thanks.
For fear of running into another error I ran the code with 1 epoch, ran it 6 hours back and it's still not done. What is the ideal number of epochs I should run the code with? 1000 seems a lot.

emedvedev · 2018-06-20T03:45:13Z

Run until you can get acceptable (whatever that means for your case) accuracy on the test dataset. It also depends a lot on the GPU — if yours isn't fast enough, try training in the cloud with something like Google ML Engine.

…

On Jun 20, 2018, 09:41 +0600, kulkarnivishal ***@***.***>, wrote: Got it working, thanks. For fear of running into another error I ran the code with 1 epoch, ran it 6 hours back and it's still not done. What is the ideal number of epochs I should run the code with? 1000 seems a lot. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

kulkarnivishal · 2018-06-20T03:46:43Z

Thank you, appreciate your help. I can close this issue now

kulkarnivishal · 2018-06-25T18:30:18Z

Hi @emedvedev

I don't think this issue is fixed yet. I tried training and testing on the master code. The code breaks when the it cannot identify image file (a lot of instances in Synth90k): IOError:
Here's the error:
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/bin/aocr", line 11, in
load_entry_point('aocr==0.7.4', 'console_scripts', 'aocr')()
File "build/bdist.linux-x86_64/egg/aocr/main.py", line 261, in main
File "build/bdist.linux-x86_64/egg/aocr/model/model.py", line 293, in test
File "build/bdist.linux-x86_64/egg/aocr/util/data_gen.py", line 70, in gen
File "/home/ubuntu/anaconda3/envs/tensorflow_p27/lib/python2.7/site-packages/Pillow-5.1.0-py2.7-linux-x86_64.egg/PIL/Image.py", line 2590, in open
% (filename if filename else fp))
IOError: cannot identify image file <StringIO.StringIO instance at 0x7f736586ff80>

For now I am skipping the image that i can't read in the data_gen.py , a solution in master would be helpful.

Best,
Vishal

emedvedev · 2018-06-27T05:15:12Z

@kulkarnivishal ah, I see. That's a similar error, just in a different place. There's not much we can do aside from skipping when the image source is corrupt, but if you could submit a PR with graceful exception handling (like #92, but for data_gen.py:70), that would actually be super helpful.

kulkarnivishal · 2018-06-27T06:29:16Z

Sure. Also, can you help me to use prediction on an image. For example: I wish to pass an image as a command line argument and get ocr for all the text? Any direction for how I can get this implemented would be great.

emedvedev · 2018-06-27T06:35:10Z

aocr predict gets a list of filenames from stdin and runs prediction on each one. So something like cat filenames.txt | aocr predict should work for you, given the model is properly trained.

If by "all the text" you mean that there's a lot of different areas with text in your images, I would suggest running it through a text detection tool to isolate specific areas first, only then apply OCR.

kulkarnivishal · 2018-06-27T06:38:44Z

Thanks for the reply but if I wish to implement it in a way that I feed the image directly, an image with just one word, any direction to how I go about it?

emedvedev · 2018-06-27T06:39:55Z

That would be echo "my-image.png" | aocr predict

kulkarnivishal · 2018-06-27T06:44:40Z

Awesome. That works for testing thanks. However, if I want to use it as a package within a python code where I read the image as numpy object and do ocr on that? This would work like: feed an image object and it would return a string. Something like tesseract python wrappers. Where can i start? I am sorry for the silly question. This would also be a good feature for aocr, I suppose

emedvedev · 2018-06-27T06:46:55Z

Then you can just import the module and run aocr similarly to how predict is run in __main__.py:

text, probability = model.predict(img_file_data)

kulkarnivishal · 2018-06-27T06:48:39Z

Wow, you've saved me a lot of time. Thank you so much.

emedvedev · 2018-06-27T06:53:32Z

You're welcome! I'm going to close this issue, but do feel free to open a new one if you run into other problems, or submit a PR for better error handling in data_gen.py if you feel like working on it. That would be much appreciated!

kulkarnivishal · 2018-11-20T03:02:26Z

Hi, Is there a way I can read the image in opencv instead of PIL Image.open(IO(img))? This is referring to reading the image in data_gen.py

emedvedev · 2018-11-24T12:18:32Z

You could probably use cv2.imdecode instead of Image.open and then use .shape to determine the size. Any particular reason you'd want to do that?

kulkarnivishal closed this as completed Jun 20, 2018

kulkarnivishal reopened this Jun 25, 2018

emedvedev closed this as completed Jun 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't understand the error: Premature end of JPEG data #96

Can't understand the error: Premature end of JPEG data #96

kulkarnivishal commented Jun 18, 2018

kulkarnivishal commented Jun 19, 2018 •

edited

Loading

emedvedev commented Jun 20, 2018

kulkarnivishal commented Jun 20, 2018

emedvedev commented Jun 20, 2018 via email

kulkarnivishal commented Jun 20, 2018

kulkarnivishal commented Jun 25, 2018

emedvedev commented Jun 27, 2018

kulkarnivishal commented Jun 27, 2018

emedvedev commented Jun 27, 2018 •

edited

Loading

kulkarnivishal commented Jun 27, 2018

emedvedev commented Jun 27, 2018

kulkarnivishal commented Jun 27, 2018

emedvedev commented Jun 27, 2018

kulkarnivishal commented Jun 27, 2018

emedvedev commented Jun 27, 2018

kulkarnivishal commented Nov 20, 2018

emedvedev commented Nov 24, 2018

Can't understand the error: Premature end of JPEG data #96

Can't understand the error: Premature end of JPEG data #96

Comments

kulkarnivishal commented Jun 18, 2018

kulkarnivishal commented Jun 19, 2018 • edited Loading

emedvedev commented Jun 20, 2018

kulkarnivishal commented Jun 20, 2018

emedvedev commented Jun 20, 2018 via email

kulkarnivishal commented Jun 20, 2018

kulkarnivishal commented Jun 25, 2018

emedvedev commented Jun 27, 2018

kulkarnivishal commented Jun 27, 2018

emedvedev commented Jun 27, 2018 • edited Loading

kulkarnivishal commented Jun 27, 2018

emedvedev commented Jun 27, 2018

kulkarnivishal commented Jun 27, 2018

emedvedev commented Jun 27, 2018

kulkarnivishal commented Jun 27, 2018

emedvedev commented Jun 27, 2018

kulkarnivishal commented Nov 20, 2018

emedvedev commented Nov 24, 2018

kulkarnivishal commented Jun 19, 2018 •

edited

Loading

emedvedev commented Jun 27, 2018 •

edited

Loading