Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataLossError (see above for traceback): inflate() failed with error -3: incorrect header check #14

Open
anmol4210 opened this issue Aug 8, 2019 · 5 comments

Comments

@anmol4210
Copy link

Caused by op 'IteratorGetNext_3', defined at:
File "bin/iterate/table_adjacency_parsing.py", line 31, in
trainer.train()
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/python/iterators/table_adjacency_parsing_iterator.py", line 48, in train
model.initialize(training=True)
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/python/models/basic_model.py", line 73, in initialize
self.validation_feeds = self.validation_reader.get_feeds()
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/python/readers/image_words_reader.py", line 68, in get_feeds
vertex_features, vertex_text, image, global_features, adj_cells, adj_rows, adj_cols = iterator.get_next()
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/env/table-gn/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 414, in get_next
output_shapes=self._structure._flat_shapes, name=name)
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/env/table-gn/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 1685, in iterator_get_next
output_shapes=output_shapes, name=name)
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/env/table-gn/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/env/table-gn/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/env/table-gn/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/env/table-gn/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in init
self._traceback = tf_stack.extract_stack()

DataLossError (see above for traceback): inflate() failed with error -3: incorrect header check
[[node IteratorGetNext_3 (defined at /media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/python/readers/image_words_reader.py:68) ]]
[[node IteratorGetNext_3 (defined at /media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/python/readers/image_words_reader.py:68) ]]

@nnnnwang
Copy link

nnnnwang commented Feb 5, 2020

did you solve it? i met the same problem, so bad

@Sky9222
Copy link

Sky9222 commented Feb 12, 2020

You can use the following code to find the corrupted tfrecords:

import tensorflow as tf
import glob
total_images = 0
train_files = sorted(glob.glob('~/*.tfrecord'))
compression = tf.python_io.TFRecordCompressionType.GZIP
print("validation started!")
for idx, file in enumerate(train_files):
    try:
        total_images += sum([1 for _ in tf.io.tf_record_iterator(file, tf.python_io.TFRecordOptions(compression))])
        print("{}: {} is ok".format(idx, file))
    except Exception as e:
        print("{}: {} is corrupted".format(idx, file))
        print(e)

@oksidgy
Copy link

oksidgy commented Feb 18, 2020

download files separately, without archive.

@zdmwang
Copy link

zdmwang commented Jul 7, 2020

did you solve it? i met the same problem, so bad

@mtchibozo
Copy link

mtchibozo commented Jul 12, 2020

To solve this problem, you can modify the image_words_reader.py file:
Define the dataset with:
dataset = tf.data.TFRecordDataset(file_paths, compression_type=None)
instead of
dataset = tf.data.TFRecordDataset(file_paths, compression_type='GZIP')

With the original version of the code, I believe the program thinks the tfrecords are GZIP compressed, when in reality you should already have uncompressed them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants