Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in preprocessing the dev set of natural QA #3

Closed
abaheti95 opened this issue Jan 24, 2019 · 3 comments
Closed

Issue in preprocessing the dev set of natural QA #3

abaheti95 opened this issue Jan 24, 2019 · 3 comments

Comments

@abaheti95
Copy link

Hi,
I was trying to download and preprocess the Natural QA data. I followed the instructions and downloaded it successfully. However, I encountered this error while preprocessing

python -m language.question_answering.preprocessing.create_nq_short_pipeline_examples   --input_pattern=$NQ_DATA_DIR/dev/nq-dev-*.jsonl.gz   --output_dir=$NQ_DATA_DIR/dev
I0124 13:27:44.925077 4667721152 tf_logging.py:115] Converting input 0 files: []
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/user/Reseach/QA_and_Dialog/language/language/question_answering/preprocessing/create_nq_short_pipeline_examples.py", line 109, in <module>
    app.run(main)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/Users/user/Reseach/QA_and_Dialog/language/language/question_answering/preprocessing/create_nq_short_pipeline_examples.py", line 102, in main
    pool = multiprocessing.Pool(num_threads)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/context.py", line 119, in Pool
    context=self.get_context())
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 167, in __init__
    raise ValueError("Number of processes must be at least 1")
ValueError: Number of processes must be at least 1

What am I doing wrong here?

@kentonl
Copy link
Collaborator

kentonl commented Jan 24, 2019

It looks like it couldn't find any input files matching $NQ_DATA_DIR/dev/nq-dev-*.jsonl.gz to process.

Did you download the data using the first few commands?

export DATA_DIR=data
mkdir -p $DATA_DIR
gsutil -m cp -r gs://natural_questions $DATA_DIR
export NQ_DATA_DIR=$DATA_DIR/natural_questions/v1.0

@abaheti95
Copy link
Author

You were right. I opened a new terminal and the variable $NQ_DATA_DIR was lost. However now I'm getting a new decoding error:

python -m language.question_answering.preprocessing.create_nq_short_pipeline_examples   --input_pattern=$NQ_DATA_DIR/dev/nq-dev-*.jsonl.gz   --output_dir=$NQ_DATA_DIR/dev
I0124 14:34:15.836303 4424058304 tf_logging.py:115] Converting input 5 files: ['/Users/user/Reseach/QA_and_Dialog/Datasets/natural_questions/v1.0/dev/nq-dev-00.jsonl.gz', '/Users/user/Reseach/QA_and_Dialog/Datasets/natural_questions/v1.0/dev/nq-dev-01.jsonl.gz', '/Users/user/Reseach/QA_and_Dialog/Datasets/natural_questions/v1.0/dev/nq-dev-04.jsonl.gz', '/Users/user/Reseach/QA_and_Dialog/Datasets/natural_questions/v1.0/dev/nq-dev-03.jsonl.gz', '/Users/user/Reseach/QA_and_Dialog/Datasets/natural_questions/v1.0/dev/nq-dev-02.jsonl.gz']
I0124 14:34:15.851707 4424058304 tf_logging.py:115] Converting examples in /Users/user/Reseach/QA_and_Dialog/Datasets/natural_questions/v1.0/dev/nq-dev-01.jsonl.gz to tf.Examples.
I0124 14:34:15.851823 4424058304 tf_logging.py:115] Converting examples in /Users/user/Reseach/QA_and_Dialog/Datasets/natural_questions/v1.0/dev/nq-dev-04.jsonl.gz to tf.Examples.
I0124 14:34:15.851601 4424058304 tf_logging.py:115] Converting examples in /Users/user/Reseach/QA_and_Dialog/Datasets/natural_questions/v1.0/dev/nq-dev-00.jsonl.gz to tf.Examples.
I0124 14:34:15.851979 4424058304 tf_logging.py:115] Converting examples in /Users/user/Reseach/QA_and_Dialog/Datasets/natural_questions/v1.0/dev/nq-dev-03.jsonl.gz to tf.Examples.
I0124 14:34:15.854323 4424058304 tf_logging.py:115] Converting examples in /Users/user/Reseach/QA_and_Dialog/Datasets/natural_questions/v1.0/dev/nq-dev-02.jsonl.gz to tf.Examples.
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/Users/user/Reseach/QA_and_Dialog/language/language/question_answering/preprocessing/create_nq_short_pipeline_examples.py", line 88, in _create_short_answer_examples
    for i, tf_example in enumerate(_generate_tf_examples(input_file)):
  File "/Users/user/Reseach/QA_and_Dialog/language/language/question_answering/preprocessing/create_nq_short_pipeline_examples.py", line 52, in _generate_tf_examples
    for line in input_file:
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py", line 374, in readline
    return self._buffer.readline(size)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py", line 463, in read
    if not self._read_gzip_header():
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py", line 406, in _read_gzip_header
    magic = self._fp.read(2)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py", line 91, in read
    self.file.read(size-self._length+read)
  File "/Users/user/Reseach/QA_and_Dialog/language/venv/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 132, in read
    pywrap_tensorflow.ReadFromStream(self._read_buf, length, status))
  File "/Users/user/Reseach/QA_and_Dialog/language/venv/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 100, in _prepare_value
    return compat.as_str_any(val)
  File "/Users/user/Reseach/QA_and_Dialog/language/venv/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 107, in as_str_any
    return as_str(value)
  File "/Users/user/Reseach/QA_and_Dialog/language/venv/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 80, in as_text
    return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/user/Reseach/QA_and_Dialog/language/language/question_answering/preprocessing/create_nq_short_pipeline_examples.py", line 109, in <module>
    app.run(main)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/Users/user/Reseach/QA_and_Dialog/language/language/question_answering/preprocessing/create_nq_short_pipeline_examples.py", line 105, in main
    pool.map(_create_short_answer_examples, input_paths)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

@kentonl
Copy link
Collaborator

kentonl commented Jan 24, 2019

This seems to be a Python 3 compatibility issue. This should be fixed in the latest version.

@kentonl kentonl closed this as completed Jan 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants