Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault error when using tensorflow dataset #450

Closed
bothbossandboss opened this issue Apr 13, 2020 · 3 comments
Closed

Segmentation fault error when using tensorflow dataset #450

bothbossandboss opened this issue Apr 13, 2020 · 3 comments

Comments

@bothbossandboss
Copy link

Description

Hi all, I'm building dataset using tensorflow and trax on Ubuntu docker. But I encountered Segmentation fault error.
When I run the code without trax, there is no error. Please help me.

Environment information (Dockerfile)

FROM tensorflow/tensorflow:latest-gpu-py3

RUN apt-get -y update
RUN apt-get -y upgrade
RUN apt-get install -y less wget git
# for error of matplotlib + trax 
RUN apt-get install -y python3-cairocffi python3-gi gir1.2-gtk-3.0

RUN pip install -U pip
RUN pip install -U six
RUN pip install -U matplotlib==3.1.3
RUN pip install --upgrade https://storage.googleapis.com/jax-releases/cuda101/jaxlib-0.1.43-cp36-none-linux_x86_64.whl
RUN pip install --upgrade jax
WORKDIR /tmp/docker_works
RUN git clone https://github.com/google/trax.git
WORKDIR /tmp/docker_works/trax
RUN sed -i '1s/^/import tensorflow\n/' ./trax/models/research/bert.py
RUN sed -i -e "s/from tensorflow.train import load_checkpoint//g" ./trax/models/research/bert.py
RUN sed -i -e "s/load_checkpoint/tensorflow.train.load_checkpoint/g" ./trax/models/research/bert.py
RUN python setup.py install
WORKDIR /tmp/docker_works

For bugs: reproduction and error logs

code

import matplotlib as mlp
mlp.use('Agg')
import trax
import faulthandler
faulthandler.enable()
import pickle
import random
import numpy as np
import tensorflow as tf

if __name__ == "__main__":
	with tf.io.TFRecordWriter('./data/tmp.tfrecord') as writer:
		for i in range(10):
			example = tf.train.Example(features=tf.train.Features(
				feature = {'input_ids':tf.train.Feature(int64_list=tf.train.Int64List(value=range(10))),
						   'labels':tf.train.Feature(int64_list=tf.train.Int64List(value=range(10)))
						  }
			))
			writer.write(example.SerializeToString())
	dataset = tf.data.TFRecordDataset('./data/tmp.tfrecord')
	print(dataset)

Error logs:

log with trax

$ python script/sample_with_trax.py
Fatal Python error: Segmentation fault

Current thread 0x00007eff32052740 (most recent call first):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/context.py", line 1081 in _initialize_physical_devices
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/context.py", line 815 in config
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/context.py", line 496 in ensure_initialized
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/constant_op.py", line 95 in convert_to_eager_tensor
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/constant_op.py", line 266 in _constant_impl
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/constant_op.py", line 258 in constant
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/constant_op.py", line 317 in _constant_tensor_conversion_function
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1302 in convert_to_tensor
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/readers.py", line 55 in _create_or_validate_filenames_dataset
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/readers.py", line 316 in __init__
  File "script/sample_with_trax.py", line 20 in <module>
Segmentation fault (core dumped)

log without trax (comment out import trax)

$ python script/sample_with_trax.py
<TFRecordDatasetV2 shapes: (), types: tf.string>
@juneoh
Copy link

juneoh commented Apr 22, 2020

I was able to reproduce with the following dockerfile:

FROM tensorflow/tensorflow:1.15.2-gpu-py3

RUN pip install matplotlib==3.2.1 trax==1.2.4

Haven't yet found out why, but manually specifying the version for tensor2tensor makes the segfault go away.

pip install matplotlib==3.2.1 tensor2tensor==1.15.5 trax==1.2.4

Some weird behavior in pip(19.3.1) perhaps?

Issue #374 might have also been related.

@lukaszkaiser
Copy link
Contributor

Can it be that down-grading the gym package is the key here? (It looks like some gym/TF interaction, but why oh why?)

@lukaszkaiser
Copy link
Contributor

On other issues with the same bug down-grading gym seems to have helped, so closing this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants