Failed to run optimizer ArithmeticOptimizer, stage RemoveStackStridedSliceSameAxis. #7

tomas-wood · 2018-10-22T21:10:31Z

Getting this error when I run blocks_test.py, modules_test.py, and utils_tf_test.py.

2018-10-22 14:07:06.293160: W ./tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:241] Failed to run optimizer ArithmeticOptimizer, stage RemoveStackStridedSliceSameAxis. Error: Pack node (data_dicts_to_graphs_tuple/stack) axis attribute is out of bounds: 0

Was using tensorflow version 1.13.0-dev20181022.

The text was updated successfully, but these errors were encountered:

IMBurbank · 2018-10-22T21:34:15Z

How are you running the tests? What environment? What commands?

They all pass in my dev environment.

I ran the tests as follows:

Clone and enter repo

git clone https://github.com/deepmind/graph_nets.git
cd gatph_nets/

I use docker images with all the dependencies included so I don't have to worry about system incompatibilities or version conflicts. If you have docker, you can try the Graph Nets images I'm currently hosting to see if it's an issue with your local dev environment.

# CPU Image
docker run --rm -u $(id -u):$(id -g) -p 8888:8888 -v $(pwd):/my-devel -it imburbank/graph_nets bash -l

# GPU image
docker run --rm --runtime=nvidia --user $(id -u):$(id -g) -p 8888:8888 -v $(pwd):/my-devel -it imburbank/graph_nets:latest-gpu bash -l

Then I ran each test

python graph_nets/tests/blocks_test.py
python graph_nets/tests/modules_test.py
python graph_nets/tests/utils_tf_test.py
...ect

tomas-wood · 2018-10-22T21:56:03Z

Hi @IMBurbank thank you for commenting.

I was just cding into graph_nets/tests and running python blocks_test.py after installing. I'm pulling your docker images right now and will try it out through them. Alright I tried 'em out and it got pretty ugly.

Realized I had sshed into the wrong machine and had just installed the binaries for tensorflow through pip instead of building them myself with bazel as I always do. Though in this case it seems like the new binary for tensorflow installed through pip isn't sending me mangled stack traces my own build is.

Running with my own compiled binaries (no docker, no conda env, just Ubuntu 16.04) gave me something similar to your docker image.

2018-10-22 14:49:38.296960: I tensorflow/stream_executor/stream.cc:1960] stream 0x8cda6960 did not wait for stream: 0x18423e90
2018-10-22 14:49:38.296978: I tensorflow/stream_executor/stream.cc:4793] stream 0x8cda6960 did not memcpy host-to-device; source: 0x7fc8f2c00000
2018-10-22 14:49:38.297064: F tensorflow/core/common_runtime/gpu/gpu_util.cc:339] CPU->GPU Memcpy failed
*** Received signal 6 ***
*** BEGIN MANGLED STACK TRACE ***
/usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so(+0x6ba3ee)[0x7fd10b6163ee]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fd15e6f4390]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7fd15e34e428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7fd15e35002a]
/usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so(+0x4fadaa7)[0x7fd110eddaa7]
/usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so(+0x5f75ff)[0x7fd10b5535ff]
/usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so(_ZN5Eigen26NonBlockingThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi+0x241)[0x7fd10b5ee581]
/usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so(_ZNSt17_Function_handlerIFvvEZN10tensorflow6thread16EigenEnvironment12CreateThreadESt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data+0x37)[0x7fd10b5ec317]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80)[0x7fd11d65fc80]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fd15e6ea6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fd15e42041d]
*** END MANGLED STACK TRACE ***

*** Begin stack trace ***
	tensorflow::CurrentStackTrace[abi:cxx11]()
	
	
	gsignal
	abort
	
	
	Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int)
	std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&)
	
	
	clone
*** End stack trace ***

Aborted (core dumped)

It looks like it couldn't find BLAS with your docker image and in my local environment I'm having trouble getting data from the GPU to the CPU because of a misbehaving stream.

IMBurbank · 2018-10-22T22:23:52Z

As long as Docker is working, your local installations of python, conda, bazel, tensorflow, etc won't matter. Everything needed to run the tests is already in the container environments.

Let's start with CPU (I'm not sure if you have GPU configured).

Make sure you're in your normal local environment at a location where you can download the graph_nets repository.
Clone a fresh version of graph_nets to make sure tests are passing with the current build.

git clone https://github.com/deepmind/graph_nets.git

Enter the graph_nets project directory

cd graph_nets/

Run the CPU docker file image with a bash command to enter the container.

docker run --rm -u $(id -u):$(id -g) -p 8888:8888 -v $(pwd):/my-devel -it imburbank/graph_nets bash -l

In that same terminal, so that you're using the container environment, run the tests

python graph_nets/tests/blocks_test.py
python graph_nets/tests/modules_test.py
python graph_nets/tests/utils_tf_test.py

This will not use your locally-compiled tensorflow. The tests should pass. From there, you may be able to work on isolating the problem in your local dev environment.

I would recommend trying the tests on your local dev system with a standard tensorflow package and seeing if they pass. If they do, move to the next link in the chain with your compiled tensorflow.

tomas-wood · 2018-10-22T22:43:39Z

I'll try out your CPU version, but I have the GPU configured. My locally installed tensorflow-r1.10 build works on the GPU. All tests passing. Lots of code run with it. If it's causing the problem, I'm only seeing it when trying to run the tests in graph_nets. I also know how docker works. I'm not a complete idiot (just a touch, now and then, for character).

Looks like your CPU binaries work. Doesn't really do me a bit of good, but they work. Kudos.

The thing you recommend, trying the tests on local dev system with standard tf (no GPU) installed with pip is what produced the Failed to run optimizer ArithmeticOptimizer, stage RemoveStackStridedSliceSameAxis. errors I first reported.

tomas-wood · 2018-10-22T22:51:51Z

Okay I figured it out it's related to this issue. Merci!

IMBurbank · 2018-10-22T22:52:38Z

To run the GPU version, follow the exact same steps again, but swap in the GPU image in step 3.
3. Run the GPU docker file image with a bash command to enter the container.

docker run --rm --runtime=nvidia --user $(id -u):$(id -g) -p 8888:8888 -v $(pwd):/my-devel -it imburbank/graph_nets:latest-gpu bash -l

That should duplicate a standard environment running tensorflow_gpu, tensorflow_probability_gpu, graph_nets and the standard dependencies.

I see you got it worked out. Cheers!

tomas-wood · 2018-10-22T22:55:54Z

I'm still using nvidia-docker because I'm trapped in the past lol

abh2424 · 2019-03-22T09:58:39Z

I am facing the similar error while running my object detection python file.I have completed all the above steps given by @IMBurBank.But still the error is same.

What is the top-level directory of the model you are using:
./models/research
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
NO
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Raspbian OS - linux
TensorFlow installed from (source or binary):
pip3
TensorFlow version (use command below):
1.13.1
Bazel version (if compiling from source):
0.8.0
CUDA/cuDNN version:
no
GPU model and memory:
cpu only

Please help me @IMBurbank

cutemuggle · 2020-01-15T06:35:08Z

I am facing the similar error while running my object detection python file. Could you please tell me how to solve it ? Thanks a lot! @abh2424 @tomas-wood

tomas-wood closed this as completed Oct 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to run optimizer ArithmeticOptimizer, stage RemoveStackStridedSliceSameAxis. #7

Failed to run optimizer ArithmeticOptimizer, stage RemoveStackStridedSliceSameAxis. #7

tomas-wood commented Oct 22, 2018 •

edited

Loading

IMBurbank commented Oct 22, 2018

tomas-wood commented Oct 22, 2018

IMBurbank commented Oct 22, 2018

tomas-wood commented Oct 22, 2018

tomas-wood commented Oct 22, 2018

IMBurbank commented Oct 22, 2018 •

edited

Loading

tomas-wood commented Oct 22, 2018

abh2424 commented Mar 22, 2019 •

edited

Loading

cutemuggle commented Jan 15, 2020

Failed to run optimizer ArithmeticOptimizer, stage RemoveStackStridedSliceSameAxis. #7

Failed to run optimizer ArithmeticOptimizer, stage RemoveStackStridedSliceSameAxis. #7

Comments

tomas-wood commented Oct 22, 2018 • edited Loading

IMBurbank commented Oct 22, 2018

tomas-wood commented Oct 22, 2018

IMBurbank commented Oct 22, 2018

tomas-wood commented Oct 22, 2018

tomas-wood commented Oct 22, 2018

IMBurbank commented Oct 22, 2018 • edited Loading

tomas-wood commented Oct 22, 2018

abh2424 commented Mar 22, 2019 • edited Loading

cutemuggle commented Jan 15, 2020

tomas-wood commented Oct 22, 2018 •

edited

Loading

IMBurbank commented Oct 22, 2018 •

edited

Loading

abh2424 commented Mar 22, 2019 •

edited

Loading