Skip to content
This repository has been archived by the owner on Jan 16, 2023. It is now read-only.

running R2D2 without Docker #11

Closed
turmeric-blend opened this issue Mar 28, 2020 · 13 comments
Closed

running R2D2 without Docker #11

turmeric-blend opened this issue Mar 28, 2020 · 13 comments

Comments

@turmeric-blend
Copy link

I'm trying to run seed rl (R2D2) without Docker on Ubuntu 18.04. I've tried to decouple the files as much as I can from docker. When I try to run r2d2_main.py in leaner mode in the terminal,
python atari/r2d2_main.py --run_mode=learner --logtostderr --pdb_post_mortem --num_actors=2,

I get this error:

Traceback (most recent call last):
  File "atari/r2d2_main.py", line 27, in <module>
    from seed_rl.agents.r2d2 import learner
  File "/home/dave/Documents/AI/2020_seed_rl/seed_rl/agents/r2d2/learner.py", line 38, in <module>
    from seed_rl import grpc
  File "/home/dave/Documents/AI/2020_seed_rl/seed_rl/grpc/__init__.py", line 21, in <module>
    from seed_rl.grpc.python.ops import *  
  File "/home/dave/Documents/AI/2020_seed_rl/seed_rl/grpc/python/ops.py", line 25, in <module>
    from seed_rl.grpc.python.ops_wrapper import gen_grpc_ops
  File "/home/dave/Documents/AI/2020_seed_rl/seed_rl/grpc/python/ops_wrapper.py", line 25, in <module>
    gen_grpc_ops = tf.load_op_library(os.path.join(tf.compat.v1.resource_loader.get_data_files_path(), '../grpc_cc.so'))
  File "/home/dave/anaconda3/envs/dave/lib/python3.7/site-packages/tensorflow_core/python/framework/load_library.py", line 57, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/dave/Documents/AI/2020_seed_rl/seed_rl/grpc/python/../grpc_cc.so: undefined symbol: _ZN10tensorflow14DataTypeStringENS_8DataTypeE

The only change I made to r2d2_main.py is add

import sys
sys.path.insert(1, '/home/dave/Documents/AI/2020_seed_rl/')

for path purposes.

@lespeholt
Copy link
Collaborator

Do you build the grpc library in docker? If no, try and do that.
If you do already: I have seen this error before with specific versions of TF, so make sure you use exactly the same.

@turmeric-blend
Copy link
Author

@lespeholt

My TF version is 2.1.0 which is the same.

I think my issue is with the grpc library as when I ran this simple example I got the same error.

However, as I am quite new to Docker and grpc, I am not quite sure how to 'build' grpc from docker (even after reading up on grpc and docker).

The file structure of grpc in seed_rl seems quite different from those examples given by the tutorial and example repository. The following gave me a lot of confusion:

  1. there is no _pb2_grpc.py generated file as specified in the tutorials.
  2. there seems to be a grpc.cc file and grpc_cc.so file, which is not common if using grpc python.
  3. i am looking for a way to run r2d2 without docker, so it seems that building grpc from docker would not be ideal.

Above all, do we really need the c++ and .so files to use grpc? Is there a way to do it like the examples in the tutorial (using grpc with python) without those files and without docker?

thanks for your patients and I really appreciate all the help I can get thanks ! (:

@lespeholt
Copy link
Collaborator

We are using C++ grpc, not Python grpc.

Do you right now use the prebuild .so file (i.e. you don't try to build the grpc library?

@turmeric-blend
Copy link
Author

turmeric-blend commented Mar 30, 2020

@lespeholt

Do you right now use the prebuild .so file (i.e. you don't try to build the grpc library?

yes, both seed_rl for r2d2 (without docker) using
python atari/r2d2_main.py --run_mode=learner --logtostderr --pdb_post_mortem --num_actors=2
and this simple example runs on the existing .so file from the repository.

We are using C++ grpc, not Python grpc.

Is there any advantages running c++ instead of python?

@lespeholt
Copy link
Collaborator

I'm not sure what goes wrong for you, the following in Docker works:

FROM ubuntu:18.04

RUN apt-get update && apt-get install -y tmux libsm6 libxext6 libxrender-dev python3-pip
RUN pip3 install --upgrade pip
RUN pip3 install tensorflow==2.1.0

this should be fairly close to what you're doing.

@turmeric-blend
Copy link
Author

ok @lespeholt I will look into it again on running without docker.

Also, is it possible to use grpc python instead of grpc c++? do you know if there would be any slowly down in speed/bandwidth or if there are any features which cant be implemented in grpc Python?

@lespeholt
Copy link
Collaborator

Using Python grpc would be significantly slower than C++ and the custom batching.

@lespeholt lespeholt mentioned this issue Apr 23, 2020
@galdl
Copy link

galdl commented Jul 30, 2020

Reviving this. I'm trying to do the same since apparently profiling with nvprof is problematic inside the docker; I'm getting segmentation faults.

I'm getting a very similar error: tensorflow.python.framework.errors_impl.NotFoundError: /home/nvidia/PycharmProjects/seed_rl/grpc/python/../grpc_cc.so: undefined symbol: _ZN10tensorflow8OpKernel11TraceStringEPNS_15OpKernelContextEb

Is there a solution proposed here? I'm not sure I understood. The suggestion to compile grpc within the docker is not relevant, right? since I'm not using docker...

@lespeholt
Copy link
Collaborator

you can still compile grpc inside docker, copy the file and then not use docker at all when you run the training.

@zhuliwen
Copy link

How can I run this code without docker?

I did it successfully, here I want to share my experience.

1. First, you can create a virtual environment using conda or virtualenv (my python version is python3.6.7 ), installing the following packages:

absl-py==0.9.0
appdirs==1.4.4
asn1crypto==0.24.0
astunparse==1.6.3
atari-py==0.2.6
cachetools==4.1.1
certifi==2020.6.20
cffi==1.14.0
chardet==3.0.4
cloudpickle==1.3.0
cryptography==2.1.4
decorator==4.4.2
distlib==0.3.1
filelock==3.0.12
future==0.18.2
gast==0.3.3
google-auth==1.18.0
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
grpcio==1.30.0
gym==0.17.2
h5py==2.10.0
idna==2.10
importlib-metadata==1.7.0
importlib-resources==3.0.0
Keras-Preprocessing==1.1.2
keyring==10.6.0
keyrings.alt==3.0
Markdown==3.2.2
numpy==1.19.0
oauthlib==3.1.0
opencv-python==4.3.0.36
opt-einsum==3.2.1
Pillow==7.2.0
protobuf==3.12.2
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycairo==1.19.1
pycparser==2.20
pyglet==1.5.0
PyGObject==3.36.1
requests==2.24.0
requests-oauthlib==1.3.0
rsa==4.6
scipy==1.4.1
SecretStorage==2.3.1
six==1.15.0
tensorboard==2.2.2
tensorboard-plugin-wit==1.7.0
tensorflow-estimator==2.2.0
tensorflow-gpu==2.2.0
tensorflow-probability==0.9.0
termcolor==1.1.0
urllib3==1.25.9
Werkzeug==1.0.1
wrapt==1.12.1
zipp==3.1.0

2. Second, we need to configure the cuda_10 environment

conda install cudatoolkit==10.1
conda install cudnn -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/linux-64/

3. Third, we add the seed_rl path to the python path, the pwd is under the seed_rl folder

export PYTHONPATH=$(dirname "$PWD"):$PYTHONPATH

4. Forth, we create 5 tmux windows named learner, actor0, actor1, actor2, actor3

Run the following commands in these 5 windows respectively:

python3 atari/r2d2_main.py --run_mode=learner --logtostderr --pdb_post_mortem  --num_actors=4
CUDA_VISIBLE_DEVICES='' python3 atari/r2d2_main.py --run_mode=actor --logtostderr --pdb_post_mortem  --num_actors=4 --task=0
CUDA_VISIBLE_DEVICES='' python3 atari/r2d2_main.py --run_mode=actor --logtostderr --pdb_post_mortem  --num_actors=4 --task=1
CUDA_VISIBLE_DEVICES='' python3 atari/r2d2_main.py --run_mode=actor --logtostderr --pdb_post_mortem  --num_actors=4 --task=2
CUDA_VISIBLE_DEVICES='' python3 atari/r2d2_main.py --run_mode=actor --logtostderr --pdb_post_mortem  --num_actors=4 --task=3

Note: you should make sure that you have grpc_cc.so file (9.4M) under the grpc folder.

That's all, hope you can succeed!

@lespeholt
Copy link
Collaborator

@zhuliwen thanks!

@galdl
Copy link

galdl commented Aug 12, 2020

Excellent, I'll give it a try. Thanks a lot!

@omurammm
Copy link

FYI, the appropriate version should be used as this repository is updated.
The version is written in docker files.
https://github.com/google-research/seed_rl/blob/master/docker/Dockerfile.grpc
https://github.com/google-research/seed_rl/blob/master/docker/Dockerfile.atari

Now, you need to use

tensorflow-gpu==2.4.1
tensorflow-probability==0.11.0

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants