Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples/openai_gym_async.py broken #253

Closed
sdab opened this issue Dec 4, 2017 · 7 comments
Closed

examples/openai_gym_async.py broken #253

sdab opened this issue Dec 4, 2017 · 7 comments

Comments

@sdab
Copy link

sdab commented Dec 4, 2017

When I run the example as stated in the file's documentation:
python examples/openai_gym_async.py Pong-ram-v0 -a examples/configs/vpg.json -n examples/configs/mlp2_network.json -e 50000 -m 2000 -W 3

the workers fail with:
TypeError: Input 'value' of 'Assign' Op has type int32 that does not match type float32 of argument 'ref'.

This is at current HEAD with tensorflow version 1.4.0.

@sdab
Copy link
Author

sdab commented Dec 4, 2017

It looks like global_variables and local_variables do not match in the following section [1]. The error above comes from local_init_op when a local variable with one dtype is assigned to a global variable with another dtype.

  1. https://github.com/reinforceio/tensorforce/blob/35e35253cea26e869a7ef75907b558c5672b824d/tensorforce/models/model.py#L297-L302

@michaelschaarschmidt
Copy link
Contributor

michaelschaarschmidt commented Dec 4, 2017

Thanks for raising, we will look into it this week, we just moved a lot of code from python to tensorflow so a few hickups. 0.3.2 should be stable for a3c (hopefully)

@michaelschaarschmidt
Copy link
Contributor

Could you post a full stacktrace?

@sdab
Copy link
Author

sdab commented Dec 5, 2017

Sure, the script opens up a server with several workers. The server seems fine, its the workers that get the following stacktrace:
CUDA_VISIBLE_DEVICES= /usr/bin/python /home/ubuntu/git/tensorforce/examples/openai_gym_async.py Pong-ram-v0 --agent-config /home/ubuntu/git/tensorforce/examples/configs/vpg.json --network-spec /home/ubuntu/git/tensorforce/examples/confi\ gs/mlp2_network.json --num-workers 3 --child --task-index 0 --episodes 50000 --max-episode-timesteps 2000 Traceback (most recent call last): File "/home/ubuntu/git/tensorforce/examples/openai_gym_async.py", line 233, in <module> main() File "/home/ubuntu/git/tensorforce/examples/openai_gym_async.py", line 191, in main network_spec=network_spec File "/home/ubuntu/git/tensorforce/tensorforce/agents/agent.py", line 250, in from_spec kwargs=kwargs File "/home/ubuntu/git/tensorforce/tensorforce/util.py", line 173, in get_object return obj(*args, **kwargs) File "/home/ubuntu/git/tensorforce/tensorforce/agents/vpg_agent.py", line 144, in __init__ keep_last_timestep=keep_last_timestep File "/home/ubuntu/git/tensorforce/tensorforce/agents/batch_agent.py", line 61, in __init__ batched_observe=batched_observe File "/home/ubuntu/git/tensorforce/tensorforce/agents/agent.py", line 97, in __init__ self.model = self.initialize_model() File "/home/ubuntu/git/tensorforce/tensorforce/agents/vpg_agent.py", line 169, in initialize_model gae_lambda=self.gae_lambda File "/home/ubuntu/git/tensorforce/tensorforce/models/pg_model.py", line 86, in __init__ entropy_regularization=entropy_regularization, File "/home/ubuntu/git/tensorforce/tensorforce/models/distribution_model.py", line 74, in __init__ reward_preprocessing_spec=reward_preprocessing_spec File "/home/ubuntu/git/tensorforce/tensorforce/models/model.py", line 119, in __init__ self.setup() File "/home/ubuntu/git/tensorforce/tensorforce/models/model.py", line 302, in setup local_init_op = tf.group(*(local_var.assign(value=global_var) for local_var, global_var in zip(local_variables, global_variables))) File "/home/ubuntu/git/tensorforce/tensorforce/models/model.py", line 302, in <genexpr> local_init_op = tf.group(*(local_var.assign(value=global_var) for local_var, global_var in zip(local_variables, global_variables))) File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 573, in assign return state_ops.assign(self._variable, value, use_locking=use_locking) File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 276, in assign validate_shape=validate_shape) File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 57, in assign use_locking=use_locking, name=name) File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 546, in _apply_op_helper inferred_from[input_arg.type_attr])) TypeError: Input 'value' of 'Assign' Op has type int32 that does not match type float32 of argument 'ref'.

@michaelschaarschmidt
Copy link
Contributor

Thanks, will have a look soon

@slundell
Copy link

slundell commented Dec 6, 2017

This is the same one as I mentioned in gitter.

@michaelschaarschmidt
Copy link
Contributor

Variables were not being sorted in optimizer, thus causing indeterministic assignments. Now running for me with latest commit:

[2017-12-08 20:41:47,332] Making new env: CartPole-v0
2017-12-08 20:41:51.882975: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-08 20:41:51.883023: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-12-08 20:41:51.883041: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-08 20:41:51.883055: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-12-08 20:41:51.894743: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> 127.0.0.1:12222}
2017-12-08 20:41:51.894799: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> localhost:12223, 1 -> 127.0.0.1:12224}
2017-12-08 20:41:51.895239: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:316] Started server with target: grpc://localhost:12223
2017-12-08 20:41:52.690747: I tensorflow/core/distributed_runtime/master_session.cc:998] Start master session 0bc448dfc7fb6aa8 with config:
[2017-12-08 20:41:52,776] Starting distributed agent for OpenAI Gym 'CartPole-v0'
[2017-12-08 20:41:52,776] Config:
[2017-12-08 20:41:52,776] {u'optimizer': {u'learning_rate': 0.01, u'type': u'adam'}, u'baseline': None, u'entropy_regularization': None, u'batch_size': 4000, u'gae_lambda': None, u'discount': 0.99, 'distributed_spec': {'device': '/job:worker/task:0', 'parameter_server': False, 'task_index': 0, 'cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x1101c9ed0>}, u'baseline_optimizer': None, u'baseline_mode': None, u'type': u'vpg_agent'}
[2017-12-08 20:41:52,823] Finished episode 1 after overall 13 timesteps. Steps Per Second 275.237636607
[2017-12-08 20:41:52,823] Episode reward: 13.0
[2017-12-08 20:41:52,823] Average of last 500 rewards: 13.0
[2017-12-08 20:41:52,823] Average of last 100 rewards: 13.0
[2017-12-08 20:41:52,876] Finished episode 2 after overall 31 timesteps. Steps Per Second 310.739675742
[2017-12-08 20:41:52,876] Episode reward: 18.0
[2017-12-08 20:41:52,876] Average of last 500 rewards: 15.5
[2017-12-08 20:41:52,877] Average of last 100 rewards: 15.5
[2017-12-08 20:41:52,959] Finished episode 3 after overall 55 timesteps. Steps Per Second 301.204784039
[2017-12-08 20:41:52,959] Episode reward: 24.0
[2017-12-08 20:41:52,959] Average of last 500 rewards: 18.3333333333
[2017-12-08 20:41:52,959] Average of last 100 rewards: 18.3333333333
[2017-12-08 20:41:53,054] Finished episode 4 after overall 83 timesteps. Steps Per Second 298.711660685
[2017-12-08 20:41:53,054] Episode reward: 28.0
[2017-12-08 20:41:53,054] Average of last 500 rewards: 20.75
[2017-12-08 20:41:53,054] Average of last 100 rewards: 20.75
[2017-12-08 20:41:53,101] Finished episode 5 after overall 108 timesteps. Steps Per Second 331.898111927
[2017-12-08 20:41:53,101] Episode reward: 25.0
[2017-12-08 20:41:53,102] Average of last 500 rewards: 21.6
[2017-12-08 20:41:53,102] Average of last 100 rewards: 21.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants