Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidArgumentError on terminal observe call #639

Closed
connorlbark opened this issue Dec 20, 2019 · 21 comments
Closed

InvalidArgumentError on terminal observe call #639

connorlbark opened this issue Dec 20, 2019 · 21 comments

Comments

@connorlbark
Copy link

connorlbark commented Dec 20, 2019

Perhaps something is wrong with my code, but almost half the time when the episode ends, I get an assertion error when I run observe on my PPO agent:

Traceback (most recent call last):
  File "ll.py", line 208, in <module>
    main()
  File "ll.py", line 181, in main
    agent.give_reward(reward, terminal)
  File "ll.py", line 123, in give_reward
    self.agent.observe(reward=reward, terminal=terminal)
  File "c:\users\connor\desktop\tensorforce\tensorforce\agents\agent.py", line 534, in observe
    terminal=terminal, reward=reward, parallel=[parallel], **kwargs
  File "c:\users\connor\desktop\tensorforce\tensorforce\core\module.py", line 578, in fn
    fetched = self.monitored_session.run(fetches=fetches, feed_dict=feed_dict)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\six.py", line 696, in reraise
    raise value
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1418, in run
    run_metadata=run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1176, in run
    return self._sess.run(*args, **kwargs)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
    run_metadata_ptr)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
    run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [] [Condition x <= y did not hold element-wise:x (baseline-network-state.observe/baseline-network-state.core_observe/baseline-network-state.core_experience/memory.enqueue/strided_slice:0) = ] [18243] [y (baseline-network-state.observe/baseline-network-state.core_observe/baseline-network-state.core_experience/memory.enqueue/sub_2:0) = ] [17999]
         [[{{node Assert}}]]

My original theory was that I was accidentally calling observe again after setting terminal=True and before resetting the agent, or some other abuse of observe, but I prevented that from happening in my code, so I don't believe that's the case. Also, the episode runs completely fine, and I get through thousands of calls to observe without ever running to any issues. It's only when terminal=True that it seems to occur.

Running on Windows 10 x64, with tensorflow-gpu v2.0.0 on an RTX2070, Tensorforce installed from the Github at commit 827febc

@AlexKuhnle
Copy link
Member

Hi,
a likely reason is that there is a mismatch between memory-size, max-episode-timesteps and actual episode length. For many configurations, only a terminal observe triggers an actual TensorFlow call (to avoid unnecessary overhead), but internal memory- and buffer-sizes need to be statically created, so need to know in advance how long an episode will be. Is it possible that this happens? What are the values for memory, update, max_episode_timesteps, which agent are you using, and does it happen that an episode runs longer than specified?

@connorlbark
Copy link
Author

That definitely sounds correct -- entirely possible it's running over.

I'm not sure when I'll be able to test it, but I will get back to you asap

@connorlbark
Copy link
Author

Okay, I hadn't set memory so according to the docs it should have been the minimum. But I just increased it to well over what it needs to be, and it appears to be working. Thanks. Kind of a confusing error though

@AlexKuhnle
Copy link
Member

Great. Yes, you're right, exception messages have to be improved. Can you post the agent and environment specs, just to double-check?

@connorlbark
Copy link
Author

connorlbark commented Dec 24, 2019

yep!

Here's the call I make to create the agent

        self.agent = agents.Agent.create(
            agent='ppo',
            # Automatically configured network
            network=dict(type='auto', depth=5, size=128, internal_rnn=64),
            # MDP structure
            states=dict(type='float', shape=(54, )),
            actions=dict(type='int', num_values=13),
            max_episode_timesteps=18000, memory=500000, 
            # Optimization
            batch_size=3, update_frequency=3, learning_rate=1e-3, subsampling_fraction=0.33,
            optimization_steps=10,
            # Reward estimation
            likelihood_ratio_clipping=0.2, discount=0.99, estimate_terminal=False,
            # Critic
            critic_network=dict(type='auto', depth=5, size=128, internal_rnn=64),
            critic_optimizer=dict(optimizer='adam', multi_step=10, learning_rate=1e-3),
            # Preprocessing
            preprocessing=None,
            # Exploration
            exploration=0.0, variable_noise=0.05,
            # Regularization
            l2_regularization=0.2, entropy_regularization=0.0,
            # TensorFlow etc
            name='agent', device=None, parallel_interactions=1, seed=None, execution=None, saver=None,
            summarizer=None, recorder=None
        )

Most of the hyperparameters are arbitrarily set for me to quickly debug and make sure everything is working.

The environment itself isn't an OpenAI gym or anything like that, just me calling directly to act+observe up to a max of 60 times/sec according to a data scraped from a game's frame data.

On the topic of obscure exception messages, when it's training, it seems to get through all of the updates (takes quite a few minutes & resources), but eventually it will crash with a message like:

Traceback (most recent call last):
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
    return fn(*args)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[1553534] = 4765250 is not in [0, 74857)
         [[{{node inner-optimizer.step/GatherV2_1}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ll.py", line 211, in <module>
    main()
  File "ll.py", line 184, in main
    agent.give_reward(reward, terminal)
  File "ll.py", line 123, in give_reward
    self.agent.observe(reward=reward, terminal=terminal)
  File "c:\users\connor\desktop\tensorforce\tensorforce\agents\agent.py", line 534, in observe
    terminal=terminal, reward=reward, parallel=[parallel], **kwargs
  File "c:\users\connor\desktop\tensorforce\tensorforce\core\module.py", line 578, in fn
    fetched = self.monitored_session.run(fetches=fetches, feed_dict=feed_dict)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\six.py", line 696, in reraise
    raise value
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1418, in run
    run_metadata=run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1176, in run
    return self._sess.run(*args, **kwargs)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
    run_metadata_ptr)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
    run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[1553534] = 4765250 is not in [0, 74857)
         [[{{node inner-optimizer.step/GatherV2_1}}]]

Increasing memory, changing batch size had no effect.

@AlexKuhnle
Copy link
Member

That's weird, the index it's complaining about is completely off. The exception message is from TensorFlow, ideally Tensorforce should catch these before (and generally provide more meaningful messages than it does right now).

While I don't see how this would cause the error, general comment: memory is not necessary to set, and if set, to be safe needs to be around (batch-size + 1) * max-episode-timesteps. Or maybe it is related: Did you say that increasing the memory size fixed the problem above?

@connorlbark
Copy link
Author

connorlbark commented Dec 24, 2019

Yes, when I set the memory to way higher than it needed to be, 500000, it worked fine (I've ran it quite a few times with the higher memory, and it has only crashed to the above comment's error ). Just unset it to make sure, went back to the original error. Definitely memory is the issue there. Didn't matter if I chose different batch sizes (crashed on 1, crashed on 3, etc.)

@connorlbark
Copy link
Author

connorlbark commented Dec 24, 2019

Okay, so as it turns out the first error was just an error on my part. I assumed the game capped at 60fps, but it caps at 120, so it was getting about double the amount of calls than expected. When I fixed that, I was able to comment out the memory call, but the error when training remained:

Traceback (most recent call last):
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
    return fn(*args)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[1587572] = 4626839 is not in [0, 76496)
         [[{{node inner-optimizer.step/GatherV2_1}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ll.py", line 214, in <module>
    main()
  File "ll.py", line 187, in main
    agent.give_reward(reward, terminal)
  File "ll.py", line 123, in give_reward
    self.agent.observe(reward=reward, terminal=terminal)
  File "c:\users\connor\desktop\tensorforce\tensorforce\agents\agent.py", line 534, in observe
    terminal=terminal, reward=reward, parallel=[parallel], **kwargs
  File "c:\users\connor\desktop\tensorforce\tensorforce\core\module.py", line 578, in fn
    fetched = self.monitored_session.run(fetches=fetches, feed_dict=feed_dict)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\six.py", line 696, in reraise
    raise value
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1418, in run
    run_metadata=run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1176, in run
    return self._sess.run(*args, **kwargs)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
    run_metadata_ptr)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
    run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[1587572] = 4626839 is not in [0, 76496)
         [[{{node inner-optimizer.step/GatherV2_1}}]]

This, again, might be a setup error on my part, but I can open a new issue for this if you'd like.

@AlexKuhnle
Copy link
Member

Let's discuss it here, at least until we identify what the problem is in more detail.

Do you make sure the episode is terminated after at most 18000 timesteps? Maybe you can also run your config on CartPole or so, to see whether the same exception occurs, in which case I could reproduce it?

@AlexKuhnle AlexKuhnle reopened this Dec 26, 2019
@connorlbark
Copy link
Author

I recreated it with this exact code:

import tensorflow as tf
from tensorforce.agents import Agent
from tensorforce.environments import Environment
from tensorforce.execution import Runner
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

environment = Environment.create(environment='gym', level='CartPole-v1')

agent = Agent.create(
        agent='ppo',
        # Automatically configured network
        network=dict(type='auto', depth=5, size=128, internal_rnn=64),
        # MDP structure
        environment=environment,
        #states=dict(type='float', shape=(54, )),
        #actions=dict(type='int', num_values=13),
        #max_episode_timesteps=60000, #memory=500000, 
        # Optimization
        batch_size=3, update_frequency=3, learning_rate=1e-3, subsampling_fraction=0.33,
        optimization_steps=10,
        # Reward estimation
        likelihood_ratio_clipping=0.2, discount=0.99, estimate_terminal=False,
        # Critic
        critic_network=dict(type='auto', depth=5, size=128, internal_rnn=64),
        critic_optimizer=dict(optimizer='adam', multi_step=10, learning_rate=1e-3),
        # Preprocessing
        preprocessing=None,
        # Exploration
        exploration=0.0, variable_noise=0.05,
        # Regularization
        l2_regularization=0.2, entropy_regularization=0.0,
        # TensorFlow etc
        name='agent', device=None, parallel_interactions=1, seed=None, execution=None, saver=None,
        summarizer=None, recorder=None
    )

runner = Runner(agent=agent, environment=environment)

runner.run(num_episodes=300)
runner.close()

Outputs:

WARNING:tensorflow:From C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Episodes:   1%|▎                          | 3/300 [00:05, reward=11.00, ts/ep=11, sec/ep=0.03, ms/ts=3.1, agent=100.0%]Traceback (most recent call last):
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
    return fn(*args)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0] = 212 is not in [0, 65)
         [[{{node inner-optimizer.step/GatherV2_5}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 41, in <module>
    runner.run(num_episodes=300)
  File "c:\users\connor\desktop\tensorforce\tensorforce\execution\runner.py", line 241, in run
    if not self.run_episode(environment=self.environment, evaluation=self.evaluation):
  File "c:\users\connor\desktop\tensorforce\tensorforce\execution\runner.py", line 353, in run_episode
    updated = self.agent.observe(terminal=terminal, reward=reward)
  File "c:\users\connor\desktop\tensorforce\tensorforce\agents\agent.py", line 534, in observe
    terminal=terminal, reward=reward, parallel=[parallel], **kwargs
  File "c:\users\connor\desktop\tensorforce\tensorforce\core\module.py", line 578, in fn
    fetched = self.monitored_session.run(fetches=fetches, feed_dict=feed_dict)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\six.py", line 696, in reraise
    raise value
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1418, in run
    run_metadata=run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1176, in run
    return self._sess.run(*args, **kwargs)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
    run_metadata_ptr)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
    run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0] = 212 is not in [0, 65)
         [[{{node inner-optimizer.step/GatherV2_5}}]]
Episodes:   1%|▎                          | 3/300 [00:07, reward=11.00, ts/ep=11, sec/ep=0.03, ms/ts=3.1, agent=100.0%]

Does this have something to do with the fact that I'm running it on windows? Like I mentioned in Gitter, it means I can't have tfa, I'm not sure if that's causing any problems.

@connorlbark
Copy link
Author

connorlbark commented Dec 26, 2019

Update: Removing the RNN seems to have completely removed the error message.

Like so:

agent = Agent.create(
        agent='ppo',
        # Automatically configured network
        network=dict(type='auto', depth=5, size=128),
        # MDP structure
        environment=environment,
        #states=dict(type='float', shape=(54, )),
        #actions=dict(type='int', num_values=13),
        #max_episode_timesteps=60000, #memory=500000, 
        # Optimization
        batch_size=3, update_frequency=3, learning_rate=1e-3, subsampling_fraction=0.33,
        optimization_steps=10,
        # Reward estimation
        likelihood_ratio_clipping=0.2, discount=0.99, estimate_terminal=False,
        # Critic
        critic_network=dict(type='auto', depth=5, size=128, internal_rnn=64),
        critic_optimizer=dict(optimizer='adam', multi_step=10, learning_rate=1e-3),
        # Preprocessing
        preprocessing=None,
        # Exploration
        exploration=0.0, variable_noise=0.05,
        # Regularization
        l2_regularization=0.2, entropy_regularization=0.0,
        # TensorFlow etc
        name='agent', device=None, parallel_interactions=1, seed=None, execution=None, saver=None,
        summarizer=None, recorder=None
    )

@AlexKuhnle
Copy link
Member

Thanks, that's helpful, I will look into this asap.

@AlexKuhnle
Copy link
Member

Okay, I had a look at this now, and unfortunately it turns out to be a rather annoying problem that I have so far not realised: The subsampling-optimizer as part of the PPO optimization randomly subsamples from the batch, wrongly assuming the same shape everywhere, but in the case of RNNs each timestep state actually involves a sequence of preceeding states as well (which are then processed by the RNN to one single embedding per timestep). It should be possible to fix this, but it will certainly take a few days. In the meantime, you could consider just removing the subsampling-optimizer here (by commenting this line). This shouldn't really affect performance on the whole (in theory at least), but it may negatively affect runtime and memory usage.

Thanks for catching and reporting this problem!

@connorlbark
Copy link
Author

Awesome, glad to help!

@qZhang88
Copy link

qZhang88 commented Jan 3, 2020

I have encountered the same kind of error, but I haven't use any RNN.

Traceback (most recent call last):
  File "1v1_ppo.py", line 215, in <module>
    main()
  File "1v1_ppo.py", line 179, in main
    updated = agent.observe(terminal=terminal, reward=reward)
  File "C:\Users\starsee-fanpw\.conda\envs\battle\lib\site-packages\tensorforce\agents\agent.py", line 534, in observe
    terminal=terminal, reward=reward, parallel=[parallel], **kwargs
  File "C:\Users\starsee-fanpw\.conda\envs\battle\lib\site-packages\tensorforce\core\module.py", line 578, in fn
    fetched = self.monitored_session.run(fetches=fetches, feed_dict=feed_dict)
  File "C:\Users\starsee-fanpw\.conda\envs\battle\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "C:\Users\starsee-fanpw\.conda\envs\battle\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "C:\Users\starsee-fanpw\.conda\envs\battle\lib\site-packages\six.py", line 696, in reraise
    raise value
  File "C:\Users\starsee-fanpw\.conda\envs\battle\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "C:\Users\starsee-fanpw\.conda\envs\battle\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1418, in run
    run_metadata=run_metadata)
  File "C:\Users\starsee-fanpw\.conda\envs\battle\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1176, in run
    return self._sess.run(*args, **kwargs)
  File "C:\Users\starsee-fanpw\.conda\envs\battle\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
    run_metadata_ptr)
  File "C:\Users\starsee-fanpw\.conda\envs\battle\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\starsee-fanpw\.conda\envs\battle\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
    run_metadata)
  File "C:\Users\starsee-fanpw\.conda\envs\battle\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [] [Condition x <= y did not hold element-wise:x (red_0.observe/red_0.core_observe/red_0.core_update/update-batch-size.value/Identity:0) = ] [10] [y (red_0.observe/red_0.core_observe/red_0.core_update/memory.retrieve_episodes/assert_less_equal/ReadVariableOp:0) = ] [9]
         [[{{node Assert}}]]

The agent was created using following code:

agent = Agent.create(
    agent='ppo',
    # Basic
    states=dict(type='float', shape=(args.num_states,)),
    actions=dict(type='int', num_values=args.num_actions),
    max_episode_timesteps=args.max_timestep_per_episode,
    memory=10000,
    # Automatically configured network
    network='auto',
    # Optimization
    batch_size=10, update_frequency=2, learning_rate=1e-3,
    subsampling_fraction=0.2, optimization_steps=5,
    # Reward estimation
    likelihood_ratio_clipping=0.2, discount=0.99, estimate_terminal=False,
    # Critic
    critic_network='auto',
    critic_optimizer=dict(optimizer='adam', multi_step=10, learning_rate=1e-3),
    # Preprocessing
    preprocessing=None,
    # Exploration
    exploration=0.0, variable_noise=0.0,
    # Regularization
    l2_regularization=0.0, entropy_regularization=0.0,
    # TensorFlow etc
    name=name, device=None, parallel_interactions=1, seed=None, execution=None,
    saver=None, summarizer=None, recorder=None
  )

@connorlbark
Copy link
Author

Hey @qZhang88, try making sure your max_timesteps_per_episode is a high enough value. That happened to me when the number of timesteps actually observed was greater than max_timesteps_per_episode

@qZhang88
Copy link

qZhang88 commented Jan 3, 2020

Hey @qZhang88, try making sure your max_timesteps_per_episode is a high enough value. That happened to me when the number of timesteps actually observed was greater than max_timesteps_per_episode

will try it, but found it might be related to batch size, it only run 11 episode, count from 0 to 10, when 10 episode end, the error occurs. And I have tried to set batch siez to 4, it works all fine, no error.

@connorlbark
Copy link
Author

Make sure your memory is greater than (batch_size + 1) * max_episode_timesteps. If you increased the batch size without increasing memory=10000, you may have gone over. You could remove the memory=10000 line and have the agent set its own memory, and memory will be set automatically according to batch_size.

@AlexKuhnle
Copy link
Member

@porgull's comments should help. Generally, one needs to make sure that the memory is big enough to fit enough timesteps, which in case of the episode-mode would be (batch-size + 1) * max-episode-timesteps. Moreover, one needs to make sure that the environment terminates within max-episode-timesteps. If you use environment = Environment.create(..., max_episode_timesteps=?) and agent = Agent.create(..., environment=environment), and you don't explicitly specify memory, it should definitely all work (but of course there are cases which require more customization).

@qZhang88
Copy link

qZhang88 commented Jan 4, 2020

Thanks, guys @AlexKuhnle and @porgull, it is indeed the memory problem.

Now I understand the error message here, update-batch-size.value should <= memory.retrieve_episodes.

tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [] [Condition x <= y did not hold element-wise:x (red_0.observe/red_0.core_observe/red_0.core_update/update-batch-size.value/Identity:0) = ] [10] [y (red_0.observe/red_0.core_observe/red_0.core_update/memory.retrieve_episodes/assert_less_equal/ReadVariableOp:0) = ] [9]
         [[{{node Assert}}]]

@AlexKuhnle
Copy link
Member

@porgull , I think the problem you mentioned should be fixed in the latest commit (could only run a few tests right now, so I hope I didn't introduce another problem :-).

I also tried to improve / introduce exception messages which hopefully explain the problem a bit better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants