You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I discovered while digging through the code that a certain state value called previous_state of the DQN algorithm (and possibly some others) is being cached on the act() and action_distribution methods of the class.
From the little digging that I did, it seems to be related to the side-panel of the rendering, which showcases extra information about the attention heads of the controller vehicles.
Only that, when there are more than one controller vehicles, it seems to be redefined n+1 times, where n is the number of vehicles, during each act() call: once as the tuple of observations of all agents, and once as the observation of each agent, until it gets redefined as the observation of the last controlled vehicle.
Snippet from rl_agents/agents/deep_q_network/abstract.py:
defact(self, state, step_exploration_time=True):
""" Act according to the state-action value model and an exploration policy :param state: current state :param step_exploration_time: step the exploration schedule :return: an action """self.previous_state=state#<==========HERE=============ifstep_exploration_time:
self.exploration_policy.step_time()
# Handle multi-agent observations# TODO: it would be more efficient to forward a batch of statesifisinstance(state, tuple):
returntuple(self.act(agent_state, step_exploration_time=False) foragent_stateinstate)
# Single-agent settingvalues=self.get_state_action_values(state)
self.exploration_policy.update(values)
returnself.exploration_policy.sample()
It does not seem like the most pressing issue, but I am just putting it here, in case anyone has a decent idea on how to deal with this. Or for a clearer explanation as to why this variable is important, as I only gave one example of its usefulness.
Thanks!
The text was updated successfully, but these errors were encountered:
previous_state is currently only used to render information about the agent's decision-making process. In particular, when we need access to internal information rather than the mere output Q-values / action probs, e.g. the attention scores. It's then easier to forward the model again than to store and
we typically expect a single state, so as to render the decision a single agent
when I introduced a multi-agent mode, I decided to keep rendering a single agent for simplicity and readability
so the act() method is called iteratively for all agents, and only the last one is used for rendering
I discovered while digging through the code that a certain state value called
previous_state
of the DQN algorithm (and possibly some others) is being cached on theact()
andaction_distribution
methods of the class.From the little digging that I did, it seems to be related to the side-panel of the rendering, which showcases extra information about the attention heads of the controller vehicles.
Only that, when there are more than one controller vehicles, it seems to be redefined n+1 times, where n is the number of vehicles, during each
act()
call: once as the tuple of observations of all agents, and once as the observation of each agent, until it gets redefined as the observation of the last controlled vehicle.Snippet from
rl_agents/agents/deep_q_network/abstract.py
:It does not seem like the most pressing issue, but I am just putting it here, in case anyone has a decent idea on how to deal with this. Or for a clearer explanation as to why this variable is important, as I only gave one example of its usefulness.
Thanks!
The text was updated successfully, but these errors were encountered: