Action Discrete(5) and reward in "simple_tag" env #11

namidairo777 · 2017-11-11T13:21:47Z

The action for each agent is Discrete(5). However actually it is Box(5) within (-1, 1).
The code here
agent.action.u[0] += action[0][1] - action[0][2] agent.action.u[1] += action[0][3] - action[0][4]
is used to get p_force and then to get p_vel, So what does action[0][0] do?
The reward of adversary agents for each step is based on is_collision which turns out to be the same reward for each adversary agent even if we consider the penalty in the case shape = True.
How is it different from self.shared_reward = True in environment.py?

I don't mean to complain, just wonder how it works.
Appreciate it if you guys could answer me.

The text was updated successfully, but these errors were encountered:

xuemei-ye · 2017-11-12T03:02:51Z

I have the same question,What is Discrete mean,a integer?
And in the experiments of paper,how to use the action,and what is the network output?
What is the dimensionality of action,how to use it to change agent position and velocity? apply_action_force 、apply_environment_force in core.py changed it,but how to use it when action is continuous?
Have many other questions in recurrence your experiments, and I email you @ryan-lowe,wish you can give a example of how to use the platform,Thank you!

Haoxiang-Wang · 2017-11-13T08:56:18Z

Agree. If you OpenAI guys can release a simple example of random agents in all environment, then it will be a great relief. Hope there will be a explanation of the action space and how to take action in different environments, since it's quiet confusing. Thank you.

tebba-von-mathenstein · 2018-01-15T22:29:58Z

@Northernwolf, I'm not a maintainer/author but I was playing around with it this morning and I think I have a simple example that you can use to give all agents in the environment a random action for any of these environments, just replace make_env('simple_push') with the name of the scenario you want to watch:

from make_env import make_env
import numpy as np

env = make_env('simple_push')

for i_episode in range(20):
    observation = env.reset()
    for t in range(100):
        env.render()
        agent_actions = []
        for i, agent in enumerate(env.world.agents):
            # This is a Discrete
            # https://github.com/openai/gym/blob/master/gym/spaces/discrete.py
            agent_action_space = env.action_space[i]

            # Sample returns an int from 0 to agent_action_space.n
            action = agent_action_space.sample()

            # Environment expects a vector with length == agent_action_space.n
            # containing 0 or 1 for each action, 1 meaning take this action
            action_vec = np.zeros(agent_action_space.n)
            action_vec[action] = 1
            agent_actions.append(action_vec)

        # Each of these is a vector parallel to env.world.agents, as is agent_actions
        observation, reward, done, info = env.step(agent_actions)
        print (observation)
        print (reward)
        print (done)
        print (info)
        print()

Hope it helps!

ryan-lowe · 2018-02-23T21:50:35Z

Hi
I apologize for taking so long to get to this. We've finally released some code for training agents on this domain, which is publicly available here: https://github.com/openai/maddpg
-Ryan

maxmax1992 · 2019-07-19T08:47:35Z

I really wish that they specified if the wanted vector is one-hot encoded or just probabilities of taking that action. It is very unclear from the documentation :((

ryan-lowe closed this as completed Feb 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Action Discrete(5) and reward in "simple_tag" env #11

Action Discrete(5) and reward in "simple_tag" env #11

namidairo777 commented Nov 11, 2017 •

edited

Loading

xuemei-ye commented Nov 12, 2017 •

edited

Loading

Haoxiang-Wang commented Nov 13, 2017

tebba-von-mathenstein commented Jan 15, 2018

ryan-lowe commented Feb 23, 2018

maxmax1992 commented Jul 19, 2019

Action Discrete(5) and reward in "simple_tag" env #11

Action Discrete(5) and reward in "simple_tag" env #11

Comments

namidairo777 commented Nov 11, 2017 • edited Loading

xuemei-ye commented Nov 12, 2017 • edited Loading

Haoxiang-Wang commented Nov 13, 2017

tebba-von-mathenstein commented Jan 15, 2018

ryan-lowe commented Feb 23, 2018

maxmax1992 commented Jul 19, 2019

namidairo777 commented Nov 11, 2017 •

edited

Loading

xuemei-ye commented Nov 12, 2017 •

edited

Loading