Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Action Discrete(5) and reward in "simple_tag" env #11

Closed
namidairo777 opened this issue Nov 11, 2017 · 5 comments
Closed

Action Discrete(5) and reward in "simple_tag" env #11

namidairo777 opened this issue Nov 11, 2017 · 5 comments

Comments

@namidairo777
Copy link

namidairo777 commented Nov 11, 2017

  1. The action for each agent is Discrete(5). However actually it is Box(5) within (-1, 1).
    The code here
    agent.action.u[0] += action[0][1] - action[0][2] agent.action.u[1] += action[0][3] - action[0][4]
    is used to get p_force and then to get p_vel, So what does action[0][0] do?

  2. The reward of adversary agents for each step is based on is_collision which turns out to be the same reward for each adversary agent even if we consider the penalty in the case shape = True.
    How is it different from self.shared_reward = True in environment.py?

I don't mean to complain, just wonder how it works.
Appreciate it if you guys could answer me.

@xuemei-ye
Copy link

xuemei-ye commented Nov 12, 2017

I have the same question,What is Discrete mean,a integer?
And in the experiments of paper,how to use the action,and what is the network output?
What is the dimensionality of action,how to use it to change agent position and velocity? apply_action_force 、apply_environment_force in core.py changed it,but how to use it when action is continuous?
Have many other questions in recurrence your experiments, and I email you @ryan-lowe,wish you can give a example of how to use the platform,Thank you!

@Haoxiang-Wang
Copy link

Agree. If you OpenAI guys can release a simple example of random agents in all environment, then it will be a great relief. Hope there will be a explanation of the action space and how to take action in different environments, since it's quiet confusing. Thank you.

@tebba-von-mathenstein
Copy link

@Northernwolf, I'm not a maintainer/author but I was playing around with it this morning and I think I have a simple example that you can use to give all agents in the environment a random action for any of these environments, just replace make_env('simple_push') with the name of the scenario you want to watch:

from make_env import make_env
import numpy as np

env = make_env('simple_push')

for i_episode in range(20):
    observation = env.reset()
    for t in range(100):
        env.render()
        agent_actions = []
        for i, agent in enumerate(env.world.agents):
            # This is a Discrete
            # https://github.com/openai/gym/blob/master/gym/spaces/discrete.py
            agent_action_space = env.action_space[i]

            # Sample returns an int from 0 to agent_action_space.n
            action = agent_action_space.sample()

            # Environment expects a vector with length == agent_action_space.n
            # containing 0 or 1 for each action, 1 meaning take this action
            action_vec = np.zeros(agent_action_space.n)
            action_vec[action] = 1
            agent_actions.append(action_vec)

        # Each of these is a vector parallel to env.world.agents, as is agent_actions
        observation, reward, done, info = env.step(agent_actions)
        print (observation)
        print (reward)
        print (done)
        print (info)
        print()

Hope it helps!

@ryan-lowe
Copy link
Contributor

Hi
I apologize for taking so long to get to this. We've finally released some code for training agents on this domain, which is publicly available here: https://github.com/openai/maddpg
-Ryan

@maxmax1992
Copy link

I really wish that they specified if the wanted vector is one-hot encoded or just probabilities of taking that action. It is very unclear from the documentation :((

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants