-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Action Discrete(5) and reward in "simple_tag" env #11
Comments
I have the same question,What is Discrete mean,a integer? |
Agree. If you OpenAI guys can release a simple example of random agents in all environment, then it will be a great relief. Hope there will be a explanation of the action space and how to take action in different environments, since it's quiet confusing. Thank you. |
@Northernwolf, I'm not a maintainer/author but I was playing around with it this morning and I think I have a simple example that you can use to give all agents in the environment a random action for any of these environments, just replace from make_env import make_env
import numpy as np
env = make_env('simple_push')
for i_episode in range(20):
observation = env.reset()
for t in range(100):
env.render()
agent_actions = []
for i, agent in enumerate(env.world.agents):
# This is a Discrete
# https://github.com/openai/gym/blob/master/gym/spaces/discrete.py
agent_action_space = env.action_space[i]
# Sample returns an int from 0 to agent_action_space.n
action = agent_action_space.sample()
# Environment expects a vector with length == agent_action_space.n
# containing 0 or 1 for each action, 1 meaning take this action
action_vec = np.zeros(agent_action_space.n)
action_vec[action] = 1
agent_actions.append(action_vec)
# Each of these is a vector parallel to env.world.agents, as is agent_actions
observation, reward, done, info = env.step(agent_actions)
print (observation)
print (reward)
print (done)
print (info)
print() Hope it helps! |
Hi |
I really wish that they specified if the wanted vector is one-hot encoded or just probabilities of taking that action. It is very unclear from the documentation :(( |
The action for each agent is Discrete(5). However actually it is Box(5) within (-1, 1).
The code here
agent.action.u[0] += action[0][1] - action[0][2] agent.action.u[1] += action[0][3] - action[0][4]
is used to get
p_force
and then to getp_vel
, So what does action[0][0] do?The reward of adversary agents for each step is based on
is_collision
which turns out to be the same reward for each adversary agent even if we consider the penalty in the caseshape = True
.How is it different from
self.shared_reward = True
inenvironment.py
?I don't mean to complain, just wonder how it works.
Appreciate it if you guys could answer me.
The text was updated successfully, but these errors were encountered: