-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ghost obstacles hurt #66
Comments
Very interesting! Can you please check if |
to reproduce, you could simply log every observations of every episode, and check them with the algorithm above. my code was checking on the fly during training. |
@kidzik i train mainly with remote environment so by the time I wrote previous post that's not possible. I will try that on local machine with modified env to collect env_desc. I think it might not be a problem of the obstacle list though, since the obstacle observations are generated by comparing pelvis_x with the obstacle list. It might be bugs that corrupted pelvis_x or the calculation of obstacle_relative. |
That's true, I meant it rather as a sanity check, also to check which of the obstacles is wrong (if it's always the first or the last one, it may speed up debugging). One important thing here is the logic of the obstacles sensor. Current semantics are as follows (besides the error of the first observation):
as implemented here https://github.com/stanfordnmbl/osim-rl/blob/master/osim/env/run.py#L110-L120 Note that the second point makes it a little messy: agent is likely to see an obstacle with negative relative distance. Moreover, if one obstacle was covering another it may show up after passing the first one. The actual list of obstacles is created here https://github.com/stanfordnmbl/osim-rl/blob/master/osim/env/run.py#L228-L262 and I don't see any reason why it could be other than Yet, the example you gave is still not covered by this case. |
Thank you. I know the logic of generation of obstacles very well (we all do :) ). which will result in undercounting (the client will never 'see' one of the obstacles throughout 1000 steps) in some cases, this one for example: which is not strictly a bug since this unseen obstacle does not affect the agent's performance. it just made the counting inaccurate. |
Yes, that's the reason I mentioned the exact procedure here. As you say, it should not affect performance much, while it might affect counting. |
on FourObstacleError I made a dump. dump.zip |
I'll paste all the problems I found.
|
Update: the messed up observations are likely to be my problem, this issue could be closed. def reset(self, difficulty=2, seed=None):
super(RunEnv, self).reset()
self.istep = 0
self.last_state = self.get_observation()
self.setup(difficulty, seed)
self.current_state = self.last_state
return self.last_state the last_state was obtained before setup(), causing last_state not representing the actual model state after reset(). The two lines should really be switched. |
Great! At this point, it's a duplicate of #53, so I'm closing this one. |
I tried to log everything during training, especially the positions of obstacles my agent have seen.
Psuedo code:
my actual python implementation at: https://github.com/ctmakro/stanford-osrl/blob/master/observation_processor.py#L196-L215
problem description:
the first observation (from reset()) does not contain the closest obstacle. Instead it reads [100,0,0] meaning no obstacle is ahead. I have already explicitly handled that case in my code. I'm not sure if that's fixed on master.
sometimes ghost obstacles came out of nowhere. They only exist for like one frame, then disappears. Not very often though, about once every 50-100 episodes. With my algorithm above, those situation will result in either NewObstacleAreCloserToOriginError or ObstaclesTooMuchError.
I believe there's no bug in my algorithm, because when Errors are not raised, my agent can run through the environment and score >15 points. If this is a bug on my side, then ghost obstacles should came up more often and have my agent killed more often.
When logging the Error I also log the current state of the obstacle buffer (the variable
have_seen
as shown above). Here are three samples from my console:the absolute x position of the obstacles (I called them balls in my code), as shown above, are [1.36, 2.51, 3.20, 4.84].
the absolute x position of the incoming obstacle is 1.01, less than the one already in the buffer, which is 1.84. This should only happen if my agent fell backwards and hit an obstacle he haven't seen before, which he should definitely saw, because it's the first obstacle.
Same as above, except this time not only the new obstacle (at 1.83) is closer to origin than the old ones(at [2.51, 3.20, 4.84]), but also we have four obstacles in total.
The text was updated successfully, but these errors were encountered: