Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ghost obstacles hurt #66

Closed
ctmakro opened this issue Aug 30, 2017 · 11 comments
Closed

Ghost obstacles hurt #66

ctmakro opened this issue Aug 30, 2017 · 11 comments

Comments

@ctmakro
Copy link
Contributor

ctmakro commented Aug 30, 2017

I tried to log everything during training, especially the positions of obstacles my agent have seen.

Psuedo code:

have_seen = []
for each observation:
    obstacle_absolute = obstacle_relative + pelvis_x
    if obstacles_absolute not present in have_seen:
        if obstacle_absolute < any x in have_seen:
            raise NewObstacleAreCloserToOriginError
        have_seen.append(obstacles_absolute)
        if length of have_seen> 3:
            raise ObstaclesTooMuchError
    else:
        (this obstacle was already seen)

my actual python implementation at: https://github.com/ctmakro/stanford-osrl/blob/master/observation_processor.py#L196-L215

problem description:

  1. the first observation (from reset()) does not contain the closest obstacle. Instead it reads [100,0,0] meaning no obstacle is ahead. I have already explicitly handled that case in my code. I'm not sure if that's fixed on master.

  2. sometimes ghost obstacles came out of nowhere. They only exist for like one frame, then disappears. Not very often though, about once every 50-100 episodes. With my algorithm above, those situation will result in either NewObstacleAreCloserToOriginError or ObstaclesTooMuchError.

  3. I believe there's no bug in my algorithm, because when Errors are not raised, my agent can run through the environment and score >15 points. If this is a bug on my side, then ghost obstacles should came up more often and have my agent killed more often.

  4. When logging the Error I also log the current state of the obstacle buffer (the variable have_seen as shown above). Here are three samples from my console:

[[1.361221778390644, -0.011292036961017608, 0.05461492299694686], [2.5194703733694803, -0.0017294513769069337, 0.27494896696570115], [3.2016361046625708, -0.012949957370803934, 0.06392466839510719], [4.840752657985693, -0.0036897576500632386, 0.14875511812424147]]
(@ step 179)What the fuck you just did! Why num of balls became greater than 3!!!
(agent) something wrong on step(). episode teminates now
Traceback (most recent call last):
  File "D:\GitHub\stanford-osrl\ddpg2.py", line 385, in play
    observation, reward, done, _info = env.step(action_out) # take long time
  File "D:\GitHub\stanford-osrl\multi.py", line 37, in step
    o = self.obg(oo)
  File "D:\GitHub\stanford-osrl\multi.py", line 21, in obg
    processed_observation, self.old_observation = go(plain_obs, self.old_observation, step=self.stepcount)
  File "D:\GitHub\stanford-osrl\observation_processor.py", line 219, in generate_observation
    addball_if_new()
  File "D:\GitHub\stanford-osrl\observation_processor.py", line 213, in addball_if_new
    raise Exception('ball number greater than 3.')
Exception: ball number greater than 3.
ball number greater than 3.

the absolute x position of the obstacles (I called them balls in my code), as shown above, are [1.36, 2.51, 3.20, 4.84].

1.0187754964007407 [[1.8462540880564555, 0.008053394783150494, 0.052140473594560394]]
(@ step )28)Damn! new ball closer than existing balls.
(agent) something wrong on step(). episode teminates now
Traceback (most recent call last):
  File "D:\GitHub\stanford-osrl\ddpg2.py", line 385, in play
    observation, reward, done, _info = env.step(action_out) # take long time
  File "D:\GitHub\stanford-osrl\multi.py", line 37, in step
    o = self.obg(oo)
  File "D:\GitHub\stanford-osrl\multi.py", line 21, in obg
    processed_observation, self.old_observation = go(plain_obs, self.old_observation, step=self.stepcount)
  File "D:\GitHub\stanford-osrl\observation_processor.py", line 219, in generate_observation
    addball_if_new()
  File "D:\GitHub\stanford-osrl\observation_processor.py", line 203, in addball_if_new
    raise Exception('new ball closer than the old ones.')
Exception: new ball closer than the old ones.
new ball closer than the old ones.

the absolute x position of the incoming obstacle is 1.01, less than the one already in the buffer, which is 1.84. This should only happen if my agent fell backwards and hit an obstacle he haven't seen before, which he should definitely saw, because it's the first obstacle.

ep 369 / 200000 times: 1 noise_level 0.05
1.8385624314724573 [[2.5194703733694803, -0.0017294513769069337, 0.27494896696570115], [3.2016361046625708, -0.012949957370803934, 0.06392466839510719], [4.840752657985693, -0.0036897576500632386, 0.14875511812424147]]
(@ step )175)Damn! new ball closer than existing balls.
(agent) something wrong on step(). episode teminates now
Traceback (most recent call last):
  File "D:\GitHub\stanford-osrl\ddpg2.py", line 385, in play
    observation, reward, done, _info = env.step(action_out) # take long time
  File "D:\GitHub\stanford-osrl\multi.py", line 37, in step
    o = self.obg(oo)
  File "D:\GitHub\stanford-osrl\multi.py", line 21, in obg
    processed_observation, self.old_observation = go(plain_obs, self.old_observation, step=self.stepcount)
  File "D:\GitHub\stanford-osrl\observation_processor.py", line 219, in generate_observation
    addball_if_new()
  File "D:\GitHub\stanford-osrl\observation_processor.py", line 203, in addball_if_new
    raise Exception('new ball closer than the old ones.')
Exception: new ball closer than the old ones.
new ball closer than the old ones.

Same as above, except this time not only the new obstacle (at 1.83) is closer to origin than the old ones(at [2.51, 3.20, 4.84]), but also we have four obstacles in total.

@kidzik
Copy link
Member

kidzik commented Aug 30, 2017

Very interesting! Can you please check if env.env_desc['obstacles'] gives the same/similar obstacles? The exact description of the current environment should be there.

@ctmakro
Copy link
Contributor Author

ctmakro commented Aug 30, 2017

to reproduce, you could simply log every observations of every episode, and check them with the algorithm above. my code was checking on the fly during training.

@ctmakro
Copy link
Contributor Author

ctmakro commented Aug 30, 2017

@kidzik i train mainly with remote environment so by the time I wrote previous post that's not possible. I will try that on local machine with modified env to collect env_desc.

I think it might not be a problem of the obstacle list though, since the obstacle observations are generated by comparing pelvis_x with the obstacle list. It might be bugs that corrupted pelvis_x or the calculation of obstacle_relative.

@kidzik
Copy link
Member

kidzik commented Aug 30, 2017

That's true, I meant it rather as a sanity check, also to check which of the obstacles is wrong (if it's always the first or the last one, it may speed up debugging).

One important thing here is the logic of the obstacles sensor. Current semantics are as follows (besides the error of the first observation):

  • if all obstacles are ahead, return the relative position of the first one, obstacle_x - pelvis_x
  • if pelvis_x passed an obstacle, it is pelvis_x > obstacle_x + obstacle_radius return the relative position of the next obstacle, i.e. obstacle_x - pelvis_x
  • if there are no obstacles ahead, return [100,0,0]

as implemented here https://github.com/stanfordnmbl/osim-rl/blob/master/osim/env/run.py#L110-L120

Note that the second point makes it a little messy: agent is likely to see an obstacle with negative relative distance. Moreover, if one obstacle was covering another it may show up after passing the first one.

The actual list of obstacles is created here https://github.com/stanfordnmbl/osim-rl/blob/master/osim/env/run.py#L228-L262 and I don't see any reason why it could be other than 3 since the code is straightforward. If it's a problem on osim-rl side, it should rather be in the sensing part.

Yet, the example you gave is still not covered by this case.

@ctmakro
Copy link
Contributor Author

ctmakro commented Aug 30, 2017

Thank you. I know the logic of generation of obstacles very well (we all do :) ).
There's one problem regarding the sensor though: you sorted the list of obstacles at the beginning, then assume they should be detected in that order by iterating thru the list. https://github.com/stanfordnmbl/osim-rl/blob/master/osim/env/run.py#L256-L257
You sort them by obstacle_x; but when iterating thru the list, you compare the pelvis_x with obstacle_x+radius, which may not be in the same ascending order as obstacle_x.

which will result in undercounting (the client will never 'see' one of the obstacles throughout 1000 steps) in some cases, this one for example:

image

which is not strictly a bug since this unseen obstacle does not affect the agent's performance. it just made the counting inaccurate.

@kidzik
Copy link
Member

kidzik commented Aug 30, 2017

Yes, that's the reason I mentioned the exact procedure here. As you say, it should not affect performance much, while it might affect counting.
In either case, it still doesn't explain 4 obstacles...

@ctmakro
Copy link
Contributor Author

ctmakro commented Aug 30, 2017

on FourObstacleError I made a dump.
image

dump.zip
I'm currently examining it.

@ctmakro
Copy link
Contributor Author

ctmakro commented Aug 30, 2017

I'll paste all the problems I found.

  1. psoas isn't constant:
    psoas | psoas
    -- | --
    1.094567 | 0.796778
    0.992841 | 0.991891
    0.992841 | 0.991891
    0.992841 | 0.991891
    0.992841 | 0.991891
    0.992841 | 0.991891
    0.992841 | 0.991891
    0.992841 | 0.991891
    0.992841 | 0.991891
    0.992841 | 0.991891
    0.992841 | 0.991891
    0.992841 | 0.991891
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951
    0.940272 | 1.144951

@ctmakro
Copy link
Contributor Author

ctmakro commented Aug 30, 2017

i tried to plot, well this is strange:
image

it seems the data i collected from the environment contains two consecutive episodes rather than one. I guess there could be something extremely wrong in my parallelization code. so now I'll go check my own code and see if it indeed is my problem.

@ctmakro
Copy link
Contributor Author

ctmakro commented Aug 30, 2017

Update: the messed up observations are likely to be my problem, this issue could be closed.
Still the values in the first observation will always be wrong, due to osim-rl's implementation:
https://github.com/stanfordnmbl/osim-rl/blob/master/osim/env/run.py#L57-L63

    def reset(self, difficulty=2, seed=None):
        super(RunEnv, self).reset()
        self.istep = 0
        self.last_state = self.get_observation()
        self.setup(difficulty, seed)
        self.current_state = self.last_state
        return self.last_state

the last_state was obtained before setup(), causing last_state not representing the actual model state after reset(). The two lines should really be switched.

@kidzik
Copy link
Member

kidzik commented Aug 30, 2017

Great! At this point, it's a duplicate of #53, so I'm closing this one.

@kidzik kidzik closed this as completed Aug 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants