Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing Figure 11 and reporting success rate #357

Closed
rasoolfa opened this issue Jan 9, 2022 · 9 comments
Closed

Reproducing Figure 11 and reporting success rate #357

rasoolfa opened this issue Jan 9, 2022 · 9 comments

Comments

@rasoolfa
Copy link

rasoolfa commented Jan 9, 2022

Hi all and @avnishn,

I've been trying to reproduce results from Figure 11 in https://arxiv.org/pdf/1910.10897.pdf using https://github.com/rlworkgroup/garage/blob/08492007d6e2d9ead9beb83a8a4247e52019ac7d/metaworld_examples/sac_metaworld.py and hyper-parameters reported in Table 3. Should I use Table 3 for hyper-parameters?

One thing which is not clear to me is how the success rate is reported. I notice the env.step returns 'success' but want to verify here that is what reported in the paper. Here is the code the I use to report results ( random action is used for simplicity):

from metaworld.envs import ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE
env_cls = ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE['hammer-v2-goal-observable']
eval_env= env_cls(seed=0)
eval_env.seed(0)
avg_reward = 0 
success_rate = 0 
num_evals = 2

for _ in range(num_evals):
    obs = eval_env.reset()
    done = False
    stp = 0
    while not done and stp < eval_env.max_path_length:
        obs, reward, done, info = eval_env.step(eval_env.action_space.sample())
        avg_reward += reward
        stp += 1
        if 'success' in info:
            success_rate += info['success']
avg_reward /= num_evals
success_rate /= num_evals

Is this the right way to report the success rate like Figure 11?
Thanks for your help.
Rasool

@rasoolfa
Copy link
Author

I have to add, my results are way worse than reported results.
Any help would be highly appreciated. Thanks.

@avnishn
Copy link
Contributor

avnishn commented Jan 12, 2022

Hi @rasoolfa,

sorry for the late response.

This is the correct way of computing success:

num_evals = 10
num_successful_eval_trajectories = 0

for _ in range(num_evals):
    obs = w.reset()
    done = False
    success_curr_time_step = False
    stp = 0
    while not done and stp < eval_env.max_path_length:
        obs, reward, done, info = eval_env.step(eval_env.action_space.sample())
        stp += 1
        success_curr_time_step |= info['success']
    num_successful_eval_trajectories += int(success_curr_time_step)

success_rate = num_successful_eval_trajectories/num_evals

The code hasn't shifted, or the environments, so its unlikely that a performance regression happened, unless there was a performance regression caused by one of the dependencies (e.g. an upgraded version of torch. I used 1.8)

Thanks,
@avnishn

@rasoolfa
Copy link
Author

rasoolfa commented Jan 12, 2022

Thanks @avnishn
Can you also please comment on the following questions?

  1. So the value of 'success' in info, i.e. info = { 'success':val}, is not important then and should be ignored?
    In addition, with your mentioned approach, the success is always 100% as "success" is already in the "info" (see below), isn't it? I've checked with 8 different environments and all of them return "info" always containing the "success" flag.

"info" contains the followings:

{'success': 0.0,
 'near_object': 0.919886337009916,
 'grasp_success': False,
 'grasp_reward': 0.018269567161941357,
 'in_place_reward': 0.07328687668107409,
 'obj_to_target': 0,
 'unscaled_reward': 0.43810542967701377}
  1. Also should I use Table 3 as a reference for the hyper-parameters?

  2. ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE should be used to create an env for single task experiments, is that right? e.g.

from metaworld.envs import ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE
env_cls = ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE['hammer-v2-goal-observable']
eval_env= env_cls(seed=0)
eval_env.seed(0)

Appreciated for your help.

@avnishn
Copy link
Contributor

avnishn commented Jan 12, 2022

  1. whoops sorry, I edited my answer. Gave you the wrong answer the first time.
  2. yes, and they should be the same as the hparams in the launcher that you linked.
    3)yes.

@rasoolfa
Copy link
Author

thanks again for your help.

@avnishn
Copy link
Contributor

avnishn commented Jan 13, 2022

No problem! I'm gonna go ahead and close this for now, but if you have more questions, I'd recommend joining our slack community (link on the readme), where a lot of questions like these have been answered, but of course feel free to post here again if you'd like.

@avnishn avnishn closed this as completed Jan 13, 2022
@rasoolfa
Copy link
Author

Thanks again @avnishn. One last thing, do you happen to have learning curves or logs files for these experiments (i.e. Figure 11)? I just want to compare as I still can't reproduce paper results.

@krzentner
Copy link
Contributor

krzentner commented Jan 23, 2022

To be clear, since the above conversation was unclear to me: In MetaWorld, an episode is considered successful if the info['success'] ever becomes 1.0 during that episode. SuccessRate therefore needs to be computed across many episodes to be meaningful.

@AnukritiSinghh
Copy link

I wanted to know that success_rate is calculated in evaluation part? and that is what is reported in the paper?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants