Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The results on Mujoco reported in paper might be heavily influenced by env version #42

Open
linprophet opened this issue May 8, 2022 · 0 comments

Comments

@linprophet
Copy link

Hello there,

Recently, we reproduce some experiments in offline reinforcement learning and find that the decision transformer cites the result of CQL from the original paper. However, the problem is that DT uses mujoco version 2 like(hopper-v2, walker2d-v2), while original CQL uses mujoco version 0 like(hopper-v0, walker2d-v0), and the reward scale is different in these environments. So we run DT and CQL in the same environment(hopper-v2, walker2d-v2), but CQL is better than DT in almost all the tasks (except for hopper-replay). So I wonder:

  1. Have you considered the environment version into consideration in the results?
  2. Refer to how to get the score of an expert policy and some other details #16 . The score is normalized by an export policy from https://github.com/rail-berkeley/d4rl/blob/master/d4rl/infos.py . However, the results based on the official code are far away from the results reported in the paper. Or did I miss some key components in DT code?

Looking forward to your reply!

Best Wishes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant