Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maximization bias #29

Open
mikelty opened this issue Aug 17, 2020 · 1 comment
Open

maximization bias #29

mikelty opened this issue Aug 17, 2020 · 1 comment

Comments

@mikelty
Copy link

mikelty commented Aug 17, 2020

Hello, I'm not sure whether this is an issue or not but I've been looking at your implementation for half an hour, and I think there might be a maximization bias in the implementation. Specifically, you used the same set of experience to update two q-tables. The paper says two independent q-tables will benefit training.
I've tested my thought out on a similar code base and the owner agreed with my view so far. I've opened a stack overflow question here. Could you say something about this? I think I'll test the implementation as well.
Thanks in advance.

@haarnoja
Copy link
Owner

Hi,

we indeed use the same data to update both of the Q-functions. I haven't tested splitting the data and using different sets for different Q's, but I'm guessing that that doesn't make much difference in terms of the maximization bias. My reasoning is that we evaluate the Q-functions (both for the TD target and the policy target) at actions that is not part of the data but instead samples from the current policy. For those actions, given a seen state, the Q values are less correlated since the Q's were never trained for those particular actions, thus reducing maximization bias. We've observed that this can make a big difference in practice, especially in higher dimensional tasks.

I hope this answers your question!

Tuomas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants