Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conceptual question about DQN when reward is always -1 #19

Open
keithmgould opened this issue Jan 25, 2019 · 0 comments
Open

Conceptual question about DQN when reward is always -1 #19

keithmgould opened this issue Jan 25, 2019 · 0 comments

Comments

@keithmgould
Copy link

keithmgould commented Jan 25, 2019

Given that the OpenAI Gym environment MountainCar-v0 ALWAYS returns -1.0 as a reward (even when goal is achieved), I don't understand how DQN with experience-replay converges, yet I know it does, because I have working code (basically your awesome code, that is) that proves it.

It is my understanding that ultimately there needs to be a "sparse reward" that is found. Yet as far as I can see from the openAI Gym code, there is never any reward other than -1. It feels more like a "no reward" environment.

What almost answers my question, but in fact does not: when the task is completed quickly, the return (sum of rewards) of the episode is larger. So if the car never finds the flag, the return is -1000. If the car finds the flag quickly the return might be -200. The reason this does not answer my question is because with DQN and experience replay, those returns (-1000, -200) are never present in the experience replay memory. All the memory has are tuples of the form (state, action, reward, next_state), and of course tuples are pulled from memory at random, not episode-by-episode.

If reaching the flag yielded a reward of +1 (or 100) etc.... things would make more sense to me...

So, I don't see anything in the memory that indicates that the episode was performed well.

And thus, I have no idea why this DQN code is working for MountainCar.

PS: I asked this question on your blog too (as a comment). Apologies for duplication -- I'm not sure where you look and don't look :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant