Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix disappearing probability mass #2

Merged
merged 1 commit into from Apr 7, 2018
Merged

Fix disappearing probability mass #2

merged 1 commit into from Apr 7, 2018

Conversation

Kaixhin
Copy link
Contributor

@Kaixhin Kaixhin commented Apr 4, 2018

Closes #1. Not sure if this is the best way to deal with the problem, but it seems to cover a few cases - let me know if you have a better solution.

So to provide an example on the issue, the update relies on qa_probs * (u.float() - b) and qa_probs * (b - l.float()), but if b happens to contain any ints (e.g. in a terminal state where all probability mass is concentrated in one location), then both of these parts of the update turn into qa_probs * 0 and hence the network tries matching a vector of 0s with its softmax output, which will obviously cause problems. Due to the nature of this edge case I believe it has a worse effect on environments with more terminal transitions.

@floringogianu floringogianu merged commit e2425d1 into floringogianu:master Apr 7, 2018
@floringogianu
Copy link
Owner

Thank you for the pull request! Sorry for the rather late reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants