New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No more Q states #30
No more Q states #30
Conversation
Oh, I also rearrange the main setup logic for Teacher so that it's more likely to immediately crash if something doesn't work, rather than, say, do rollouts and then crash. |
Hmmm, https://www.google.com/search?q="q+state"+mdp However, I think that you're right that keeping the actions and observations explicitly separated until they are fed into the neural net is an improvement. It should make the system more accessible to newcomers. Have you tested that this PR doesn't cause any performance regressions on Mujoco? |
rl_teacher/teach.py
Outdated
segement_alt_act = self.segment_alt_act_placeholder | ||
|
||
# A vanilla multi-layer perceptron maps a (state, action) pair to a reward (Q-value) | ||
mlp = FullyConnectedMLP(self.obs_shape, self.act_shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call moving mlp
to a local variable 👌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested on MuJoCo, as my license expired and I wasn't able to renew. :(
Made some small changes and am running regression tests currently |
I've never seen a "Q-State" outside of Teacher. Q functions take (state,action) pairs. Storing the concatenation of states and actions isn't particularly elegant and only saves us a tiny amount of computation.
Furthermore, concatenating states and actions only really works when they're the same rank. For Atari environments or other non-MuJoCo environments this will not always be the case. We can extend support for more environments by pulling them apart and adding some checks for environment dimensionality.