-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GaussianLSTMPolicy with model #677
Conversation
Codecov Report
@@ Coverage Diff @@
## master #677 +/- ##
==========================================
+ Coverage 64.08% 64.62% +0.53%
==========================================
Files 159 161 +2
Lines 9770 9924 +154
Branches 1293 1303 +10
==========================================
+ Hits 6261 6413 +152
- Misses 3182 3183 +1
- Partials 327 328 +1
Continue to review full report at Codecov.
|
@@ -0,0 +1,295 @@ | |||
"""GaussianLSTMPolicy with GaussianLSTMModel.""" | |||
from akro.tf import Box |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import akro.tf
161c2fb
to
d20932e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only have minor comments. Is the *LSTMModel
always be single layer recurrent unit now?
|
||
Returns: | ||
action (numpy.ndarray): Predicted action. | ||
agent_info (dict[numpy.ndarray]): Mean and log std of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe be explicit about the contents of the dict, e.g.
https://github.com/rlworkgroup/garage/blob/master/src/garage/tf/algos/batch_polopt.py#L85
|
||
Returns: | ||
actions (numpy.ndarray): Predicted actions. | ||
agent_infos (dict[numpy.ndarray]): Mean and log std of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be explicit about keys in the dict https://github.com/rlworkgroup/garage/blob/master/src/garage/tf/algos/batch_polopt.py#L85
|
||
Returns: | ||
action (numpy.ndarray): Predicted action. | ||
agent_info (dict[numpy.ndarray]): Mean and log std of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please make it multi layer recurrent? Single layer sometimes does not work well for hard problem.. |
@zhanpenghe I think another PR will be more appropriate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
File an issue then. |
This PR refactored GaussianLSTMPolicy with garage.tf.models.Model. It added two classes: GaussianLSTMModel and GaussianLSTMPolicyWithModel.
Added GaussianLSTMModel and GaussianLSTMPolicyWithModel.
Added test for PPO with GaussianLSTMPolicyWithModel.
Apart from testing functionality of GaussianLSTMPolicyWithModel
in test_gaussian_lstm_policy_with_model.py, transitions from the
old policy (GaussianLSTMPolicy) to the new policy
(GaussianLSTMPolicyWithModel) are also tested in
test_gaussian_lstm_policy_with_model_transit.py, to make sure
they have the same API.