-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement NN Policy Learner #78
Conversation
@Kurorororo [must]
offline.py[nits]
tests/policy/test_offline.py[nits]
maybe assert np.allclose((action_dist.sum(1), np.ones_like(context_test.shape[0], len_list)) examples/opl/evaluate_off_policy_learners.py[imo]
["random_policy", "ipw_learner", f"nn_policy_learner (with {ope_estimator})"], is more interpretable |
@usaito Thank you for the review! |
@Kurorororo Thanks! I have just a few minor comments. policy/offline.py[imo&ask]
examples/README.mdI think
is better describing what you implemented instead of the current form below
(this may be confusing because we also have many OPE-related examples) |
I updated the README.md. @usaito These arguments should not be default arguments, but they must be; this is because |
@Kurorororo Got it. Thanks! Then, how about removing |
@usaito |
@Kurorororo Thanks! I'l merge this PR |
This PR implements
NNPolicyLearer
.It uses a neural network whose objective function is an OPE estimator. To realize this,
estimate_policy_value_tensor
is implemented in each OPE estimator. However, replay method and switch DR cannot be used because they are indifferentiable.An example script
examples/opl/evaluate_off_policy_learners.py
and notebookopl.ipynb
are also created.Now,
torch >= 1.7.1
is added to the dependencies.