Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Implementing Continuous OPE Estimators #113

Merged
merged 15 commits into from
Aug 14, 2021

Conversation

usaito
Copy link
Contributor

@usaito usaito commented Jul 6, 2021

new feature

estimators_continuous.py

reference
Nathan Kallus and Angela Zhou.
"Policy Evaluation and Optimization with Continuous Treatments", AISTATS, 2018.

meta_continuous.py

  • implement ContinuousOffPolicyEvaluation that streamlines OPE with continuous actions. This works as follows.
# (1) Synthetic Data Generation
dataset = SyntheticContinuousBanditDataset(dim_context=5)
bandit_feedback = dataset.obtain_batch_bandit_feedback(
    n_rounds=10000, min_action_value=-10, max_action_value=10,
)

# (2) Off-Policy Evaluation
ope = ContinuousOffPolicyEvaluation(
    bandit_feedback=bandit_feedback,
    ope_estimators=[KernelizedIPW(kernel="epanechnikov", bandwidth=0.02)]
)
estimated_policy_value = ope.estimate_policy_values(
     action_by_evaluation_policy=action_by_evaluation_policy,
)

tests

1 and 2 include some performance tests of the continuous OPE estimators using synthetic data as well as input checks.
4 checks whether the kernel functions satisfy the conditions described on page 9 of the following lecture slides: http://ibis.t.u-tokyo.ac.jp/suzuki/lecture/2015/dataanalysis/L9.pdf

  • fix descriptions (mostly English expressions) of docstrings

@usaito usaito changed the base branch from master to continuous-dataset July 6, 2021 14:12
@usaito usaito changed the title [WIP] feature: Continuous OPE Estimators Feature: Continuous OPE Estimators Jul 7, 2021
@usaito usaito changed the title Feature: Continuous OPE Estimators Feature: Implementing Continuous OPE Estimators Jul 8, 2021
@nomuramasahir0
Copy link
Contributor

nomuramasahir0 commented Aug 14, 2021

I'm just letting you know:
IDE (PyCharm) generates a non-default argument error, though I do not find any specific runtime error.
Screen Shot 2021-08-14 at 13 28 24

If you would like to avoid this error, it might be better to inherit BaseContinuousOffPolicyEstimator, rather than KernelizedInverseProbabilityWeighting.

@nomuramasahir0
Copy link
Contributor

Other than the above one point, LGTM!

@usaito
Copy link
Contributor Author

usaito commented Aug 14, 2021

@nmasahiro Thanks!

@usaito usaito merged commit 1c8233a into continuous-dataset Aug 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants