Skip to content

Releases: joshuaspear/offline_rl_ope

v7.0.0

26 Jul 12:51
Compare
Choose a tag to compare
  • Altered ISEstimator and OPEEstimatorBase APIs to depend on EmpiricalMeanDenomBase and WeightDenomBase
    • EmpiricalMeanDenomBase and WeightDenomBase seperatly define functions over the dataset value and weights of the individul trajectory weights, respectively. This allows a far greater number of estimators to be flexibly implemented
  • Added api/StandardEstimators for IS and DR to allow for 'plug-and-play' analysis
  • Altered discrete torch propensity model to use softmax instead of torch. Requires modelling both classes for binary classification however, improves generalisability of code

Version 6.0.0

17 Jul 16:31
Compare
Choose a tag to compare
  • Updated PropensityModels structure for sklearn and added a helper class for compatability with torch
  • Full runtime typechecking with jaxtyping
  • Fixed bug with IS methods where the average was being taken twice
  • Significantly simplified API, especially integrating Policy classes with propensity models
  • Generalised d3rlpy API to allow for wrapping continuous policies with D3RlPyTorchAlgoPredict
  • Added explicit stochastic policies for d3rlpy
  • Introduced 'policy_func' which is any function/method which outputs type Union[TorchPolicyReturn, NumpyPolicyReturn]
  • Simplified and unified ISCallback in d3rlpy/api using PolicyFactory
  • Added 'premade' doubly robust estimators for vanilla DR, weighted DR, per-decision DR and weighted per-decision DR

v5.0.0

01 Mar 10:46
10308ac
Compare
Choose a tag to compare
  • Correctly implemented per-decision weighted importance sampling
  • Expanded the different types of weights that can be implemented based on:
    • http://proceedings.mlr.press/v48/jiang16.pdf: Per-decision weights are defined as the average weight at a given timepoint. This results in a different denominator for different timepoints. This is implemented with the following WISWeightNorm(avg_denom=True)
    • https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1079&context=cs_faculty_pubs: Per-decision weights are defined as the sum of discounted weights across all timesteps. This is implemented with the following WISWeightNorm(discount=discount_value)
    • Combinations of different weights can be easily implemented for example 'average discounted weights' WISWeightNorm(discount=discount_value, avg_denom=True) however, these do not necessaily have backing from literature.
  • EffectiveSampleSize metric optinally returns nan if all weights are 0
  • Bug fixes:
    • Fix bug when running on cuda where tensors were not being pushed to CPU
    • Improved static typing

v4.0.0

23 Feb 11:46
cde25c1
Compare
Choose a tag to compare
  • Various bug fixes (see release log in README.md)
  • Predefined propensity models including:
    • Generic feedforward MLP for continuous and discrete action spaces built in PyTorch
    • xGBoost for continuous and discrete action spaces built in sklearn
    • Both PyTorch and sklearn models can handle space discrete actions spaces i.e., a propensity model can be exposed to 'new' actions provided the full action space definition is provided at the training time of the propensity model
  • Metrics pattern with:
    • Effective sample size calculation
    • Proportion of valid weights i.e., the mean proportion of weights between a min and max value across trajectories
  • Refactored the BehavPolicy class to accept a 'policy_func' that aligns with the other policy classes