Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Off-Policy Learning with Deficient Support

This repository contains various off-policy learning algorithms under deficient support. The code accompanies the paper "Off-policy Bandits with Deficient Support" [ACM] [arXiv] where we firstly define the impact of deficient support on existing algorithms and secondly propose different estimators to tackle this problem.

If you find any module of this repository helpful for your own research, please consider citing the below KDD'20 paper. Thanks!

  author = {Noveen Sachdeva, Yi Su, and Thorsten Joachims},
  title = {Off-policy Bandits with Deficient Support},
  booktitle = {ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD)},
  year = {2020}

Code Author: Noveen Sachdeva (


  • Python3
  • Pytorch >= 0.4.0
  • tensorboardX

Data Setup

There are six hyper-parameters for pre-processing the data which you might need to edit in the file.

$ ./

The above command will:

  • Download the CIFAR-10 dataset
  • Train a logging policy (Default: Train ResNet20 on 35K out of 50K images for 2 epochs)
  • Create bandit feedback data with train/test/val splits (Default: temperature (t=4) softmax on logging policy, and clip actions with prob < 0.01)
  • Train a regression function (Default: ResNet20 until convergence on the bandit feedback dataset using all features)
  • Create an auxillary data file (Only used while running the efficient approximation of the conservative_extrapolation method)

Run Instructions

  • Edit the file which lists all config parameters, including what type of off-policy learning algorithm to run. Currently supported models:

    • IPS
    • IPS w/ prune_unsupported = True
    • IPS w/ imputed = negative
    • RegressionExtrapolation
    • PolicyRestriction
  • Finally, type the following command to run:

$ cd code;


If you have a new proposition about off-policy learning in the support deficient scenario, please feel free to send a pull request with your algorithm and I'll be happy to merge it into this repository.




[ KDD '20 ] Off-policy Bandits with Deficient Support



No releases published


No packages published
You can’t perform that action at this time.