Skip to content

nick-jhlee/logistic_bandit

 
 

Repository files navigation

Codes for the following two papers by Junghyun Lee, Se-Young Yun, and Kwang-Sung Jun:

This is forked from https://github.com/criteo-research/logistic_bandit.

If you plan to use this repository or cite our paper, please use the following bibtex format:

@article{lee2024glm,
	title={{A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits}},
	author={Lee, Junghyun and Yun, Se-Young and Jun, Kwang-Sung},
	journal={arXiv preprint arXiv:2407.13977},
	url={https://arxiv.org/abs/2407.13977},
	year={2024}
}

@InProceedings{lee2024logistic,
  title = 	 {{Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion}},
  author =       {Lee, Junghyun and Yun, Se-Young and Jun, Kwang-Sung},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {4474--4482},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://arxiv.org/pdf/2310.18554.pdf},
  url = 	 {https://arxiv.org/abs/2310.18554},
}

Install

Clone the repository and run:

$ pip install .

Every time one changes the source code, one should run pip install . again to update the package.

Usage

This code implements the following bandit algorithms (oldest to newest):

(*) only applicable to linear bandits (**) only applicable to logistic bandits (***) only applicable to bounded GLBs

Experiments can be run for several Logistic Bandit (i.e., structured Bernoulli feedback) environments, such as static and time-varying finite arm-sets, or inifinite arm-sets (e.g. unit ball). Note that the Thompson Sampling type algorithm (TS) is only available for GLOC, OL2M, GLM-UCB, and ada-OFU-ECOLog-TS. For the first three algorithms, TS is automatically triggered for unit ball arm-set.

Single experiment

Single experiments (one algorithm for one environment) can be run via scripts/run_example.py. The script instantiate the algorithm and environment indicated in the file scripts/configs/example_config.py and plots the regret.

Reproducing the experiments

The results in the paper can be obtained via scripts/run_all.py. This script runs experiments for any config file in scripts/configs/generated_configs/ and stores the result in scripts/logs/.

Plot results

Regret curves

You can use scripts/plot_regret.py to plot the regret curve for a specific S. This scripts plot regret curves for all logs in scripts/logs/ that match the indicated dimension and parameter norm.

usage: plot_regret.py [-h] [-d [D]] [-hz [HZ]] [-ast [AST]] [-pn [PN]]

Plot regret curves

optional arguments:
  -h, --help  show this help message and exit
  -d [D]      Dimension (default: 2)
  -hz [HZ]    Horizon length (default: 4000)
  -ast [AST]  Dimension (default: tv_discrete)
  -pn [PN]    Parameter norm (default: 9.0)

Confidence sets

You can use scripts/plot_confidence.py to plot confidence sets. This scripts plot confidence sets for all logs in scripts/S=*.

usage: plot_confidence.py [-h] [-ast [AST]] [-pn [PN]] [-Nconfidence [N]]

Plot confidence sets for all algorithms

optional arguments:
  -h, --help          show this help message and exit
  -ast [AST]          Dimension (default: tv_discrete)
  -pn [PN]            Parameter norm (default: 9.0)
  -Nconfidence [N]    Number of discretizations (per axis) for confidence set plot (default: 2000)

Plotting everything together

You can use scripts/plot_total.py to plot everything (Figure 1 of Lee et al. 2024b)

usage: plot_total.py [-h] [-d [D]] [-hz [HZ]] [-ast [AST]] [-Nconfidence [N]]

Plot regret curves

optional arguments:
  -h, --help  show this help message and exit
  -d [D]      Dimension (default: 2)
  -hz [HZ]    Horizon length (default: 4000)
  -ast [AST]  Dimension (default: tv_discrete)
  -Nconfidence [N]    Number of discretizations (per axis) for confidence set plot (default: 2000)

Example output:

Generating configs

You can automatically generate config files thanks to scripts/generate_configs.py.

usage: generate_configs.py [-h] [-dims DIMS [DIMS ...]] [-pn PN [PN ...]] [-algos ALGOS [ALGOS ...]] [-r [R]] [-hz [HZ]] [-ast [AST]] [-ass [ASS]] [-fl [FL]]

Automatically creates configs, stored in configs/generated_configs/

optional arguments:
  -h, --help            show this help message and exit
  -dims DIMS [DIMS ...]
                        Dimension (default: None)
  -pn PN [PN ...]       Parameter norm (||theta_star||) (default: None)
  -algos ALGOS [ALGOS ...]
                        List of algorithms. (default: None)
                        ('EMK', 'RS-GLinCB', 'OFUGLB-e', 'OFUGLB', 'OFULogPlus', 'adaECOLog', 'OFULog-r', 'LogUCB1', 'OL2M', 'GLOC', 'GLM-UCB')
  -r [R]                # of independent runs (default: 10)
  -hz [HZ]              Horizon, normalized (later multiplied by sqrt(dim)) (default: 2000)
  -ast [AST]            Arm set type. Must be either fixed_discrete, tv_discrete or ball (default: tv_discrete)
  -ass [ASS]            Arm set size, normalized (later multiplied by dim) (default: 10)
  -fl [FL]              Failure level, must be in (0,1) (default: 0.05)
  -plotconfidence [N]   To plot the confidence set at the end or not (default: False)
  -Nconfidence [N]      Number of discretizations (per axis) for confidence set plot

For instance running python generate_configs.py -dims 2 -pn 3 4 5 -algos GLM-UCB GLOC OL2M adaECOLog generates configs in dimension 2 for GLM-UCB, GLOC, OL2M and adaECOLog, for environments (set as defaults) of ground-truth norm 3, 4 and 5.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.3%
  • Jupyter Notebook 4.7%