Skip to content

Latest commit

 

History

History
34 lines (27 loc) · 1.47 KB

README.md

File metadata and controls

34 lines (27 loc) · 1.47 KB

Bandit Environments

Series of n-armed bandit environments for the OpenAI Gym

Environments

  • BanditTwoArmedDeterministicFixed-v0: Simplest case where one bandit always pays, and the other always doesn't
  • BanditTwoArmedHighLowFixed-v0: Stochastic version with a large difference between which bandit pays out of two choices
  • BanditTwoArmedHighHighFixed-v0: Stochastic version with a small difference between which bandit pays where both are good
  • BanditTwoArmedLowLowFixed-v0: Stochastic version with a small difference between which bandit pays where both are bad
  • BanditTwoArmedUniform-v0: Stochastic version both arms pay between 0 and 1
  • BanditTenArmedRandomFixed-v0: 10 armed bandit with random probabilities assigned to payouts
  • BanditTenArmedRandomRandom-v0: 10 armed bandit with random probabilities assigned to both payouts and rewards
  • BanditTenArmedUniformDistributedReward-v0: 10 armed bandit with that always pays out with a reward selected from a uniform distribution
  • BanditTenArmedGaussian-v0: 10 armed bandit mentioned on page 30 of Reinforcement Learning: An Introduction (Sutton and Barto)

Installation

git clone git@github.com:mimoralea/gym-bandits.git
cd gym-bandits
pip install .

or:

pip install git+https://github.com/mimoralea/gym-bandits#egg=gym-bandits

In your gym environment

import gym, gym_bandits
env = gym.make("BanditTenArmedGaussian-v0")