In [4]:
!pip install gym



Reinforcement learning is learning how to map situations to actions so as to maximize a numerical reward signal. Gym is a toolkit for developing and comparing reinforcement learning algorithms.

In [0]:
import gym
import random
import numpy as np

Play one game of blackjack with random actions

In [6]:
env = gym.make("Blackjack-v0")
state = env.reset()
memory = []
for _ in range(10):
  action = env.action_space.sample() 
  state, reward, done, info = env.step(action)
  memory.append((state,action,reward,done))
  if done:
    break

  result = entry_point.load(False)


Calling env.step gives us an observation, reward, a boolean indicating whether the episode has finished

In [7]:
memory

[((12, 4, False), 1, 0, False),
 ((13, 4, False), 1, 0, False),
 ((13, 4, False), 0, -1.0, True)]

**States**


 The observation is a 3-tuple of: 
the players current sum,
the dealer's one showing card (1-10 where 1 is ace),
and whether or not the player holds a usable ace (0 or 1).


**Actions**

In [8]:
env.action_space
# Stay = 0
# Hit = 1


Discrete(2)

**Rewards**

In [9]:
# Win = 1
# Loss = -1

def compute_avg_reward(memory):
  rewards = [r[2] for r in memory]
  return sum(rewards)/len(memory)

compute_avg_reward(memory)

-0.3333333333333333

Now lets play 100 games of random blackjack. We'll keep track of our score.

In [10]:
env = gym.make("Blackjack-v0")
state = env.reset()
memory = []
episodes = 100
for e in range(episodes):
  for _ in range(10):
    action = env.action_space.sample() 
    state, reward, done, info = env.step(action)
    memory.append((state,action,reward,done))
    if done:
      break

  result = entry_point.load(False)


In [11]:
rewards = [r[2] for r in memory]
sum(rewards)

-100.0

Let's try building a simple agent.

In [0]:
class RuleBasedAgent():
  
  def __init__(self,epsilon):
    self.epsilon = epsilon
  
  def act(self,state):
    if state[0]>=17:
      return 0
    else:
      return 1

In [13]:
env = gym.make("Blackjack-v0")

memory = []
agent = RuleBasedAgent(.1)
episodes = 100
for e in range(episodes):
  state = env.reset()
  for _ in range(10):
    action = agent.act(state) 
    state, reward, done, info = env.step(action)
    memory.append((state,action,reward,done))
    if done:
      break

  result = entry_point.load(False)


In [14]:
compute_avg_reward(memory)

-0.056962025316455694

Other environments include Atari Video games, a physics simulator called MuJuCo, and a robot arm simulator. They also provide tools to create your own environment provide a common API. We have avoided the question of reinforcement learning algorithms themselves but if you are interested in learning more OpenAI provides a series of tutorials called spinning up: https://spinningup.openai.com/en/latest/

Reinforcement Learning: An Introduction by Richard Sutton and Andrew Barto defined the discipline. They recently released a second edition versions of which are available for free online.