<img src="images/taxi.png" width="500">

Let's assume Smartcab is the only vehicle in this parking lot. We can break up the parking lot into a 5x5 grid, which gives us 25 possible taxi locations. These 25 locations are one part of our state space. Notice the current location state of our taxi is coordinate (3, 1).

You'll also notice there are four (4) locations that we can pick up and drop off a passenger: R, G, Y, B or [(0,0), (0,4), (4,0), (4,3)] in (row, col) coordinates. Our illustrated passenger is in location Y and they wish to go to location R.

When we also account for one (1) additional passenger state of being inside the taxi, we can take all combinations of passenger locations and destination locations to come to a total number of states for our taxi environment; there's four (4) destinations and five (4 + 1) passenger locations.


<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Imports" data-toc-modified-id="Imports-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Imports</a></span></li><li><span><a href="#Generating-environment" data-toc-modified-id="Generating-environment-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Generating environment</a></span><ul class="toc-item"><li><span><a href="#States" data-toc-modified-id="States-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>States</a></span></li><li><span><a href="#Actions" data-toc-modified-id="Actions-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Actions</a></span></li></ul></li><li><span><a href="#Generating-Random-Policy" data-toc-modified-id="Generating-Random-Policy-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Generating Random Policy</a></span></li><li><span><a href="#Function-to-show-the-environment" data-toc-modified-id="Function-to-show-the-environment-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Function to show the environment</a></span></li></ul></div>

# Imports

In [2]:
import gym
import numpy as np
import random

# Generating environment

In [3]:
env = gym.make("Taxi-v3").env

env.render()

+---------+
|[34;1mR[0m: | : :[35mG[0m|
| : | : : |
| : : : : |
| | : |[43m [0m: |
|Y| : |B: |
+---------+



## States


So, our taxi environment has 5×5×5×4=500 total possible states.
    * 5x5 of the grid word.
    * 5 Passenger locations (4 Points + inside cab).
    * 4 possible destinations.

## Actions

The agent encounters one of the 500 states and it takes an action. The action in our case can be to move in a direction or decide to pickup/dropoff a passenger.

In other words, we have six possible actions:

    0 = south
    1 = north
    2 = east
    3 = west
    4 = pickup
    5 = dropoff

In [4]:
print("Action Space {}".format(env.action_space))
print("State Space {}".format(env.observation_space))

Action Space Discrete(6)
State Space Discrete(500)


In [5]:
print("Action Space {}".format(env.action_space))
print("State Space {}".format(env.observation_space))

Action Space Discrete(6)
State Space Discrete(500)


In [7]:
env.P[328]

{0: [(1.0, 428, -1, False)],
 1: [(1.0, 228, -1, False)],
 2: [(1.0, 348, -1, False)],
 3: [(1.0, 328, -1, False)],
 4: [(1.0, 328, -10, False)],
 5: [(1.0, 328, -10, False)]}

# Generating Random Policy

In [8]:
env.s = 328  # set environment to illustration's state

epochs = 0
penalties, reward = 0, 0

frames = [] # for animation

done = False

while not done:
    # sample() method will return as a random action from the set of all possible actions.
    action = env.action_space.sample()
    state, reward, done, info = env.step(action)

    if reward == -10:
        penalties += 1
    
    # Put each rendered frame into dict for animation
    frames.append({
        'frame': env.render(mode='ansi'),
        'state': state,
        'action': action,
        'reward': reward
        }
    )

    epochs += 1
    
    
print("Timesteps taken: {}".format(epochs))
print("Penalties incurred: {}".format(penalties))

Timesteps taken: 1723
Penalties incurred: 518


# Function to show the environment

In [10]:
from IPython.display import clear_output
from time import sleep

def print_frames(frames):
    for i, frame in enumerate(frames):
        clear_output(wait=True)
        print(frame['frame'])
        print(f"Timestep: {i + 1}")
        print(f"State: {frame['state']}")
        print(f"Action: {frame['action']}")
        print(f"Reward: {frame['reward']}")
        sleep(.1)

In [11]:
print_frames(frames)

+---------+
|[35m[34;1m[43mR[0m[0m[0m: | : :G|
| : | : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
  (Dropoff)

Timestep: 1723
State: 0
Action: 5
Reward: 20
