## Variationon the Taxi-Grid Enviroment

### Motivation

Our project is based on the taxi gym environment (https://gym.openai.com/envs/Taxi-v3/), which is a basic "grid-world" enviroment used for research in artificial intelligence and reinforcement learning.

In [1]:
import gym

env = gym.make("Taxi-v3").env

env.reset ()
env.render()

+---------+
|[34;1mR[0m: | : :[35mG[0m|
| : | : : |
| : : : : |
| | : | : |
|Y| :[43m [0m|B: |
+---------+



This is original taxi grid enviroment, which is available as part of the OpenAI gym module. The enviroment models a taxi, which moves within the gridworld enviroment with the goal of picking up a passenger and bringing her to her destination.

Our modifications on the original enviroment were created with two main
goals:  
**First**, the original taxi grid enviroment is a single agent enviroment. We were interested in allowing multi-agent reinforcement learning, so we created an option to use multiple taxis.  
**Second**, by adding a fuel constraint modification, we were able to test whether agents would be able to consider the long term fuel constraint, and maybe use the environment in limitted resources research situations.  
**Third**, our environment can show and allow/restrict for taxis collision.  
**Fourth**, we added to the fuel constraint a fuel-type constraint (we can model gas/fuel taxis with suitable fuel stations).

### Installation

To work with our custom enviroments, the first thing that we want to do is to install them. As the enviroments are packaged together into a module called "multitaxienv" which is to be found on the github, it is easy to install all of the modules at once.

In [2]:
!pip install git+https://github.com/sarah-keren/MutliTaxiEnv.git --upgrade

Collecting git+https://github.com/sarah-keren/MutliTaxiEnv.git
  Cloning https://github.com/sarah-keren/MutliTaxiEnv.git to /tmp/pip-req-build-r3uofinr
  Running command git clone -q https://github.com/sarah-keren/MutliTaxiEnv.git /tmp/pip-req-build-r3uofinr
Building wheels for collected packages: MultiTaxiEnv
  Building wheel for MultiTaxiEnv (setup.py) ... [?25l[?25hdone
  Created wheel for MultiTaxiEnv: filename=MultiTaxiEnv-0.2-cp36-none-any.whl size=11042 sha256=4276509999855777277fc12b9947e5e0959f932039121f2ca297a1ffff03d204
  Stored in directory: /tmp/pip-ephem-wheel-cache-1u1ngwdm/wheels/a7/8b/8f/2c7828f791585f08f092f39fb14023ccf64d14a1a3d68143ef
Successfully built MultiTaxiEnv
Installing collected packages: MultiTaxiEnv
Successfully installed MultiTaxiEnv-0.2


You'll note that we have only one calss file. To support all the variations of our environment we use the parameters in the __init__ of our class.

After installing the environment, run the following line:

In [1]:
%load_ext autoreload
%autoreload 2
from multitaxienv.taxi_environment import TaxiEnv

### Initializing Environment - Hyper-Parameters of the Class

In [38]:
env = TaxiEnv(num_taxis = 2, num_passengers = 2, max_fuel = None,
                 taxis_capacity = None, collision_sensitive_domain = False,
                 fuel_type_list = None, option_to_stand_by = True)

1. *num_taxis* - default is 2, but you can select every number $\in$ $[1, \infty)$.
2. *num_passengers* - same.
3. *max_fuel* - a list where the i'th element is the max fuel of taxi number i. Each taxi starts with max fuel. Default is **None** which sets fuel limit to np.inf (no fuel model).
4. *taxis_capacity* - A list with the passengers limit for each taxi, default is 1 for each taxi.
5. *collision_sensitive_domain* - Boolean to specify wether collisions will be shown and affect the domain (True) or will be ignored (False).
6. *fuel_type_list* - The fuel type ('**F**uel / '**G**as' of each taxi, default is 'F'.
7. *option_to_stand_by* - Can taxis stand in place (True), or not (False), default is True.

In [3]:
env = TaxiEnv()
env.reset()
env.s = 1022
env.render()

+---------+
|[34;1mX[0m: |F:[41m_[0m:[35m[35mX[0m[0m|
| : | : : |
| : : : : |
| | : | : |
|[43m[34;1mX[0m[0m| :G|X: |
+---------+
Taxi1-YELLOW: Fuel: inf, Location: (4,0), Collided: False
Taxi2-RED: Fuel: inf, Location: (0,3), Collided: False
Passenger1: Location: (4, 0), Destination: (0, 4)
Passenger2: Location: (0, 0), Destination: (0, 4)


Here we initialized a domain with the default values. As we can see:  
We have **2 taxis** represented in the highlighted boxes, we can see wich taxi in which color by the description printed below the map.  
We also have **np.inf** fuel limit.  
There are also **2 passengers** at the X marked positions with destination at X colored in *magenta*.

Note that we save for each taxi wether it had been collided or not.

### Action Space of the Environment

In [4]:
env.get_available_actions_dictionary()[1]

{0: 'south',
 1: 'north',
 2: 'east',
 3: 'west',
 4: 'pickup',
 5: 'dropoff',
 6: 'turn_engine_on',
 7: 'turn_engine_off',
 8: 'standby',
 9: 'refuel'}

**get_available_actions_dictionary()** returns a tuple where:  
1. 1st element - available action indexes that the specific initialized domain supports.
2. 2nd element - the dictionary above which specify all available actions' names and it's indexes.  

*Note that in no-fuel model (induced by max_fuel = np.inf) and in no option to standby model - we won't be able to use actions: 9 and 6, 7, 8 respectively.*

In [5]:
env.get_available_actions_dictionary()[0]

[0, 1, 2, 3, 4, 5, 6, 7, 8]

At the first element returned we get the action space available to us on the initialized domain.  
**Those indexes are the actions we send to the domain when we want to make a step!**

Here we send actions "east" for each one of our 2 taxis.  
We get the environment's next state which is build as: **taxis locations, current_fuel_list, passengers_start_locations, destinations, passengers_status**.

In [6]:
state, reward, done = env.step([2, 2])
print("The next state is: " + str(state) + ", the reward for the last action is: " + str(reward) + ", and the episode is "+ {True: "", False: "not"}[done]  + " done.")

The next state is: [[[4, 0], [0, 4]], [inf, inf], [[4, 0], [0, 0]], [[0, 4], [0, 4]], [0, 0]], the reward for the last action is: [-20, -1], and the episode is not done.


Generally to preform an action, we use the step function. This returns a tuple which includes the next state, the reward and whether the episode has ended or not.  
In this environment, the episode ends when either:
1. *all passengers reached their destinations*.
2. *all taxis are out of fuel*.
3. *all taxis collided*.  

The ***rewards***, are given in the config.py as below:

In [None]:
taxi_env_rewards = dict(
    step=-1,
    no_fuel=-20,
    bad_pickup=-15,
    bad_dropoff=-15,
    bad_refuel=-10,
    pickup=-1,
    standby_engine_off=-1,
    turn_engine_on=-1,
    turn_engine_off=-1,
    standby_engine_on=-1,
    intermediate_dropoff=-10,
    final_dropoff=100,
    hit_wall=-20,
    collision=-30,
)

Now let's try to navigate to the fuel station:

In [72]:
env.render()

+---------+
|X: |F: :X|
| : | : : |
| : : : : |
| | :[41m_[0m| : |
|[35m[35mX[0m[0m| :G|[34;1m[34;1mX[0m[0m:[43m_[0m|
+---------+
  (east ,east)
Taxi1-YELLOW: Fuel: inf, Location: (4,4), Collided: False
Taxi2-RED: Fuel: inf, Location: (3,2), Collided: False
Passenger1: Location: (4, 3), Destination: (4, 0)
Passenger2: Location: (4, 3), Destination: (4, 0)


In [73]:
env.step([1, 0])
env.render()

+---------+
|X: |F: :X|
| : | : : |
| : : : : |
| | : | :[43m_[0m|
|[35m[35mX[0m[0m| :[41mG[0m|[34;1m[34;1mX[0m[0m: |
+---------+
  (north ,south)
Taxi1-YELLOW: Fuel: inf, Location: (3,4), Collided: False
Taxi2-RED: Fuel: inf, Location: (4,2), Collided: False
Passenger1: Location: (4, 3), Destination: (4, 0)
Passenger2: Location: (4, 3), Destination: (4, 0)


**Now let's do the same thing with a limtted fuel mode.**  
Notice in the map-description as the fuel status changes for the taxi that we move.

In [8]:
env_limitted_fuel = TaxiEnv(num_taxis = 2, num_passengers = 2, max_fuel = [5, 5],
                            taxis_capacity = None, collision_sensitive_domain = False,
                            fuel_type_list = None, option_to_stand_by = True)

In [9]:
env_limitted_fuel.reset()
env_limitted_fuel.render()

+---------+
|[35m[35mX[0m[0m: |F:[43m_[0m:[34;1mX[0m|
|[41m_[0m: | : : |
| : : : : |
| | : | : |
|[34;1mX[0m| :G|X: |
+---------+
Taxi1-YELLOW: Fuel: 5, Location: (0,3), Collided: False
Taxi2-RED: Fuel: 5, Location: (1,0), Collided: False
Passenger1: Location: (0, 4), Destination: (0, 0)
Passenger2: Location: (4, 0), Destination: (0, 0)


We will now navigate to the fuel station and refuel.
Notice how the fuel of each taxi changes as it moves.

In [10]:
env_limitted_fuel.step([3, 0])  # north, south
env_limitted_fuel.render()

+---------+
|[35m[35mX[0m[0m: |[43mF[0m: :[34;1mX[0m|
| : | : : |
|[41m_[0m: : : : |
| | : | : |
|[34;1mX[0m| :G|X: |
+---------+
  (west ,south)
Taxi1-YELLOW: Fuel: 4, Location: (0,2), Collided: False
Taxi2-RED: Fuel: 4, Location: (2,0), Collided: False
Passenger1: Location: (0, 4), Destination: (0, 0)
Passenger2: Location: (4, 0), Destination: (0, 0)


In [11]:
env_limitted_fuel.step([9, 1])  # refuel, north
env_limitted_fuel.render()

+---------+
|[35m[35mX[0m[0m: |[43mF[0m: :[34;1mX[0m|
|[41m_[0m: | : : |
| : : : : |
| | : | : |
|[34;1mX[0m| :G|X: |
+---------+
  (refuel ,north)
Taxi1-YELLOW: Fuel: 4, Location: (0,2), Collided: False
Taxi2-RED: Fuel: 3, Location: (1,0), Collided: False
Passenger1: Location: (0, 4), Destination: (0, 0)
Passenger2: Location: (4, 0), Destination: (0, 0)


In [12]:
env_limitted_fuel.map_at_location(env_limitted_fuel.state[0][0])

'F'

In [None]:
env_limitted_fuel.fuel_type_list

In [14]:
env_limitted_fuel.step([1, 0])
env_limitted_fuel.render()

['F', 'F']

In [None]:
env_limitted_fuel.step([1, 0])
env_limitted_fuel.render()

In [None]:
state, reward, done = env_limitted_fuel.step([1, 9])
env_limitted_fuel.render()

In [63]:
Here, since we have refuelled, we are back at full. Another thing to note is that the state of the environment is encoded using a single number that represents the state. However, sometimes it may be useful to decode what that number actually means (for example, when using deep-Q learning). We can do that using the decode function.

+---------+
|[35m[35mX[0m[0m:[43m_[0m|F: :X|
| : | : : |
| : : : : |
| | : | : |
|[34;1m[34;1mX[0m[0m| :[41mG[0m|X: |
+---------+
  (north ,south)
Taxi1-YELLOW: Fuel: 1, Location: (0,1), Collided: False
Taxi2-RED: Fuel: 2, Location: (4,2), Collided: False
Passenger1: Location: (4, 0), Destination: (0, 0)
Passenger2: Location: (4, 0), Destination: (0, 0)


In [64]:
state, reward, done = env_limitted_fuel.step([1, 9])
env_limitted_fuel.render()

+---------+
|[35m[35mX[0m[0m:[43m_[0m|F: :X|
| : | : : |
| : : : : |
| | : | : |
|[34;1m[34;1mX[0m[0m| :[41mG[0m|X: |
+---------+
  (north ,refuel)
Taxi1-YELLOW: Fuel: 1, Location: (0,1), Collided: False
Taxi2-RED: Fuel: 2, Location: (4,2), Collided: False
Passenger1: Location: (4, 0), Destination: (0, 0)
Passenger2: Location: (4, 0), Destination: (0, 0)


Here, since we have refuelled, we are back at full. Another thing to note is that the state of the environment is encoded using a single number that represents the state. However, sometimes it may be useful to decode what that number actually means (for example, when using deep-Q learning). We can do that using the decode function.

In [8]:
x, y, pass_loc, pass_dest, fuel = list(env.decode(state))
print("The coordinates of the taxi are currently: " + "(" + str(x) + "," + str(y) +")")
print("The index of the passenger location is: " + str(pass_loc) + ", while the index of the passenger destination is: " + str(pass_dest))
print("Currently, the fuel level of the taxi is: " + str(fuel))

The coordinates of the taxi are currently: (0,2)
The index of the passenger location is: 3, while the index of the passenger destination is: 0
Currently, the fuel level of the taxi is: 8


You can also input your own map as a list of strings and specify the maximum fuel. The map must be formatted in the same way as above, with '+'at the corners and '-', '|' specifying the boundaries. You can have as many destination and fuel stations as you want!

In [9]:
custom_map = [
    '+---------------+',
    '| : :X| :F: : : |',
    '|X: : | : | :X| |',
    '| : : : : : : | |',
    '| :X:F| :X| : :X|',
    '+---------------+',
]

In [10]:
env = OneTaxiFuelEnv(max_fuel=6, map=custom_map)
env.reset()
env.render()

+---------------+
| : :X| :F: : : |
|X: : | : | :[35mX[0m| |
| : : : : : : | |
| :[34;1mX[0m:F|[43m [0m:X| : :X|
+---------------+
Fuel: 6



This concludes the demonstration for the single taxi with fuel environment.

### Multiple taxis without fuel

This environment supports having multiple taxis and multiple passengers which you can specify when initializing. The default initialization is two taxis and one passenger. The goal is to deliver all passengers to their destinations. Each taxi can only carry one passenger at a time, but can dropoff passengers anywhere.

In [11]:
env = MultiTaxiEnv(num_taxis=3, num_passengers=2)
env.reset()
env.state = [[[2, 2], [0, 1], [2, 1]], [[4, 0], [0, 0]], [[0, 0], [4, 3]], [0, 0]]
env.render()

+---------+
|[35m[34;1mX[0m[0m:[41m_[0m| : :X|
| : | : : |
| :[47m_[0m:[43m_[0m: : |
| | : | : |
|[34;1mX[0m| : |[35mX[0m: |
+---------+
Taxi1: Location: (2,2)
Taxi2: Location: (0,1)
Taxi3: Location: (2,1)
Passenger1: Location: (4, 0), Destination: (0, 0)
Passenger2: Location: (0, 0), Destination: (4, 3)


Each state is recorded as a list of lists:
1. a list of locations of each taxi, formatted [row, col]
2. a list of passenger starting locations as coordinates
3. a list of passenger destinations as coordinates
4. a list of current passenger locations as integers, 0 means not picked up, -1 means reached destination, and positive number specifies which taxi they are in

The i'th observation in each of the relevant passenger lists correspond to the same passenger.



In [12]:
print(env.state)

[[[2, 2], [0, 1], [2, 1]], [[4, 0], [0, 0]], [[0, 0], [4, 3]], [0, 0]]


Actions are inputted as list of actions corresponding to each taxi. Each taxi has the same actions as in the above environment with only one taxi except for refueling. Thus the action space of the taxi is (0,1,2,3,4,5,6), where 0,1,2,3 are move south, north, east, west respecitvely, 4 is pickup passenger, 5 is dropoff passenger, and 6 is standby.

Let's move taxis 1 and 3 south, and taxi 2 west.

In [13]:
state, reward, done, _ = env.step([0,3,0])
print("Now the reward is given as the reward for each individual taxi: " + str(reward))

env.render()

Now the reward is given as the reward for each individual taxi: [-1, -1, -1]
+---------+
|[35m[41m[34;1mX[0m[0m[0m: | : :X|
| : | : : |
| : : : : |
| |[47m_[0m:[43m_[0m| : |
|[34;1mX[0m| : |[35mX[0m: |
+---------+
  (South ,West ,South)
Taxi1: Location: (3,2)
Taxi2: Location: (0,0)
Taxi3: Location: (3,1)
Passenger1: Location: (4, 0), Destination: (0, 0)
Passenger2: Location: (0, 0), Destination: (4, 3)


To demonstrate the ability to dropoff passengers anywhere, we will pickup passenger 2 with taxi 2 while telling other taxis to dropoff. Then we will move taxi 2 south twice and dropoff.

In [14]:
actions = [[5,4,5],[5,0,5],[5,0,5],[5,5,5]]
for action in actions:
    env.step(action)
    env.render()

+---------+
|[35m[41;1mX[0m[0m: | : :X|
| : | : : |
| : : : : |
| |[47m_[0m:[43m_[0m| : |
|[34;1mX[0m| : |[35mX[0m: |
+---------+
  (Dropoff ,Pickup ,Dropoff)
Taxi1: Location: (3,2)
Taxi2: Location: (0,0)
Taxi3: Location: (3,1)
Passenger1: Location: (4, 0), Destination: (0, 0)
Passenger2: Location: Taxi2, Destination: (4, 3)
+---------+
|[35mX[0m: | : :X|
|[41;1m [0m: | : : |
| : : : : |
| |[47m_[0m:[43m_[0m| : |
|[34;1mX[0m| : |[35mX[0m: |
+---------+
  (Dropoff ,South ,Dropoff)
Taxi1: Location: (3,2)
Taxi2: Location: (1,0)
Taxi3: Location: (3,1)
Passenger1: Location: (4, 0), Destination: (0, 0)
Passenger2: Location: Taxi2, Destination: (4, 3)
+---------+
|[35mX[0m: | : :X|
| : | : : |
|[41;1m [0m: : : : |
| |[47m_[0m:[43m_[0m| : |
|[34;1mX[0m| : |[35mX[0m: |
+---------+
  (Dropoff ,South ,Dropoff)
Taxi1: Location: (3,2)
Taxi2: Location: (2,0)
Taxi3: Location: (3,1)
Passenger1: Location: (4, 0), Destination: (0, 0)
Passenger2: Location: Taxi2, Desti

We can see that passenger 2's location was taxi 2, and at the end they are at the new location (2,0). Now let's  pick them up with taxi 3.

In [15]:
actions = [[5,5,1],[5,5,3],[5,5,4],[5,5,2]]
for action in actions:
    env.step(action)
    env.render()

+---------+
|[35mX[0m: | : :X|
| : | : : |
|[41m[34;1m [0m[0m:[47m_[0m: : : |
| | :[43m_[0m| : |
|[34;1mX[0m| : |[35mX[0m: |
+---------+
  (Dropoff ,Dropoff ,North)
Taxi1: Location: (3,2)
Taxi2: Location: (2,0)
Taxi3: Location: (2,1)
Passenger1: Location: (4, 0), Destination: (0, 0)
Passenger2: Location: (2, 0), Destination: (4, 3)
+---------+
|[35mX[0m: | : :X|
| : | : : |
|[47m[41m[34;1m [0m[0m[0m: : : : |
| | :[43m_[0m| : |
|[34;1mX[0m| : |[35mX[0m: |
+---------+
  (Dropoff ,Dropoff ,West)
Taxi1: Location: (3,2)
Taxi2: Location: (2,0)
Taxi3: Location: (2,0)
Passenger1: Location: (4, 0), Destination: (0, 0)
Passenger2: Location: (2, 0), Destination: (4, 3)
+---------+
|[35mX[0m: | : :X|
| : | : : |
|[41m[47;1m [0m[0m: : : : |
| | :[43m_[0m| : |
|[34;1mX[0m| : |[35mX[0m: |
+---------+
  (Dropoff ,Dropoff ,Pickup)
Taxi1: Location: (3,2)
Taxi2: Location: (2,0)
Taxi3: Location: (2,0)
Passenger1: Location: (4, 0), Destination: (0, 0)
Passenger2: Loc

Now taxi 3 has picked up passenger 2 and can transport them around. This simple example illustrates this new function that wasn't available in the single taxi environment where the taxi could only dropoff at the destination.

We can also  choose to input our own custom map as well.

In [16]:
custom_map = [
    '+---------------+',
    '| : :X| : : : : |',
    '|X: : | : | :X| |',
    '| : : : : : : | |',
    '| :X: | :X| : :X|',
    '+---------------+',
]

In [17]:
env = MultiTaxiEnv(map=custom_map)
env.reset()
env.render()

+---------------+
| : :X| : : : : |
|X: : | : | :X| |
| : :[43m_[0m: : : : | |
| :[34;1mX[0m: | :[35mX[0m| : :[41mX[0m|
+---------------+
Taxi1: Location: (2,2)
Taxi2: Location: (3,7)
Passenger1: Location: (3, 1), Destination: (3, 4)


Below we will explore the most interesting environment.

### Multiple taxi with fuel

The most complex environment is the multiple taxi with fuel environment. This environment supports an arbitrary number of taxis with fuel and an arbitrary number of passengers. By default, the environment is initialized with two taxis, one passenger, and the maximum (and starting) fuel of each taxi is 8. However, we are able to change all of those parameters like above. This is essentially just the MultiTaxi environment with fuel constraints, so we are still able to dropoff passengers anywhere.

In [18]:
env = MultiTaxiFuelEnv(num_taxis=2, num_passengers=2, max_fuel=8)
env.reset()
env.render()

+---------+
|[43mX[0m: |F: :[34;1mX[0m|
| : | : : |
| : : : : |
| | : | :[41m_[0m|
|[34;1mX[0m| :G|[35m[35mX[0m[0m: |
+---------+
Taxi1: Fuel: 8, Location: (0,0)
Taxi2: Fuel: 8, Location: (3,4)
Passenger1: Location: (4, 0), Destination: (4, 3)
Passenger2: Location: (0, 4), Destination: (4, 3)


Initializing the environment, we have the location and fuel values for each of the passneger, as well as the location and destination values for each passenger. In the multiple passenger enviroment, the episode does not end until each passenger is delivered to their destination.

Each state is recorded as a list of lists like the MultiTaxi environment:
1. a list of locations of each taxi, formatted [row, col]
2. a list of integers specifying fuel levels for each taxi
2. a list of passenger starting locations as coordinates
3. a list of passenger destinations as coordinates
4. a list of current passenger locations as integers, 0 means not picked up, -1 means reached destination, and positive number specifies which taxi they are in

In [19]:
print(env.state)

[[[0, 0], [3, 4]], [8, 8], [[4, 0], [0, 4]], [[4, 3], [4, 3]], [0, 0]]


Like the MultiTaxi environment, actions are now lists of individual actions,  and the actions for each individual taxi are the same as in the original one taxi fuel enviroment (8 poassible actions). For example, suppose that I wish for taxi 1 to go north (action 1) and for taxi 2 to go south (action 0).

In [20]:
state, reward, done, _ = env.step([1,0])
print("Now the reward is given as the reward for each individual taxi: " + str(reward))

env.render()

Now the reward is given as the reward for each individual taxi: [-20, -1]
+---------+
|[43mX[0m: |F: :[34;1mX[0m|
| : | : : |
| : : : : |
| | : | : |
|[34;1mX[0m| :G|[35m[35mX[0m[0m:[41m_[0m|
+---------+
  (North ,South)
Taxi1: Fuel: 8, Location: (0,0)
Taxi2: Fuel: 7, Location: (4,4)
Passenger1: Location: (4, 0), Destination: (4, 3)
Passenger2: Location: (0, 4), Destination: (4, 3)


Now, we will try a random solution, and see how long it takes to finish an episode. In particular, at every time step, each of the taxis will choose a random action. We can easily do this using the env.action_space.sample() function, which returns a random sample of the action space corresponding to a random action by each taxi.

In [21]:
env.reset()
epochs = 0
penalties, reward = 0, 0

frames = [] # for animation

done = False

while not done:
    action = env.action_space.sample()
    state, reward, done, info = env.step(action)

    if reward[0] == -10 or reward[1] == -10:
        penalties += 1
    
    # Put each rendered frame into dict for animation
    frames.append({
        'frame': env.render(mode='ansi'),
        'state': state,
        'action': action,
        'reward': reward
        }
    )

    epochs += 1
    
    
print("Timesteps taken: {}".format(epochs))
print("Penalties incurred: {}".format(penalties))

Timesteps taken: 16855
Penalties incurred: 15645


Here, as we have seen, the episode takes a very long time to finish, and the taxis incur many penalties, meaning that they have tried many invalid moves. We can see this in action by replaying the frames of that episode.

In [None]:
from IPython.display import clear_output
from time import sleep

def print_frames(frames):
    for i, frame in enumerate(frames):
        clear_output(wait=True)
        print(frame['frame'])
        print(f"Timestep: {i + 1}")
        print(f"State: {frame['state']}")
        print(f"Action: {frame['action']}")
        print(f"Reward: {frame['reward']}")
        sleep(.1)
        
print_frames(frames)

+---------+
|X: |[34;1mF[0m: :[43mX[0m|
| : | : :[41m_[0m|
| :[34;1m [0m: : : |
| | : | : |
|[35mX[0m| :G|[35mX[0m: |
+---------+
  (East ,North)
Taxi1: Fuel: 0, Location: (0,4)
Taxi2: Fuel: 0, Location: (1,4)
Passenger1: Location: (0, 2), Destination: (4, 0)
Passenger2: Location: (2, 1), Destination: (4, 3)

Timestep: 3885
State: [[[3, 3], [4, 3]], [0, 0], [[2, 1], [3, 3]], [[4, 0], [4, 3]], [-1, -1]]
Action: [2 1]
Reward: [-20, -10]
