## Variation on the Taxi-Grid Enviroment

The multi-taxi environment was created by Kevin Huang and Howie Guo from Harvard College and Ofir Abu from Hebrew University. Thank you so much for their hard work. 

### Motivation

Our project is based on the taxi gym environment (https://gym.openai.com/envs/Taxi-v3/), which is a basic "grid-world" enviroment used for research in artificial intelligence and reinforcement learning.

In [None]:
import gym

env = gym.make("Taxi-v3").env

env.reset ()
env.render()

+---------+
|[35mR[0m: | : :G|
| : | : : |
| : : : : |
| | : | :[43m [0m|
|Y| : |[34;1mB[0m: |
+---------+



This is original taxi grid enviroment, which is available as part of the OpenAI gym module. The enviroment models a taxi, which moves within the gridworld enviroment with the goal of picking up a passenger and bringing her to her destination.

Our modifications on the original enviroment were created with two main
goals:  
**First**, the original taxi grid enviroment is a single agent enviroment. We were interested in allowing multi-agent reinforcement learning, so we created an option to use multiple taxis.  
**Second**, by adding a fuel constraint modification, we were able to test whether agents would be able to consider the long term fuel constraint, and maybe use the environment in limitted resources research situations.  
**Third**, our environment can show and allow/restrict for taxis collision.  
**Fourth**, we added to the fuel constraint a fuel-type constraint (we can model gas/fuel taxis with suitable fuel stations).

### Installation

To work with our custom enviroments, the first thing that we want to do is to install them. As the enviroments are packaged together into a module called "multitaxienv" which is to be found on the github, it is easy to install all of the modules at once.

In [None]:
!pip install git+https://github.com/sarah-keren/MutliTaxiEnv.git --upgrade

Collecting git+https://github.com/sarah-keren/MutliTaxiEnv.git
  Cloning https://github.com/sarah-keren/MutliTaxiEnv.git to /tmp/pip-req-build-ycshx639
  Running command git clone -q https://github.com/sarah-keren/MutliTaxiEnv.git /tmp/pip-req-build-ycshx639
Building wheels for collected packages: MultiTaxiEnv
  Building wheel for MultiTaxiEnv (setup.py) ... [?25l[?25hdone
  Created wheel for MultiTaxiEnv: filename=MultiTaxiEnv-0.2-cp36-none-any.whl size=8117 sha256=0f23baee05dae777a93e6d5fa72f07c32cb3bad72ed34cec5b18dd021f6b7598
  Stored in directory: /tmp/pip-ephem-wheel-cache-h0cf1sdl/wheels/a7/8b/8f/2c7828f791585f08f092f39fb14023ccf64d14a1a3d68143ef
Successfully built MultiTaxiEnv
Installing collected packages: MultiTaxiEnv
  Found existing installation: MultiTaxiEnv 0.2
    Uninstalling MultiTaxiEnv-0.2:
      Successfully uninstalled MultiTaxiEnv-0.2
Successfully installed MultiTaxiEnv-0.2


You'll note that we have only one calss file. To support all the variations of our environment we use the parameters in the __init__ of our class.

After installing the environment, run the following line:

In [None]:
# used for debugging purposes
%load_ext autoreload
%autoreload 2
from multitaxienv.taxi_environment import TaxiEnv

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Initializing Environment - Hyper-Parameters of the Class

In [None]:
env = TaxiEnv(num_taxis = 2, num_passengers = 2, max_fuel = None,
                 taxis_capacity = None, collision_sensitive_domain = False,
                 fuel_type_list = None, option_to_stand_by = True)

1. *num_taxis* - default is 2, but you can select any number $\in$ $[1, \infty)$.
2. *num_passengers* - same.
3. *max_fuel* - a list where the i'th element is the max fuel of taxi number i. Each taxi starts with max fuel. Default is **None** which sets fuel limit to np.inf (no fuel model).
4. *taxis_capacity* - A list with the passengers limit for each taxi, default is 1 for each taxi.
5. *collision_sensitive_domain* - Boolean to specify wether collisions will be shown and affect the domain (True) or will be ignored (False).
6. *fuel_type_list* - The fuel type ('**F**uel / '**G**as' of each taxi, default is 'F'.
7. *option_to_stand_by* - Can taxis stand in place (True), or not (False), default is True.

In [None]:
env = TaxiEnv()
env.reset()
env.s = 1022
env.render()

+---------+
|[35mX[0m:[43m_[0m|F: :[34;1m[34;1mX[0m[0m|
| : | : : |
| :[41m_[0m: : : |
| | : | : |
|[35mX[0m| :G|X: |
+---------+
Taxi1-YELLOW: Fuel: inf, Location: (0,1), Collided: False
Taxi2-RED: Fuel: inf, Location: (2,1), Collided: False
Passenger1: Location: (0, 4), Destination: (0, 0)
Passenger2: Location: (0, 4), Destination: (4, 0)


Here we initialized a domain with the default values. As we can see:  
We have **2 taxis** represented in the highlighted boxes, we can see wich taxi in which color by the description printed below the map.  
We also have **np.inf** fuel limit.  
There are also **2 passengers** at the X marked positions with destination at X colored in *magenta*.

Note that we save for each taxi wether it had been collided or not.

### Action Space of the Environment

In [None]:
env.get_available_actions_dictionary()[1]

{0: 'south',
 1: 'north',
 2: 'east',
 3: 'west',
 4: 'pickup',
 5: 'dropoff',
 6: 'turn_engine_on',
 7: 'turn_engine_off',
 8: 'standby',
 9: 'refuel'}

**get_available_actions_dictionary()** returns a tuple where:  
1. 1st element - available action indexes that the specific initialized domain supports.
2. 2nd element - the dictionary above which specify all available actions' names and it's indexes.  

*Note that in no-fuel model (induced by max_fuel = np.inf) and in no option to standby model - we won't be able to use actions: 9 and 6, 7, 8 respectively.*

In [None]:
env.get_available_actions_dictionary()[0]

[0, 1, 2, 3, 4, 5, 6, 7, 8]

At the first element returned we get the action space available to us on the initialized domain.  
**Those indexes are the actions we send to the domain when we want to make a step!**

Here we send actions "east" for each one of our 2 taxis.  
We get the environment's next state which is build as: **taxis locations, current_fuel_list, passengers_start_locations, destinations, passengers_status**.

In [None]:
state, reward, done = env.step([2, 2])
print("The next state is: " + str(state) + ", the reward for the last action is: " + str(reward) + ", and the episode is "+ {True: "", False: "not"}[done]  + " done.")

The next state is: [[[0, 1], [2, 2]], [inf, inf], [[0, 4], [0, 4]], [[0, 0], [4, 0]], [0, 0]], the reward for the last action is: [-20, -1], and the episode is not done.


Generally to preform an action, we use the step function. This returns a tuple which includes the next state, the reward and whether the episode has ended or not.  
In this environment, the episode ends when either:
1. *all passengers reached their destinations*.
2. *all taxis are out of fuel*.
3. *all taxis collided*.  

The ***rewards***, are given in the config.py as below:

In [None]:
taxi_env_rewards = dict(
    step=-1,
    no_fuel=-20,
    bad_pickup=-15,
    bad_dropoff=-15,
    bad_refuel=-10,
    pickup=-1,
    standby_engine_off=-1,
    turn_engine_on=-1,
    turn_engine_off=-1,
    standby_engine_on=-1,
    intermediate_dropoff=-10,
    final_dropoff=100,
    hit_wall=-20,
    collision=-30,
)

Now let's try to navigate to the fuel station:

In [None]:
env.render()

+---------+
|[35mX[0m:[43m_[0m|F: :[34;1m[34;1mX[0m[0m|
| : | : : |
| : :[41m_[0m: : |
| | : | : |
|[35mX[0m| :G|X: |
+---------+
  (east ,east)
Taxi1-YELLOW: Fuel: inf, Location: (0,1), Collided: False
Taxi2-RED: Fuel: inf, Location: (2,2), Collided: False
Passenger1: Location: (0, 4), Destination: (0, 0)
Passenger2: Location: (0, 4), Destination: (4, 0)


In [None]:
env.step([1, 0])
env.render()

+---------+
|[35mX[0m:[43m_[0m|F: :[34;1m[34;1mX[0m[0m|
| : | : : |
| : : : : |
| | : | : |
|[35mX[0m| :[41mG[0m|X: |
+---------+
  (north ,south)
Taxi1-YELLOW: Fuel: inf, Location: (0,1), Collided: False
Taxi2-RED: Fuel: inf, Location: (4,2), Collided: False
Passenger1: Location: (0, 4), Destination: (0, 0)
Passenger2: Location: (0, 4), Destination: (4, 0)


### **Now let's play with a limtted fuel mode**  
Notice,  
1. in the map-description as the fuel status changes for the taxi that we move.
2. The taxi is able to refuel only at stations with a suitable fuel type as defined for it in __init__.

In [None]:
env_limitted_fuel = TaxiEnv(num_taxis = 2, num_passengers = 2, max_fuel = [5, 5],
                            taxis_capacity = None, collision_sensitive_domain = False,
                            fuel_type_list = None, option_to_stand_by = True)

In [None]:
env_limitted_fuel.reset()
env_limitted_fuel.render()

+---------+
|X: |F: :[35mX[0m|
| : | : : |
| :[41m_[0m: : : |
| | : | : |
|[35m[34;1mX[0m[0m|[43m_[0m:G|[34;1mX[0m: |
+---------+
Taxi1-YELLOW: Fuel: 5, Location: (4,1), Collided: False
Taxi2-RED: Fuel: 5, Location: (2,1), Collided: False
Passenger1: Location: (4, 3), Destination: (4, 0)
Passenger2: Location: (4, 0), Destination: (0, 4)


We will now navigate to the fuel station and refuel.
Notice how the fuel of each taxi changes as it moves.

In [None]:
env_limitted_fuel.step([0, 3])  # south, west
env_limitted_fuel.step([0, 3])  # south, west
env_limitted_fuel.render()

+---------+
|X: |F: :[35mX[0m|
| : | : : |
|[41m_[0m: : : : |
| | : | : |
|[35m[34;1mX[0m[0m|[43m_[0m:G|[34;1mX[0m: |
+---------+
  (south ,west)
Taxi1-YELLOW: Fuel: 5, Location: (4,1), Collided: False
Taxi2-RED: Fuel: 4, Location: (2,0), Collided: False
Passenger1: Location: (4, 3), Destination: (4, 0)
Passenger2: Location: (4, 0), Destination: (0, 4)


In [None]:
env_limitted_fuel.step([2, 9])  # east, refuel
env_limitted_fuel.render()

+---------+
|[35m[34;1mX[0m[0m: |[41mF[0m: :[35m[34;1mX[0m[0m|
| : | : : |
| : : : : |
| | : | : |
|X| :G|X:[43m_[0m|
+---------+
  (east ,refuel)
Taxi1-YELLOW: Fuel: 3, Location: (4,4), Collided: False
Taxi2-RED: Fuel: 5, Location: (0,2), Collided: False
Passenger1: Location: (0, 4), Destination: (0, 0)
Passenger2: Location: (0, 0), Destination: (0, 4)


In [None]:
env_limitted_fuel.step([1, 0])
env_limitted_fuel.render()

+---------+
|[35mX[0m: |F: :X|
| : | : : |
| : : : : |
| |[41m_[0m:[43m_[0m| : |
|[35mX[0m| :G|[34;1m[34;1mX[0m[0m: |
+---------+
  (north ,south)
Taxi1-YELLOW: Fuel: 4, Location: (3,2), Collided: False
Taxi2-RED: Fuel: 2, Location: (3,1), Collided: False
Passenger1: Location: (4, 3), Destination: (0, 0)
Passenger2: Location: (4, 3), Destination: (4, 0)


In [None]:
state, reward, done = env_limitted_fuel.step([1, 9])
env_limitted_fuel.render()

+---------+
|[35mX[0m: |F: :X|
| : | : : |
| : :[43m_[0m: : |
| |[41m_[0m: | : |
|[35mX[0m| :G|[34;1m[34;1mX[0m[0m: |
+---------+
  (north ,refuel)
Taxi1-YELLOW: Fuel: 3, Location: (2,2), Collided: False
Taxi2-RED: Fuel: 2, Location: (3,1), Collided: False
Passenger1: Location: (4, 3), Destination: (0, 0)
Passenger2: Location: (4, 3), Destination: (4, 0)


Here, since we have refuelled, the red taxi is at full. Another thing to note is that the state of the environment is encoded using a single number that represents the state. However, sometimes it may be useful to decode what that number actually means (for example, when using deep-Q learning). We can do that using the decode function.

Here, since we have refuelled, we are back at full.

### Using Different Map  
You can also input your own map as a list of strings and specify the maximum fuel. The map must be formatted in the same way as above, with '+'at the corners and '-', '|' specifying the boundaries. You can have as many destination and fuel stations as you want!

In [None]:
custom_map = [
    '+---------------+',
    '| : :X| :F: : : |',
    '|X: : | : | :X| |',
    '| : : : : : : | |',
    '| :X:F| :X| : :X|',
    '+---------------+',
]

In [None]:
env_new_map = TaxiEnv(domain_map=custom_map)
env_new_map.reset()
env_new_map.render()

+---------------+
| : :[35m[35mX[0m[0m|[43m_[0m:F: : : |
|[34;1mX[0m: : | : | :[34;1mX[0m| |
| : : : : : :[41m_[0m| |
| :X:F| :X| : :X|
+---------------+
Taxi1-YELLOW: Fuel: inf, Location: (0,3), Collided: False
Taxi2-RED: Fuel: inf, Location: (2,6), Collided: False
Passenger1: Location: (1, 0), Destination: (0, 2)
Passenger2: Location: (1, 6), Destination: (0, 2)


### Driving Passengers

Here we'll show how the taxi can collect a passenger and drop it off, things to notice:  
1. Passenger's location is changed when he's picked up and dropped off.
2. The taxi can only collect as much passengers as it's capacity.
3. The passenger can be dropped off anywhere free on the map.

In [None]:
passengers_example_env = TaxiEnv(num_taxis = 2, num_passengers = 2, max_fuel = [5, 5],
                            taxis_capacity = None, collision_sensitive_domain = False,
                            fuel_type_list = None, option_to_stand_by = True)

In [None]:
passengers_example_env.reset()
passengers_example_env.render()

+---------+
|[35mX[0m: |F: :[34;1mX[0m|
| : | : :[41m_[0m|
| : : : :[43m_[0m|
| | : | : |
|[34;1mX[0m| :G|[35mX[0m: |
+---------+
Taxi1-YELLOW: Fuel: 5, Location: (2,4), Collided: False
Taxi2-RED: Fuel: 5, Location: (1,4), Collided: False
Passenger1: Location: (0, 4), Destination: (4, 3)
Passenger2: Location: (4, 0), Destination: (0, 0)


In [None]:
passengers_example_env.step([1, 1])  # north, north
passengers_example_env.render()

+---------+
|[35mX[0m: |F: :[41m[34;1mX[0m[0m|
| : | : :[43m_[0m|
| : : : : |
| | : | : |
|[34;1mX[0m| :G|[35mX[0m: |
+---------+
  (north ,north)
Taxi1-YELLOW: Fuel: 4, Location: (1,4), Collided: False
Taxi2-RED: Fuel: 4, Location: (0,4), Collided: False
Passenger1: Location: (0, 4), Destination: (4, 3)
Passenger2: Location: (4, 0), Destination: (0, 0)


In [None]:
passengers_example_env.step([4, 1])  # south, pickup
passengers_example_env.render()

+---------+
|[35mX[0m: |F: :[41m[34;1mX[0m[0m|
| : | : :[43m_[0m|
| : : : : |
| | : | : |
|[34;1mX[0m| :G|[35mX[0m: |
+---------+
  (pickup ,north)
Taxi1-YELLOW: Fuel: 4, Location: (1,4), Collided: False
Taxi2-RED: Fuel: 4, Location: (0,4), Collided: False
Passenger1: Location: (0, 4), Destination: (4, 3)
Passenger2: Location: (4, 0), Destination: (0, 0)


In [None]:
passengers_example_env.step([0, 1])  # south, north
passengers_example_env.render()

+---------+
|[35mX[0m: |F: :[41m[34;1mX[0m[0m|
| : | : : |
| : : : :[43m_[0m|
| | : | : |
|[34;1mX[0m| :G|[35mX[0m: |
+---------+
  (south ,north)
Taxi1-YELLOW: Fuel: 3, Location: (2,4), Collided: False
Taxi2-RED: Fuel: 4, Location: (0,4), Collided: False
Passenger1: Location: (0, 4), Destination: (4, 3)
Passenger2: Location: (4, 0), Destination: (0, 0)


In [None]:
passengers_example_env.step([5, 1])  # dropoff, north
passengers_example_env.render()

+---------+
|[35mX[0m: |F: :[41m[34;1mX[0m[0m|
| : | : : |
| : : : :[43m_[0m|
| | : | : |
|[34;1mX[0m| :G|[35mX[0m: |
+---------+
  (dropoff ,north)
Taxi1-YELLOW: Fuel: 3, Location: (2,4), Collided: False
Taxi2-RED: Fuel: 4, Location: (0,4), Collided: False
Passenger1: Location: (0, 4), Destination: (4, 3)
Passenger2: Location: (4, 0), Destination: (0, 0)


### Collisions - Allow or Enforce?

By using the hyper-parameter *collision_sensitive_model* at __init__, we can choose how to treat collisions.
We'll demonstrate a different case of collision in the environment, but the important things to remember are:  
1. When taxis can standby and the current step is about to cause a collision - **the later taxi to take action will be forced to standby**.
2. when all taxis are collided - the domain is done.

The collision status is printed in every taxi's status.

In [None]:
import numpy as np
collision_env_example = TaxiEnv(num_taxis = 3, num_passengers = 2, max_fuel = [np.inf]*3,
                            taxis_capacity = None, collision_sensitive_domain = True,
                            fuel_type_list = None, option_to_stand_by = False)

In [None]:
collision_env_example.reset()
collision_env_example.render()

+---------+
|X: |[43mF[0m: :[35mX[0m|
| : | : : |
| : : :[47m_[0m: |
| | : | :[41m_[0m|
|[35m[34;1mX[0m[0m| :G|[34;1mX[0m: |
+---------+
Taxi1-YELLOW: Fuel: inf, Location: (0,2), Collided: False
Taxi2-RED: Fuel: inf, Location: (3,4), Collided: False
Taxi3-WHITE: Fuel: inf, Location: (2,3), Collided: False
Passenger1: Location: (4, 3), Destination: (4, 0)
Passenger2: Location: (4, 0), Destination: (0, 4)


In [None]:
collision_env_example.step([0, 1, 0])  # south, north, south
collision_env_example.render()

+---------+
|X: |F: :[35mX[0m|
| : |[43m_[0m: : |
| : : : :[41m_[0m|
| | : |[47m_[0m: |
|[35m[34;1mX[0m[0m| :G|[34;1mX[0m: |
+---------+
  (south ,north ,south)
Taxi1-YELLOW: Fuel: inf, Location: (1,2), Collided: False
Taxi2-RED: Fuel: inf, Location: (2,4), Collided: False
Taxi3-WHITE: Fuel: inf, Location: (3,3), Collided: False
Passenger1: Location: (4, 3), Destination: (4, 0)
Passenger2: Location: (4, 0), Destination: (0, 4)


In [None]:
collision_env_example.step([0, 0, 0])  # all south
collision_env_example.step([0, 0, 0])  # all south
collision_env_example.render()

+---------+
|X: |F: :[35mX[0m|
| : | : : |
| : : : : |
| | :[43m_[0m| : |
|[35m[34;1mX[0m[0m| :G|[47m[34;1mX[0m[0m:[41m_[0m|
+---------+
  (south ,south ,south)
Taxi1-YELLOW: Fuel: inf, Location: (3,2), Collided: False
Taxi2-RED: Fuel: inf, Location: (4,4), Collided: False
Taxi3-WHITE: Fuel: inf, Location: (4,3), Collided: False
Passenger1: Location: (4, 3), Destination: (4, 0)
Passenger2: Location: (4, 0), Destination: (0, 4)
