<a href="https://colab.research.google.com/github/sarah-keren/MultiTaxiEnv/blob/master/notebooks/MultiTaxiOriginalEnv.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


##Variation on the Taxi-Grid Enviroment
The multi-taxi environment was created by Kevin Huang and Howie Guo from Harvard College and Ofir Abu from Hebrew University. Thank you so much for their hard work.

###Motivation
Our project is based on the taxi gym environment (https://gym.openai.com/envs/Taxi-v3/), which is a basic "grid-world" enviroment used for research in artificial intelligence and reinforcement learning.

In [1]:
import gym

env = gym.make("Taxi-v3").env

env.reset ()
env.render()

+---------+
|R: | : :[34;1mG[0m|
| :[43m [0m| : : |
| : : : : |
| | : | : |
|Y| : |[35mB[0m: |
+---------+




This is original taxi grid enviroment, which is available as part of the OpenAI gym module. The enviroment models a taxi, which moves within the gridworld enviroment with the goal of picking up a passenger and bringing her to her destination.

Our modifications on the original enviroment were created with two main goals:

**First**, the original taxi grid enviroment is a single agent enviroment. We were interested in allowing multi-agent reinforcement learning, so we created an option to use multiple taxis.

**Second**, by adding a fuel constraint modification, we were able to test whether agents would be able to consider the long term fuel constraint, and maybe use the environment in limitted resources research situations.

**Third**, our environment can show and allow/restrict for taxis collision.

**Fourth**, we added to the fuel constraint a fuel-type constraint (we can model gas/fuel taxis with suitable fuel stations).

###Installation
To work with our custom enviroments, the first thing that we want to do is to install them. As the enviroments are packaged together into a module called "multitaxienv" which is to be found on the github, it is easy to install all of the modules at once.

In [2]:
!git clone https://github.com/sarah-keren/MultiTaxiEnv.git

Cloning into 'MultiTaxiEnv'...
remote: Enumerating objects: 333, done.[K
remote: Counting objects: 100% (92/92), done.[K
remote: Compressing objects: 100% (70/70), done.[K
remote: Total 333 (delta 44), reused 58 (delta 19), pack-reused 241[K
Receiving objects: 100% (333/333), 152.41 KiB | 2.00 MiB/s, done.
Resolving deltas: 100% (192/192), done.



You'll note that we have only one calss file. To support all the variations of our environment we use the parameters in the init of our class.

After installing the environment, run the following line

In [3]:
%cd MultiTaxiEnv
from taxi_environment import TaxiEnv

/content/MultiTaxiEnv
+---------+
|[34;1mX[0m:[43m_[0m|F: :[34;1mX[0m|
| : | : : |
| : : : :[41m_[0m|
| | : | : |
|[35mX[0m| :G|[35mX[0m: |
+---------+
Taxi1-YELLOW: Fuel: 100, Location: (0,1), Collided: False
Taxi2-RED: Fuel: 100, Location: (2,4), Collided: False
Passenger1: Location: (0, 4), Destination: (4, 3)
Passenger2: Location: (0, 0), Destination: (4, 0)
Done: False, {'taxi_1': False, 'taxi_2': False, '__all__': False}
Passengers Status's: [2, 2]


###Initializing Environment - Hyper-Parameters of the Class

In [4]:
# env = TaxiEnv(num_taxis = 2, num_passengers = 2, max_fuel = None,
#                  taxis_capacity = None, collision_sensitive_domain = True,
#                  fuel_type_list = None, option_to_stand_by = False)
env = TaxiEnv(num_taxis = 2, num_passengers = 1, max_fuel = None,
                 domain_map = None, taxis_capacity = None, collision_sensitive_domain = True,
                 fuel_type_list = None, option_to_stand_by = False)



1.   num_taxis - default is 2, but you can select any number $\in$ $[1, \infty)$.
2.   num_passengers - same.
3.   max_fuel - a list where the i'th element is the max fuel of taxi number i. Each taxi starts with max fuel. Default is None which sets fuel limit to np.inf (no fuel model).
4.   taxis_capacity - A list with the passengers limit for each taxi, default is 1 for each taxi.
5.   collision_sensitive_domain - Boolean to specify wether collisions will be shown and affect the domain (True) or will be ignored (False).
6.   fuel_type_list - The fuel type ('Fuel / 'Gas' of each taxi, default is 'F'.
7.   option_to_stand_by - Can taxis stand in place (True), or not (False), default is True.



In [5]:
env = TaxiEnv()
env.reset()
env.s = 1022
env.render()

+---------+
|[34;1mX[0m: |F: :[35mX[0m|
| :[41m_[0m| : : |
| : : : : |
| | : | : |
|X| :G|[43mX[0m: |
+---------+
Taxi1-YELLOW: Fuel: 100, Location: (4,3), Collided: False
Taxi2-RED: Fuel: 100, Location: (1,1), Collided: False
Passenger1: Location: (0, 0), Destination: (0, 4)
Done: False, {'taxi_1': False, 'taxi_2': False, '__all__': False}
Passengers Status's: [2]


Here we initialized a domain with the default values. As we can see:
We have **2 taxis** represented in the highlighted boxes, we can see wich taxi in which color by the description printed below the map.
We also have **np.inf** fuel limit.
There are also **2 passengers** at the X marked positions with destination at X colored in magenta.

Note that we save for each taxi wether it had been collided or not.

###Action Space of the Environment

In [6]:
env.get_available_actions_dictionary()[1]

{0: 'south',
 1: 'north',
 2: 'east',
 3: 'west',
 4: 'pickup',
 5: 'dropoff',
 6: 'refuel',
 7: 'turn_engine_on',
 8: 'turn_engine_off',
 9: 'standby',
 10: 'refuel'}

**get_available_actions_dictionary()** returns a tuple where:

1.  1st element - available action indexes that the specific initialized domain supports.
2.  2nd element - the dictionary above which specify all available actions' names and it's indexes.
Note that in no-fuel model (induced by max_fuel = np.inf) and in no option to standby model - we won't be able to use actions: 9 and 6, 7, 8 respectively.

In [7]:
env.get_available_actions_dictionary()[0]

[0, 1, 2, 3, 4, 5, 7, 8, 9, 10]

At the first element returned we get the action space available to us on the initialized domain.
**Those indexes are the actions we send to the domain when we want to make a step!**

Here we send actions "east" for each one of our 2 taxis.
We get the environment's next state which is build as: **taxis locations, current_fuel_list, passengers_start_locations, destinations, passengers_status**.

In [14]:
state, reward, done, _ = env.step({"taxi_1": 2, "taxi_2": 2})
print("taxi_1 next state is: " + str(state["taxi_1"]) + ", taxi_2 next state is: " +  str(state["taxi_2"]))
print("the reward for the last action is: " + str(reward["taxi_1"]) + ", " + str(reward["taxi_2"]) + " for the two taxis respectivly")
print( "and the episode is "+ str("" if done["__all__"] else "not")  + " done.")

taxi_1 next state is: [[ 4  4  0  0 99  0  0  0  0  4  2]], taxi_2 next state is: [[  1   1   0   0 100   0   0   0   0   4   2]]
the reward for the last action is: -2, -2 for the two taxis respectivly
and the episode is not done.


Generally to preform an action, we use the step function. This returns a tuple which includes the next state, the reward and whether the episode has ended or not.
In this environment, the episode ends when either:

1.  all passengers reached their destinations.
2.  all taxis are out of fuel.
3.  all taxis collided.
The ***rewards***, are given in the config.py as below:

In [15]:
taxi_env_rewards = dict(
    step=-1,
    no_fuel=-20,
    bad_pickup=-15,
    bad_dropoff=-15,
    bad_refuel=-10,
    pickup=-1,
    standby_engine_off=-1,
    turn_engine_on=-1,
    turn_engine_off=-1,
    standby_engine_on=-1,
    intermediate_dropoff=-10,
    final_dropoff=100,
    hit_wall=-20,
    collision=-30,
)

Now let's try to navigate to the fuel station:



In [18]:
env.render()

+---------+
|[34;1mX[0m: |F: :[35mX[0m|
| :[41m_[0m| : : |
| : : : : |
| | : | : |
|X| :G|X:[43m_[0m|
+---------+
  (east ,east)
Taxi1-YELLOW: Fuel: 99, Location: (4,4), Collided: False
Taxi2-RED: Fuel: 100, Location: (1,1), Collided: False
Passenger1: Location: (0, 0), Destination: (0, 4)
Done: False, {'taxi_1': False, 'taxi_2': False, '__all__': False}
Passengers Status's: [2]


In [19]:
env.step({"taxi_1": 1, "taxi_2": 0})
env.render()

+---------+
|[34;1mX[0m: |F: :[35mX[0m|
| : | : : |
| :[41m_[0m: : : |
| | : | :[43m_[0m|
|X| :G|X: |
+---------+
  (north ,south)
Taxi1-YELLOW: Fuel: 98, Location: (3,4), Collided: False
Taxi2-RED: Fuel: 99, Location: (2,1), Collided: False
Passenger1: Location: (0, 0), Destination: (0, 4)
Done: False, {'taxi_1': False, 'taxi_2': False, '__all__': False}
Passengers Status's: [2]


###Using Different Map
You can also input your own map as a list of strings and specify the maximum fuel. The map must be formatted in the same way as above, with '+'at the corners and '-', '|' specifying the boundaries. You can have as many destination and fuel stations as you want!

In [20]:
custom_map = [
    '+---------------+',
    '| : :X| :F: : : |',
    '|X: : | : | :X| |',
    '| : : : : : : | |',
    '| :X:F| :X| : :X|',
    '+---------------+',
]

In [22]:
env_new_map = TaxiEnv(domain_map=custom_map)
env_new_map.reset()
env_new_map.render()

+---------------+
| : :[35mX[0m| :F: : : |
|X: : | : | :[34;1mX[0m| |
| : : : : : : | |
| :[43mX[0m:F| :X| : :[41mX[0m|
+---------------+
Taxi1-YELLOW: Fuel: 100, Location: (3,1), Collided: False
Taxi2-RED: Fuel: 100, Location: (3,7), Collided: False
Passenger1: Location: (1, 6), Destination: (0, 2)
Done: False, {'taxi_1': False, 'taxi_2': False, '__all__': False}
Passengers Status's: [2]
