# Quick start

### Install

Uncomment the following cells:

In [None]:
#!git clone https://github.com/ricgama/maenvs4vrp_beta.git

In [None]:
# When using Colab
# %cd maenvs4vrp_beta/
# ! pip install -e .
#%cd maenvs4vrp/notebooks/

## Basic usage

Let's explore the library using the CVRPTW environment as an example. Our API structure is inspired by [PettingZoo](https://pettingzoo.farama.org/), following the Agent Environment Cycle (AEC) philosophy. We have been also greatly influenced by [Flatland's](https://flatland.aicrowd.com/intro.html) environment library, and we chose to adopt some of its design principles.

In [None]:
from maenvs4vrp.environments.cvrptw.env import Environment
from maenvs4vrp.environments.cvrptw.env_agent_selector import AgentSelector
from maenvs4vrp.environments.cvrptw.observations import Observations
from maenvs4vrp.environments.cvrptw.instances_generator import InstanceGenerator
from maenvs4vrp.environments.cvrptw.env_agent_reward import DenseReward
%load_ext autoreload
%autoreload 2

In [None]:
gen = InstanceGenerator(batch_size = 8)
obs = Observations()
sel = AgentSelector()
rew = DenseReward()

env = Environment(instance_generator_object=gen,  
                  obs_builder_object=obs,
                  agent_selector_object=sel,
                  reward_evaluator=rew,
                  seed=0)

In [None]:
td = env.reset(batch_size = 8, num_agents=4, num_nodes=16)
td

In [None]:
while not td["done"].all():  
    td = env.sample_action(td) # this is where we insert our policy
    td = env.step(td)

## Quick walkthrough

Let's now go through the library's building blocks, exploring their functionalities.

### Instance generation

We can generate instances using one of the two available methods `InstanceGenerator` and `BenchmarkInstanceGenerator`:

In [None]:
from maenvs4vrp.environments.cvrptw.instances_generator import InstanceGenerator
from maenvs4vrp.environments.cvrptw.benchmark_instances_generator import BenchmarkInstanceGenerator

#### Random generated instances

Random instances are generated following:

Li, S., Yan, Z., & Wu, C. (2021). [Learning to delegate for large-scale vehicle routing](https://proceedings.neurips.cc/paper/2021/hash/dc9fa5f217a1e57b8a6adeb065560b38-Abstract.html). Advances in Neural Information Processing Systems, 34, 26198-26211.

In [None]:
generator = InstanceGenerator()

In [None]:
instance = generator.sample_instance(num_agents=2, num_nodes=10)

In [None]:
instance.keys()

In [None]:
instance

It's possible to load a set of pre-generaded instances, to be used as validation/test sets. For example:

In [None]:
set_of_instances = set(generator.get_list_of_benchmark_instances()['servs_100_agents_25']['validation'])

In [None]:
generator = InstanceGenerator(instance_type='validation', set_of_instances=set_of_instances)

In [None]:
instance = generator.sample_instance()

Let's check instance dict keys:

In [None]:
instance.keys()

In [None]:
instance['name']

#### Benchmark instances

In order to narrow the current gap between the test beds for algorithm benchmarking used in RL
and OR communities, the library allows a straightforward integration of classical OR benchmark
instances. For example, we can load a set of classical benchmark instances. Let's see what benchmark instances we have for the CVPTW:

In [None]:
BenchmarkInstanceGenerator.get_list_of_benchmark_instances()

Ok! Now we instanciate the `generator` selecting two of them:

In [None]:
generator = BenchmarkInstanceGenerator(instance_type='Solomon', set_of_instances={'C101', 'C102'})

In [None]:
instance_c101 = generator.get_instance('C101')

In [None]:
instance_c101.keys()

In [None]:
instance_c101['name']

In [None]:
instance_c101['num_agents']

In [None]:
instance_c101['num_nodes']

By customizing `.sample_instance` method arguments, it is possible to sample a sub-instance of the original instance:

In [None]:
instance = generator.sample_instance(num_agents=3, num_nodes=8)

In [None]:
instance['name']

In [None]:
instance['num_agents']

In [None]:
instance['num_nodes']

For the CVRPTW, setting `random_sample=False` we sample first `n` instace services (see  [Transportation Optimization Portal](https://www.sintef.no/projectweb/top/vrptw) for more details about `first n` Solomon benchmark
 instance):

In [None]:
instance = generator.sample_instance(num_agents=3, num_nodes=8, sample_type='random')

In [None]:
instance['name']

###  Observations

Observation features, that will be available to the active agent while interacting with the environment, are handle by `Observations` class. 

In [None]:
from maenvs4vrp.environments.cvrptw.observations import Observations

In [None]:
obs = Observations()

The class has a `default_feature_list` attribute where the default configuration dictionary is defined.

In [None]:
obs.default_feature_list

Also, five possible features lists exist, detailing the available features in the class: `POSSIBLE_NODES_STATIC_FEATURES`, `POSSIBLE_NODES_DYNAMIC_FEATURES`, `POSSIBLE_SELF_FEATURES`, `POSSIBLE_AGENTS_FEATURES`, `POSSIBLE_GLOBAL_FEATURES`. For example:

In [None]:
obs.POSSIBLE_NODES_STATIC_FEATURES

In [None]:
obs.POSSIBLE_GLOBAL_FEATURES

While instantiating the `Observations` class, we can pass through a feature list dictionary specifying which features will be available for the agent:

In [None]:
import yaml

In [None]:
feature_list = yaml.safe_load("""
    nodes_static:
        x_coordinate_min_max:
            feat: x_coordinate_min_max
            norm: min_max
        x_coordinate_min_max: 
            feat: x_coordinate_min_max
            norm: min_max
        tw_low_mm:
            feat: tw_low
            norm: min_max
        tw_high:
            feat: tw_high
            norm: min_max

    nodes_dynamic:
        - time2open_div_end_time
        - time2close_div_end_time
        - time2open_after_step_div_end_time
        - time2close_after_step_div_end_time
        - fract_time_after_step_div_end_time

    agent:
        - x_coordinate_min_max
        - y_coordinate_min_max
        - frac_current_time
        - frac_current_load

    other_agents:
        - x_coordinate_min_max
        - y_coordinate_min_max
        - frac_current_time
        - frac_current_load
        - dist2agent_div_end_time
    
    global:
        - frac_demands
        - frac_fleet_load_capacity
        - frac_done_agents
        - frac_not_done_nodes
        - frac_used_agents
""")

In [None]:
obs = Observations(feature_list)

We can test these observations on the environment:

In [None]:
gen = InstanceGenerator(batch_size=8)
obs = Observations()
sel = AgentSelector()
rew = DenseReward()

env = Environment(instance_generator_object=gen,  
                  obs_builder_object=obs,
                  agent_selector_object=sel,
                  reward_evaluator=rew,
                  seed=0)

In [None]:
td = env.reset(batch_size = 8, num_agents=4, num_nodes=16)

In [None]:
td_observation = env.observe()

In [None]:
td_observation

###  Agent Selector class

In [None]:
from maenvs4vrp.environments.cvrptw.env_agent_selector import AgentSelector, SmallestTimeAgentSelector

With `AgentSelector` class, the same agent is selected until it returns to the depot. Afterward, it selects the next active agent and repeats the process until all agents are done:

In [None]:
gen = InstanceGenerator(batch_size = 1)
obs = Observations()
sel = AgentSelector()
rew = DenseReward()

env = Environment(instance_generator_object=gen,  
                  obs_builder_object=obs,
                  agent_selector_object=sel,
                  reward_evaluator=rew,
                  seed=0)

td = env.reset()

while not td["done"].all():  
    td = env.sample_action(td) # this is where we insert our policy
    td = env.step(td)
    step = env.env_nsteps
    cur_agent_idx = td['cur_agent_idx']
    print(f'env step number: {step}, active agent name: {cur_agent_idx}')

With `SmallesttimeAgentSelector` class, The agent with the shortest travel time is selected, until all agents have finished:

In [None]:
gen = InstanceGenerator(batch_size = 1)
obs = Observations()
sel = SmallestTimeAgentSelector()
rew = DenseReward()

env = Environment(instance_generator_object=gen,  
                  obs_builder_object=obs,
                  agent_selector_object=sel,
                  reward_evaluator=rew,
                  seed=0)

td = env.reset()

while not td["done"].all():  
    td = env.sample_action(td) # this is where we insert our policy
    td = env.step(td)
    step = env.env_nsteps
    cur_agent_idx = td['cur_agent_idx']
    print(f'env step number: {step}, active agent name: {cur_agent_idx}')