# Quick start

### Install maenvs4vrp 

Uncomment the following cells:

In [None]:
#!git clone https://github.com/ricgama/maenvs4vrp.git

In [None]:
# When using Colab
# %cd maenvs4vrp_beta/
# ! pip install -e .
#%cd maenvs4vrp/notebooks/

## Basic usage

Let's explore the library using the CVRPTW environment as an example. Our API structure is inspired on [PettingZoo](https://pettingzoo.farama.org/), following the Agent Environment Cycle (AEC) philosophy. We have been also greatly influenced by [Flatland](https://flatland.aicrowd.com/intro.html) environment library, and we chose to adopt some of its design principles.

make ref to TorchRL - TensorDic and rl4co 

In [9]:
from maenvs4vrp.environments.cvrptw.env import Environment
from maenvs4vrp.environments.cvrptw.env_agent_selector import AgentSelector
from maenvs4vrp.environments.cvrptw.observations import Observations
from maenvs4vrp.environments.cvrptw.instances_generator import InstanceGenerator
from maenvs4vrp.environments.cvrptw.env_agent_reward import DenseReward
%load_ext autoreload
%autoreload 2

In [10]:
gen = InstanceGenerator(batch_size = 8)
obs = Observations()
sel = AgentSelector()
rew = DenseReward()

env = Environment(instance_generator_object=gen,  
                  obs_builder_object=obs,
                  agent_selector_object=sel,
                  reward_evaluator=rew,
                  seed=0)

In [11]:
td = env.reset(batch_size = 8, num_agents=4, num_nodes=16)
td

TensorDict(
    fields={
        agent_step: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.int32, is_shared=False),
        cur_agent_idx: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.int64, is_shared=False),
        done: Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.bool, is_shared=False),
        observations: TensorDict(
            fields={
                action_mask: Tensor(shape=torch.Size([8, 16]), device=cpu, dtype=torch.bool, is_shared=False),
                agent_obs: Tensor(shape=torch.Size([8, 6]), device=cpu, dtype=torch.float32, is_shared=False),
                agents_mask: Tensor(shape=torch.Size([8, 4]), device=cpu, dtype=torch.bool, is_shared=False),
                global_obs: Tensor(shape=torch.Size([8, 3]), device=cpu, dtype=torch.float32, is_shared=False),
                node_dynamic_obs: Tensor(shape=torch.Size([8, 16, 8]), device=cpu, dtype=torch.float32, is_shared=False),
                node_static_obs: Tensor(shape=torch.Siz

In [12]:
while not td["done"].all():  
    td = env.sample_action(td) # this is where we insert our policy
    td = env.step(td)

## Quick walkthrough

Let's now go through the library's building blocks, exploring their functionalities.

### Instance generation

We can generate instances using one of the two available methods `InstanceGenerator` and `BenchmarkInstanceGenerator`:

In [13]:
from maenvs4vrp.environments.cvrptw.instances_generator import InstanceGenerator
from maenvs4vrp.environments.cvrptw.benchmark_instances_generator import BenchmarkInstanceGenerator

#### Random generated instances

Random instances are generated following:

Li, S., Yan, Z., & Wu, C. (2021). [Learning to delegate for large-scale vehicle routing](https://proceedings.neurips.cc/paper/2021/hash/dc9fa5f217a1e57b8a6adeb065560b38-Abstract.html). Advances in Neural Information Processing Systems, 34, 26198-26211.

In [14]:
generator = InstanceGenerator()

In [15]:
instance = generator.sample_instance(num_agents=2, num_nodes=10)

In [16]:
instance.keys()

dict_keys(['name', 'num_nodes', 'num_agents', 'data'])

In [17]:
instance

{'name': 'random_instance',
 'num_nodes': 10,
 'num_agents': 2,
 'data': TensorDict(
     fields={
         capacity: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.float32, is_shared=False),
         coords: Tensor(shape=torch.Size([1, 10, 2]), device=cpu, dtype=torch.float32, is_shared=False),
         demands: Tensor(shape=torch.Size([1, 10]), device=cpu, dtype=torch.float32, is_shared=False),
         depot_idx: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.int64, is_shared=False),
         end_time: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, is_shared=False),
         is_depot: Tensor(shape=torch.Size([1, 10]), device=cpu, dtype=torch.bool, is_shared=False),
         service_time: Tensor(shape=torch.Size([1, 10]), device=cpu, dtype=torch.float32, is_shared=False),
         start_time: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, is_shared=False),
         tw_high: Tensor(shape=torch.Size([1, 10]), device=cpu, dtype=torch.f

It's possible to load a set of pre-generaded instances, to be used as validation/test sets. For example:

In [21]:
generator.get_list_of_benchmark_instances()

{'servs_100_agents_25': {'validation': ['cvrptw/data/generated\\servs_100_agents_25\\validation/generated_val_servs_100_agents_25_0',
   'cvrptw/data/generated\\servs_100_agents_25\\validation/generated_val_servs_100_agents_25_1',
   'cvrptw/data/generated\\servs_100_agents_25\\validation/generated_val_servs_100_agents_25_10',
   'cvrptw/data/generated\\servs_100_agents_25\\validation/generated_val_servs_100_agents_25_11',
   'cvrptw/data/generated\\servs_100_agents_25\\validation/generated_val_servs_100_agents_25_12',
   'cvrptw/data/generated\\servs_100_agents_25\\validation/generated_val_servs_100_agents_25_13',
   'cvrptw/data/generated\\servs_100_agents_25\\validation/generated_val_servs_100_agents_25_14',
   'cvrptw/data/generated\\servs_100_agents_25\\validation/generated_val_servs_100_agents_25_15',
   'cvrptw/data/generated\\servs_100_agents_25\\validation/generated_val_servs_100_agents_25_16',
   'cvrptw/data/generated\\servs_100_agents_25\\validation/generated_val_servs_100_

In [22]:
set_of_instances = set(generator.get_list_of_benchmark_instances()['servs_100_agents_25']['validation'])

In [23]:
generator = InstanceGenerator(instance_type='validation', set_of_instances=set_of_instances)

In [None]:
instance = generator.sample_instance()

Let's check instance dict keys:

In [None]:
instance.keys()

In [None]:
instance['name']

#### Benchmark instances

In order to narrow the current gap between the test beds for algorithm benchmarking used in RL
and OR communities, the library allows a straightforward integration of classical OR benchmark
instances. For example, we can load a set of classical benchmark instances. Let's see what benchmark instances we have for the CVPTW:

In [None]:
BenchmarkInstanceGenerator.get_list_of_benchmark_instances()

Ok! Now we instanciate the `generator` selection two of them:

In [None]:
generator = BenchmarkInstanceGenerator(instance_type='Solomon', set_of_instances={'C101', 'C102'})

In [None]:
instance_c101 = generator.get_instance('C101')

In [None]:
instance_c101.keys()

In [None]:
instance_c101['name']

In [None]:
instance_c101['num_agents']

In [None]:
instance_c101['num_nodes']

By customizing `.sample_instance` method arguments, it is possible to sample a sub-instance of the original instance:

In [None]:
instance = generator.sample_instance(num_agents=3, num_nodes=8)

In [None]:
instance['name']

In [None]:
instance['num_agents']

In [None]:
instance['num_nodes']

For the CVRPTW, setting `random_sample=False` we sample first `n` instace services (see  [Transportation Optimization Portal](https://www.sintef.no/projectweb/top/vrptw) for more details about `first n` Solomon benchmark
 instance):

In [None]:
instance = generator.sample_instance(num_agents=3, num_nodes=8, random_sample=False)

In [None]:
instance['name']

###  Obervations

Observation features, that will be available to the active agent while interacting with the environment, are handle by `Observations` class. 

In [None]:
from maenvs4vrp.environments.cvrptw.observations import Observations

In [None]:
obs = Observations()

The class has a `default_feature_list` attribute where the default configuration dictionary is defined.

In [None]:
obs.default_feature_list

Also, five possible features lists exist, detailing the available features in the class: `POSSIBLE_NODES_STATIC_FEATURES`, `POSSIBLE_NODES_DYNAMIC_FEATURES`, `POSSIBLE_SELF_FEATURES`, `POSSIBLE_AGENTS_FEATURES`, `POSSIBLE_GLOBAL_FEATURES`. For example:

In [None]:
obs.POSSIBLE_NODES_STATIC_FEATURES

In [None]:
obs.POSSIBLE_GLOBAL_FEATURES

While instantiating the `Observations` class, we can pass through a feature list dictionary specifying which features will be available for the agent:

In [None]:
import yaml

In [None]:
feature_list = yaml.safe_load("""
    nodes_static:
        x_coordinate_min_max:
            feat: x_coordinate_min_max
            norm: min_max
        x_coordinate_min_max: 
            feat: x_coordinate_min_max
            norm: min_max
        tw_low_mm:
            feat: tw_low
            norm: min_max
        tw_high:
            feat: tw_high
            norm: min_max

    nodes_dynamic:
        - time2open_div_end_time
        - time2close_div_end_time
        - time2open_after_step_div_end_time
        - time2close_after_step_div_end_time
        - fract_time_after_step_div_end_time

    agent:
        - x_coordinate_min_max
        - y_coordinate_min_max
        - frac_current_time
        - frac_current_load

    other_agents:
        - x_coordinate_min_max
        - y_coordinate_min_max
        - frac_current_time
        - frac_current_load
        - dist2agent_div_end_time
    
    global:
        - frac_demands
        - frac_fleet_load_capacity
        - frac_done_agents
        - frac_not_done_nodes
        - frac_used_agents
""")

In [None]:
obs = Observations(feature_list)

We can test this observations on the environment:

In [None]:
gen = InstanceGenerator(batch_size = 8)
obs = Observations()
sel = AgentSelector()
rew = DenseReward()

env = Environment(instance_generator_object=gen,  
                  obs_builder_object=obs,
                  agent_selector_object=sel,
                  reward_evaluator=rew,
                  seed=0)

In [None]:
td = env.reset(batch_size = 8, num_agents=4, num_nodes=16)

In [None]:
td_observation = env.observe()

In [None]:
td_observation

###  Agent Iterator class

Equivalent to [PettingZoo](https://pettingzoo.farama.org/), Agent Selector class incorporates the iterator method `agent_iter` that returns the next active agent in the environment. It is perfectly customizable and currently  `AgentSelector`, `SmallesttimeAgentSelector` classes are available.

In [None]:
from maenvs4vrp.environments.cvrptw.env_agent_selector import AgentSelector, SmallesttimeAgentSelector

With `AgentSelector` class, the selection steps through the active agents in a circular fashion, until no more active agents are available:

In [None]:
gen = InstanceGenerator(batch_size = 1)
obs = Observations()
sel = AgentSelector()
rew = DenseReward()

env = Environment(instance_generator_object=gen,  
                  obs_builder_object=obs,
                  agent_selector_object=sel,
                  reward_evaluator=rew,
                  seed=0)

td = env.reset()

while not td["done"].all():  
    td = env.sample_action(td) # this is where we insert our policy
    td = env.step(td)
    step = env.env_nsteps
    cur_agent_idx = td['cur_agent_idx']
    print(f'env step number: {step}, active agent name: {cur_agent_idx}')

With `SmallesttimeAgentSelector` class, the same agent is  select until it returns to the depot. Afterward, it selects the next active agent and repeats the process until all agents are done:

In [None]:
gen = InstanceGenerator(batch_size = 1)
obs = Observations()
sel = SmallesttimeAgentSelector()
rew = DenseReward()

env = Environment(instance_generator_object=gen,  
                  obs_builder_object=obs,
                  agent_selector_object=sel,
                  reward_evaluator=rew,
                  seed=0)

td = env.reset()

while not td["done"].all():  
    td = env.sample_action(td) # this is where we insert our policy
    td = env.step(td)
    step = env.env_nsteps
    cur_agent_idx = td['cur_agent_idx']
    print(f'env step number: {step}, active agent name: {cur_agent_idx}')