# Quick start

## Basic usage

Let's explore the library using the CVRPTW environment as an example. Our API structure is inspired on [PettingZoo](https://pettingzoo.farama.org/), following the Agent Environment Cycle (AEC) philosophy. We have been also greatly influenced by [Flatland](https://flatland.aicrowd.com/intro.html) environment library, and we chose to adopt some of its design principles.

make ref to TorchRL - TensorDic and rl4co 

In [1]:
from vrpmaenvs.environments.cvrptw.env import Environment
from vrpmaenvs.environments.cvrptw.env_agent_selector import AgentSelector
from vrpmaenvs.environments.cvrptw.observations import Observations
from vrpmaenvs.environments.cvrptw.instances_generator import InstanceGenerator
from vrpmaenvs.environments.cvrptw.env_agent_reward import DenseReward
%load_ext autoreload
%autoreload 2

In [2]:
gen = InstanceGenerator(batch_size = 8)
obs = Observations()
sel = AgentSelector()
rew = DenseReward()

env = Environment(instance_generator_object=gen,  
                  obs_builder_object=obs,
                  agent_selector_object=sel,
                  reward_evaluator=rew,
                  seed=0)

In [3]:
td = env.reset(batch_size = 8, num_agents=4, num_nodes=16)
td

True


TensorDict(
    fields={
        cur_agent_idx: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.int64, is_shared=False),
        done: Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.bool, is_shared=False),
        observations: TensorDict(
            fields={
                action_mask: Tensor(shape=torch.Size([8, 16]), device=cpu, dtype=torch.bool, is_shared=False),
                agent_obs: Tensor(shape=torch.Size([8, 4]), device=cpu, dtype=torch.float32, is_shared=False),
                agents_mask: Tensor(shape=torch.Size([8, 4]), device=cpu, dtype=torch.bool, is_shared=False),
                global_obs: Tensor(shape=torch.Size([8, 3]), device=cpu, dtype=torch.float32, is_shared=False),
                node_dynamic_obs: Tensor(shape=torch.Size([8, 16, 3]), device=cpu, dtype=torch.float32, is_shared=False),
                node_static_obs: Tensor(shape=torch.Size([8, 16, 2]), device=cpu, dtype=torch.float32, is_shared=False),
                other_agents_obs: Te

In [4]:
while not td["done"].all():  
    td = env.sample_action(td) # this is where we insert our policy
    td = env.step(td)

## Quick walkthrough

Let's now go through the library's building blocks, exploring their functionalities.

### Instance generation

We can generate instances using one of the two available methods `InstanceGenerator` and `BenchmarkInstanceGenerator`:

In [5]:
from vrpmaenvs.environments.cvrptw.instances_generator import InstanceGenerator
from vrpmaenvs.environments.cvrptw.benchmark_instances_generator import BenchmarkInstanceGenerator

#### Random generated instances

Random instances are generated following:

Li, S., Yan, Z., & Wu, C. (2021). [Learning to delegate for large-scale vehicle routing](https://proceedings.neurips.cc/paper/2021/hash/dc9fa5f217a1e57b8a6adeb065560b38-Abstract.html). Advances in Neural Information Processing Systems, 34, 26198-26211.

In [6]:
generator = InstanceGenerator()

In [7]:
instance = generator.sample_instance(num_agents=2, num_nodes=10)

True


In [8]:
instance.keys()

dict_keys(['name', 'num_nodes', 'num_agents', 'data'])

In [9]:
instance

{'name': 'random_instance',
 'num_nodes': 10,
 'num_agents': 2,
 'data': TensorDict(
     fields={
         capacity: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.float32, is_shared=False),
         coords: Tensor(shape=torch.Size([1, 10, 2]), device=cpu, dtype=torch.float32, is_shared=False),
         demands: Tensor(shape=torch.Size([1, 10]), device=cpu, dtype=torch.float32, is_shared=False),
         depot_idx: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.int64, is_shared=False),
         end_time: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, is_shared=False),
         is_depot: Tensor(shape=torch.Size([1, 10]), device=cpu, dtype=torch.bool, is_shared=False),
         service_time: Tensor(shape=torch.Size([1, 10]), device=cpu, dtype=torch.float32, is_shared=False),
         start_time: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, is_shared=False),
         tw_high: Tensor(shape=torch.Size([1, 10]), device=cpu, dtype=torch.f

It's possible to load a set of pre-generaded instances, to be used as validation/test sets. For example:

In [10]:
set_of_instances = set(generator.get_list_of_benchmark_instances()['validation'])

In [11]:
generator = InstanceGenerator(instance_type='validation', set_of_instances=set_of_instances)

In [14]:
instance = generator.sample_instance()

generated_val_servs_25_agents_10_38


Let's check instance dict keys:

In [15]:
instance.keys()

dict_keys(['name', 'num_nodes', 'num_agents', 'data'])

In [16]:
instance['name']

'generated_val_servs_25_agents_10_38'

#### Benchmark instances

In order to narrow the current gap between the test beds for algorithm benchmarking used in RL
and OR communities, the library allows a straightforward integration of classical OR benchmark
instances. For example, we can load a set of classical benchmark instances. Let's see what benchmark instances we have for the CVPTW:

In [17]:
BenchmarkInstanceGenerator.get_list_of_benchmark_instances()

{'Solomon': ['R103',
  'C101',
  'C102',
  'C103',
  'C104',
  'C105',
  'C106',
  'C107',
  'C108',
  'C109',
  'C201',
  'C202',
  'C203',
  'C204',
  'C205',
  'C206',
  'C207',
  'C208',
  'R101',
  'R102',
  'R104',
  'R105',
  'R106',
  'R107',
  'R108',
  'R109',
  'R110',
  'R111',
  'R112',
  'R201',
  'R202',
  'R203',
  'R204',
  'R205',
  'R206',
  'R207',
  'R208',
  'R209',
  'R210',
  'R211',
  'RC101',
  'RC102',
  'RC103',
  'RC104',
  'RC105',
  'RC106',
  'RC107',
  'RC108',
  'RC201',
  'RC202',
  'RC203',
  'RC204',
  'RC205',
  'RC206',
  'RC207',
  'RC208'],
 'Homberger': ['C1_10_1',
  'C1_10_10',
  'C1_10_2',
  'C1_10_3',
  'C1_10_4',
  'C1_10_5',
  'C1_10_6',
  'C1_10_7',
  'C1_10_8',
  'C1_10_9',
  'C1_2_1',
  'C1_2_10',
  'C1_2_2',
  'C1_2_3',
  'C1_2_4',
  'C1_2_5',
  'C1_2_6',
  'C1_2_7',
  'C1_2_8',
  'C1_4_1',
  'C1_4_10',
  'C1_4_2',
  'C1_4_3',
  'C1_4_4',
  'C1_4_5',
  'C1_4_6',
  'C1_4_7',
  'C1_4_8',
  'C1_4_9',
  'C1_6_1',
  'C1_6_10',
  'C1_6_2',
 

Ok! Now we instanciate the `generator` selection two of them:

In [18]:
generator = BenchmarkInstanceGenerator(instance_type='Solomon', set_of_instances={'C101', 'C102'})

In [19]:
instance_c101 = generator.get_instance('C101')

In [20]:
instance_c101.keys()

dict_keys(['name', 'num_agents', 'num_nodes', 'data', 'n_digits'])

In [21]:
instance_c101['name']

'C101'

In [22]:
instance_c101['num_agents']

25

In [24]:
instance_c101['num_nodes']

101

By customizing `.sample_instance` method arguments, it is possible to sample a sub-instance of the original instance:

In [26]:
instance = generator.sample_instance(num_agents=3, num_nodes=8)

In [27]:
instance['name']

'C101_samp'

In [28]:
instance['num_agents']

3

In [30]:
instance['num_nodes']

8

For the CVRPTW, setting `random_sample=False` we sample first `n` instace services (see  [Transportation Optimization Portal](https://www.sintef.no/projectweb/top/vrptw) for more details about `first n` Solomon benchmark
 instance):

In [32]:
instance = generator.sample_instance(num_agents=3, num_nodes=8, random_sample=False)

In [33]:
instance['name']

'C101_samp'

###  Obervations

Observation features, that will be available to the active agent while interacting with the environment, are handle by `Observations` class. 

In [34]:
from vrpmaenvs.environments.cvrptw.observations import Observations

In [35]:
obs = Observations()

The class has a `default_feature_list` attribute where the default configuration dictionary is defined.

In [36]:
obs.default_feature_list

{'nodes_static': {'x_coordinate_min_max': {'feat': 'x_coordinate_min_max',
   'norm': None},
  'y_coordinate_min_max': {'feat': 'y_coordinate_min_max', 'norm': None}},
 'nodes_dynamic': ['time2open_div_end_time',
  'time2close_div_end_time',
  'arrive2node_div_end_time'],
 'agent': ['x_coordinate_min_max',
  'y_coordinate_min_max',
  'frac_current_time',
  'frac_current_load'],
 'other_agents': ['frac_current_time',
  'frac_current_load',
  'frac_feasible_nodes'],
 'global': ['frac_demands', 'frac_fleet_load_capacity', 'frac_done_agents']}

Also, five possible features lists exist, detailing the available features in the class: `POSSIBLE_NODES_STATIC_FEATURES`, `POSSIBLE_NODES_DYNAMIC_FEATURES`, `POSSIBLE_SELF_FEATURES`, `POSSIBLE_AGENTS_FEATURES`, `POSSIBLE_GLOBAL_FEATURES`. For example:

In [37]:
obs.POSSIBLE_NODES_STATIC_FEATURES

['x_coordinate',
 'y_coordinate',
 'tw_low',
 'tw_high',
 'demand',
 'service_time',
 'tw_high_minus_tw_low_div_max_dur',
 'x_coordinate_min_max',
 'y_coordinate_min_max',
 'is_depot']

In [38]:
obs.POSSIBLE_GLOBAL_FEATURES

['min_agent_current_time_div_end_time',
 'frac_demands',
 'frac_fleet_load_capacity',
 'frac_done_agents',
 'frac_not_done_nodes',
 'frac_used_agents']

While instantiating the `Observations` class, we can pass through a feature list dictionary specifying which features will be available for the agent:

In [39]:
import yaml

In [40]:
feature_list = yaml.safe_load("""
    nodes_static:
        x_coordinate_min_max:
            feat: x_coordinate_min_max
            norm: min_max
        x_coordinate_min_max: 
            feat: x_coordinate_min_max
            norm: min_max
        tw_low_mm:
            feat: tw_low
            norm: min_max
        tw_high:
            feat: tw_high
            norm: min_max

    nodes_dynamic:
        - time2open_div_end_time
        - time2close_div_end_time
        - time2open_after_step_div_end_time
        - time2close_after_step_div_end_time
        - fract_time_after_step_div_end_time

    agent:
        - x_coordinate_min_max
        - y_coordinate_min_max
        - frac_current_time
        - frac_current_load

    other_agents:
        - x_coordinate_min_max
        - y_coordinate_min_max
        - frac_current_time
        - frac_current_load
        - dist2agent_div_end_time
    
    global:
        - frac_demands
        - frac_fleet_load_capacity
        - frac_done_agents
        - frac_not_done_nodes
        - frac_used_agents
""")

In [41]:
obs = Observations(feature_list)

We can test this observations on the environment:

In [43]:
gen = InstanceGenerator(batch_size = 8)
obs = Observations()
sel = AgentSelector()
rew = DenseReward()

env = Environment(instance_generator_object=gen,  
                  obs_builder_object=obs,
                  agent_selector_object=sel,
                  reward_evaluator=rew,
                  seed=0)

In [45]:
td = env.reset(batch_size = 8, num_agents=4, num_nodes=16)

In [47]:
td_observation = env.observe()

In [48]:
td_observation

TensorDict(
    fields={
        action_mask: Tensor(shape=torch.Size([8, 16]), device=cpu, dtype=torch.bool, is_shared=False),
        agent_obs: Tensor(shape=torch.Size([8, 4]), device=cpu, dtype=torch.float32, is_shared=False),
        agents_mask: Tensor(shape=torch.Size([8, 4]), device=cpu, dtype=torch.bool, is_shared=False),
        global_obs: Tensor(shape=torch.Size([8, 3]), device=cpu, dtype=torch.float32, is_shared=False),
        node_dynamic_obs: Tensor(shape=torch.Size([8, 16, 3]), device=cpu, dtype=torch.float32, is_shared=False),
        node_static_obs: Tensor(shape=torch.Size([8, 16, 2]), device=cpu, dtype=torch.float32, is_shared=False),
        other_agents_obs: Tensor(shape=torch.Size([8, 4, 3]), device=cpu, dtype=torch.float32, is_shared=False)},
    batch_size=torch.Size([8]),
    device=cpu,
    is_shared=False)

###  Agent Iterator class

Equivalent to [PettingZoo](https://pettingzoo.farama.org/), Agent Selector class incorporates the iterator method `agent_iter` that returns the next active agent in the environment. It is perfectly customizable and currently  `AgentSelector`, `SmallesttimeAgentSelector` classes are available.

In [52]:
from vrpmaenvs.environments.cvrptw.env_agent_selector import AgentSelector, SmallesttimeAgentSelector

With `AgentSelector` class, the selection steps through the active agents in a circular fashion, until no more active agents are available:

In [54]:
gen = InstanceGenerator(batch_size = 1)
obs = Observations()
sel = AgentSelector()
rew = DenseReward()

env = Environment(instance_generator_object=gen,  
                  obs_builder_object=obs,
                  agent_selector_object=sel,
                  reward_evaluator=rew,
                  seed=0)

td = env.reset()

while not td["done"].all():  
    td = env.sample_action(td) # this is where we insert our policy
    td = env.step(td)
    step = td['step']
    cur_agent_idx = td['cur_agent_idx']
    print(f'env step number: {step}, active agent name: {cur_agent_idx}')

env step number: tensor([[1]]), active agent name: tensor([[0]])
env step number: tensor([[2]]), active agent name: tensor([[0]])
env step number: tensor([[3]]), active agent name: tensor([[0]])
env step number: tensor([[4]]), active agent name: tensor([[0]])
env step number: tensor([[5]]), active agent name: tensor([[1]])
env step number: tensor([[6]]), active agent name: tensor([[1]])
env step number: tensor([[7]]), active agent name: tensor([[1]])
env step number: tensor([[8]]), active agent name: tensor([[1]])
env step number: tensor([[9]]), active agent name: tensor([[2]])
env step number: tensor([[10]]), active agent name: tensor([[2]])
env step number: tensor([[11]]), active agent name: tensor([[2]])
env step number: tensor([[12]]), active agent name: tensor([[3]])
env step number: tensor([[13]]), active agent name: tensor([[3]])
env step number: tensor([[14]]), active agent name: tensor([[4]])
env step number: tensor([[15]]), active agent name: tensor([[4]])
env step number: te

With `SmallesttimeAgentSelector` class, the same agent is  select until it returns to the depot. Afterward, it selects the next active agent and repeats the process until all agents are done:

In [55]:
gen = InstanceGenerator(batch_size = 1)
obs = Observations()
sel = SmallesttimeAgentSelector()
rew = DenseReward()

env = Environment(instance_generator_object=gen,  
                  obs_builder_object=obs,
                  agent_selector_object=sel,
                  reward_evaluator=rew,
                  seed=0)

td = env.reset()

while not td["done"].all():  
    td = env.sample_action(td) # this is where we insert our policy
    td = env.step(td)
    step = td['step']
    cur_agent_idx = td['cur_agent_idx']
    print(f'env step number: {step}, active agent name: {cur_agent_idx}')

env step number: tensor([[1]]), active agent name: tensor([[1]])
env step number: tensor([[2]]), active agent name: tensor([[2]])
env step number: tensor([[3]]), active agent name: tensor([[3]])
env step number: tensor([[4]]), active agent name: tensor([[4]])
env step number: tensor([[5]]), active agent name: tensor([[5]])
env step number: tensor([[6]]), active agent name: tensor([[6]])
env step number: tensor([[7]]), active agent name: tensor([[7]])
env step number: tensor([[8]]), active agent name: tensor([[8]])
env step number: tensor([[9]]), active agent name: tensor([[9]])
env step number: tensor([[10]]), active agent name: tensor([[10]])
env step number: tensor([[11]]), active agent name: tensor([[11]])
env step number: tensor([[12]]), active agent name: tensor([[12]])
env step number: tensor([[13]]), active agent name: tensor([[13]])
env step number: tensor([[14]]), active agent name: tensor([[14]])
env step number: tensor([[15]]), active agent name: tensor([[15]])
env step numb