# Multi-Tasking Environments

### Install

Uncomment the following cells:

In [None]:
#!git clone https://github.com/ricgama/maenvs4vrp.git

In [None]:
# When using Colab
#%cd maenvs4vrp
#%mv maenvs4vrp/ repo_temp/
#%mv repo_temp/ ..
#%cd ..
#%cp maenvs4vrp/setup.py repo_temp/
#%rm -r maenvs4vrp
#%mv repo_temp/ maenvs4vrp/
#%cd maenvs4vrp/
#!pip install .

Multi-tasking environments support simulations on multiple variants within the same environment structure, unlike all other environments where one can only simulate a single variant.

There is either the possibility of sampling random variants across batches, so that we get an instance with several VRP problems or picking an instance from the list of supported variants.

Supported variants are combinations of a set of attributes, which can be enabled or disabled.

At the moment, **MAENVS4VRP** offers 4 different multi-tasking environments. One base environment and three generalizations.

Environments supported are:

* MTVRP: Base environment.
* GMTVRP: MTVRP generalization with support to online scenarios.
* MTDVRP: MTVRP generalization with multiple depots.
* GMTDVRP: MTVRP generalization with support to online scenarios and multiple depots.

MTVRP base environment is adapted from [RouteFinder](https://github.com/ai4co/routefinder) environments.

## Supported VRP Variants

Supported VRP variants are present on the following table. Generalizations use these base variants to introduce extra features.

|Variants         |    Capacity     |   Open Routes   |    Backhaul     | Mixed Problems  | Duration Limits |  Time Windows   |
|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|
|CVRP             |        ✓        |                 |                 |                 |                 |                 |
|OVRP             |        ✓        |        ✓        |                 |                 |                 |                 |
|VRPB             |        ✓        |                 |        ✓        |                 |                 |                 |
|VRPL             |        ✓        |                 |                 |                 |        ✓        |                 |
|VRPTW            |        ✓        |                 |                 |                 |                 |        ✓        |
|OVRPTW           |        ✓        |        ✓        |                 |                 |                 |        ✓        |
|OVRPB            |        ✓        |        ✓        |        ✓        |                 |                 |                 |
|OVRPL            |        ✓        |        ✓        |                 |                 |        ✓        |                 |
|VRPBL            |        ✓        |                 |        ✓        |                 |        ✓        |                 |
|VRPBTW           |        ✓        |                 |        ✓        |                 |                 |        ✓        |
|VRPLTW           |        ✓        |                 |                 |                 |        ✓        |        ✓        |
|OVRPBL           |        ✓        |        ✓        |        ✓        |                 |        ✓        |                 |
|OVRPBTW          |        ✓        |        ✓        |        ✓        |                 |                 |        ✓        |
|OVRPLTW          |        ✓        |        ✓        |                 |                 |        ✓        |        ✓        |
|VRPBLTW          |        ✓        |                 |        ✓        |                 |        ✓        |        ✓        |
|OVRPBLTW         |        ✓        |        ✓        |        ✓        |                 |        ✓        |        ✓        |
|VRPMB            |        ✓        |                 |        ✓        |        ✓        |                 |                 |
|OVRPMB           |        ✓        |        ✓        |        ✓        |        ✓        |                 |                 |
|VRPMBL           |        ✓        |                 |        ✓        |        ✓        |        ✓        |                 |
|VRPMBTW          |        ✓        |                 |        ✓        |        ✓        |                 |        ✓        |
|OVRPMBL          |        ✓        |        ✓        |        ✓        |        ✓        |        ✓        |                 |
|OVRPMBTW         |        ✓        |        ✓        |        ✓        |        ✓        |                 |        ✓        |
|VRPMBLTW         |        ✓        |                 |        ✓        |        ✓        |        ✓        |        ✓        |
|OVRPMBLTW        |        ✓        |        ✓        |        ✓        |        ✓        |        ✓        |        ✓        |

## MTVRP

Let's explore this base environment

In [132]:
from maenvs4vrp.environments.mtvrp.env import Environment
from maenvs4vrp.environments.mtvrp.env_agent_selector import AgentSelector
from maenvs4vrp.environments.mtvrp.observations import Observations
from maenvs4vrp.environments.mtvrp.instances_generator import InstanceGenerator
from maenvs4vrp.environments.mtvrp.env_agent_reward import DenseReward

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [133]:
gen = InstanceGenerator()
obs = Observations()
sel = AgentSelector()
rew = DenseReward()

In [134]:
env = Environment(
    instance_generator_object=gen,
    obs_builder_object=obs,
    agent_selector_object=sel,
    reward_evaluator=rew
)

### Sample Random Variants

By default, when ``variant_preset`` is not specified, ``env.reset()`` samples random variants across batches.

If ``use_combinations`` is ``True``, attributes are randomly sampled. Otherwise, there's only one attribute per batch.

In [135]:
td = env.reset(batch_size=4, num_agents=2, num_nodes=6, use_combinations=True)

TensorDict ``env.td_state`` includes all of the problem's parameters.

In [136]:
env.td_state

TensorDict(
    fields={
        agents: TensorDict(
            fields={
                active_agents_mask: Tensor(shape=torch.Size([4, 2]), device=cpu, dtype=torch.bool, is_shared=False),
                capacity: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                cum_ttime: Tensor(shape=torch.Size([4, 2]), device=cpu, dtype=torch.float32, is_shared=False),
                cur_node: Tensor(shape=torch.Size([4, 2]), device=cpu, dtype=torch.int64, is_shared=False),
                cur_step: Tensor(shape=torch.Size([4, 2]), device=cpu, dtype=torch.int32, is_shared=False),
                cur_time: Tensor(shape=torch.Size([4, 2]), device=cpu, dtype=torch.float32, is_shared=False),
                cur_ttime: Tensor(shape=torch.Size([4, 2]), device=cpu, dtype=torch.float32, is_shared=False),
                feasible_nodes: Tensor(shape=torch.Size([4, 2, 6]), device=cpu, dtype=torch.bool, is_shared=False),
                route_length: Tenso

You can check if attributes are present in each batch.

Backhauls:

In [137]:
env.td_state['has_backhauls']

tensor([[ True],
        [ True],
        [False],
        [False]])

In [138]:
env.td_state['linehaul_demands']

tensor([[0., 4., 8., 0., 1., 1.],
        [0., 3., 1., 0., 0., 8.],
        [0., 4., 2., 4., 6., 9.],
        [0., 9., 8., 5., 5., 4.]])

In [139]:
env.td_state['backhaul_demands']

tensor([[0., 0., 0., 5., 0., 0.],
        [0., 0., 0., 3., 8., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.]])

Distance Limits:

In [140]:
env.td_state['has_distance_limits']

tensor([[ True],
        [False],
        [False],
        [ True]])

In [141]:
env.td_state['distance_limits']

tensor([[2.2786],
        [   inf],
        [   inf],
        [2.4822]])

Open Routes:

In [142]:
env.td_state['has_open_routes']

tensor([[True],
        [True],
        [True],
        [True]])

Time Windows:

In [143]:
env.td_state['has_time_windows']

tensor([[ True],
        [False],
        [ True],
        [False]])

In [144]:
env.td_state['time_windows']

tensor([[[0.0000, 4.6000],
         [1.4379, 1.6248],
         [3.6609, 3.8444],
         [2.4875, 2.6786],
         [1.2890, 1.4818],
         [0.6775, 0.8762]],

        [[0.0000,    inf],
         [0.0000,    inf],
         [0.0000,    inf],
         [0.0000,    inf],
         [0.0000,    inf],
         [0.0000,    inf]],

        [[0.0000, 4.6000],
         [2.5068, 2.6907],
         [1.9209, 2.1087],
         [2.4964, 2.6811],
         [3.6757, 3.8600],
         [2.4038, 2.6037]],

        [[0.0000,    inf],
         [0.0000,    inf],
         [0.0000,    inf],
         [0.0000,    inf],
         [0.0000,    inf],
         [0.0000,    inf]]])

### Sample Variant Preset from Presets List

Let's consider ``variant_preset`` is ``VRPBL``.

If ``use_combinations`` is ``True``, then Backhaul and Distance Limits are not considered into the variant.

Otherwise, VRPBL will be represented.

In [145]:
td = env.reset(batch_size=4, num_agents=4, num_nodes=6, use_combinations=False, variant_preset='vrpbl')

In [146]:
env.td_state['has_backhauls']

tensor([[True],
        [True],
        [True],
        [True]])

In [147]:
env.td_state['has_distance_limits']

tensor([[True],
        [True],
        [True],
        [True]])

In [148]:
env.td_state['has_open_routes']

tensor([[False],
        [False],
        [False],
        [False]])

In [149]:
env.td_state['has_time_windows']

tensor([[False],
        [False],
        [False],
        [False]])

### Problem Simulation Cycle

Problem simulation cycle has two different parts:
* ``sample_action()``: An action is randomly sampled to the agent according to the action mask present in ``env.td_state``.
* ``step()``: The environment's parameters are updated according to its actions.

The simulation ends when all ``td['done']`` keys become ``True``.

In [150]:
td['done']

tensor([False, False, False, False])

In [151]:
while not td["done"].all():
    td = env.sample_action(td)
    td = env.step(td)
    step = env.env_nsteps
    cur_agent_idx = td['cur_agent_idx']
    print(f'env step number: {step}, active agent name: {cur_agent_idx}')

env step number: 1, active agent name: tensor([[1],
        [0],
        [0],
        [0]])
env step number: 2, active agent name: tensor([[1],
        [0],
        [0],
        [0]])
env step number: 3, active agent name: tensor([[1],
        [0],
        [0],
        [1]])
env step number: 4, active agent name: tensor([[2],
        [1],
        [1],
        [1]])
env step number: 5, active agent name: tensor([[2],
        [2],
        [1],
        [1]])
env step number: 6, active agent name: tensor([[2],
        [2],
        [2],
        [2]])
env step number: 7, active agent name: tensor([[3],
        [3],
        [3],
        [3]])
env step number: 8, active agent name: tensor([[0],
        [3],
        [0],
        [0]])
env step number: 9, active agent name: tensor([[0],
        [0],
        [0],
        [0]])


In [152]:
td['done']

tensor([True, True, True, True])

## GMTVRP

In the standard MTVRP setup, the vehicle’s linehaul and backhaul loads aren’t known until the episode ends, which isn’t practical for modeling real‑time operations. To address this, we created the GMTVRP environment, where each vehicle’s load is specified up front at the start of the episode. Let’s dive in:


In [153]:
from maenvs4vrp.environments.gmtvrp.env import Environment
from maenvs4vrp.environments.gmtvrp.env_agent_selector import SmallestTimeAgentSelector
from maenvs4vrp.environments.gmtvrp.observations import Observations
from maenvs4vrp.environments.gmtvrp.instances_generator import InstanceGenerator
from maenvs4vrp.environments.gmtvrp.env_agent_reward import DenseReward
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


**Note:** If we want to simulate an online scenario, we have to make sure to instantiate ``SmallestTimeAgentSelector`` as our Agent Selector class.

In [154]:
gen = InstanceGenerator()
obs = Observations()
sel = SmallestTimeAgentSelector()
rew = DenseReward()

In [155]:
env = Environment(
    instance_generator_object=gen,
    obs_builder_object=obs,
    agent_selector_object=sel,
    reward_evaluator=rew
)

### Initial Load

As previously stated, on MTVRP simulations, agents' load is implicit. In order to support online scenarios, initial load must be defined from the beggining of the route. It can be done in 2 ways:
* Sampling a random initial load through ``env.sample_initial_load()`` method.
* Defining a custom initial load on ``env.reset()``

The method ``env.set_initial_load()`` must always be called, because it will set the initial load present in key ``td['initial_load']``.

### Sample Initial Load

It samples values between 0 and agents' maximum capacity.

In [156]:
td = env.reset(batch_size=4, num_agents=2, num_nodes=6, use_combinations=True)

In [157]:
env.td_state['capacity']

tensor([[30.],
        [30.],
        [30.],
        [30.]])

In [158]:
td = env.sample_initial_load(td)

In [159]:
td['initial_load']

tensor([[21.6385, 21.7317],
        [21.6329,  7.6351],
        [18.1970, 26.6878],
        [ 9.6124,  4.6757]])

In [160]:
td = env.set_initial_load(td)

In [161]:
env.td_state['agents']['cur_linehaul_load']

tensor([[21.6385, 21.7317],
        [21.6329,  7.6351],
        [18.1970, 26.6878],
        [ 9.6124,  4.6757]])

### Set Custom Initial Load

Defined on ``env.reset()`` arguments.

In [162]:
td = env.reset(batch_size=4, num_agents=2, num_nodes=6, use_combinations=True, initial_load=20)

In [163]:
td['initial_load']

tensor([[20., 20.],
        [20., 20.],
        [20., 20.],
        [20., 20.]])

In [164]:
td = env.set_initial_load(td)

In [165]:
env.td_state['agents']['cur_linehaul_load']

tensor([[20., 20.],
        [20., 20.],
        [20., 20.],
        [20., 20.]])

## MTDVRP

Let's now explore multidepot environments:

In [166]:
from maenvs4vrp.environments.mtdvrp.env import Environment
from maenvs4vrp.environments.mtdvrp.env_agent_selector import AgentSelector
from maenvs4vrp.environments.mtdvrp.observations import Observations
from maenvs4vrp.environments.mtdvrp.instances_generator import InstanceGenerator
from maenvs4vrp.environments.mtdvrp.env_agent_reward import DenseReward
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [167]:
gen = InstanceGenerator()
obs = Observations()
sel = AgentSelector()
rew = DenseReward()

In [168]:
env = Environment(
    instance_generator_object=gen,
    obs_builder_object=obs,
    agent_selector_object=sel,
    reward_evaluator=rew
)

### Multiple Depots

In order to include multiple depots into the simulation, they must defined on ``env.reset()``. The total number of agents will be the product of the numbers of depots and the defined number of agents.

Ex: If ``num_depots = 3`` and ``num_agents  = 5``, the total number of agents will be ``15``. That means each depot will have ``5`` agents associated.

Each agent will be assigned to its depot sequentially. So, first agent will be assigned to depot 0, segond agent assigned to depot 1, etc. Every agent must start and end the route at its own depot.

In [169]:
td = env.reset(batch_size=3, num_agents=5, num_depots=3, num_nodes=24, use_combinations=True)

In [170]:
env.td_state['depot_idx']

tensor([[0, 1, 2],
        [0, 1, 2],
        [0, 1, 2]])

In [171]:
env.td_state['agents']['depot_idx']

tensor([[0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
        [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
        [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]])

After a simulation is run, agents' actions must be starting and ending at their depot.

In [172]:
while not td["done"].all():
    td = env.sample_action(td)
    td = env.step(td)

In [173]:
env.td_state['solution']['agents']

tensor([[ 0,  1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  2,  3,  3,  4,
          4,  4,  4,  4,  4,  4,  5,  5,  6,  6,  7,  8,  9, 10, 11, 12, 13, 14],
        [ 0,  0,  0,  0,  1,  1,  1,  2,  2,  3,  3,  3,  4,  4,  4,  5,  5,  5,
          5,  6,  6,  6,  7,  7,  8,  8,  9,  9,  9, 10, 11, 11, 12, 12, 13, 14],
        [ 0,  0,  0,  1,  1,  1,  2,  2,  2,  2,  3,  3,  3,  4,  4,  4,  5,  5,
          6,  6,  7,  7,  7,  8,  8,  8,  9, 10, 10, 11, 11, 11, 12, 13, 14,  0]])

In [174]:
env.td_state['solution']['actions']

tensor([[ 0, 15, 16, 14, 13, 20,  9,  1, 21,  4, 22,  3, 18,  6,  2, 11,  0, 10,
         17,  5,  8,  7, 19,  1, 23,  2, 12,  0,  1,  2,  0,  1,  2,  0,  1,  2],
        [18,  4, 23,  0, 10, 20,  1, 11,  2,  8, 19,  0,  7,  5,  1,  6, 22, 16,
          2, 12, 15,  0, 21,  1, 13,  2,  3,  9,  0,  1, 14,  2, 17,  0,  1,  2],
        [16, 14,  0, 12,  8,  1,  9, 10, 22,  2,  7,  5,  0, 13,  4,  1, 18,  2,
         21,  0, 20, 17,  1,  3, 15,  2,  0, 23,  1,  6, 19,  2,  0,  1,  2,  0]])

## GMTDVRP

Here, it's possible to combine an online scenario and multiple depots.

In [175]:
from maenvs4vrp.environments.gmtdvrp.env import Environment
from maenvs4vrp.environments.gmtdvrp.env_agent_selector import AgentSelector
from maenvs4vrp.environments.gmtdvrp.observations import Observations
from maenvs4vrp.environments.gmtdvrp.instances_generator import InstanceGenerator
from maenvs4vrp.environments.gmtdvrp.env_agent_reward import DenseReward
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [176]:
gen = InstanceGenerator()
obs = Observations()
sel = AgentSelector()
rew = DenseReward()

In [177]:
env = Environment(
    instance_generator_object=gen,
    obs_builder_object=obs,
    agent_selector_object=sel,
    reward_evaluator=rew
)

In [178]:
td = env.reset(batch_size=3, num_agents=5, num_depots=3, num_nodes=24, use_combinations=True, initial_load=15)

In [179]:
td = env.set_initial_load(td)

In [180]:
env.td_state['agents']['cur_linehaul_load']

tensor([[15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15.,
         15.],
        [15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15.,
         15.],
        [15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15.,
         15.]])

In [181]:
env.td_state['depot_idx']

tensor([[0, 1, 2],
        [0, 1, 2],
        [0, 1, 2]])

In [182]:
env.td_state['agents']['depot_idx']

tensor([[0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
        [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
        [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]])

## Acknowledgements: 

* https://github.com/ai4co/routefinder - checkout their paper:
["RouteFinder: Towards Foundation Models for Vehicle Routing Problems"](https://arxiv.org/abs/2406.15007) 