## Investigate simulator

In [2]:
from maro.simulator import Env, get_available_envs, DecisionMode
from maro.simulator.scenarios.ecr.common import Action, DecisionEvent
from maro.simulator.graph import SnapshotList

## Hello world
**MARO** simulator provides a **Gym**-like interface, which is convenient to use.

In [3]:
env = Env(scenario='ecr', topology='5p_ssddd_l0.0', max_tick=10)

for ep in range(2):
    _, decision_event, is_done = env.step(None)

    while not is_done:
        print(f'ep: {ep}, decision event: {decision_event}')
        dummy_action = Action(decision_event.vessel_idx, decision_event.port_idx, 0)
        reward, decision_event, is_done = env.step(dummy_action)

    env.reset()
    

ep: 0, decision event: DecisionEvent(tick=7, port_idx=1, vessel_idx=0, action_scope=ActionScope {load: 20000, discharge: 0 })
ep: 0, decision event: DecisionEvent(tick=7, port_idx=2, vessel_idx=1, action_scope=ActionScope {load: 20000, discharge: 0 })
ep: 0, decision event: DecisionEvent(tick=7, port_idx=0, vessel_idx=2, action_scope=ActionScope {load: 12000, discharge: 0 })
ep: 0, decision event: DecisionEvent(tick=7, port_idx=3, vessel_idx=3, action_scope=ActionScope {load: 16000, discharge: 0 })
ep: 0, decision event: DecisionEvent(tick=7, port_idx=4, vessel_idx=4, action_scope=ActionScope {load: 16000, discharge: 0 })
ep: 0, decision event: DecisionEvent(tick=7, port_idx=0, vessel_idx=5, action_scope=ActionScope {load: 12000, discharge: 0 })
ep: 1, decision event: DecisionEvent(tick=7, port_idx=1, vessel_idx=0, action_scope=ActionScope {load: 20000, discharge: 0 })
ep: 1, decision event: DecisionEvent(tick=7, port_idx=2, vessel_idx=1, action_scope=ActionScope {load: 20000, discharg

## The workflow of ECR problem
In **ECR** scenario  <span style="color:green">**agent**</span> need to make the decision for loading/discharging empty container to/from the vessel.
![alt text](./images/ecr_workflow.png)



## Get all supported environments
Internally, **MARO** use a 2 level hierarchical namespace to identify an environment.
- `scenario` denotes a specific resource optimization domain
- `topology` denotes a typical topology structure of its scenario

You can use `get_available_envs` function to get all supported environments.

In [4]:
# get all supported environments
get_available_envs()

[{'scenario': 'ecr', 'topology': '22p_global_l0.7'},
 {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.0'},
 {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.3'},
 {'scenario': 'ecr', 'topology': '5p_ssddd_l0.6'},
 {'scenario': 'ecr', 'topology': '22p_global_l0.5'},
 {'scenario': 'ecr', 'topology': '5p_ssddd_l0.5'},
 {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.4'},
 {'scenario': 'ecr', 'topology': '22p_global_l0.2'},
 {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.5'},
 {'scenario': 'ecr', 'topology': '5p_ssddd_l0.2'},
 {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.6'},
 {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.8'},
 {'scenario': 'ecr', 'topology': '5p_ssddd_l0.8'},
 {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.2'},
 {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.7'},
 {'scenario': 'ecr', 'topology': '5p_ssddd_l0.3'},
 {'scenario': 'ecr', 'topology': '4p_ssdd_l0.7'},
 {'scenario': 'ecr', 'topology': '4p_ssdd_l0.6'},
 {'scenario': 'ecr', 'topology': '4p_ssdd_l0.5'},
 {'scenario': 'ecr',

## Interact with environment
**MARO** simulator is mainly focused on a multi-agent continuing resource optimization scenario. So both *continuing* and *multi-agent* are the main concerned interface design features.
### Initialize an environment
**MARO** simulator focus on the continuing scenario, so `max_tick` parameter is a must for initializing a new environment besides `scenario` and `topology`.

For *mutli-agent* interaction, **MARO** simulator provides two kinds of decision mode for the decision events on the different/same agent in the same tick:
- `DecisionMode.Sequential`: each related agent makes the decision sequentially, the next agent make the decision based on the last agent decision results.
- `DecisionMode.Joint`: all related agents make the decision jointly, all related agents make the decision at the same time, based on the same states.


In [5]:
# init an environment of 10 tick ecr scenario with 5p_ssddd_simple_static_order
env = Env(scenario='ecr', topology='5p_ssddd_l0.0', max_tick=10, decision_mode=DecisionMode.Sequential)

### Get environment basic information
**MARO** is a *multi-agent* setting, different decision events need to be handled on the different agents. And for performance concern, **MARO** simulator uses the self-managed memory, all interfaces use an index(int) as the entity id. To ease the debug process, **MARO** simulator also provides the mapping from the `index(int)` to `name(string)`.
**MARO** simulator has a general data modeling for the resource optimization problem, for **ECR** scenario two kinds of resource nodes are used:

- `STATIC`: It's an abstraction of the port, which is a resource repository. It's also the decision component, which is called *agent* in the **RL** context.
- `DYNAMIC`: It's an abstraction of the vessel, which is a resource container, and will sail between different ports.


In [6]:
agent_idx_list = env.agent_idx_list
print(f'{env.name} decision agent index list: {agent_idx_list}')
name_mapping = env.node_name_mapping
print(f'{env.name} internal idx2name mapping', name_mapping)
# TODO: get config

ecr:5p_ssddd_l0.0 decision agent index list: [0, 1, 2, 3, 4]
ecr:5p_ssddd_l0.0 internal idx2name mapping {'static': {0: 'transfer_port_001', 1: 'supply_port_001', 2: 'supply_port_002', 3: 'demand_port_001', 4: 'demand_port_002'}, 'dynamic': {0: 'rt1_vessel_001', 1: 'rt1_vessel_002', 2: 'rt1_vessel_003', 3: 'rt2_vessel_001', 4: 'rt2_vessel_002', 5: 'rt2_vessel_003'}}


### Get environment observation
Unlike the `step` function in **Gym**, which will return an environment observation directly. **MARO** simulator `step` function doesn't return the environment observation directly. The agent can get a `snapshot_list` from `Env` instance, which includes environment snapshot of all happened ticks. It's convenient for *state shaping*. **MARO** simulator uses a *self-managed* memory for performance concern, `snapshot_list` is a block of `C` memory.
#### Get all supported attributes list

In [7]:
env.snapshot_list.attributes

[{'name': 'empty', 'slot length': 1},
 {'name': 'full', 'slot length': 1},
 {'name': 'early_discharge', 'slot length': 1},
 {'name': 'capacity', 'slot length': 1},
 {'name': 'remaining_space', 'slot length': 1},
 {'name': 'on_shipper', 'slot length': 1},
 {'name': 'on_consignee', 'slot length': 1},
 {'name': 'booking', 'slot length': 1},
 {'name': 'shortage', 'slot length': 1},
 {'name': 'acc_booking', 'slot length': 1},
 {'name': 'acc_shortage', 'slot length': 1},
 {'name': 'fulfillment', 'slot length': 1},
 {'name': 'acc_fulfillment', 'slot length': 1},
 {'name': 'route_idx', 'slot length': 1},
 {'name': 'last_loc_idx', 'slot length': 1},
 {'name': 'next_loc_idx', 'slot length': 1},
 {'name': 'past_stop_list', 'slot length': 4},
 {'name': 'past_stop_tick_list', 'slot length': 4},
 {'name': 'future_stop_list', 'slot length': 3},
 {'name': 'future_stop_tick_list', 'slot length': 3},
 {'name': 'full_on_ports', 'slot length': 25},
 {'name': 'full_on_vessels', 'slot length': 30},
 {'name'

#### Get attribute value
The attribute value can be queried through a sliceable interface, the returned value is a flattened 1-dimensional float32 numpy array, which length is `len(tick list) * len(node id list) * len(attribute list) * len(slot list)`.
- Get static node attribute value: `snapshot_list.static_nodes[[tick] or tick: [node id] or node id: ([attribute name] or attribute name, [attribute slot] or attribute slot)]`
- Get dynamic node attribute value: `snapshot_list.dynamic_nodes[[tick] or tick: [node id] or node id: ([attribute name] or attribute name, [attribute slot] or attribute slot)]`
 

In [14]:
env = Env(scenario='ecr', topology='5p_ssddd_l0.0', max_tick=100)
_, decision_event, is_done = env.step(None)

ports = env.snapshot_list.static_nodes
vessels = env.snapshot_list.dynamic_nodes
matrix = env.snapshot_list.matrix

while not is_done:
    dummy_action = Action(decision_event.vessel_idx, decision_event.port_idx, 0)
    reward, decision_event, is_done = env.step(dummy_action)
    print(f'decision event: {decision_event}')
    print('empty value of static node 0 at all ticks:', ports[:0: ('empty', 0)])
    print('empty value of static node 1 at all ticks:', ports[:1: ('empty', 0)])
    print('empty value of static node 2 at all ticks:', ports[:2: ('empty', 0)])
    print('empty value of static node 3 at all ticks:', ports[:3: ('empty', 0)])
    print('empty value of static node 4 at all ticks:', ports[:4: ('empty', 0)])
    print('empty value of all static nodes at all ticks:', ports[:: ('empty', 0)])
    print('full value of static node 0 at all ticks:', ports[:0: ('full', 0)])
    print('full value of static node 1 at all ticks:', ports[:1: ('full', 0)])
    print('full value of static node 2 at all ticks:', ports[:2: ('full', 0)])
    print('full value of static node 3 at all ticks:', ports[:3: ('full', 0)])
    print('full value of static node 4 at all ticks:', ports[:4: ('full', 0)])
    print('full value of all static nodes at all ticks:', ports[:: ('full', 0)]) 
    print('empty, full values of all static nodes at all ticks:', ports[:: (['empty', 'full'], 0)])
    
    predict_stop_id = vessels[decision_event.tick: decision_event.port_idx: ('future_stop_list', [0,1,2])]
    predict_arrival_tick = vessels[decision_event.tick: decision_event.port_idx: ('future_stop_tick_list', [0,1,2])]
    print(f'vessel {decision_event.vessel_idx} next 3 stop after tick {decision_event.tick}:', predict_stop_id)
    print(f'vessel {decision_event.vessel_idx} next 3 stop ticks after tick {decision_event.tick}:', predict_arrival_tick)
    
    # get the full container distribution (row is source port, column is destination port)
    full_on_ports = matrix[decision_event.tick: "full_on_ports"]
    print("full containers on port:")
    # x->port, y->port, v->ready for load full container number
    print(full_on_ports.reshape(5, 5))
    
    # get full containers on vessels (row is vessel, column is destination port)
    full_on_vessels = matrix[decision_event.tick: "full_on_vessels"]
    print("full containers on vessels")
    # x->vessel, y->port, v->full container number
    print(full_on_vessels.reshape(6, 5))
    
    # get predict vessel plans (at first time, the tick of current arriving vessels is current tick, it will be changed to planed tick after take action)
    vessel_plans = matrix[decision_event.tick: "vessel_plans"] 
    # -1 means the vessel cannot reach that port
    print("vessel plans")
    # x->vessel, y->port, v->vessel next arrive time
    print(vessel_plans.reshape(6, 5))
    break

decision event: DecisionEvent(tick=7, port_idx=2, vessel_idx=1, action_scope=ActionScope {load: 20000, discharge: 0 })
empty value of static node 0 at all ticks: [19000. 18000. 17000. 16000. 15000. 14000. 13000. 12000.]
empty value of static node 1 at all ticks: [20000. 20000. 20000. 20000. 20000. 20000. 20000. 20000.]
empty value of static node 2 at all ticks: [20000. 20000. 20000. 20000. 20000. 20000. 20000. 20000.]
empty value of static node 3 at all ticks: [19500. 19000. 18500. 18000. 17500. 17000. 16500. 16000.]
empty value of static node 4 at all ticks: [19500. 19000. 18500. 18000. 17500. 17000. 16500. 16000.]
empty value of all static nodes at all ticks: [19000. 20000. 20000. 19500. 19500. 18000. 20000. 20000. 19000. 19000.
 17000. 20000. 20000. 18500. 18500. 16000. 20000. 20000. 18000. 18000.
 15000. 20000. 20000. 17500. 17500. 14000. 20000. 20000. 17000. 17000.
 13000. 20000. 20000. 16500. 16500. 12000. 20000. 20000. 16000. 16000.]
full value of static node 0 at all ticks: [  

### Drive environment move forward
**MARO** simulator provides a **Gym**-like interface, the agent interacts with simulator though `step` interface. For the interface unification and simplification, `step` is used for both the environment `start` and `take action`.

- `start`: Tell the environment to start running, and environment will run until meeting the next decision event.
```python
_, decision_event, is_done = env.step(None)
```
- `DecisionEvent`: Provide the needed information for the decision making of the agent, such as the current arrived vessel index, port index, and action scope, etc. 
- `Action`: The applied action for the environment, which is generated by the agent.
- `take action`: Provide an action to the environment for the current decision event, which will be applied to the environment.
```python
dummy_action = Action(decision_event.vessel_idx, decision_event.port_idx, 0)
reward, decision_event, is_done = env.step(dummy_action)
```

In [9]:
env.reset()
_, decision_event, is_done = env.step(None)
print(decision_event)
dummy_action = Action(decision_event.vessel_idx, decision_event.port_idx, 0)
print(dummy_action)
reward, decision_event, is_done = env.step(dummy_action)
print(decision_event)

DecisionEvent(tick=7, port_idx=1, vessel_idx=0, action_scope=ActionScope {load: 20000, discharge: 0 })
Action {quantity: 0, port: 1, vessel: 0 }
DecisionEvent(tick=7, port_idx=2, vessel_idx=1, action_scope=ActionScope {load: 20000, discharge: 0 })
