# Simple Tutorial to MARO Supply Chain Scenario

In [4]:
import os
import sys

sys.path.append("/maro")
root_dir = os.path.abspath('../..')

In [5]:
root_dir

'/maro'

## Simple random policy example

The simple random example shows the interface of the Supply Chain Simulator and illustrates how to interact with it. As you can see in line 72 of file [*examples/supply_chain/simple_random_example.py*](https://github.com/microsoft/maro/blob/sc_tutorial/examples/supply_chain/simple_random_example.py#L72), we can deliver `ManufactureAction` and `ConsumerAction` to `Env`, and call function `step()` to trigger the simulation process. Try the simple example by:

```sh
python examples/supply_chain/simple_random_example.py
```

In [10]:
!python {root_dir}/examples/supply_chain/simple_random_example.py

{'sold': array([100., 100.]), 'demand': array([10095.,  9969.]), 'sold/demand': array([0.00990589, 0.0100311 ])}


## Interaction with Non-RL policy

The complex example leverage the RL workflow in MARO. And the example code enable many configurations. Simpler configurations are listed in file [*examples/supply_chain/rl/config.py*](https://github.com/microsoft/maro/blob/sc_tutorial/examples/supply_chain/rl/config.py). The basic ones you may need are:

- `ALGO`: The algorithm to use. "DQN" and "PPO" are RL algorithms, "EOQ" is a rule-based algorithm, "BSP" is an OR-algorithm base-stock policy.
- `TOPOLOGY`: The "plant" and "super_vendor" are toy topologies. You can use the "SCI(_XX)" ones if you add the topology under directory *maro/simulator/scenarios/supply_chain/topologies*
- `PLOT_RENDER`: Render figures to show important metrics during experiment or not.
- `EXP_NAME`: The experiment name, the experiment logs would be saved to the log path with `EXP_NAME` as the folder name.

With setting `ALGO = "EOQ"`, we can try to simulate with the rule-based policy. Since the non-rl policy does not require any training process, we can use *evaluate_only* mode by:

```sh
python examples/rl/run_rl_example.py examples/rl/supply_chain.yml --evaluate_only
```

In [11]:
!python {root_dir}/examples/rl/run_rl_example.py {root_dir}/examples/rl/supply_chain.yml --evaluate_only

Traceback (most recent call last):
  File "/maro/maro/rl/workflows/main.py", line 204, in <module>
    module = importlib.import_module(os.path.basename(scenario_path))
  File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'rl'


## Interaction with RL policy

If you want to try trainable RL policy, you may also need to adjust the training workflow in file [*examples/rl/supply_chain.yml*](https://github.com/microsoft/maro/blob/sc_tutorial/examples/rl/supply_chain.yml). The basic ones you may need are:

- `num_episodes` in line 15: Number of episode to run. Each episode is one cycle of roll-out and policy training.
- `eval_schedule` in line 17: Intervals between two evaluation process. `eval_schedule: 5` means will evaluate every 5 episodes.
- `interval` in line 31: Intervals between two dump action of policy network.

With setting `ALGO = "PPO"` of *config.py*, we can try to simulate with the PPO algorithm based policy. The rl policy requires training process, so we need to enable training mode by:

```sh
python examples/rl/run_rl_example.py examples/rl/supply_chain.yml
```

In [None]:
!python {root_dir}/examples/rl/run_rl_example.py {root_dir}/examples/rl/supply_chain.yml

## Much more complex configuration

The complex solution configurations are gathered in file [*examples/supply_chain/rl/rl_component_bundle.py*](https://github.com/microsoft/maro/blob/sc_tutorial/examples/supply_chain/rl/rl_component_bundle.py), the ones you may concern about are:

- `get_agent2policy` in line 67: the mapping from the entity id in the scenario to the policy alias.
- `get_policy_creator` in line 84: what exactly the policy is for each policy alias.
- `get_trainer_creator` in line 97: the trainer for the policy training. It is related to what algorithm to use.
- `get_device_mapping` in line 109: the mapping from the policy alias to the training device.
- `get_policy_trainer_mapping` in line 135: the mapping from the policy alias to the trainer alias.

Besides, the **state shaping**, **action shaping** and **reward shaping** logics are defined in file [*examples/supply_chain/rl/env_sampler.py*](https://github.com/microsoft/maro/blob/sc_tutorial/examples/supply_chain/rl/env_sampler.py), while [*examples/supply_chain/rl/rl_agent_state.py*](https://github.com/microsoft/maro/blob/sc_tutorial/examples/supply_chain/rl/rl_agent_state.py) and [*examples/supply_chain/rl/or_agent_state.py*](https://github.com/microsoft/maro/blob/sc_tutorial/examples/supply_chain/rl/or_agent_state.py) are used by **state shaping** logic.
