# Single Echelon Inventory Optimization Problem

## Topology definition

As a show case, we'll create a topology with 1 DC and 3 Stores. The products will flow from the DC to the Stores and there will be no direct connection between the Stores. The full definition could be found in [maro/simulator/scenarios/supply_chain/topologies/single_echelon/config.yml](https://github.com/microsoft/maro/blob/sc_single_echelon/maro/simulator/scenarios/supply_chain/topologies/single_echelon/config.yml).

For each problem definition, there are 3 must set things in the *config.yml* file:
  1. The full SKU list in this problem. Starting at [line 40](https://github.com/microsoft/maro/blob/sc_single_echelon/maro/simulator/scenarios/supply_chain/topologies/single_echelon/config.yml#L40), we defined 3 kinds of products named *product_0*, *product_1*, *product_2* here;
  ```yaml
      skus:
        - id: 10
          name: product_0
        - id: 11
          name: product_1
        - id: 12
          name: product_2
  ```
  2. The facility list in this problem. Starting at [line 48](https://github.com/microsoft/maro/blob/sc_single_echelon/maro/simulator/scenarios/supply_chain/topologies/single_echelon/config.yml#L48), we defined 1 DC and 3 Stores here. According to the problem statement, the DC's storage capacity is unlimited ([line 54](https://github.com/microsoft/maro/blob/sc_single_echelon/maro/simulator/scenarios/supply_chain/topologies/single_echelon/config.yml#L54)) and free ([line 55](https://github.com/microsoft/maro/blob/sc_single_echelon/maro/simulator/scenarios/supply_chain/topologies/single_echelon/config.yml#L55)). While each product in each store can be independently configured holding cost ([line 88](https://github.com/microsoft/maro/blob/sc_single_echelon/maro/simulator/scenarios/supply_chain/topologies/single_echelon/config.yml#L88)) and loss sale cost ([line 102](https://github.com/microsoft/maro/blob/sc_single_echelon/maro/simulator/scenarios/supply_chain/topologies/single_echelon/config.yml#L102)). As the forecasted demand would be provided for each product from each store at each time, we can specify the file path of the forecasted demand data ([line 114](https://github.com/microsoft/maro/blob/sc_single_echelon/maro/simulator/scenarios/supply_chain/topologies/single_echelon/config.yml#L114)) and the column names of the provided information ([line 32](https://github.com/microsoft/maro/blob/sc_single_echelon/maro/simulator/scenarios/supply_chain/topologies/single_echelon/config.yml#L32));
  ```yaml
      facilities:
        - name: DC
          definition_ref: DistributionCentre
          children:
            storage:
              config:
                capacity: null  # null indicates infinite storage capacity.
                unit_storage_cost: 0
          ...
        - name: Store_1
          definition_ref: StoreFacility
          children:
            storage:
              config:
                -
                  id: 0
                  capacity: 500
                  unit_storage_cost: 0.1  # NOTE: Holding cost for product_0
                -
                  ...
          skus:
            "product_0":
              price: 5
              sub_storage_id: 0
              backlog_ratio: 0.1  # NOTE: LS = price * backlog_ratio
              ...
            ...
          config:
            file_path:  maro/simulator/scenarios/supply_chain/topologies/single_echelon/store_1_demand.csv
            ...
        -
          ...
  ```
  3. The product flow together with the leading time information, i.e., the *topology* setting. This information is organized as (store, product, upstream facility it can order from). For example, to indicate that *Store_1* can order *product_0* from *DC*, with a leading time of *1 day*, we can define the topology relationship as [line 202](https://github.com/microsoft/maro/blob/sc_single_echelon/maro/simulator/scenarios/supply_chain/topologies/single_echelon/config.yml#L202):
  ```yaml
      topology:
        "Store_1":
          "product_0":
            "DC":
              train:
                vlt: 1
                cost: 0
        ...
  ```

## Constraints Supporting

Constraint *3. Day end inventory calculation* and *4. Inventory continuity constraint* are already embedded in existing facility as common logic. While the *1. Supply constraint* and *2. Labour constraint* can be supported in 2 ways:
1. Inherit the Product Unit of the used DC facility and add the logic there;
2. Add action shaping logic to meet these 2 constraints outside the simulator. Here we choose this way as the show case. The full action shaping logic can be found in file [examples/supply_chain/single_echelon
/env_sampler.py](https://github.com/microsoft/maro/blob/sc_single_echelon/examples/supply_chain/single_echelon/env_sampler.py), and the extra shaping logic for these 2 constraints can be found in [line 612](https://github.com/microsoft/maro/blob/sc_single_echelon/examples/supply_chain/single_echelon/env_sampler.py#L612). Note that, here we just use a constant *SC* and *LC*, these constant values can be easily replaced with dynamic ones.

## Action Shaping with Initial Guess

In current [action shaping implementation](https://github.com/microsoft/maro/blob/sc_single_echelon/examples/supply_chain/single_echelon/env_sampler.py#L569), we defined the ordered quantity to be a multiple of the historical observed demand mean, which means the action output by the policy is the multiple factor. Further, with the *or_action* provided as the *initial guess*, the RL output could be a fluctuation based on the *or_action*/*initial guess*:
```py
  # Consumer action
  if issubclass(self._entity_dict[agent_id].class_type, ConsumerUnit):
      if isinstance(self._policy_dict[self._agent2policy[agent_id]], RLPolicy):
          baseline_action = np.array(self._agent_state_dict[agent_id][-OR_NUM_CONSUMER_ACTIONS:])
          or_action = np.where(baseline_action == 1.0)[0][0]
          action_idx = max(0, int(action[0] - 1 + or_action))
      else:
          action_idx = action[0]

      product_unit_id: int = self._unit2product_unit[entity_id]
      action_quantity = int(
          int(action_idx) * max(1.0, self._cur_metrics["products"][product_unit_id]["demand_mean"]),
      )
```

**Note that,** depends on your need, the action space could be defined as the quantity instead of the multiple factor. Then the RL output could be the abstract quantity delta to the initial guess.

## Optimization Objective Supporting

The balance calculation would be triggered tick by tick, unit by unit, facility by facility, as you can find in [examples/supply_chain/common/balance_calculator.py](https://github.com/microsoft/maro/blob/sc_single_echelon/examples/supply_chain/common/balance_calculator.py#L412). Since we already set the unit cost of parts that we don't care about to 0 in topology config file ([maro/simulator/scenarios/supply_chain/topologies/single_echelon/config.yml](https://github.com/microsoft/maro/blob/sc_single_echelon/maro/simulator/scenarios/supply_chain/topologies/single_echelon/config.yml)), the calculated balance would contains only *product profit*, *loss sale cost* and *holding cost*.

## Try Run RL Training Process in Single Echelon Problem

To show the training process here, we shorten the training episodes and duration here, to config it you can tune the parameters in [examples/rl/sc_single_echelon.yml](https://github.com/microsoft/maro/blob/sc_single_echelon/examples/rl/sc_single_echelon.yml) and [examples/supply_chain/single_echelon/config.py](https://github.com/microsoft/maro/blob/sc_single_echelon/examples/supply_chain/single_echelon/config.py).

In [1]:
import os
os.chdir("../..")

In [2]:
! python examples/rl/run_rl_example.py examples/rl/sc_single_echelon.yml

Loading data from maro/simulator/scenarios/supply_chain/topologies/single_echelon/store_1_demand_preprocessed.csv
200it [00:00, 170396.26it/s]
Loading data from maro/simulator/scenarios/supply_chain/topologies/single_echelon/store_2_demand_preprocessed.csv
200it [00:00, 182083.96it/s]
Loading data from maro/simulator/scenarios/supply_chain/topologies/single_echelon/store_3_demand_preprocessed.csv
300it [00:00, 188508.04it/s]
Loading data from maro/simulator/scenarios/supply_chain/topologies/single_echelon/store_1_demand_preprocessed.csv
200it [00:00, 190433.78it/s]
Loading data from maro/simulator/scenarios/supply_chain/topologies/single_echelon/store_2_demand_preprocessed.csv
200it [00:00, 145711.45it/s]
Loading data from maro/simulator/scenarios/supply_chain/topologies/single_echelon/store_3_demand_preprocessed.csv
300it [00:00, 197844.53it/s]
15:57:49 | MAIN | INFO | Start training workflow.
15:57:49 | Total number of policy-related agents / entities: 7 / 33
15:57:49 | MAIN | INFO |