Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
432 commits
Select commit Hold shift + click to select a range
55db382
fixed a bug
Nov 10, 2020
7149362
fixed lint issues
Nov 10, 2020
eadf021
revised ac based on new model abstraction
Nov 11, 2020
29175a3
added load/dump functions to LearningModel
Nov 11, 2020
a8f9087
fixed a bug
Nov 11, 2020
9788114
fixed a bug
Nov 11, 2020
0c75673
fixed lint issues
Nov 11, 2020
471009c
Merge branch 'v0.2_embedded_optim' into v0.2_pg
Nov 11, 2020
312d12d
merged with v0.2_embedded_optims
Nov 11, 2020
1baad5e
refined DQN docstrings
Nov 11, 2020
e7c8a2a
removed load/dump functions from DQN
ysqyang Nov 11, 2020
2312337
minor changes
Nov 11, 2020
80f13f6
added task validator
Nov 11, 2020
0abc408
fixed decorator use
Nov 11, 2020
d53885e
fixed a typo
Nov 11, 2020
4fbc372
fixed a bug
Nov 11, 2020
7c0a43e
revised ac, pg and ppo based on new abstraction
Nov 11, 2020
acc9034
revised
Nov 11, 2020
7b875e3
fixed lint issues
Nov 11, 2020
eeaf930
Merge branch 'v0.2_embedded_optim' into v0.2_pg
Nov 11, 2020
ea8b1f7
changed LearningModel's step() to take a single loss
Nov 12, 2020
bf5d347
Merge branch 'v0.2_embedded_optim' into v0.2_pg
Nov 12, 2020
06a0acb
revised learning model design
Nov 12, 2020
3d07ce8
revised example
Nov 12, 2020
1369241
fixed a bug
Nov 12, 2020
ef01678
fixed a bug
Nov 12, 2020
01cbc2d
fixed a bug
Nov 12, 2020
6dc27d7
fixed a bug
Nov 12, 2020
18f82e2
fixed merge conflicts
Nov 13, 2020
a891732
added decorator utils to algorithm
Nov 13, 2020
13de76e
fixed a bug
Nov 13, 2020
2dee56c
renamed core_model to model
Nov 13, 2020
aed936b
fixed a bug
Nov 13, 2020
fc4b2ff
1. fixed lint formatting issues; 2. refined learning model docstrings
Nov 13, 2020
04d2033
rm trailing whitespaces
Nov 13, 2020
1641a28
changes to pg algorithm based on new rl toolkit features
Nov 13, 2020
1b364bc
added decorator for choose_action
Nov 13, 2020
d94d9ad
fixed a bug
Nov 13, 2020
4a0f89b
fixed a bug
Nov 13, 2020
26d09cf
fixed version-related issues
Nov 13, 2020
b1a77ac
renamed add_zeroth_dim decorator to expand_dim
Nov 13, 2020
56b3d9d
overhauled exploration abstraction
Nov 16, 2020
12d474a
Merge remote-tracking branch 'origin/v0.2' into v0.2_explorer
Nov 16, 2020
885f9b3
fixed a bug
Nov 16, 2020
2c3cddd
fixed a bug
Nov 16, 2020
cca18a1
fixed a bug
Nov 16, 2020
5cde319
added exploration related methods to abs_agent
Nov 16, 2020
a0a497a
fixed a bug
Nov 16, 2020
ab6934b
fixed a bug
Nov 16, 2020
6df170f
fixed a bug
Nov 16, 2020
09a0122
fixed a bug
Nov 16, 2020
4200ef3
fixed a bug
Nov 16, 2020
7f47c03
fixed a bug
Nov 16, 2020
07e06e0
separated learning with exploration schedule and without
Nov 17, 2020
ca15318
small fixes
Nov 17, 2020
0b7ac92
moved explorer logic to actor side
Nov 17, 2020
d331b48
fixed a bug
Nov 17, 2020
3e811b3
fixed a bug
Nov 17, 2020
bdd3a7d
fixed a bug
Nov 17, 2020
6d3e7fd
fixed a bug
Nov 17, 2020
ce4b34a
fixed some merge conflicts with v0.2_explorer
Nov 18, 2020
2713f99
added merged with v0.2_embedded_optims
Nov 18, 2020
a304f4c
removed unwanted param from simple agent manager
Nov 18, 2020
250a957
fixed merge conflicts
Nov 18, 2020
719d524
fixed some small conflicts
Nov 18, 2020
065edb9
small fixes
Nov 18, 2020
dbdb25c
revised code based on revised abstractions
Nov 18, 2020
d33d81e
fixed some bugs
Nov 18, 2020
368fa36
fixed a bug
Nov 18, 2020
d0e3fa3
fixed a bug
Nov 18, 2020
adcfd47
fixed a bug
Nov 18, 2020
275a278
fixed a bug
Nov 18, 2020
e9a6eca
fixed a bug
Nov 18, 2020
4ae2f87
fixed a bug
Nov 18, 2020
c6c1221
added shared_module property to LearningModel
Nov 18, 2020
4daa253
added shared_module property to LearningModel
Nov 18, 2020
30abf8b
Merge branch 'v0.2_embedded_optim' into v0.2_pg
Nov 18, 2020
6f4ebbe
fixed a bug with k-step return in AC
Nov 18, 2020
fe45444
fixed a bug
Nov 18, 2020
926f469
fixed a bug
Nov 18, 2020
3eb024e
merged pg, ac and ppo examples
Nov 18, 2020
19e272e
fixed a bug
Nov 18, 2020
00429e4
fixed a bug
Nov 18, 2020
9abbe38
fixed naming for ppo
Nov 18, 2020
699e895
renamed some variables in PPO
Nov 18, 2020
41efa70
added ActionWithLogProbability return type for PO-type algorithms
Nov 19, 2020
694d2b0
fixed a bug
Nov 19, 2020
7db7100
fixed a bug
Nov 19, 2020
c499534
fixed lint issues
Nov 19, 2020
c98f562
revised __getstate__ for LearningModel
Nov 19, 2020
a4b419b
fixed a bug
Nov 19, 2020
3e74a20
added soft_update function to learningModel
Nov 19, 2020
e972f75
fixed a bug
Nov 19, 2020
0e13ab2
revised learningModel
Nov 19, 2020
81a0341
rm __getstate__ and __setstate__ from LearningModel
Nov 19, 2020
708eae1
Merge branch 'v0.2_embedded_optim' into v0.2_pg
Nov 19, 2020
f552ae9
added noise explorer
Nov 20, 2020
b8fea59
fixed merge conflicts
Nov 20, 2020
f78bee0
formatting
Nov 20, 2020
5d01149
fixed formatting
Nov 20, 2020
c6e1c19
Merge branch 'v0.2_explorer' into v0.2_pg
Nov 20, 2020
05a7215
fixed conflicts
Nov 20, 2020
9e010fe
removed unnecessary comma
Nov 23, 2020
c774441
removed unnecessary comma
Nov 23, 2020
e7522fd
removed unnecessary comma
Nov 23, 2020
704b2ad
fixed PR comments
Nov 23, 2020
330e4f0
Merge remote-tracking branch 'origin/v0.2' into v0.2_explorer
Nov 23, 2020
bb8527e
Merge remote-tracking branch 'origin/v0.2' into v0.2_embedded_optim
Nov 23, 2020
eedd63e
fixed merge conflicts
Nov 23, 2020
ffc98b9
Merge remote-tracking branch 'origin/v0.2' into v0.2_pg
Nov 23, 2020
c2a04eb
fixed merge conflicts
Nov 23, 2020
f97a302
removed unwanted exception and imports
Nov 24, 2020
18e3366
removed unwanted exception and imports
Nov 24, 2020
4f612f5
fixed a bug
Nov 24, 2020
f1fdda3
Merge branch 'v0.2_explorer' into v0.2_embedded_optim
Nov 24, 2020
2383d48
Merge branch 'v0.2_explorer' into v0.2_pg
Nov 24, 2020
85557a8
fixed PR comments
Nov 25, 2020
e7fde35
fixed a bug
Nov 25, 2020
c3ea72f
fixed a bug
Nov 25, 2020
a0dbad7
fixed a bug
Nov 25, 2020
f719fa8
fixed a bug
Nov 25, 2020
451f645
fixed a bug
Nov 25, 2020
16f55d2
fixed a bug
Nov 25, 2020
f4c7d35
fixed lint issue
Nov 25, 2020
1469b77
fixed a bug
Nov 25, 2020
e379f71
fixed lint issue
Nov 25, 2020
aae44d5
fixed conflicts
Nov 25, 2020
5ce1680
fixed naming
Nov 25, 2020
71b4cda
Merge branch 'v0.2_explorer' into v0.2_embedded_optim
Nov 25, 2020
bbf140b
combined exploration param generation and early stopping in scheduler
Nov 27, 2020
1e4f982
fixed a bug
Nov 27, 2020
9bc8430
fixed a bug
Nov 27, 2020
d15a761
fixed a bug
Nov 27, 2020
fa33a91
fixed a bug
Nov 27, 2020
fa34480
fixed a bug
Nov 27, 2020
8030edd
fixed lint issues
Nov 27, 2020
2a1cbc7
fixed conflicts and renamed LearningModel to LearningModuleManager
Nov 27, 2020
7011d64
fixed lint issue
Nov 27, 2020
5ce4fb8
fixed merge conflicts
Nov 27, 2020
39e99a3
moved logger inside scheduler
Nov 28, 2020
13f1843
fixed a bug
Nov 28, 2020
27c64fa
fixed a bug
Nov 28, 2020
54bb17e
fixed a bug
Nov 28, 2020
f304a98
fixed merge conflicts
Nov 28, 2020
c210605
fixed lint issues
Nov 28, 2020
e4b4e65
fixed lint issue
Nov 28, 2020
d094903
removed epsilon parameter from choose_action
Nov 30, 2020
4fc1cf6
removed epsilon parameter from choose_action
Nov 30, 2020
bf680a4
Merge remote-tracking branch 'origin/v0.2' into v0.2_embedded_optim
Dec 3, 2020
c68c556
Merge remote-tracking branch 'origin/v0.2' into v0.2_explorer
Dec 3, 2020
1b7c31e
fixed merge conflicts
Dec 3, 2020
5b40972
changed agent manager's train parameter to experience_by_agent
Dec 3, 2020
7ac67c3
fixed some PR comments
Dec 11, 2020
92702a6
renamed zero_grad to zero_gradients in LearningModule
Dec 14, 2020
4b02278
fixed some PR comments
Dec 15, 2020
752b8b4
bug fix
Dec 15, 2020
089b857
bug fix
Dec 15, 2020
d9eb634
bug fix
Dec 15, 2020
2112398
Merge remote-tracking branch 'origin/v0.2' into v0.2_explorer
Dec 15, 2020
f839e4f
conflict fix
Dec 16, 2020
94e68d6
conflict fix
Dec 16, 2020
278a86b
conflict fix
Dec 16, 2020
fd67efd
removed explorer abstraction from agent
Dec 22, 2020
5b5d7bc
Merge remote-tracking branch 'origin/v0.2' into v0.2_explorer
Dec 22, 2020
5f7f73f
fixed conflicts
Dec 22, 2020
b29fd13
added DEVICE env variable as first choice for torch device
Dec 23, 2020
7cb9388
refined dqn example
Dec 23, 2020
f8dd9bd
Merge remote-tracking branch 'origin/v0.2' into v0.2_explorer
Dec 23, 2020
ec3e3f4
Merge branch 'v0.2_explorer' into v0.2_embedded_optim
Dec 23, 2020
236c9fc
fixed lint issues
Dec 23, 2020
ecef225
Merge branch 'v0.2_explorer' into v0.2_embedded_optim
Dec 23, 2020
0209a98
removed unwanted import in cim example
Dec 23, 2020
804fd08
updated cim-dqn notebook
Dec 23, 2020
9dd71d6
simplified scheduler
Dec 24, 2020
3addbff
Merge remote-tracking branch 'origin/v0.2' into v0.2_explorer
Dec 24, 2020
87beb2b
fixed conflicts
Dec 24, 2020
bc7b841
edited notebook according to merged scheduler changes
Dec 24, 2020
8ad6378
refined dimension check for learning module manager and removed num_a…
Dec 24, 2020
d4fe0db
bug fix for cim example
Dec 24, 2020
9fe7590
added notebook output
Dec 24, 2020
e697820
fixed conflicts
Dec 24, 2020
b4ef3f2
updated cim PO example code according to changes in maro/rl
Dec 24, 2020
ccf14c0
removed early stopping from CIM dqn example
Dec 24, 2020
af32156
fixed conflicts
Dec 24, 2020
f454bab
fixed conflicts
Dec 24, 2020
a662914
combined ac and ppo and simplified example code and config
Dec 24, 2020
382108e
removed early stopping from cim example config
Dec 24, 2020
13f7f04
Merge branch 'v0.2_explorer' into v0.2_embedded_optim
Dec 24, 2020
604d4d9
Merge branch 'v0.2_embedded_optim' into v0.2_pg
Dec 24, 2020
407fb9a
moved decorator logic inside algorithms
Dec 25, 2020
be9cbeb
Merge remote-tracking branch 'origin/v0.2' into v0.2_explorer
Dec 25, 2020
61e869e
renamed early_stopping_callback to early_stopping_checker
Dec 25, 2020
d1272eb
Merge remote-tracking branch 'origin/v0.2' into v0.2_embedded_optim
Dec 25, 2020
b55c5d8
small refinement
Dec 25, 2020
deb4032
fix conflicts
Dec 25, 2020
d70074a
put PG and AC under PolicyOptimization class and refined examples acc…
Dec 25, 2020
afdd054
fixed lint issues
Dec 25, 2020
53d2ee4
removed action_dim from noise explorer classes and added some shape c…
Dec 27, 2020
7570e1d
Merge branch 'v0.2_explorer' into v0.2_embedded_optim
Dec 27, 2020
7f25ae5
Merge branch 'v0.2_embedded_optim' into v0.2_pg
Dec 27, 2020
ee37435
modified NoiseExplorer's __call__ logic to batch processing
Dec 27, 2020
9eefa5d
Merge branch 'v0.2_explorer' into v0.2_embedded_optim
Dec 27, 2020
e3d57e6
Merge branch 'v0.2_embedded_optim' into v0.2_pg
Dec 27, 2020
56cacfd
made NoiseExplorer's __call__ return type np array
Dec 27, 2020
caea7c8
Merge branch 'v0.2_explorer' into v0.2_embedded_optim
Dec 27, 2020
f2c2f24
Merge branch 'v0.2_embedded_optim' into v0.2_pg
Dec 27, 2020
c981ffe
renamed update to set_parameters in explorer
Dec 28, 2020
39a2946
fixed old naming in test_grass
Dec 28, 2020
ba4f7de
Merge branch 'v0.2_explorer' into v0.2_embedded_optim
Dec 28, 2020
e05f6a6
Merge branch 'v0.2_embedded_optim' into v0.2_pg
Dec 28, 2020
000184a
fixed merge conflicts
Dec 28, 2020
233a5c1
Merge remote-tracking branch 'origin/v0.2' into v0.2_pg
Dec 29, 2020
48e284a
moved optimizer options to LearningModel
Dec 30, 2020
0d3da6b
typo fix
Dec 30, 2020
b22fd44
fixed lint issues
Dec 30, 2020
ebf49e2
updated notebook
Dec 30, 2020
8df04f6
Merge remote-tracking branch 'origin/v0.2' into v0.2_learning_model_r…
Dec 31, 2020
d4da693
fixed conflicts
Dec 31, 2020
d708143
updated cim example for policy optimization
Dec 31, 2020
da171e2
typo fix
Dec 31, 2020
97c9bf5
typo fix
Dec 31, 2020
4f5df18
typo fix
Dec 31, 2020
4472d7c
typo fix
Dec 31, 2020
c65e2b3
misc edits
Dec 31, 2020
fba8a53
fixed conflicts
Dec 31, 2020
3524d74
fixed conflicts
Jan 4, 2021
3126e48
fixed conflicts
Jan 5, 2021
238d432
minor edits to rl_toolkit.rst
Jan 5, 2021
806ec1d
checked out docs from master
Jan 5, 2021
89f1d7c
fixed typo in k-step shaper
Jan 5, 2021
3ec712c
fixed lint issues
Jan 5, 2021
3d4758a
bug fix in store
Jan 5, 2021
8f8ee61
lint issue fix
Jan 5, 2021
0d669f5
changed default max_ep to 100 for policy_optimization algos
Jan 5, 2021
d4bb9d5
vis doc update to master (#244)
Meroy9819 Jan 5, 2021
36af4f4
Merge remote-tracking branch 'origin/master' into v0.2_pg
Jan 6, 2021
166ae35
bug fix related to np array divide (#245)
ysqyang Jan 6, 2021
a11a009
Merge remote-tracking branch 'origin/master' into v0.2_pg
Jan 6, 2021
97fab8c
Master.simple bike (#250)
Jinyu-W Jan 11, 2021
6b615d3
simple bike repositioning article: formula updated
Jan 11, 2021
b2947cf
Merge remote-tracking branch 'origin/master' into v0.2_pg
Jan 12, 2021
70969f2
fixed conflicts
Jan 12, 2021
6e76179
checked out docs/source from v0.2
Jan 12, 2021
1f82cde
aligned with v0.2
Jan 12, 2021
baa9871
rm unwanted import
Jan 12, 2021
a5705b1
added references in policy_optimization.py
Jan 15, 2021
f469e6c
Merge remote-tracking branch 'origin/v0.2' into v0.2_pg
Jan 15, 2021
79b9190
fixed lint issues
Jan 15, 2021
91167d5
Merge remote-tracking branch 'origin/v0.2' into v0.2_pg
Jan 19, 2021
12e38e4
Merge branch 'v0.2' into v0.2_pg
ysqyang Jan 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion examples/cim/dqn/components/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# Licensed under the MIT license.

"""
This file is used to load config and convert it into a dotted dictionary.
This file is used to load the configuration and convert it into a dotted dictionary.
"""

import io
Expand Down
1 change: 0 additions & 1 deletion examples/cim/dqn/single_process_launcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ def launch(config):

# Step 4: Create an actor and a learner to start the training process.
scheduler = TwoPhaseLinearParameterScheduler(config.main_loop.max_episode, **config.main_loop.exploration)

actor = SimpleActor(env, agent_manager)
learner = SimpleLearner(
agent_manager, actor, scheduler,
Expand Down
22 changes: 22 additions & 0 deletions examples/cim/policy_optimization/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Overview

The CIM problem is one of the quintessential use cases of MARO. The example can
be run with a set of scenario configurations that can be found under
maro/simulator/scenarios/cim. General experimental parameters (e.g., type of
topology, type of algorithm to use, number of training episodes) can be configured
through config.yml. Each RL formulation has a dedicated folder, e.g., dqn, and
all algorithm-specific parameters can be configured through
the config.py file in that folder.

## Single-host Single-process Mode

To run the CIM example using the DQN algorithm under single-host mode, go to
examples/cim/dqn and run single_process_launcher.py. You may play around with
the configuration if you want to try out different settings.

## Distributed Mode

The examples/cim/dqn/components folder contains dist_learner.py and dist_actor.py
for distributed training. For debugging purposes, we provide a script that
simulates distributed mode using multi-processing. Simply go to examples/cim/dqn
and run multi_process_launcher.py to start the learner and actor processes.
14 changes: 14 additions & 0 deletions examples/cim/policy_optimization/components/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

from .action_shaper import CIMActionShaper
from .agent_manager import POAgentManager, create_po_agents
from .experience_shaper import TruncatedExperienceShaper
from .state_shaper import CIMStateShaper

__all__ = [
"CIMActionShaper",
"POAgentManager", "create_po_agents",
"TruncatedExperienceShaper",
"CIMStateShaper"
]
33 changes: 33 additions & 0 deletions examples/cim/policy_optimization/components/action_shaper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

from maro.rl import ActionShaper
from maro.simulator.scenarios.cim.common import Action


class CIMActionShaper(ActionShaper):
def __init__(self, action_space):
super().__init__()
self._action_space = action_space
self._zero_action_index = action_space.index(0)

def __call__(self, model_action, decision_event, snapshot_list):
scope = decision_event.action_scope
tick = decision_event.tick
port_idx = decision_event.port_idx
vessel_idx = decision_event.vessel_idx

port_empty = snapshot_list["ports"][tick: port_idx: ["empty", "full", "on_shipper", "on_consignee"]][0]
vessel_remaining_space = snapshot_list["vessels"][tick: vessel_idx: ["empty", "full", "remaining_space"]][2]
early_discharge = snapshot_list["vessels"][tick:vessel_idx: "early_discharge"][0]
assert 0 <= model_action < len(self._action_space)

if model_action < self._zero_action_index:
actual_action = max(round(self._action_space[model_action] * port_empty), -vessel_remaining_space)
elif model_action > self._zero_action_index:
plan_action = self._action_space[model_action] * (scope.discharge + early_discharge) - early_discharge
actual_action = round(plan_action) if plan_action > 0 else round(self._action_space[model_action] * scope.discharge)
else:
actual_action = 0

return Action(vessel_idx, port_idx, actual_action)
83 changes: 83 additions & 0 deletions examples/cim/policy_optimization/components/agent_manager.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

import numpy as np
import torch.nn as nn
from torch.optim import Adam, RMSprop

from maro.rl import (
AbsAgent, ActorCritic, ActorCriticConfig, FullyConnectedBlock, LearningModel, NNStack,
OptimizerOptions, PolicyGradient, PolicyOptimizationConfig, SimpleAgentManager
)
from maro.utils import set_seeds


class POAgent(AbsAgent):
def train(self, states: np.ndarray, actions: np.ndarray, log_action_prob: np.ndarray, rewards: np.ndarray):
self._algorithm.train(states, actions, log_action_prob, rewards)


def create_po_agents(agent_id_list, config):
input_dim, num_actions = config.input_dim, config.num_actions
set_seeds(config.seed)
agent_dict = {}
for agent_id in agent_id_list:
actor_net = NNStack(
"actor",
FullyConnectedBlock(
input_dim=input_dim,
output_dim=num_actions,
activation=nn.Tanh,
is_head=True,
**config.actor_model
)
)

if config.type == "actor_critic":
critic_net = NNStack(
"critic",
FullyConnectedBlock(
input_dim=config.input_dim,
output_dim=1,
activation=nn.LeakyReLU,
is_head=True,
**config.critic_model
)
)

hyper_params = config.actor_critic_hyper_parameters
hyper_params.update({"reward_discount": config.reward_discount})
learning_model = LearningModel(
actor_net, critic_net,
optimizer_options={
"actor": OptimizerOptions(cls=Adam, params=config.actor_optimizer),
"critic": OptimizerOptions(cls=RMSprop, params=config.critic_optimizer)
}
)
algorithm = ActorCritic(
learning_model, ActorCriticConfig(critic_loss_func=nn.SmoothL1Loss(), **hyper_params)
)
else:
learning_model = LearningModel(
actor_net,
optimizer_options=OptimizerOptions(cls=Adam, params=config.actor_optimizer)
)
algorithm = PolicyGradient(learning_model, PolicyOptimizationConfig(config.reward_discount))

agent_dict[agent_id] = POAgent(name=agent_id, algorithm=algorithm)

return agent_dict


class POAgentManager(SimpleAgentManager):
def train(self, experiences_by_agent: dict):
for agent_id, exp in experiences_by_agent.items():
if not isinstance(exp, list):
exp = [exp]
for trajectory in exp:
self.agent_dict[agent_id].train(
trajectory["state"],
trajectory["action"],
trajectory["log_action_probability"],
trajectory["reward"]
)
19 changes: 19 additions & 0 deletions examples/cim/policy_optimization/components/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

"""
This file is used to load the configuration and convert it into a dotted dictionary.
"""

import io
import os
import yaml


CONFIG_PATH = os.path.join(os.path.split(os.path.realpath(__file__))[0], "../config.yml")
with io.open(CONFIG_PATH, "r") as in_file:
config = yaml.safe_load(in_file)

DISTRIBUTED_CONFIG_PATH = os.path.join(os.path.split(os.path.realpath(__file__))[0], "../distributed_config.yml")
with io.open(DISTRIBUTED_CONFIG_PATH, "r") as in_file:
distributed_config = yaml.safe_load(in_file)
51 changes: 51 additions & 0 deletions examples/cim/policy_optimization/components/experience_shaper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

from collections import defaultdict

import numpy as np

from maro.rl import ExperienceShaper


class TruncatedExperienceShaper(ExperienceShaper):
def __init__(self, *, time_window: int, time_decay_factor: float, fulfillment_factor: float,
shortage_factor: float):
super().__init__(reward_func=None)
self._time_window = time_window
self._time_decay_factor = time_decay_factor
self._fulfillment_factor = fulfillment_factor
self._shortage_factor = shortage_factor

def __call__(self, trajectory, snapshot_list):
agent_ids = np.asarray(trajectory.get_by_key("agent_id"))
states = np.asarray(trajectory.get_by_key("state"))
actions = np.asarray(trajectory.get_by_key("action"))
log_action_probabilities = np.asarray(trajectory.get_by_key("log_action_probability"))
rewards = np.fromiter(
map(self._compute_reward, trajectory.get_by_key("event"), [snapshot_list] * len(trajectory)),
dtype=np.float32
)
return {agent_id: {
"state": states[agent_ids == agent_id],
"action": actions[agent_ids == agent_id],
"log_action_probability": log_action_probabilities[agent_ids == agent_id],
"reward": rewards[agent_ids == agent_id],
}
for agent_id in set(agent_ids)}

def _compute_reward(self, decision_event, snapshot_list):
start_tick = decision_event.tick + 1
end_tick = decision_event.tick + self._time_window
ticks = list(range(start_tick, end_tick))

# calculate tc reward
future_fulfillment = snapshot_list["ports"][ticks::"fulfillment"]
future_shortage = snapshot_list["ports"][ticks::"shortage"]
decay_list = [self._time_decay_factor ** i for i in range(end_tick - start_tick)
for _ in range(future_fulfillment.shape[0]//(end_tick-start_tick))]

tot_fulfillment = np.dot(future_fulfillment, decay_list)
tot_shortage = np.dot(future_shortage, decay_list)

return np.float(self._fulfillment_factor * tot_fulfillment - self._shortage_factor * tot_shortage)
30 changes: 30 additions & 0 deletions examples/cim/policy_optimization/components/state_shaper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

import numpy as np

from maro.rl import StateShaper

PORT_ATTRIBUTES = ["empty", "full", "on_shipper", "on_consignee", "booking", "shortage", "fulfillment"]
VESSEL_ATTRIBUTES = ["empty", "full", "remaining_space"]


class CIMStateShaper(StateShaper):
def __init__(self, *, look_back, max_ports_downstream):
super().__init__()
self._look_back = look_back
self._max_ports_downstream = max_ports_downstream
self._dim = (look_back + 1) * (max_ports_downstream + 1) * len(PORT_ATTRIBUTES) + len(VESSEL_ATTRIBUTES)

def __call__(self, decision_event, snapshot_list):
tick, port_idx, vessel_idx = decision_event.tick, decision_event.port_idx, decision_event.vessel_idx
ticks = [tick - rt for rt in range(self._look_back - 1)]
future_port_idx_list = snapshot_list["vessels"][tick: vessel_idx: 'future_stop_list'].astype('int')
port_features = snapshot_list["ports"][ticks: [port_idx] + list(future_port_idx_list): PORT_ATTRIBUTES]
vessel_features = snapshot_list["vessels"][tick: vessel_idx: VESSEL_ATTRIBUTES]
state = np.concatenate((port_features, vessel_features))
return str(port_idx), state

@property
def dim(self):
return self._dim
50 changes: 50 additions & 0 deletions examples/cim/policy_optimization/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
env:
scenario: "cim"
topology: "toy.4p_ssdd_l0.0"
durations: 1120
state_shaping:
look_back: 7
max_ports_downstream: 2
experience_shaping:
time_window: 100
fulfillment_factor: 1.0
shortage_factor: 1.0
time_decay_factor: 0.97
main_loop:
max_episode: 100
early_stopping:
warmup_ep: 20
last_k: 5
perf_threshold: 0.95 # minimum performance (fulfillment ratio) required to trigger early stopping
perf_stability_threshold: 0.1 # stability is measured by the maximum of abs(perf_(i+1) - perf_i) / perf_i
# over the last k episodes (where perf is short for performance). This value must
# be below this threshold to trigger early stopping
agents:
seed: 1024 # for reproducibility
type: "actor_critic" # "actor_critic" or "policy_gradient"
num_actions: 21
actor_model:
hidden_dims:
- 256
- 128
- 64
softmax_enabled: true
batch_norm_enabled: false
actor_optimizer:
lr: 0.001
critic_model:
hidden_dims:
- 256
- 128
- 64
softmax_enabled: false
batch_norm_enabled: true
critic_optimizer:
lr: 0.001
reward_discount: .0
actor_critic_hyper_parameters:
train_iters: 10
actor_loss_coefficient: 0.1
k: 1
lam: 0.0
# clip_ratio: 0.8
46 changes: 46 additions & 0 deletions examples/cim/policy_optimization/dist_actor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

import os

import numpy as np

from maro.simulator import Env
from maro.rl import AgentManagerMode, SimpleActor, ActorWorker
from maro.utils import convert_dottable

from components import CIMActionShaper, CIMStateShaper, POAgentManager, TruncatedExperienceShaper, create_po_agents


def launch(config):
config = convert_dottable(config)
env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
state_shaper = CIMStateShaper(**config.env.state_shaping)
action_shaper = CIMActionShaper(action_space=list(np.linspace(-1.0, 1.0, config.agents.num_actions)))
experience_shaper = TruncatedExperienceShaper(**config.env.experience_shaping)

config["agents"]["input_dim"] = state_shaper.dim
agent_manager = POAgentManager(
name="cim_actor",
mode=AgentManagerMode.INFERENCE,
agent_dict=create_po_agents(agent_id_list, config.agents),
state_shaper=state_shaper,
action_shaper=action_shaper,
experience_shaper=experience_shaper,
)
proxy_params = {
"group_name": os.environ["GROUP"],
"expected_peers": {"learner": 1},
"redis_address": ("localhost", 6379)
}
actor_worker = ActorWorker(
local_actor=SimpleActor(env=env, agent_manager=agent_manager),
proxy_params=proxy_params
)
actor_worker.launch()


if __name__ == "__main__":
from components.config import config
launch(config)
Loading