Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
95c6413
refine readme
Oct 1, 2020
ddd15a5
Merge branch 'v0.1' of https://github.com/microsoft/maro into v0.1
ArthurJiang Oct 1, 2020
de6f89d
Merge branch 'master' into v0.1
ArthurJiang Oct 7, 2020
b3c01ca
feat: refine data push/pull (#138)
romickid Oct 8, 2020
e9f96cb
add fall back function in weather download (#112)
zhawan Oct 8, 2020
080714c
Merge branch 'master' into v0.1
ArthurJiang Oct 8, 2020
1fa9854
Merge branch 'master' into v0.1
ArthurJiang Oct 8, 2020
ef7f870
Merge branch 'master' into v0.1
ArthurJiang Oct 10, 2020
5bfc5eb
added example docs (#136)
ysqyang Oct 12, 2020
11714ea
Merge branch 'master' into v0.1
ArthurJiang Oct 12, 2020
71c7e5b
Merge branch 'v0.1' of https://github.com/microsoft/maro into v0.1
ArthurJiang Oct 12, 2020
1c9e60d
Merge branch 'master' into v0.1
ArthurJiang Oct 12, 2020
73ee225
switch the key and value of handler_dict in decorator (#144)
kaiqli Oct 12, 2020
77b8288
V0.1 annotation (#147)
Jinyu-W Oct 13, 2020
0b11548
Merge branch 'master' into v0.1
ArthurJiang Oct 13, 2020
b4fd7ad
Event payload details for env.summary (#156)
Jinyu-W Oct 16, 2020
5a0e622
Merge branch 'master' into v0.2
Oct 16, 2020
1f6d5a7
Merge branch 'master' into v0.2
Jinyu-W Oct 20, 2020
dd2bc2b
Merge branch 'v0.1' into v0.2
ArthurJiang Oct 27, 2020
c6170bc
V0.2 online lp for citi bike (#159)
Jinyu-W Oct 29, 2020
f878f80
V0.2 rl toolkit refinement (#165)
ysqyang Oct 30, 2020
577bb1c
merge with master
Nov 4, 2020
2625427
Merge branch 'master' into v0.2
Nov 5, 2020
977891f
merge master into this branch; update according to isort
Nov 5, 2020
fb26bbf
update according to flake8
Nov 5, 2020
5a0d6e3
V0.2 Logical operator overloading for EarlyStoppingChecker (#178)
ysqyang Nov 6, 2020
f7bd11a
V0.2 skip connection (#176)
ysqyang Nov 6, 2020
9afaac4
fixed a bug in learner's test() (#193)
ysqyang Nov 13, 2020
0d0d03d
V0.2 double dqn (#188)
ysqyang Nov 13, 2020
298c155
V0.2 feature predefined image (#183)
romickid Nov 16, 2020
fdd7736
V0.2 feature proxy rejoin (#158)
kaiqli Nov 23, 2020
c89fc69
V0.2 feature cli windows (#203)
romickid Dec 2, 2020
c2ae29f
EventBuffer refine (#197)
chaosddp Dec 11, 2020
b86bb5f
V0.2 merge master (#214)
ysqyang Dec 16, 2020
e5e4d61
merge master
Dec 16, 2020
11e9b86
typo fix
Dec 16, 2020
5b9e1fc
Bug fix: event buffer issue that cause Actions cannot be passed into …
chaosddp Dec 16, 2020
6854e80
fix flake8 style problem
Dec 16, 2020
c0999d3
V0.2 feature refine mode namings (#212)
romickid Dec 18, 2020
65b2f24
V0.2 vis new (#210)
micli Dec 19, 2020
5ff0895
V0.2 local host process (#221)
kaiqli Dec 22, 2020
259d0cc
V0.2 grass on premises (#220)
micli Dec 23, 2020
8630dd6
V0.2 vm scheduling scenario (#189)
kyu-kuanwei Dec 23, 2020
7684f1d
Resolve none action problem (#224)
kyu-kuanwei Dec 23, 2020
0903cd6
V0.2 vm_scheduling notebook (#223)
kyu-kuanwei Dec 23, 2020
675fd9b
Update process mode docs and fixed on premises (#226)
kaiqli Dec 24, 2020
2315bbd
V0.2 Add github workflow integration (#222)
romickid Dec 25, 2020
97d7dbc
V0.2 explorer (#198)
ysqyang Dec 28, 2020
b1492f1
V0.2 embedded optim (#191)
ysqyang Dec 28, 2020
a5ecb00
V0.2 VM scheduling docs (#228)
kyu-kuanwei Dec 28, 2020
7594a9b
v0.2 VM Scheduling docs refinement (#231)
kyu-kuanwei Dec 29, 2020
f2afa75
V0.2 store refinement (#234)
ysqyang Dec 30, 2020
b1fd644
Fix bug (#237)
kyu-kuanwei Dec 31, 2020
2b5bbdb
V0.2 rl toolkit doc (#235)
ysqyang Jan 4, 2021
7ac91c9
Merge V0.2 vis into V0.2 (#233)
Meroy9819 Jan 4, 2021
068a7eb
V0.2 docs process mode (#230)
kaiqli Jan 4, 2021
4da525b
V0.2 learning model refinement (#236)
ysqyang Jan 4, 2021
82e2534
Update vm docs (#241)
kyu-kuanwei Jan 4, 2021
1fcc160
V0.2 info update (#240)
ArthurJiang Jan 4, 2021
951eaaf
Merge branch 'master' into v0.2
Jan 4, 2021
b30ca0e
Fix typo (#242)
chaosddp Jan 4, 2021
ad02915
fix
Jan 4, 2021
bf08afb
Merge branch 'v0.2' of https://github.com/microsoft/maro into v0.2
ArthurJiang Jan 5, 2021
e8e66db
Merge branch 'master' into v0.2
ArthurJiang Jan 5, 2021
6a4c403
Merge branch 'master' into v0.2
ArthurJiang Jan 5, 2021
8a4c563
syntax fix (#253)
ysqyang Jan 13, 2021
d51b192
V0.2 vm oversubscription (#246)
Jan 18, 2021
39b1159
V0.2 vm scheduling decision event (#257)
Jan 19, 2021
6246ae2
V0.2 PG, K-step and lambda return utils (#155)
ysqyang Jan 21, 2021
97e4fa8
V0.2 backend dynamic node support (#172)
chaosddp Jan 22, 2021
4f7529c
V0.2 vm oversub docs (#256)
Jan 25, 2021
e505420
Merge branch 'master' into v0.2
Jan 25, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/linters/tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ exclude =
.github,
scripts,
tests,
maro/backends/*.cpp
setup.py

max-line-length = 120
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/build_wheel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@ jobs:

- name: Compile cython files
run: |
cython ./maro/backends/backend.pyx ./maro/backends/np_backend.pyx ./maro/backends/raw_backend.pyx ./maro/backends/frame.pyx -3 -E FRAME_BACKEND=NUMPY,NODES_MEMORY_LAYOUT=ONE_BLOCK -X embedsignature=True
python ./scripts/code_gen.py
cython ./maro/backends/backend.pyx ./maro/backends/np_backend.pyx ./maro/backends/raw_backend.pyx ./maro/backends/frame.pyx --cplus -3 -E NODES_MEMORY_LAYOUT=ONE_BLOCK -X embedsignature=True

- name: Build wheel on Windows and macOS
if: runner.os == 'Windows' || runner.os == 'macOS'
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/deploy_docker_image.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ jobs:
- name: Build image
run: |
pip install -r ./maro/requirements.build.txt
cython ./maro/backends/backend.pyx ./maro/backends/np_backend.pyx ./maro/backends/raw_backend.pyx ./maro/backends/frame.pyx -3 -E FRAME_BACKEND=NUMPY,NODES_MEMORY_LAYOUT=ONE_BLOCK -X embedsignature=True
python ./scripts/code_gen.py
cython ./maro/backends/backend.pyx ./maro/backends/np_backend.pyx ./maro/backends/raw_backend.pyx ./maro/backends/frame.pyx --cplus -3 -E NODES_MEMORY_LAYOUT=ONE_BLOCK -X embedsignature=True
cat ./maro/__misc__.py | grep __version__ | egrep -o [0-9].[0-9].[0-9,a-z]+ | { read version; docker build -f ./docker_files/cpu.play.df . -t ${{ secrets.DOCKER_HUB_USERNAME }}/maro:cpu -t ${{ secrets.DOCKER_HUB_USERNAME }}/maro:latest -t ${{ secrets.DOCKER_HUB_USERNAME }}/maro:cpu-$version; }

- name: Login docker hub
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/deploy_gh_pages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ jobs:

- name: Compile cython files
run: |
cython ./maro/backends/backend.pyx ./maro/backends/np_backend.pyx ./maro/backends/raw_backend.pyx ./maro/backends/frame.pyx -3 -E FRAME_BACKEND=NUMPY,NODES_MEMORY_LAYOUT=ONE_BLOCK -X embedsignature=True
python ./scripts/code_gen.py
cython ./maro/backends/backend.pyx ./maro/backends/np_backend.pyx ./maro/backends/raw_backend.pyx ./maro/backends/frame.pyx --cplus -3 -E NODES_MEMORY_LAYOUT=ONE_BLOCK -X embedsignature=True

- name: Build maro inplace
run: |
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ jobs:

- name: Compile cython files
run: |
cython ./maro/backends/backend.pyx ./maro/backends/np_backend.pyx ./maro/backends/raw_backend.pyx ./maro/backends/frame.pyx -3 -E FRAME_BACKEND=NUMPY,NODES_MEMORY_LAYOUT=ONE_BLOCK -X embedsignature=True
python ./scripts/code_gen.py
cython ./maro/backends/backend.pyx ./maro/backends/np_backend.pyx ./maro/backends/raw_backend.pyx ./maro/backends/frame.pyx --cplus -3 -E NODES_MEMORY_LAYOUT=ONE_BLOCK -X embedsignature=True

- name: Build maro inplace
run: |
Expand Down
67 changes: 55 additions & 12 deletions docs/source/key_components/data_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,14 @@ the backend language for improving the execution reference. What's more,
the backend store is a pluggable design, user can choose different backend
implementation based on their real performance requirement and device limitation.

Currenty there are two data model backend implementation: static and dynamic.
Static implementation used Numpy as its data store, do not support dynamic
attribute length, the advance of this version is that its memory size is same as its
declaration.
Dynamic implementation is hand-craft c++.
It supports dynamic attribute (list) which will take more memory than the static implementation
but is faster for querying snapshot states and accessing attributes.

Key Concepts
------------

Expand All @@ -28,6 +36,12 @@ As shown in the figure above, there are some key concepts in the data model:
The ``slot`` number can indicate the attribute values (e.g. the three different
container types in CIM scenario) or the detailed categories (e.g. the ten specific
products in the `Use Case <#use-case>`_ below). By default, the ``slot`` value is one.
As for the dynamic backend implementation, an attribute can be marked as is_list or is_const to identify
it is a list attribute or a const attribute respectively.
A list attribute's default slot number is 0, and can be increased as demand, max number is 2^32.
A const attribute is designed for the value that will not change after initialization,
e.g. the capacity of a port/station. The value is shared between frames and will not be copied
when taking a snapshot.
* **Frame** is the collection of all nodes in the environment. The historical frames
present the aggregated state of the environment during a specific period, while
the current frame hosts the latest state of the environment at the current time point.
Expand All @@ -41,6 +55,7 @@ Use Case

.. code-block:: python

from maro.backends.backend import AttributeType
from maro.backends.frame import node, NodeAttribute, NodeBase, FrameNode, FrameBase

TOTAL_PRODUCT_CATEGORIES = 10
Expand All @@ -51,8 +66,8 @@ Use Case

@node("warehouse")
class Warehouse(NodeBase):
inventories = NodeAttribute("i", TOTAL_PRODUCT_CATEGORIES)
shortages = NodeAttribute("i", TOTAL_PRODUCT_CATEGORIES)
inventories = NodeAttribute(AttributeType.Int, TOTAL_PRODUCT_CATEGORIES)
shortages = NodeAttribute(AttributeType.Int, TOTAL_PRODUCT_CATEGORIES)

def __init__(self):
self._init_inventories = [100 * (i + 1) for i in range(TOTAL_PRODUCT_CATEGORIES)]
Expand All @@ -65,9 +80,9 @@ Use Case

@node("store")
class Store(NodeBase):
inventories = NodeAttribute("i", TOTAL_PRODUCT_CATEGORIES)
shortages = NodeAttribute("i", TOTAL_PRODUCT_CATEGORIES)
sales = NodeAttribute("i", TOTAL_PRODUCT_CATEGORIES)
inventories = NodeAttribute(AttributeType.Int, TOTAL_PRODUCT_CATEGORIES)
shortages = NodeAttribute(AttributeType.Int, TOTAL_PRODUCT_CATEGORIES)
sales = NodeAttribute(AttributeType.Int, TOTAL_PRODUCT_CATEGORIES)

def __init__(self):
self._init_inventories = [10 * (i + 1) for i in range(TOTAL_PRODUCT_CATEGORIES)]
Expand All @@ -86,7 +101,8 @@ Use Case

def __init__(self):
# If your actual frame number was more than the total snapshot number, the old snapshots would be rolling replaced.
super().__init__(enable_snapshot=True, total_snapshot=TOTAL_SNAPSHOT)
# You can select a backend implementation that will fit your requirement.
super().__init__(enable_snapshot=True, total_snapshot=TOTAL_SNAPSHOT, backend_name="static/dynamic")

* The operations on the retail frame.

Expand Down Expand Up @@ -139,19 +155,34 @@ All supported data types for the attribute of the node:
* - Attribute Data Type
- C Type
- Range
* - i2
- int16_t
* - Attribute.Byte
- char
- -128 .. 127
* - Attribute.UByte
- unsigned char
- 0 .. 255
* - Attribute.Short (i2)
- short
- -32,768 .. 32,767
* - i, i4
* - Attribute.UShort
- unsigned short
- 0 .. 65,535
* - Attribute.Int (i4)
- int32_t
- -2,147,483,648 .. 2,147,483,647
* - i8
* - Attribute.UInt (i4)
- uint32_t
- 0 .. 4,294,967,295
* - Attribute.Long (i8)
- int64_t
- -9,223,372,036,854,775,808 .. 9,223,372,036,854,775,807
* - f
* - Attribute.ULong (i8)
- uint64_t
- 0 .. 18,446,744,073,709,551,615
* - Attribute.Float (f)
- float
- -3.4E38 .. 3.4E38
* - d
* - Attribute.Double (d)
- double
- -1.7E308 .. 1.7E308

Expand Down Expand Up @@ -216,3 +247,15 @@ For better data access, we also provide some advanced features, including:

# Query attribute by frame index list.
states = test_nodes_snapshots[[0, 1, 2]: 0: "int_attribute"]

# The querying states is different between static and dynamic implementation
# Static implementation will return a 1-dim numpy array, as the shape is known according to the parameters.
# Dynamic implementation will return a 4-dim numpy array, that shape is (ticks, node_indices, attributes, slots).
# Usually we can just flatten the state from dynamic implementation, then it will be same as static implementation,
# except for list attributes.
# List attribute only support one tick, one node index and one attribute name to query, cannot mix with normal attributes
states = test_nodes_snapshots[0: 0: "list_attribute"]

# Also with dynamic implementation, we can get the const attributes which is shared between snapshot list, even without
# any snapshot (need to provided one tick for padding).
states = test_nodes_snapshots[0: [0, 1]: ["const_attribute", "const_attribute_2"]]
54 changes: 27 additions & 27 deletions docs/source/key_components/rl_toolkit.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,19 +63,19 @@ Learner and Actor
Scheduler
---------

A ``Scheduler`` is the driver of an episodic learning process. The learner uses the scheduler to repeat the
rollout-training cycle a set number of episodes. For algorithms that require explicit exploration (e.g.,
A ``Scheduler`` is the driver of an episodic learning process. The learner uses the scheduler to repeat the
rollout-training cycle a set number of episodes. For algorithms that require explicit exploration (e.g.,
DQN and DDPG), there are two types of schedules that a learner may follow:

* Static schedule, where the exploration parameters are generated using a pre-defined function of episode
number. See ``LinearParameterScheduler`` and ``TwoPhaseLinearParameterScheduler`` provided in the toolkit
for example.
* Static schedule, where the exploration parameters are generated using a pre-defined function of episode
number. See ``LinearParameterScheduler`` and ``TwoPhaseLinearParameterScheduler`` provided in the toolkit
for example.
* Dynamic schedule, where the exploration parameters for the next episode are determined based on the performance
history. Such a mechanism is possible in our abstraction because the scheduler provides a ``record_performance``
interface that allows it to keep track of roll-out performances.
interface that allows it to keep track of roll-out performances.

Optionally, an early stopping checker may be registered if one wishes to terminate training when certain performance
requirements are satisfied, possibly before reaching the prescribed number of episodes.
Optionally, an early stopping checker may be registered if one wishes to terminate training when certain performance
requirements are satisfied, possibly before reaching the prescribed number of episodes.

Agent Manager
-------------
Expand Down Expand Up @@ -125,11 +125,11 @@ scenario agnostic.
Algorithm
---------

The algorithm is the kernel abstraction of the RL formulation for a real-world problem. Our abstraction
decouples algorithm and model so that an algorithm can exist as an RL paradigm independent of the inner
workings of the models it uses to generate actions or estimate values. For example, the actor-critic
The algorithm is the kernel abstraction of the RL formulation for a real-world problem. Our abstraction
decouples algorithm and model so that an algorithm can exist as an RL paradigm independent of the inner
workings of the models it uses to generate actions or estimate values. For example, the actor-critic
algorithm does not need to concern itself with the structures and optimizing schemes of the actor and
critic models. This decoupling is achieved by the ``LearningModel`` abstraction described below.
critic models. This decoupling is achieved by the ``LearningModel`` abstraction described below.


.. image:: ../images/rl/algorithm.svg
Expand All @@ -153,18 +153,18 @@ Block, NNStack and LearningModel
--------------------------------

MARO provides an abstraction for the underlying models used by agents to form policies and estimate values.
The abstraction consists of a 3-level hierachy formed by ``AbsBlock``, ``NNStack`` and ``LearningModel`` from
The abstraction consists of a 3-level hierachy formed by ``AbsBlock``, ``NNStack`` and ``LearningModel`` from
the bottom up, all of which subclass torch's nn.Module. An ``AbsBlock`` is the smallest structural
unit of an NN-based model. For instance, the ``FullyConnectedBlock`` provided in the toolkit represents a stack
of fully connected layers with features like batch normalization, drop-out and skip connection. An ``NNStack`` is
a composite network that consists of one or more such blocks, each with its own set of network features.
The complete model as used directly by an ``Algorithm`` is represented by a ``LearningModel``, which consists of
one or more task stacks as "heads" and an optional shared stack at the bottom (which serves to produce representations
as input to all task stacks). It also contains one or more optimizers responsible for applying gradient steps to the
trainable parameters within each stack, which is the smallest trainable unit from the perspective of a ``LearningModel``.
The assignment of optimizers is flexible: it is possible to freeze certain stacks while optimizing others. Such an
abstraction presents a unified interface to the algorithm, regardless of how many individual models it requires and how
complex the model architecture might be.
unit of an NN-based model. For instance, the ``FullyConnectedBlock`` provided in the toolkit represents a stack
of fully connected layers with features like batch normalization, drop-out and skip connection. An ``NNStack`` is
a composite network that consists of one or more such blocks, each with its own set of network features.
The complete model as used directly by an ``Algorithm`` is represented by a ``LearningModel``, which consists of
one or more task stacks as "heads" and an optional shared stack at the bottom (which serves to produce representations
as input to all task stacks). It also contains one or more optimizers responsible for applying gradient steps to the
trainable parameters within each stack, which is the smallest trainable unit from the perspective of a ``LearningModel``.
The assignment of optimizers is flexible: it is possible to freeze certain stacks while optimizing others. Such an
abstraction presents a unified interface to the algorithm, regardless of how many individual models it requires and how
complex the model architecture might be.

.. image:: ../images/rl/learning_model.svg
:target: ../images/rl/learning_model.svg
Expand Down Expand Up @@ -196,11 +196,11 @@ And performing one gradient step is simply:
Explorer
-------

MARO provides an abstraction for exploration in RL. Some RL algorithms such as DQN and DDPG require
explicit exploration, the extent of which is usually determined by a set of parameters whose values
MARO provides an abstraction for exploration in RL. Some RL algorithms such as DQN and DDPG require
explicit exploration, the extent of which is usually determined by a set of parameters whose values
are generated by the scheduler. The ``AbsExplorer`` class is designed to cater to these needs. Simple
exploration schemes, such as ``EpsilonGreedyExplorer`` for discrete action space and ``UniformNoiseExplorer``
and ``GaussianNoiseExplorer`` for continuous action space, are provided in the toolkit.
exploration schemes, such as ``EpsilonGreedyExplorer`` for discrete action space and ``UniformNoiseExplorer``
and ``GaussianNoiseExplorer`` for continuous action space, are provided in the toolkit.

As an example, the exploration for DQN may be carried out with the aid of an ``EpsilonGreedyExplorer``:

Expand Down
Loading