Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry pick from sc_refinement #527

Merged
merged 2 commits into from
May 18, 2022
Merged

Cherry pick from sc_refinement #527

merged 2 commits into from
May 18, 2022

Conversation

lihuoran
Copy link
Contributor

Description

Linked issue(s)/Pull request(s)

Type of Change

  • Non-breaking bug fix
  • Breaking bug fix
  • New feature
  • Test
  • Doc update
  • Docker update

Related Component

  • Simulation toolkit
  • RL toolkit
  • Distributed toolkit

Has Been Tested

  • OS:
    • Windows
    • Mac OS
    • Linux
  • Python version:
    • 3.6
    • 3.7
  • Key information snapshot(s):

Needs Follow Up Actions

  • New release package
  • New docker image

Checklist

  • Add/update the related comments
  • Add/update the related tests
  • Add/update the related documentations
  • Update the dependent downstream modules usage

@lihuoran lihuoran requested review from Jinyu-W and v-heli May 18, 2022 01:57
@Jinyu-W Jinyu-W merged commit 10b9c02 into v0.3 May 18, 2022
@Jinyu-W Jinyu-W deleted the cherry_pick_sub_branch branch May 18, 2022 07:13
Jinyu-W added a commit that referenced this pull request Jun 1, 2022
…sponding updates. (#539)

* refined proxy coding style

* updated images and refined doc

* updated images

* updated CIM-AC example

* refined proxy retry logic

* call policy update only for AbsCorePolicy

* add limitation of AbsCorePolicy in Actor.collect()

* refined actor to return only experiences for policies that received new experiences

* fix MsgKey issue in rollout_manager

* fix typo in learner

* call exit function for parallel rollout manager

* update supply chain example distributed training scripts

* 1. moved exploration scheduling to rollout manager; 2. fixed bug in lr schedule registration in core model; 3. added parallel policy manager prorotype

* reformat render

* fix supply chain business engine action type problem

* reset supply chain example render figsize from 4 to 3

* Add render to all modes of supply chain example

* fix or policy typos

* 1. added parallel policy manager prototype; 2. used training ep for evaluation episodes

* refined parallel policy manager

* updated rl/__init__/py

* fixed lint issues and CIM local learner bugs

* deleted unwanted supply_chain test files

* revised default config for cim-dqn

* removed test_store.py as it is no longer needed

* 1. changed Actor class to rollout_worker function; 2. renamed algorithm to algorithms

* updated figures

* removed unwanted import

* refactored CIM-DQN example

* added MultiProcessRolloutManager and MultiProcessTrainingManager

* updated doc

* lint issue fix

* lint issue fix

* fixed import formatting

* [Feature] Prioritized Experience Replay (#355)

* added prioritized experience replay

* deleted unwanted supply_chain test files

* fixed import order

* import fix

* fixed lint issues

* fixed import formatting

* added note in docstring that rank-based PER has yet to be implemented

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* rm AbsDecisionGenerator

* small fixes

* bug fix

* reorganized training folder structure

* fixed lint issues

* fixed lint issues

* policy manager refined

* lint fix

* restructured CIM-dqn sync code

* added policy version index and used it as a measure of experience staleness

* lint issue fix

* lint issue fix

* switched log_dir and proxy_kwargs order

* cim example refinement

* eval schedule sorted only when it's a list

* eval schedule sorted only when it's a list

* update sc env wrapper

* added docker scripts for cim-dqn

* refactored example folder structure and added workflow templates

* fixed lint issues

* fixed lint issues

* fixed template bugs

* removed unused imports

* refactoring sc in progress

* simplified cim meta

* fixed build.sh path bug

* template refinement

* deleted obsolete svgs

* updated learner logs

* minor edits

* refactored templates for easy merge with async PR

* added component names for rollout manager and policy manager

* fixed incorrect position to add last episode to eval schedule

* added max_lag option in templates

* formatting edit in docker_compose_yml script

* moved local learner and early stopper outside sync_tools

* refactored rl toolkit folder structure

* refactored rl toolkit folder structure

* moved env_wrapper and agent_wrapper inside rl/learner

* refined scripts

* fixed typo in script

* changes needed for running sc

* removed unwanted imports

* config change for testing sc scenario

* changes for perf testing

* Asynchronous Training (#364)

* remote inference code draft

* changed actor to rollout_worker and updated init files

* removed unwanted import

* updated inits

* more async code

* added async scripts

* added async training code & scripts for CIM-dqn

* changed async to async_tools to avoid conflict with python keyword

* reverted unwanted change to dockerfile

* added doc for policy server

* addressed PR comments and fixed a bug in docker_compose_yml.py

* fixed lint issue

* resolved PR comment

* resolved merge conflicts

* added async templates

* added proxy.close() for actor and policy_server

* fixed incorrect position to add last episode to eval schedule

* reverted unwanted changes

* added missing async files

* rm unwanted echo in kill.sh

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* renamed sync to synchronous and async to asynchronous to avoid conflict with keyword

* added missing policy version increment in LocalPolicyManager

* refined rollout manager recv logic

* removed a debugging print

* added sleep in distributed launcher to avoid hanging

* updated api doc and rl toolkit doc

* refined dynamic imports using importlib

* 1. moved policy update triggers to policy manager; 2. added version control in policy manager

* fixed a few bugs and updated cim RL example

* fixed a few more bugs

* added agent wrapper instantiation to workflows

* added agent wrapper instantiation to workflows

* removed abs_block and added max_prob option for DiscretePolicyNet and DiscreteACNet

* fixed incorrect get_ac_policy signature for CIM

* moved exploration inside core policy

* added state to exploration call to support context-dependent exploration

* separated non_rl_policy_index and rl_policy_index in workflows

* modified sc example code according to workflow changes

* modified sc example code according to workflow changes

* added replay_agent_ids parameter to get_env_func for RL examples

* fixed a few bugs

* added maro/simulator/scenarios/supply_chain as bind mount

* added post-step, post-collect, post-eval and post-update callbacks

* fixed lint issues

* fixed lint issues

* moved instantiation of policy manager inside simple learner

* fixed env_wrapper get_reward signature

* minor edits

* removed get_eperience kwargs from env_wrapper

* 1. renamed step_callback to post_step in env_wrapper; 2. added get_eval_env_func to RL workflows

* added rollout exp disribution option in RL examples

* removed unwanted files

* 1. made logger internal in learner; 2 removed logger creation in abs classes

* checked out supply chain test files from v0.2_sc

* 1. added missing model.eval() to choose_action; 2.added entropy features to AC

* fixed a bug in ac entropy

* abbreviated coefficient to coeff

* removed -dqn from job name in rl example config

* added tmp patch to dev.df

* renamed image name for running rl examples

* added get_loss interface for core policies

* added policy manager in rl_toolkit.rst

* 1. env_wrapper bug fix; 2. policy manager update logic refinement

* refactored policy and algorithms

* policy interface redesigned

* refined policy interfaces

* fixed typo

* fixed bugs in refactored policy interface

* fixed some bugs

* refactoring in progress

* policy interface and policy manager redesigned

* 1. fixed bugs in ac and pg; 2. fixed bugs rl workflow scripts

* fixed bug in distributed policy manager

* fixed lint issues

* fixed lint issues

* added scipy in setup

* 1. trimmed rollout manager code; 2. added option to docker scripts

* updated api doc for policy manager

* 1. simplified rl/learning code structure; 2. fixed bugs in rl example docker script

* 1. simplified rl example structure; 2. fixed lint issues

* further rl toolkit code simplifications

* more numpy-based optimization in RL toolkit

* moved replay buffer inside policy

* bug fixes

* numpy optimization and associated refactoring

* extracted shaping logic out of env_sampler

* fixed bug in CIM shaping and lint issues

* preliminary implemetation of parallel batch inference

* fixed bug in ddpg transition recording

* put get_state, get_env_actions, get_reward back in EnvSampler

* simplified exploration and core model interfaces

* bug fixes and doc update

* added improve() interface for RLPolicy for single-thread support

* fixed simple policy manager bug

* updated doc, rst, notebook

* updated notebook

* fixed lint issues

* fixed entropy bugs in ac.py

* reverted to simple policy manager as default

* 1. unified single-thread and distributed mode in learning_loop.py; 2. updated api doc for algorithms and rst for rl toolkit

* fixed lint issues and updated rl toolkit images

* removed obsolete images

* added back agent2policy for general workflow use

* V0.2 rl refinement dist (#377)

* Support `slice` operation in ExperienceSet

* Support naive distributed policy training by proxy

* Dynamically allocate trainers according to number of experience

* code check

* code check

* code check

* Fix a bug in distributed trianing with no gradient

* Code check

* Move Back-Propagation from trainer to policy_manager and extract trainer-allocation strategy

* 1.call allocate_trainer() at first of update(); 2.refine according to code review

* Code check

* Refine code with new interface

* Update docs of PolicyManger and ExperienceSet

* Add images for rl_toolkit docs

* Update diagram of PolicyManager

* Refine with new interface

* Extract allocation strategy into `allocation_strategy.py`

* add `distributed_learn()` in policies for data-parallel training

* Update doc of RL_toolkit

* Add gradient workers for data-parallel

* Refine code and update docs

* Lint check

* Refine by comments

* Rename `trainer` to `worker`

* Rename `distributed_learn` to `learn_with_data_parallel`

* Refine allocator and remove redundant code in policy_manager

* remove arugments in allocate_by_policy and so on

* added checkpointing for simple and multi-process policy managers

* 1. bug fixes in checkpointing; 2. removed version and max_lag in rollout manager

* added missing set_state and get_state for CIM policies

* removed blank line

* updated RL workflow README

* Integrate `data_parallel` arguments into `worker_allocator` (#402)

* 1. simplified workflow config; 2. added comments to CIM shaping

* lint issue fix

* 1. added algorithm type setting in CIM config; 2. added try-except clause for initial policy state loading

* 1. moved post_step callback inside env sampler; 2. updated README for rl workflows

* refined READEME for CIM

* VM scheduling with RL (#375)

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* added DQN

* added get_experiences func for ac in vm scheduling

* added post_step callback to env wrapper

* moved Aiming's tracking and plotting logic into callbacks

* added eval env wrapper

* renamed AC config variable name for VM

* vm scheduling RL code finished

* updated README

* fixed various bugs and hard coding for vm_scheduling

* uncommented callbacks for VM scheduling

* Minor revision for better code style

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* vm example refactoring

* fixed bugs in vm_scheduling

* removed unwanted files from cim dir

* reverted to simple policy manager as default

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* resolved rebase conflicts

* fixed bugs in vm_scheduling

* added get_state and set_state to vm_scheduling policy models

* updated README for vm_scheduling with RL

Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* SC refinement (#397)

* Refine test scripts & pending_order_daily logic

* Refactor code for better code style: complete type hint, correct typos, remove unused items.

Refactor code for better code style: complete type hint, correct typos, remove unused items.

* Polish test_supply_chain.py

* update import format

* Modify vehicle steps logic & remove outdated test case

* Optimize imports

* Optimize imports

* Lint error

* Lint error

* Lint error

* Add SupplyChainAction

* Lint error

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* refined workflow scripts

* fixed bug in ParallelAgentWrapper

* 1. fixed lint issues; 2. refined main script in workflows

* lint issue fix

* restored default config for rl example

* Update rollout.py

* refined env var processing in policy manager workflow

* added hasattr check in agent wrapper

* updated docker_compose_yml.py

* Minor refinement

* Minor PR. Prepare to merge latest master branch into v0.3 branch. (#412)

* Prepare to merge master_mirror

* Lint error

* Minor

* Merge latest master into v0.3 (#426)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](#352) and [314](#314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* Minor

* Remove docs/source/examples/multi_agent_dqn_cim.rst

* Update .gitignore

* Update .gitignore

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>

* Change `Env.set_seed()` logic (#456)

* Change Env.set_seed() logic

* Redesign CIM reset logic; fix lint issues;

* Lint

* Seed type assertion

* Remove all SC related files (#473)

* RL Toolkit V3 (#471)

* added daemon=True for multi-process rollout, policy manager and inference

* removed obsolete files

* [REDO][PR#406]V0.2 rl refinement taskq (#408)

* Add a usable task_queue

* Rename some variables

* 1. Add ; 2. Integrate  related files; 3. Remove

* merge `data_parallel` and `num_grad_workers` into `data_parallelism`

* Fix bugs in docker_compose_yml.py and Simple/Multi-process mode.

* Move `grad_worker` into marl/rl/workflows

* 1.Merge data_parallel and num_workers into data_parallelism in config; 2.Assign recently used workers as possible in task_queue.

* Refine code and update docs of `TaskQueue`

* Support priority for tasks in `task_queue`

* Update diagram of policy manager and task queue.

* Add configurable `single_task_limit` and correct docstring about `data_parallelism`

* Fix lint errors in `supply chain`

* RL policy redesign (V2) (#405)

* Drafi v2.0 for V2

* Polish models with more comments

* Polish policies with more comments

* Lint

* Lint

* Add developer doc for models.

* Add developer doc for policies.

* Remove policy manager V2 since it is not used and out-of-date

* Lint

* Lint

* refined messy workflow code

* merged 'scenario_dir' and 'scenario' in rl config

* 1. refined env_sampler and agent_wrapper code; 2. added docstrings for env_sampler methods

* 1. temporarily renamed RLPolicy from polivy_v2 to RLPolicyV2; 2. merged env_sampler and env_sampler_v2

* merged cim and cim_v2

* lint issue fix

* refined logging logic

* lint issue fix

* reversed unwanted changes

* .

.

.

.

ReplayMemory & IndexScheduler

ReplayMemory & IndexScheduler

.

MultiReplayMemory

get_actions_with_logps

EnvSampler on the road

EnvSampler

Minor

* LearnerManager

* Use batch to transfer data & add SHAPE_CHECK_FLAG

* Rename learner to trainer

* Add property for policy._is_exploring

* CIM test scenario for V3. Manual test passed. Next step: run it, make it works.

* env_sampler.py could run

* env_sampler refine on the way

* First runnable version done

* AC could run, but the result is bad. Need to check the logic

* Refine abstract method & shape check error info.

* Docs

* Very detailed compare. Try again.

* AC done

* DQN check done

* Minor

* DDPG, not tested

* Minors

* A rough draft of MAAC

* Cannot use CIM as the multi-agent scenario.

* Minor

* MAAC refinement on the way

* Remove ActionWithAux

* Refine batch & memory

* MAAC example works

* Reproduce-able fix. Policy share between env_sampler and trainer_manager.

* Detail refinement

* Simplify the user configed workflow

* Minor

* Refine example codes

* Minor polishment

* Migrate rollout_manager to V3

* Error on the way

* Redesign torch.device management

* Rl v3 maddpg (#418)

* Add MADDPG trainer

* Fit independent critics and shared critic modes.

* Add a new property: num_policies

* Lint

* Fix a bug in `sum(rewards)`

* Rename `MADDPG` to `DiscreteMADDPG` and fix type hint.

* Rename maddpg in examples.

* Preparation for data parallel (#420)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* Rename train worker to train ops; add placeholder for abstract methods;

* Lint

Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>

* [DRAFT] distributed training pipeline based on RL Toolkit V3 (#450)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* dsitributed training pipeline draft

* added temporary test files for review purposes

* Several code style refinements (#451)

* Polish rl_v3/utils/

* Polish rl_v3/distributed/

* Polish rl_v3/policy_trainer/abs_trainer.py

* fixed merge conflicts

* unified sync and async interfaces

* refactored rl_v3; refinement in progress

* Finish the runnable pipeline under new design

* Remove outdated files; refine class names; optimize imports;

* Lint

* Minor maddpg related refinement

* Lint

Co-authored-by: Default <huo53926@126.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Miner bug fix

* Coroutine-related bug fix ("get_policy_state") (#452)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* deleted unwanted folder

* removed unwanted changes

* resolved PR452 comments

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Quick fix

* Redesign experience recording logic (#453)

* Two not important fix

* Temp draft. Prepare to WFH

* Done

* Lint

* Lint

* Calculating advantages / returns (#454)

* V1.0

* Complete DDPG

* Rl v3 hanging issue fix (#455)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* unified worker interfaces

* recovered some files

* dist training + cli code move

* fixed bugs

* added retry logic to client

* 1. refactored CIM with various algos; 2. lint

* lint

* added type hint

* removed some logs

* lint

* Make main.py more IDE friendly

* Make main.py more IDE friendly

* Lint

* Final test & format. Ready to merge.

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* Rl v3 parallel rollout (#457)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* unified worker interfaces

* recovered some files

* dist training + cli code move

* fixed bugs

* added retry logic to client

* 1. refactored CIM with various algos; 2. lint

* lint

* added type hint

* removed some logs

* lint

* Make main.py more IDE friendly

* Make main.py more IDE friendly

* Lint

* load balancing dispatcher

* added parallel rollout

* lint

* Tracker variable type issue; rename to env_sampler_creator;

* Rl v3 parallel rollout follow ups (#458)

* AbsWorker & AbsDispatcher

* Pass env idx to AbsTrainer.record() method, and let the trainer to decide how to record experiences sampled from different worlds.

* Fix policy_creator reuse bug

* Format code

* Merge AbsTrainerManager & SimpleTrainerManager

* AC test passed

* Lint

* Remove AbsTrainer.build() method. Put all initialization operations into __init__

* Redesign AC preprocess batches logic

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* MADDPG performance bug fix (#459)

* Fix MARL (MADDPG) terminal recording bug; some other minor refinements;

* Restore Trainer.build() method

* Calculate latest action in the get_actor_grad method in MADDPG.

* Share critic bug fix

* Rl v3 example update (#461)

* updated vm_scheduling example and cim notebook

* fixed bugs in vm_scheduling

* added local train method

* bug fix

* modified async client logic to fix hidden issue

* reverted to default config

* fixed PR comments and some bugs

* removed hardcode

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Done (#462)

* Rl v3 load save (#463)

* added load/save feature

* fixed some bugs

* reverted unwanted changes

* lint

* fixed PR comments

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL Toolkit data parallelism revamp & config utils (#464)

* added load/save feature

* fixed some bugs

* reverted unwanted changes

* lint

* fixed PR comments

* 1. fixed data parallelism issue; 2. added config validator; 3. refactored cli local

* 1. fixed rollout exit issue; 2. refined config

* removed config file from example

* fixed lint issues

* fixed lint issues

* added main.py under examples/rl

* fixed lint issues

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL doc string (#465)

* First rough draft

* Minors

* Reformat

* Lint

* Resolve PR comments

* Rl type specific env getter (#466)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* fixed bugs

* fixed bugs

* bug fixes

* lint

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Example bug fix

* Optimize parser.py

* Resolve PR comments

* Rl config doc (#467)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* added detailed doc

* lint

* wording refined

* resolved some PR comments

* resolved more PR comments

* typo fix

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL online doc (#469)

* Model, policy, trainer

* RL workflows and env sampler doc in RST (#468)

* First rough draft

* Minors

* Reformat

* Lint

* Resolve PR comments

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* Rl type specific env getter (#466)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* fixed bugs

* fixed bugs

* bug fixes

* lint

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Example bug fix

* Optimize parser.py

* Resolve PR comments

* added detailed doc

* lint

* wording refined

* resolved some PR comments

* rewriting rl toolkit rst

* resolved more PR comments

* typo fix

* updated rst

Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: Default <huo53926@126.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Finish docs/source/key_components/rl_toolkit.rst

* API doc

* RL online doc image fix (#470)

* resolved some PR comments

* fix

* fixed PR comments

* added numfig=True setting in conf.py for sphinx

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Resolve PR comments

* Add example github link

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Rl v3 pr comment resolution (#474)

* added load/save feature

* 1. resolved pr comments; 2. reverted maro/cli/k8s

* fixed some bugs

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* RL renaming v2 (#476)

* Change all Logger in RL to LoggerV2

* TrainerManager => TrainingManager

* Add Trainer suffix to all algorithms

* Finish docs

* Update interface names

* Minor fix

* Cherry pick latest RL (#498)

* Cherry pick

* Remove SC related files

* Cherry pick RL changes from `sc_refinement` (latest commit: `2a4869`) (#509)

* Cherry pick RL changes from sc_refinement (2a4869)

* Limit time display precision

* RL incremental refactor (#501)

* Refactor rollout logic. Allow multiple sampling in one epoch, so that we can generate more data for training.

AC & PPO for continuous action policy; refine AC & PPO logic.

Cherry pick RL changes from GYM-DDPG

Cherry pick RL changes from GYM-SAC

Minor error in doc string

* Add min_n_sample in template and parser

* Resolve PR comments. Fix a minor issue in SAC.

* RL component bundle (#513)

* CIM passed

* Update workers

* Refine annotations

* VM passed

* Code formatting.

* Minor import loop issue

* Pass batch in PPO again

* Remove Scenario

* Complete docs

* Minor

* Remove segment

* Optimize logic in RLComponentBundle

* Resolve PR comments

* Move 'post methods from RLComponenetBundle to EnvSampler

* Add method to get mapping of available tick to frame index (#415)

* add method to get mapping of available tick to frame index

* fix lint issue

* fix naming issue

* Cherry pick from sc_refinement (#527)

* Cherry pick from sc_refinement

* Cherry pick from sc_refinement

* Refine `terminal` / `next_agent_state` logic (#531)

* Optimize RL toolkit

* Fix bug in terminal/next_state generation

* Rewrite terminal/next_state logic again

* Minor renaming

* Minor bug fix

* Resolve PR comments

* Merge master into v0.3 (#536)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](#352) and [314](#314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* add branch v0.3 to github workflow

* update github test workflow

* Update requirements.dev.txt (#444)

Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact.

* Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460)

Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](ipython/ipython@7.10.1...7.16.3)

---
updated-dependencies:
- dependency-name: ipython
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add & sort requirements.dev.txt

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
Co-authored-by: solosilence <abhishekkr23rs@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Remove random_config.py

* Remove test_trajectory_utils.py

* Pass tests

* Update rl docs

* Remove python 3.6 in test

* Update docs

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wang.Jinyu <jinywan@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: GQ.Chen <675865907@qq.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: solosilence <abhishekkr23rs@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Jinyu-W added a commit that referenced this pull request Jun 9, 2022
* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](#352) and [314](#314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* add branch v0.3 to github workflow

* update github test workflow

* Update requirements.dev.txt (#444)

Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact.

* Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460)

Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](ipython/ipython@7.10.1...7.16.3)

---
updated-dependencies:
- dependency-name: ipython
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* update github woorkflow config

* MARO v0.3: a new design of RL Toolkit, CLI refactorization, and corresponding updates. (#539)

* refined proxy coding style

* updated images and refined doc

* updated images

* updated CIM-AC example

* refined proxy retry logic

* call policy update only for AbsCorePolicy

* add limitation of AbsCorePolicy in Actor.collect()

* refined actor to return only experiences for policies that received new experiences

* fix MsgKey issue in rollout_manager

* fix typo in learner

* call exit function for parallel rollout manager

* update supply chain example distributed training scripts

* 1. moved exploration scheduling to rollout manager; 2. fixed bug in lr schedule registration in core model; 3. added parallel policy manager prorotype

* reformat render

* fix supply chain business engine action type problem

* reset supply chain example render figsize from 4 to 3

* Add render to all modes of supply chain example

* fix or policy typos

* 1. added parallel policy manager prototype; 2. used training ep for evaluation episodes

* refined parallel policy manager

* updated rl/__init__/py

* fixed lint issues and CIM local learner bugs

* deleted unwanted supply_chain test files

* revised default config for cim-dqn

* removed test_store.py as it is no longer needed

* 1. changed Actor class to rollout_worker function; 2. renamed algorithm to algorithms

* updated figures

* removed unwanted import

* refactored CIM-DQN example

* added MultiProcessRolloutManager and MultiProcessTrainingManager

* updated doc

* lint issue fix

* lint issue fix

* fixed import formatting

* [Feature] Prioritized Experience Replay (#355)

* added prioritized experience replay

* deleted unwanted supply_chain test files

* fixed import order

* import fix

* fixed lint issues

* fixed import formatting

* added note in docstring that rank-based PER has yet to be implemented

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* rm AbsDecisionGenerator

* small fixes

* bug fix

* reorganized training folder structure

* fixed lint issues

* fixed lint issues

* policy manager refined

* lint fix

* restructured CIM-dqn sync code

* added policy version index and used it as a measure of experience staleness

* lint issue fix

* lint issue fix

* switched log_dir and proxy_kwargs order

* cim example refinement

* eval schedule sorted only when it's a list

* eval schedule sorted only when it's a list

* update sc env wrapper

* added docker scripts for cim-dqn

* refactored example folder structure and added workflow templates

* fixed lint issues

* fixed lint issues

* fixed template bugs

* removed unused imports

* refactoring sc in progress

* simplified cim meta

* fixed build.sh path bug

* template refinement

* deleted obsolete svgs

* updated learner logs

* minor edits

* refactored templates for easy merge with async PR

* added component names for rollout manager and policy manager

* fixed incorrect position to add last episode to eval schedule

* added max_lag option in templates

* formatting edit in docker_compose_yml script

* moved local learner and early stopper outside sync_tools

* refactored rl toolkit folder structure

* refactored rl toolkit folder structure

* moved env_wrapper and agent_wrapper inside rl/learner

* refined scripts

* fixed typo in script

* changes needed for running sc

* removed unwanted imports

* config change for testing sc scenario

* changes for perf testing

* Asynchronous Training (#364)

* remote inference code draft

* changed actor to rollout_worker and updated init files

* removed unwanted import

* updated inits

* more async code

* added async scripts

* added async training code & scripts for CIM-dqn

* changed async to async_tools to avoid conflict with python keyword

* reverted unwanted change to dockerfile

* added doc for policy server

* addressed PR comments and fixed a bug in docker_compose_yml.py

* fixed lint issue

* resolved PR comment

* resolved merge conflicts

* added async templates

* added proxy.close() for actor and policy_server

* fixed incorrect position to add last episode to eval schedule

* reverted unwanted changes

* added missing async files

* rm unwanted echo in kill.sh

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* renamed sync to synchronous and async to asynchronous to avoid conflict with keyword

* added missing policy version increment in LocalPolicyManager

* refined rollout manager recv logic

* removed a debugging print

* added sleep in distributed launcher to avoid hanging

* updated api doc and rl toolkit doc

* refined dynamic imports using importlib

* 1. moved policy update triggers to policy manager; 2. added version control in policy manager

* fixed a few bugs and updated cim RL example

* fixed a few more bugs

* added agent wrapper instantiation to workflows

* added agent wrapper instantiation to workflows

* removed abs_block and added max_prob option for DiscretePolicyNet and DiscreteACNet

* fixed incorrect get_ac_policy signature for CIM

* moved exploration inside core policy

* added state to exploration call to support context-dependent exploration

* separated non_rl_policy_index and rl_policy_index in workflows

* modified sc example code according to workflow changes

* modified sc example code according to workflow changes

* added replay_agent_ids parameter to get_env_func for RL examples

* fixed a few bugs

* added maro/simulator/scenarios/supply_chain as bind mount

* added post-step, post-collect, post-eval and post-update callbacks

* fixed lint issues

* fixed lint issues

* moved instantiation of policy manager inside simple learner

* fixed env_wrapper get_reward signature

* minor edits

* removed get_eperience kwargs from env_wrapper

* 1. renamed step_callback to post_step in env_wrapper; 2. added get_eval_env_func to RL workflows

* added rollout exp disribution option in RL examples

* removed unwanted files

* 1. made logger internal in learner; 2 removed logger creation in abs classes

* checked out supply chain test files from v0.2_sc

* 1. added missing model.eval() to choose_action; 2.added entropy features to AC

* fixed a bug in ac entropy

* abbreviated coefficient to coeff

* removed -dqn from job name in rl example config

* added tmp patch to dev.df

* renamed image name for running rl examples

* added get_loss interface for core policies

* added policy manager in rl_toolkit.rst

* 1. env_wrapper bug fix; 2. policy manager update logic refinement

* refactored policy and algorithms

* policy interface redesigned

* refined policy interfaces

* fixed typo

* fixed bugs in refactored policy interface

* fixed some bugs

* refactoring in progress

* policy interface and policy manager redesigned

* 1. fixed bugs in ac and pg; 2. fixed bugs rl workflow scripts

* fixed bug in distributed policy manager

* fixed lint issues

* fixed lint issues

* added scipy in setup

* 1. trimmed rollout manager code; 2. added option to docker scripts

* updated api doc for policy manager

* 1. simplified rl/learning code structure; 2. fixed bugs in rl example docker script

* 1. simplified rl example structure; 2. fixed lint issues

* further rl toolkit code simplifications

* more numpy-based optimization in RL toolkit

* moved replay buffer inside policy

* bug fixes

* numpy optimization and associated refactoring

* extracted shaping logic out of env_sampler

* fixed bug in CIM shaping and lint issues

* preliminary implemetation of parallel batch inference

* fixed bug in ddpg transition recording

* put get_state, get_env_actions, get_reward back in EnvSampler

* simplified exploration and core model interfaces

* bug fixes and doc update

* added improve() interface for RLPolicy for single-thread support

* fixed simple policy manager bug

* updated doc, rst, notebook

* updated notebook

* fixed lint issues

* fixed entropy bugs in ac.py

* reverted to simple policy manager as default

* 1. unified single-thread and distributed mode in learning_loop.py; 2. updated api doc for algorithms and rst for rl toolkit

* fixed lint issues and updated rl toolkit images

* removed obsolete images

* added back agent2policy for general workflow use

* V0.2 rl refinement dist (#377)

* Support `slice` operation in ExperienceSet

* Support naive distributed policy training by proxy

* Dynamically allocate trainers according to number of experience

* code check

* code check

* code check

* Fix a bug in distributed trianing with no gradient

* Code check

* Move Back-Propagation from trainer to policy_manager and extract trainer-allocation strategy

* 1.call allocate_trainer() at first of update(); 2.refine according to code review

* Code check

* Refine code with new interface

* Update docs of PolicyManger and ExperienceSet

* Add images for rl_toolkit docs

* Update diagram of PolicyManager

* Refine with new interface

* Extract allocation strategy into `allocation_strategy.py`

* add `distributed_learn()` in policies for data-parallel training

* Update doc of RL_toolkit

* Add gradient workers for data-parallel

* Refine code and update docs

* Lint check

* Refine by comments

* Rename `trainer` to `worker`

* Rename `distributed_learn` to `learn_with_data_parallel`

* Refine allocator and remove redundant code in policy_manager

* remove arugments in allocate_by_policy and so on

* added checkpointing for simple and multi-process policy managers

* 1. bug fixes in checkpointing; 2. removed version and max_lag in rollout manager

* added missing set_state and get_state for CIM policies

* removed blank line

* updated RL workflow README

* Integrate `data_parallel` arguments into `worker_allocator` (#402)

* 1. simplified workflow config; 2. added comments to CIM shaping

* lint issue fix

* 1. added algorithm type setting in CIM config; 2. added try-except clause for initial policy state loading

* 1. moved post_step callback inside env sampler; 2. updated README for rl workflows

* refined READEME for CIM

* VM scheduling with RL (#375)

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* added DQN

* added get_experiences func for ac in vm scheduling

* added post_step callback to env wrapper

* moved Aiming's tracking and plotting logic into callbacks

* added eval env wrapper

* renamed AC config variable name for VM

* vm scheduling RL code finished

* updated README

* fixed various bugs and hard coding for vm_scheduling

* uncommented callbacks for VM scheduling

* Minor revision for better code style

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* vm example refactoring

* fixed bugs in vm_scheduling

* removed unwanted files from cim dir

* reverted to simple policy manager as default

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* resolved rebase conflicts

* fixed bugs in vm_scheduling

* added get_state and set_state to vm_scheduling policy models

* updated README for vm_scheduling with RL

Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* SC refinement (#397)

* Refine test scripts & pending_order_daily logic

* Refactor code for better code style: complete type hint, correct typos, remove unused items.

Refactor code for better code style: complete type hint, correct typos, remove unused items.

* Polish test_supply_chain.py

* update import format

* Modify vehicle steps logic & remove outdated test case

* Optimize imports

* Optimize imports

* Lint error

* Lint error

* Lint error

* Add SupplyChainAction

* Lint error

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* refined workflow scripts

* fixed bug in ParallelAgentWrapper

* 1. fixed lint issues; 2. refined main script in workflows

* lint issue fix

* restored default config for rl example

* Update rollout.py

* refined env var processing in policy manager workflow

* added hasattr check in agent wrapper

* updated docker_compose_yml.py

* Minor refinement

* Minor PR. Prepare to merge latest master branch into v0.3 branch. (#412)

* Prepare to merge master_mirror

* Lint error

* Minor

* Merge latest master into v0.3 (#426)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](#352) and [314](#314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* Minor

* Remove docs/source/examples/multi_agent_dqn_cim.rst

* Update .gitignore

* Update .gitignore

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>

* Change `Env.set_seed()` logic (#456)

* Change Env.set_seed() logic

* Redesign CIM reset logic; fix lint issues;

* Lint

* Seed type assertion

* Remove all SC related files (#473)

* RL Toolkit V3 (#471)

* added daemon=True for multi-process rollout, policy manager and inference

* removed obsolete files

* [REDO][PR#406]V0.2 rl refinement taskq (#408)

* Add a usable task_queue

* Rename some variables

* 1. Add ; 2. Integrate  related files; 3. Remove

* merge `data_parallel` and `num_grad_workers` into `data_parallelism`

* Fix bugs in docker_compose_yml.py and Simple/Multi-process mode.

* Move `grad_worker` into marl/rl/workflows

* 1.Merge data_parallel and num_workers into data_parallelism in config; 2.Assign recently used workers as possible in task_queue.

* Refine code and update docs of `TaskQueue`

* Support priority for tasks in `task_queue`

* Update diagram of policy manager and task queue.

* Add configurable `single_task_limit` and correct docstring about `data_parallelism`

* Fix lint errors in `supply chain`

* RL policy redesign (V2) (#405)

* Drafi v2.0 for V2

* Polish models with more comments

* Polish policies with more comments

* Lint

* Lint

* Add developer doc for models.

* Add developer doc for policies.

* Remove policy manager V2 since it is not used and out-of-date

* Lint

* Lint

* refined messy workflow code

* merged 'scenario_dir' and 'scenario' in rl config

* 1. refined env_sampler and agent_wrapper code; 2. added docstrings for env_sampler methods

* 1. temporarily renamed RLPolicy from polivy_v2 to RLPolicyV2; 2. merged env_sampler and env_sampler_v2

* merged cim and cim_v2

* lint issue fix

* refined logging logic

* lint issue fix

* reversed unwanted changes

* .

.

.

.

ReplayMemory & IndexScheduler

ReplayMemory & IndexScheduler

.

MultiReplayMemory

get_actions_with_logps

EnvSampler on the road

EnvSampler

Minor

* LearnerManager

* Use batch to transfer data & add SHAPE_CHECK_FLAG

* Rename learner to trainer

* Add property for policy._is_exploring

* CIM test scenario for V3. Manual test passed. Next step: run it, make it works.

* env_sampler.py could run

* env_sampler refine on the way

* First runnable version done

* AC could run, but the result is bad. Need to check the logic

* Refine abstract method & shape check error info.

* Docs

* Very detailed compare. Try again.

* AC done

* DQN check done

* Minor

* DDPG, not tested

* Minors

* A rough draft of MAAC

* Cannot use CIM as the multi-agent scenario.

* Minor

* MAAC refinement on the way

* Remove ActionWithAux

* Refine batch & memory

* MAAC example works

* Reproduce-able fix. Policy share between env_sampler and trainer_manager.

* Detail refinement

* Simplify the user configed workflow

* Minor

* Refine example codes

* Minor polishment

* Migrate rollout_manager to V3

* Error on the way

* Redesign torch.device management

* Rl v3 maddpg (#418)

* Add MADDPG trainer

* Fit independent critics and shared critic modes.

* Add a new property: num_policies

* Lint

* Fix a bug in `sum(rewards)`

* Rename `MADDPG` to `DiscreteMADDPG` and fix type hint.

* Rename maddpg in examples.

* Preparation for data parallel (#420)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* Rename train worker to train ops; add placeholder for abstract methods;

* Lint

Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>

* [DRAFT] distributed training pipeline based on RL Toolkit V3 (#450)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* dsitributed training pipeline draft

* added temporary test files for review purposes

* Several code style refinements (#451)

* Polish rl_v3/utils/

* Polish rl_v3/distributed/

* Polish rl_v3/policy_trainer/abs_trainer.py

* fixed merge conflicts

* unified sync and async interfaces

* refactored rl_v3; refinement in progress

* Finish the runnable pipeline under new design

* Remove outdated files; refine class names; optimize imports;

* Lint

* Minor maddpg related refinement

* Lint

Co-authored-by: Default <huo53926@126.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Miner bug fix

* Coroutine-related bug fix ("get_policy_state") (#452)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* deleted unwanted folder

* removed unwanted changes

* resolved PR452 comments

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Quick fix

* Redesign experience recording logic (#453)

* Two not important fix

* Temp draft. Prepare to WFH

* Done

* Lint

* Lint

* Calculating advantages / returns (#454)

* V1.0

* Complete DDPG

* Rl v3 hanging issue fix (#455)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* unified worker interfaces

* recovered some files

* dist training + cli code move

* fixed bugs

* added retry logic to client

* 1. refactored CIM with various algos; 2. lint

* lint

* added type hint

* removed some logs

* lint

* Make main.py more IDE friendly

* Make main.py more IDE friendly

* Lint

* Final test & format. Ready to merge.

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* Rl v3 parallel rollout (#457)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* unified worker interfaces

* recovered some files

* dist training + cli code move

* fixed bugs

* added retry logic to client

* 1. refactored CIM with various algos; 2. lint

* lint

* added type hint

* removed some logs

* lint

* Make main.py more IDE friendly

* Make main.py more IDE friendly

* Lint

* load balancing dispatcher

* added parallel rollout

* lint

* Tracker variable type issue; rename to env_sampler_creator;

* Rl v3 parallel rollout follow ups (#458)

* AbsWorker & AbsDispatcher

* Pass env idx to AbsTrainer.record() method, and let the trainer to decide how to record experiences sampled from different worlds.

* Fix policy_creator reuse bug

* Format code

* Merge AbsTrainerManager & SimpleTrainerManager

* AC test passed

* Lint

* Remove AbsTrainer.build() method. Put all initialization operations into __init__

* Redesign AC preprocess batches logic

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* MADDPG performance bug fix (#459)

* Fix MARL (MADDPG) terminal recording bug; some other minor refinements;

* Restore Trainer.build() method

* Calculate latest action in the get_actor_grad method in MADDPG.

* Share critic bug fix

* Rl v3 example update (#461)

* updated vm_scheduling example and cim notebook

* fixed bugs in vm_scheduling

* added local train method

* bug fix

* modified async client logic to fix hidden issue

* reverted to default config

* fixed PR comments and some bugs

* removed hardcode

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Done (#462)

* Rl v3 load save (#463)

* added load/save feature

* fixed some bugs

* reverted unwanted changes

* lint

* fixed PR comments

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL Toolkit data parallelism revamp & config utils (#464)

* added load/save feature

* fixed some bugs

* reverted unwanted changes

* lint

* fixed PR comments

* 1. fixed data parallelism issue; 2. added config validator; 3. refactored cli local

* 1. fixed rollout exit issue; 2. refined config

* removed config file from example

* fixed lint issues

* fixed lint issues

* added main.py under examples/rl

* fixed lint issues

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL doc string (#465)

* First rough draft

* Minors

* Reformat

* Lint

* Resolve PR comments

* Rl type specific env getter (#466)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* fixed bugs

* fixed bugs

* bug fixes

* lint

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Example bug fix

* Optimize parser.py

* Resolve PR comments

* Rl config doc (#467)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* added detailed doc

* lint

* wording refined

* resolved some PR comments

* resolved more PR comments

* typo fix

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL online doc (#469)

* Model, policy, trainer

* RL workflows and env sampler doc in RST (#468)

* First rough draft

* Minors

* Reformat

* Lint

* Resolve PR comments

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* Rl type specific env getter (#466)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* fixed bugs

* fixed bugs

* bug fixes

* lint

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Example bug fix

* Optimize parser.py

* Resolve PR comments

* added detailed doc

* lint

* wording refined

* resolved some PR comments

* rewriting rl toolkit rst

* resolved more PR comments

* typo fix

* updated rst

Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: Default <huo53926@126.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Finish docs/source/key_components/rl_toolkit.rst

* API doc

* RL online doc image fix (#470)

* resolved some PR comments

* fix

* fixed PR comments

* added numfig=True setting in conf.py for sphinx

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Resolve PR comments

* Add example github link

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Rl v3 pr comment resolution (#474)

* added load/save feature

* 1. resolved pr comments; 2. reverted maro/cli/k8s

* fixed some bugs

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* RL renaming v2 (#476)

* Change all Logger in RL to LoggerV2

* TrainerManager => TrainingManager

* Add Trainer suffix to all algorithms

* Finish docs

* Update interface names

* Minor fix

* Cherry pick latest RL (#498)

* Cherry pick

* Remove SC related files

* Cherry pick RL changes from `sc_refinement` (latest commit: `2a4869`) (#509)

* Cherry pick RL changes from sc_refinement (2a4869)

* Limit time display precision

* RL incremental refactor (#501)

* Refactor rollout logic. Allow multiple sampling in one epoch, so that we can generate more data for training.

AC & PPO for continuous action policy; refine AC & PPO logic.

Cherry pick RL changes from GYM-DDPG

Cherry pick RL changes from GYM-SAC

Minor error in doc string

* Add min_n_sample in template and parser

* Resolve PR comments. Fix a minor issue in SAC.

* RL component bundle (#513)

* CIM passed

* Update workers

* Refine annotations

* VM passed

* Code formatting.

* Minor import loop issue

* Pass batch in PPO again

* Remove Scenario

* Complete docs

* Minor

* Remove segment

* Optimize logic in RLComponentBundle

* Resolve PR comments

* Move 'post methods from RLComponenetBundle to EnvSampler

* Add method to get mapping of available tick to frame index (#415)

* add method to get mapping of available tick to frame index

* fix lint issue

* fix naming issue

* Cherry pick from sc_refinement (#527)

* Cherry pick from sc_refinement

* Cherry pick from sc_refinement

* Refine `terminal` / `next_agent_state` logic (#531)

* Optimize RL toolkit

* Fix bug in terminal/next_state generation

* Rewrite terminal/next_state logic again

* Minor renaming

* Minor bug fix

* Resolve PR comments

* Merge master into v0.3 (#536)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](#352) and [314](#314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* add branch v0.3 to github workflow

* update github test workflow

* Update requirements.dev.txt (#444)

Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact.

* Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460)

Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](ipython/ipython@7.10.1...7.16.3)

---
updated-dependencies:
- dependency-name: ipython
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add & sort requirements.dev.txt

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
Co-authored-by: solosilence <abhishekkr23rs@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Remove random_config.py

* Remove test_trajectory_utils.py

* Pass tests

* Update rl docs

* Remove python 3.6 in test

* Update docs

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wang.Jinyu <jinywan@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: GQ.Chen <675865907@qq.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: solosilence <abhishekkr23rs@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Logger bug hotfix (#543)

* Rename param

* Rename param

* Quick fix in env_data_process

* frame data precision issue fix (#544)

* fix frame precision issue

* add .xmake to .gitignore

* update frame precision lost warning message

* add assert to frame precision checking

* typo fix

* add TODO for future Long data type issue fix

* Minor cleaning

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
Co-authored-by: solosilence <abhishekkr23rs@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jinyu Wang <jinyu@RL4Inv.l1ea1prscrcu1p4sa0eapum5vc.bx.internal.cloudapp.net>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: GQ.Chen <675865907@qq.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Jinyu-W added a commit that referenced this pull request Dec 27, 2022
…ndle (#569)

* updated images and refined doc

* updated images

* updated CIM-AC example

* refined proxy retry logic

* call policy update only for AbsCorePolicy

* add limitation of AbsCorePolicy in Actor.collect()

* refined actor to return only experiences for policies that received new experiences

* fix MsgKey issue in rollout_manager

* fix typo in learner

* call exit function for parallel rollout manager

* update supply chain example distributed training scripts

* 1. moved exploration scheduling to rollout manager; 2. fixed bug in lr schedule registration in core model; 3. added parallel policy manager prorotype

* reformat render

* fix supply chain business engine action type problem

* reset supply chain example render figsize from 4 to 3

* Add render to all modes of supply chain example

* fix or policy typos

* 1. added parallel policy manager prototype; 2. used training ep for evaluation episodes

* refined parallel policy manager

* updated rl/__init__/py

* fixed lint issues and CIM local learner bugs

* deleted unwanted supply_chain test files

* revised default config for cim-dqn

* removed test_store.py as it is no longer needed

* 1. changed Actor class to rollout_worker function; 2. renamed algorithm to algorithms

* updated figures

* removed unwanted import

* refactored CIM-DQN example

* added MultiProcessRolloutManager and MultiProcessTrainingManager

* updated doc

* lint issue fix

* lint issue fix

* fixed import formatting

* [Feature] Prioritized Experience Replay (#355)

* added prioritized experience replay

* deleted unwanted supply_chain test files

* fixed import order

* import fix

* fixed lint issues

* fixed import formatting

* added note in docstring that rank-based PER has yet to be implemented

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* rm AbsDecisionGenerator

* small fixes

* bug fix

* reorganized training folder structure

* fixed lint issues

* fixed lint issues

* policy manager refined

* lint fix

* restructured CIM-dqn sync code

* added policy version index and used it as a measure of experience staleness

* lint issue fix

* lint issue fix

* switched log_dir and proxy_kwargs order

* cim example refinement

* eval schedule sorted only when it's a list

* eval schedule sorted only when it's a list

* update sc env wrapper

* added docker scripts for cim-dqn

* refactored example folder structure and added workflow templates

* fixed lint issues

* fixed lint issues

* fixed template bugs

* removed unused imports

* refactoring sc in progress

* simplified cim meta

* fixed build.sh path bug

* template refinement

* deleted obsolete svgs

* updated learner logs

* minor edits

* refactored templates for easy merge with async PR

* added component names for rollout manager and policy manager

* fixed incorrect position to add last episode to eval schedule

* added max_lag option in templates

* formatting edit in docker_compose_yml script

* moved local learner and early stopper outside sync_tools

* refactored rl toolkit folder structure

* refactored rl toolkit folder structure

* moved env_wrapper and agent_wrapper inside rl/learner

* refined scripts

* fixed typo in script

* changes needed for running sc

* removed unwanted imports

* config change for testing sc scenario

* changes for perf testing

* Asynchronous Training (#364)

* remote inference code draft

* changed actor to rollout_worker and updated init files

* removed unwanted import

* updated inits

* more async code

* added async scripts

* added async training code & scripts for CIM-dqn

* changed async to async_tools to avoid conflict with python keyword

* reverted unwanted change to dockerfile

* added doc for policy server

* addressed PR comments and fixed a bug in docker_compose_yml.py

* fixed lint issue

* resolved PR comment

* resolved merge conflicts

* added async templates

* added proxy.close() for actor and policy_server

* fixed incorrect position to add last episode to eval schedule

* reverted unwanted changes

* added missing async files

* rm unwanted echo in kill.sh

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* renamed sync to synchronous and async to asynchronous to avoid conflict with keyword

* added missing policy version increment in LocalPolicyManager

* refined rollout manager recv logic

* removed a debugging print

* added sleep in distributed launcher to avoid hanging

* updated api doc and rl toolkit doc

* refined dynamic imports using importlib

* 1. moved policy update triggers to policy manager; 2. added version control in policy manager

* fixed a few bugs and updated cim RL example

* fixed a few more bugs

* added agent wrapper instantiation to workflows

* added agent wrapper instantiation to workflows

* removed abs_block and added max_prob option for DiscretePolicyNet and DiscreteACNet

* fixed incorrect get_ac_policy signature for CIM

* moved exploration inside core policy

* added state to exploration call to support context-dependent exploration

* separated non_rl_policy_index and rl_policy_index in workflows

* modified sc example code according to workflow changes

* modified sc example code according to workflow changes

* added replay_agent_ids parameter to get_env_func for RL examples

* fixed a few bugs

* added maro/simulator/scenarios/supply_chain as bind mount

* added post-step, post-collect, post-eval and post-update callbacks

* fixed lint issues

* fixed lint issues

* moved instantiation of policy manager inside simple learner

* fixed env_wrapper get_reward signature

* minor edits

* removed get_eperience kwargs from env_wrapper

* 1. renamed step_callback to post_step in env_wrapper; 2. added get_eval_env_func to RL workflows

* added rollout exp disribution option in RL examples

* removed unwanted files

* 1. made logger internal in learner; 2 removed logger creation in abs classes

* checked out supply chain test files from v0.2_sc

* 1. added missing model.eval() to choose_action; 2.added entropy features to AC

* fixed a bug in ac entropy

* abbreviated coefficient to coeff

* removed -dqn from job name in rl example config

* added tmp patch to dev.df

* renamed image name for running rl examples

* added get_loss interface for core policies

* added policy manager in rl_toolkit.rst

* 1. env_wrapper bug fix; 2. policy manager update logic refinement

* refactored policy and algorithms

* policy interface redesigned

* refined policy interfaces

* fixed typo

* fixed bugs in refactored policy interface

* fixed some bugs

* refactoring in progress

* policy interface and policy manager redesigned

* 1. fixed bugs in ac and pg; 2. fixed bugs rl workflow scripts

* fixed bug in distributed policy manager

* fixed lint issues

* fixed lint issues

* added scipy in setup

* 1. trimmed rollout manager code; 2. added option to docker scripts

* updated api doc for policy manager

* 1. simplified rl/learning code structure; 2. fixed bugs in rl example docker script

* 1. simplified rl example structure; 2. fixed lint issues

* further rl toolkit code simplifications

* more numpy-based optimization in RL toolkit

* moved replay buffer inside policy

* bug fixes

* numpy optimization and associated refactoring

* extracted shaping logic out of env_sampler

* fixed bug in CIM shaping and lint issues

* preliminary implemetation of parallel batch inference

* fixed bug in ddpg transition recording

* put get_state, get_env_actions, get_reward back in EnvSampler

* simplified exploration and core model interfaces

* bug fixes and doc update

* added improve() interface for RLPolicy for single-thread support

* fixed simple policy manager bug

* updated doc, rst, notebook

* updated notebook

* fixed lint issues

* fixed entropy bugs in ac.py

* reverted to simple policy manager as default

* 1. unified single-thread and distributed mode in learning_loop.py; 2. updated api doc for algorithms and rst for rl toolkit

* fixed lint issues and updated rl toolkit images

* removed obsolete images

* added back agent2policy for general workflow use

* V0.2 rl refinement dist (#377)

* Support `slice` operation in ExperienceSet

* Support naive distributed policy training by proxy

* Dynamically allocate trainers according to number of experience

* code check

* code check

* code check

* Fix a bug in distributed trianing with no gradient

* Code check

* Move Back-Propagation from trainer to policy_manager and extract trainer-allocation strategy

* 1.call allocate_trainer() at first of update(); 2.refine according to code review

* Code check

* Refine code with new interface

* Update docs of PolicyManger and ExperienceSet

* Add images for rl_toolkit docs

* Update diagram of PolicyManager

* Refine with new interface

* Extract allocation strategy into `allocation_strategy.py`

* add `distributed_learn()` in policies for data-parallel training

* Update doc of RL_toolkit

* Add gradient workers for data-parallel

* Refine code and update docs

* Lint check

* Refine by comments

* Rename `trainer` to `worker`

* Rename `distributed_learn` to `learn_with_data_parallel`

* Refine allocator and remove redundant code in policy_manager

* remove arugments in allocate_by_policy and so on

* added checkpointing for simple and multi-process policy managers

* 1. bug fixes in checkpointing; 2. removed version and max_lag in rollout manager

* added missing set_state and get_state for CIM policies

* removed blank line

* updated RL workflow README

* Integrate `data_parallel` arguments into `worker_allocator` (#402)

* 1. simplified workflow config; 2. added comments to CIM shaping

* lint issue fix

* 1. added algorithm type setting in CIM config; 2. added try-except clause for initial policy state loading

* 1. moved post_step callback inside env sampler; 2. updated README for rl workflows

* refined READEME for CIM

* VM scheduling with RL (#375)

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* added DQN

* added get_experiences func for ac in vm scheduling

* added post_step callback to env wrapper

* moved Aiming's tracking and plotting logic into callbacks

* added eval env wrapper

* renamed AC config variable name for VM

* vm scheduling RL code finished

* updated README

* fixed various bugs and hard coding for vm_scheduling

* uncommented callbacks for VM scheduling

* Minor revision for better code style

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* vm example refactoring

* fixed bugs in vm_scheduling

* removed unwanted files from cim dir

* reverted to simple policy manager as default

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* resolved rebase conflicts

* fixed bugs in vm_scheduling

* added get_state and set_state to vm_scheduling policy models

* updated README for vm_scheduling with RL

Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* SC refinement (#397)

* Refine test scripts & pending_order_daily logic

* Refactor code for better code style: complete type hint, correct typos, remove unused items.

Refactor code for better code style: complete type hint, correct typos, remove unused items.

* Polish test_supply_chain.py

* update import format

* Modify vehicle steps logic & remove outdated test case

* Optimize imports

* Optimize imports

* Lint error

* Lint error

* Lint error

* Add SupplyChainAction

* Lint error

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* refined workflow scripts

* fixed bug in ParallelAgentWrapper

* 1. fixed lint issues; 2. refined main script in workflows

* lint issue fix

* restored default config for rl example

* Update rollout.py

* refined env var processing in policy manager workflow

* added hasattr check in agent wrapper

* updated docker_compose_yml.py

* Minor refinement

* Minor PR. Prepare to merge latest master branch into v0.3 branch. (#412)

* Prepare to merge master_mirror

* Lint error

* Minor

* Merge latest master into v0.3 (#426)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* Minor

* Remove docs/source/examples/multi_agent_dqn_cim.rst

* Update .gitignore

* Update .gitignore

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>

* Change `Env.set_seed()` logic (#456)

* Change Env.set_seed() logic

* Redesign CIM reset logic; fix lint issues;

* Lint

* Seed type assertion

* Remove all SC related files (#473)

* RL Toolkit V3 (#471)

* added daemon=True for multi-process rollout, policy manager and inference

* removed obsolete files

* [REDO][PR#406]V0.2 rl refinement taskq (#408)

* Add a usable task_queue

* Rename some variables

* 1. Add ; 2. Integrate  related files; 3. Remove

* merge `data_parallel` and `num_grad_workers` into `data_parallelism`

* Fix bugs in docker_compose_yml.py and Simple/Multi-process mode.

* Move `grad_worker` into marl/rl/workflows

* 1.Merge data_parallel and num_workers into data_parallelism in config; 2.Assign recently used workers as possible in task_queue.

* Refine code and update docs of `TaskQueue`

* Support priority for tasks in `task_queue`

* Update diagram of policy manager and task queue.

* Add configurable `single_task_limit` and correct docstring about `data_parallelism`

* Fix lint errors in `supply chain`

* RL policy redesign (V2) (#405)

* Drafi v2.0 for V2

* Polish models with more comments

* Polish policies with more comments

* Lint

* Lint

* Add developer doc for models.

* Add developer doc for policies.

* Remove policy manager V2 since it is not used and out-of-date

* Lint

* Lint

* refined messy workflow code

* merged 'scenario_dir' and 'scenario' in rl config

* 1. refined env_sampler and agent_wrapper code; 2. added docstrings for env_sampler methods

* 1. temporarily renamed RLPolicy from polivy_v2 to RLPolicyV2; 2. merged env_sampler and env_sampler_v2

* merged cim and cim_v2

* lint issue fix

* refined logging logic

* lint issue fix

* reversed unwanted changes

* .

.

.

.

ReplayMemory & IndexScheduler

ReplayMemory & IndexScheduler

.

MultiReplayMemory

get_actions_with_logps

EnvSampler on the road

EnvSampler

Minor

* LearnerManager

* Use batch to transfer data & add SHAPE_CHECK_FLAG

* Rename learner to trainer

* Add property for policy._is_exploring

* CIM test scenario for V3. Manual test passed. Next step: run it, make it works.

* env_sampler.py could run

* env_sampler refine on the way

* First runnable version done

* AC could run, but the result is bad. Need to check the logic

* Refine abstract method & shape check error info.

* Docs

* Very detailed compare. Try again.

* AC done

* DQN check done

* Minor

* DDPG, not tested

* Minors

* A rough draft of MAAC

* Cannot use CIM as the multi-agent scenario.

* Minor

* MAAC refinement on the way

* Remove ActionWithAux

* Refine batch & memory

* MAAC example works

* Reproduce-able fix. Policy share between env_sampler and trainer_manager.

* Detail refinement

* Simplify the user configed workflow

* Minor

* Refine example codes

* Minor polishment

* Migrate rollout_manager to V3

* Error on the way

* Redesign torch.device management

* Rl v3 maddpg (#418)

* Add MADDPG trainer

* Fit independent critics and shared critic modes.

* Add a new property: num_policies

* Lint

* Fix a bug in `sum(rewards)`

* Rename `MADDPG` to `DiscreteMADDPG` and fix type hint.

* Rename maddpg in examples.

* Preparation for data parallel (#420)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* Rename train worker to train ops; add placeholder for abstract methods;

* Lint

Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>

* [DRAFT] distributed training pipeline based on RL Toolkit V3 (#450)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* dsitributed training pipeline draft

* added temporary test files for review purposes

* Several code style refinements (#451)

* Polish rl_v3/utils/

* Polish rl_v3/distributed/

* Polish rl_v3/policy_trainer/abs_trainer.py

* fixed merge conflicts

* unified sync and async interfaces

* refactored rl_v3; refinement in progress

* Finish the runnable pipeline under new design

* Remove outdated files; refine class names; optimize imports;

* Lint

* Minor maddpg related refinement

* Lint

Co-authored-by: Default <huo53926@126.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Miner bug fix

* Coroutine-related bug fix ("get_policy_state") (#452)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* deleted unwanted folder

* removed unwanted changes

* resolved PR452 comments

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Quick fix

* Redesign experience recording logic (#453)

* Two not important fix

* Temp draft. Prepare to WFH

* Done

* Lint

* Lint

* Calculating advantages / returns (#454)

* V1.0

* Complete DDPG

* Rl v3 hanging issue fix (#455)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* unified worker interfaces

* recovered some files

* dist training + cli code move

* fixed bugs

* added retry logic to client

* 1. refactored CIM with various algos; 2. lint

* lint

* added type hint

* removed some logs

* lint

* Make main.py more IDE friendly

* Make main.py more IDE friendly

* Lint

* Final test & format. Ready to merge.

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* Rl v3 parallel rollout (#457)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* unified worker interfaces

* recovered some files

* dist training + cli code move

* fixed bugs

* added retry logic to client

* 1. refactored CIM with various algos; 2. lint

* lint

* added type hint

* removed some logs

* lint

* Make main.py more IDE friendly

* Make main.py more IDE friendly

* Lint

* load balancing dispatcher

* added parallel rollout

* lint

* Tracker variable type issue; rename to env_sampler_creator;

* Rl v3 parallel rollout follow ups (#458)

* AbsWorker & AbsDispatcher

* Pass env idx to AbsTrainer.record() method, and let the trainer to decide how to record experiences sampled from different worlds.

* Fix policy_creator reuse bug

* Format code

* Merge AbsTrainerManager & SimpleTrainerManager

* AC test passed

* Lint

* Remove AbsTrainer.build() method. Put all initialization operations into __init__

* Redesign AC preprocess batches logic

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* MADDPG performance bug fix (#459)

* Fix MARL (MADDPG) terminal recording bug; some other minor refinements;

* Restore Trainer.build() method

* Calculate latest action in the get_actor_grad method in MADDPG.

* Share critic bug fix

* Rl v3 example update (#461)

* updated vm_scheduling example and cim notebook

* fixed bugs in vm_scheduling

* added local train method

* bug fix

* modified async client logic to fix hidden issue

* reverted to default config

* fixed PR comments and some bugs

* removed hardcode

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Done (#462)

* Rl v3 load save (#463)

* added load/save feature

* fixed some bugs

* reverted unwanted changes

* lint

* fixed PR comments

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL Toolkit data parallelism revamp & config utils (#464)

* added load/save feature

* fixed some bugs

* reverted unwanted changes

* lint

* fixed PR comments

* 1. fixed data parallelism issue; 2. added config validator; 3. refactored cli local

* 1. fixed rollout exit issue; 2. refined config

* removed config file from example

* fixed lint issues

* fixed lint issues

* added main.py under examples/rl

* fixed lint issues

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL doc string (#465)

* First rough draft

* Minors

* Reformat

* Lint

* Resolve PR comments

* Rl type specific env getter (#466)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* fixed bugs

* fixed bugs

* bug fixes

* lint

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Example bug fix

* Optimize parser.py

* Resolve PR comments

* Rl config doc (#467)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* added detailed doc

* lint

* wording refined

* resolved some PR comments

* resolved more PR comments

* typo fix

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL online doc (#469)

* Model, policy, trainer

* RL workflows and env sampler doc in RST (#468)

* First rough draft

* Minors

* Reformat

* Lint

* Resolve PR comments

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* Rl type specific env getter (#466)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* fixed bugs

* fixed bugs

* bug fixes

* lint

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Example bug fix

* Optimize parser.py

* Resolve PR comments

* added detailed doc

* lint

* wording refined

* resolved some PR comments

* rewriting rl toolkit rst

* resolved more PR comments

* typo fix

* updated rst

Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: Default <huo53926@126.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Finish docs/source/key_components/rl_toolkit.rst

* API doc

* RL online doc image fix (#470)

* resolved some PR comments

* fix

* fixed PR comments

* added numfig=True setting in conf.py for sphinx

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Resolve PR comments

* Add example github link

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Rl v3 pr comment resolution (#474)

* added load/save feature

* 1. resolved pr comments; 2. reverted maro/cli/k8s

* fixed some bugs

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* RL renaming v2 (#476)

* Change all Logger in RL to LoggerV2

* TrainerManager => TrainingManager

* Add Trainer suffix to all algorithms

* Finish docs

* Update interface names

* Minor fix

* Cherry pick latest RL (#498)

* Cherry pick

* Remove SC related files

* Cherry pick RL changes from `sc_refinement` (latest commit: `2a4869`) (#509)

* Cherry pick RL changes from sc_refinement (2a4869)

* Limit time display precision

* RL incremental refactor (#501)

* Refactor rollout logic. Allow multiple sampling in one epoch, so that we can generate more data for training.

AC & PPO for continuous action policy; refine AC & PPO logic.

Cherry pick RL changes from GYM-DDPG

Cherry pick RL changes from GYM-SAC

Minor error in doc string

* Add min_n_sample in template and parser

* Resolve PR comments. Fix a minor issue in SAC.

* RL component bundle (#513)

* CIM passed

* Update workers

* Refine annotations

* VM passed

* Code formatting.

* Minor import loop issue

* Pass batch in PPO again

* Remove Scenario

* Complete docs

* Minor

* Remove segment

* Optimize logic in RLComponentBundle

* Resolve PR comments

* Move 'post methods from RLComponenetBundle to EnvSampler

* Add method to get mapping of available tick to frame index (#415)

* add method to get mapping of available tick to frame index

* fix lint issue

* fix naming issue

* Cherry pick from sc_refinement (#527)

* Cherry pick from sc_refinement

* Cherry pick from sc_refinement

* Refine `terminal` / `next_agent_state` logic (#531)

* Optimize RL toolkit

* Fix bug in terminal/next_state generation

* Rewrite terminal/next_state logic again

* Minor renaming

* Minor bug fix

* Resolve PR comments

* Merge master into v0.3 (#536)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* add branch v0.3 to github workflow

* update github test workflow

* Update requirements.dev.txt (#444)

Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact.

* Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460)

Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](https://github.com/ipython/ipython/compare/7.10.1...7.16.3)

---
updated-dependencies:
- dependency-name: ipython
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add & sort requirements.dev.txt

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
Co-authored-by: solosilence <abhishekkr23rs@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Merge master into v0.3 (#545)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* add branch v0.3 to github workflow

* update github test workflow

* Update requirements.dev.txt (#444)

Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact.

* Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460)

Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](https://github.com/ipython/ipython/compare/7.10.1...7.16.3)

---
updated-dependencies:
- dependency-name: ipython
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* update github woorkflow config

* MARO v0.3: a new design of RL Toolkit, CLI refactorization, and corresponding updates. (#539)

* refined proxy coding style

* updated images and refined doc

* updated images

* updated CIM-AC example

* refined proxy retry logic

* call policy update only for AbsCorePolicy

* add limitation of AbsCorePolicy in Actor.collect()

* refined actor to return only experiences for policies that received new experiences

* fix MsgKey issue in rollout_manager

* fix typo in learner

* call exit function for parallel rollout manager

* update supply chain example distributed training scripts

* 1. moved exploration scheduling to rollout manager; 2. fixed bug in lr schedule registration in core model; 3. added parallel policy manager prorotype

* reformat render

* fix supply chain business engine action type problem

* reset supply chain example render figsize from 4 to 3

* Add render to all modes of supply chain example

* fix or policy typos

* 1. added parallel policy manager prototype; 2. used training ep for evaluation episodes

* refined parallel policy manager

* updated rl/__init__/py

* fixed lint issues and CIM local learner bugs

* deleted unwanted supply_chain test files

* revised default config for cim-dqn

* removed test_store.py as it is no longer needed

* 1. changed Actor class to rollout_worker function; 2. renamed algorithm to algorithms

* updated figures

* removed unwanted import

* refactored CIM-DQN example

* added MultiProcessRolloutManager and MultiProcessTrainingManager

* updated doc

* lint issue fix

* lint issue fix

* fixed import formatting

* [Feature] Prioritized Experience Replay (#355)

* added prioritized experience replay

* deleted unwanted supply_chain test files

* fixed import order

* import fix

* fixed lint issues

* fixed import formatting

* added note in docstring that rank-based PER has yet to be implemented

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* rm AbsDecisionGenerator

* small fixes

* bug fix

* reorganized training folder structure

* fixed lint issues

* fixed lint issues

* policy manager refined

* lint fix

* restructured CIM-dqn sync code

* added policy version index and used it as a measure of experience staleness

* lint issue fix

* lint issue fix

* switched log_dir and proxy_kwargs order

* cim example refinement

* eval schedule sorted only when it's a list

* eval schedule sorted only when it's a list

* update sc env wrapper

* added docker scripts for cim-dqn

* refactored example folder structure and added workflow templates

* fixed lint issues

* fixed lint issues

* fixed template bugs

* removed unused imports

* refactoring sc in progress

* simplified cim meta

* fixed build.sh path bug

* template refinement

* deleted obsolete svgs

* updated learner logs

* minor edits

* refactored templates for easy merge with async PR

* added component names for rollout manager and policy manager

* fixed incorrect position to add last episode to eval schedule

* added max_lag option in templates

* formatting edit in docker_compose_yml script

* moved local learner and early stopper outside sync_tools

* refactored rl toolkit folder structure

* refactored rl toolkit folder structure

* moved env_wrapper and agent_wrapper inside rl/learner

* refined scripts

* fixed typo in script

* changes needed for running sc

* removed unwanted imports

* config change for testing sc scenario

* changes for perf testing

* Asynchronous Training (#364)

* remote inference code draft

* changed actor to rollout_worker and updated init files

* removed unwanted import

* updated inits

* more async code

* added async scripts

* added async training code & scripts for CIM-dqn

* changed async to async_tools to avoid conflict with python keyword

* reverted unwanted change to dockerfile

* added doc for policy server

* addressed PR comments and fixed a bug in docker_compose_yml.py

* fixed lint issue

* resolved PR comment

* resolved merge conflicts

* added async templates

* added proxy.close() for actor and policy_server

* fixed incorrect position to add last episode to eval schedule

* reverted unwanted changes

* added missing async files

* rm unwanted echo in kill.sh

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* renamed sync to synchronous and async to asynchronous to avoid conflict with keyword

* added missing policy version increment in LocalPolicyManager

* refined rollout manager recv logic

* removed a debugging print

* added sleep in distributed launcher to avoid hanging

* updated api doc and rl toolkit doc

* refined dynamic imports using importlib

* 1. moved policy update triggers to policy manager; 2. added version control in policy manager

* fixed a few bugs and updated cim RL example

* fixed a few more bugs

* added agent wrapper instantiation to workflows

* added agent wrapper instantiation to workflows

* removed abs_block and added max_prob option for DiscretePolicyNet and DiscreteACNet

* fixed incorrect get_ac_policy signature for CIM

* moved exploration inside core policy

* added state to exploration call to support context-dependent exploration

* separated non_rl_policy_index and rl_policy_index in workflows

* modified sc example code according to workflow changes

* modified sc example code according to workflow changes

* added replay_agent_ids parameter to get_env_func for RL examples

* fixed a few bugs

* added maro/simulator/scenarios/supply_chain as bind mount

* added post-step, post-collect, post-eval and post-update callbacks

* fixed lint issues

* fixed lint issues

* moved instantiation of policy manager inside simple learner

* fixed env_wrapper get_reward signature

* minor edits

* removed get_eperience kwargs from env_wrapper

* 1. renamed step_callback to post_step in env_wrapper; 2. added get_eval_env_func to RL workflows

* added rollout exp disribution option in RL examples

* removed unwanted files

* 1. made logger internal in learner; 2 removed logger creation in abs classes

* checked out supply chain test files from v0.2_sc

* 1. added missing model.eval() to choose_action; 2.added entropy features to AC

* fixed a bug in ac entropy

* abbreviated coefficient to coeff

* removed -dqn from job name in rl example config

* added tmp patch to dev.df

* renamed image name for running rl examples

* added get_loss interface for core policies

* added policy manager in rl_toolkit.rst

* 1. env_wrapper bug fix; 2. policy manager update logic refinement

* refactored policy and algorithms

* policy interface redesigned

* refined policy interfaces

* fixed typo

* fixed bugs in refactored policy interface

* fixed some bugs

* refactoring in progress

* policy interface and policy manager redesigned

* 1. fixed bugs in ac and pg; 2. fixed bugs rl workflow scripts

* fixed bug in distributed policy manager

* fixed lint issues

* fixed lint issues

* added scipy in setup

* 1. trimmed rollout manager code; 2. added option to docker scripts

* updated api doc for policy manager

* 1. simplified rl/learning code structure; 2. fixed bugs in rl example docker script

* 1. simplified rl example structure; 2. fixed lint issues

* further rl toolkit code simplifications

* more numpy-based optimization in RL toolkit

* moved replay buffer inside policy

* bug fixes

* numpy optimization and associated refactoring

* extracted shaping logic out of env_sampler

* fixed bug in CIM shaping and lint issues

* preliminary implemetation of parallel batch inference

* fixed bug in ddpg transition recording

* put get_state, get_env_actions, get_reward back in EnvSampler

* simplified exploration and core model interfaces

* bug fixes and doc update

* added improve() interface for RLPolicy for single-thread support

* fixed simple policy manager bug

* updated doc, rst, notebook

* updated notebook

* fixed lint issues

* fixed entropy bugs in ac.py

* reverted to simple policy manager as default

* 1. unified single-thread and distributed mode in learning_loop.py; 2. updated api doc for algorithms and rst for rl toolkit

* fixed lint issues and updated rl toolkit images

* removed obsolete images

* added back agent2policy for general workflow use

* V0.2 rl refinement dist (#377)

* Support `slice` operation in ExperienceSet

* Support naive distributed policy training by proxy

* Dynamically allocate trainers according to number of experience

* code check

* code check

* code check

* Fix a bug in distributed trianing with no gradient

* Code check

* Move Back-Propagation from trainer to policy_manager and extract trainer-allocation strategy

* 1.call allocate_trainer() at first of update(); 2.refine according to code review

* Code check

* Refine code with new interface

* Update docs of PolicyManger and ExperienceSet

* Add images for rl_toolkit docs

* Update diagram of PolicyManager

* Refine with new interface

* Extract allocation strategy into `allocation_strategy.py`

* add `distributed_learn()` in policies for data-parallel training

* Update doc of RL_toolkit

* Add gradient workers for data-parallel

* Refine code and update docs

* Lint check

* Refine by comments

* Rename `trainer` to `worker`

* Rename `distributed_learn` to `learn_with_data_parallel`

* Refine allocator and remove redundant code in policy_manager

* remove arugments in allocate_by_policy and so on

* added checkpointing for simple and multi-process policy managers

* 1. bug fixes in checkpointing; 2. removed version and max_lag in rollout manager

* added missing set_state and get_state for CIM policies

* removed blank line

* updated RL workflow README

* Integrate `data_parallel` arguments into `worker_allocator` (#402)

* 1. simplified workflow config; 2. added comments to CIM shaping

* lint issue fix

* 1. added algorithm type setting in CIM config; 2. added try-except clause for initial policy state loading

* 1. moved post_step callback inside env sampler; 2. updated README for rl workflows

* refined READEME for CIM

* VM scheduling with RL (#375)

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* added DQN

* added get_experiences func for ac in vm scheduling

* added post_step callback to env wrapper

* moved Aiming's tracking and plotting logic into callbacks

* added eval env wrapper

* renamed AC config variable name for VM

* vm scheduling RL code finished

* updated README

* fixed various bugs and hard coding for vm_scheduling

* uncommented callbacks for VM scheduling

* Minor revision for better code style

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* vm example refactoring

* fixed bugs in vm_scheduling

* removed unwanted files from cim dir

* reverted to simple policy manager as default

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* resolved rebase conflicts

* fixed bugs in vm_scheduling

* added get_state and set_state to vm_scheduling policy models

* updated README for vm_scheduling with RL

Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* SC refinement (#397)

* Refine test scripts & pending_order_daily logic

* Refactor code for better code style: complete type hint, correct typos, remove unused items.

Refactor code for better code style: complete type hint, correct typos, remove unused items.

* Polish test_supply_chain.py

* update import format

* Modify vehicle steps logic & remove outdated test case

* Optimize imports

* Optimize imports

* Lint error

* Lint error

* Lint error

* Add SupplyChainAction

* Lint error

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* refined workflow scripts

* fixed bug in ParallelAgentWrapper

* 1. fixed lint issues; 2. refined main script in workflows

* lint issue fix

* restored default config for rl example

* Update rollout.py

* refined env var processing in policy manager workflow

* added hasattr check in agent wrapper

* updated docker_compose_yml.py

* Minor refinement

* Minor PR. Prepare to merge latest master branch into v0.3 branch. (#412)

* Prepare to merge master_mirror

* Lint error

* Minor

* Merge latest master into v0.3 (#426)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* Minor

* Remove docs/source/examples/multi_agent_dqn_cim.rst

* Update .gitignore

* Update .gitignore

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>

* Change `Env.set_seed()` logic (#456)

* Change Env.set_seed() logic

* Redesign CIM reset logic; fix lint issues;

* Lint

* Seed type assertion

* Remove all SC related files (#473)

* RL Toolkit V3 (#471)

* added daemon=True for multi-process rollout, policy manager and inference

* removed obsolete files

* [REDO][PR#406]V0.2 rl refinement taskq (#408)

* Add a usable task_queue

* Rename some variables

* 1. Add ; 2. Integrate  related files; 3. Remove

* merge `data_parallel` and `num_grad_workers` into `data_parallelism`

* Fix bugs in docker_compose_yml.py and Simple/Multi-process mode.

* Move `grad_worker` into marl/rl/workflows

* 1.Merge data_parallel and num_workers into data_parallelism in config; 2.Assign recently used workers as possible in task_queue.

* Refine code and update docs of `TaskQueue`

* Support priority for tasks in `task_queue`

* Update diagram of policy manager and task queue.

* Add configurable `single_task_limit` and correct docstring about `data_parallelism`

* Fix lint errors in `supply chain`

* RL policy redesign (V2) (#405)

* Drafi v2.0 for V2

* Polish models with more comments

* Polish policies with more comments

* Lint

* Lint

* Add developer doc for models.

* Add developer doc for policies.

* Remove policy manager V2 since it is not used and out-of-date

* Lint

* Lint

* refined messy workflow code

* merged 'scenario_dir' and 'scenario' in rl config

* 1. refined env_sampler and agent_wrapper code; 2. added docstrings for env_sampler methods

* 1. temporarily renamed RLPolicy from polivy_v2 to RLPolicyV2; 2. merged env_sampler and env_sampler_v2

* merged cim and cim_v2

* lint issue fix

* refined logging logic

* lint issue fix

* reversed unwanted changes

* .

.

.

.

ReplayMemory & IndexScheduler

ReplayMemory & IndexScheduler

.

MultiReplayMemory

get_actions_with_logps

EnvSampler on the road

EnvSampler

Minor

* LearnerManager

* Use batch to transfer data & add SHAPE_CHECK_FLAG

* Rename learner to trainer

* Add property for policy._is_exploring

* CIM test scenario for V3. Manual test passed. Next step: run it, make it works.

* env_sampler.py could run

* env_sampler refine on the way

* First runnable version done

* AC could run, but the result is bad. Need to check the logic

* Refine abstract method & shape check error info.

* Docs

* Very detailed compare. Try again.

* AC done

* DQN check done

* Minor

* DDPG, not tested

* Minors

* A rough draft of MAAC

* Cannot use CIM as the multi-agent scenario.

* Minor

* MAAC refinement on the way

* Remove ActionWithAux

* Refine batch & memory

* MAAC example works

* Reproduce-able fix. Policy share between env_sampler and trainer_manager.

* Detail refinement

* Simplify the user configed workflow

* Minor

* Refine example codes

* Minor polishment

* Migrate rollout_manager to V3

* Error on the way

* Redesign torch.device management

* Rl v3 maddpg (#418)

* Add MADDPG trainer

* Fit independent critics and shared critic modes.

* Add a new property: num_policies

* Lint

* Fix a bug in `sum(rewards)`

* Rename `MADDPG` to `DiscreteMADDPG` and fix type hint.

* Rename maddpg in examples.

* Preparation for data parallel (#420)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* Rename train worker to train ops; add placeholder for abstract methods;

* Lint

Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>

* [DRAFT] distributed training pipeline based on RL Toolkit V3 (#450)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* dsitributed training pipeline draft

* added temporary test files for review purposes

* Several code style refinements (#451)

* Polish rl_v3/utils/

* Polish rl_v3/distributed/

* Polish rl_v3/policy_trainer/abs_trainer.py

* fixed merge conflicts

* unified sync and async interfaces

* refactored rl_v3; refinement in progress

* Finish the runnable pipeline under new design

* Remove outdated files; refine class names; optimize imports;

* Lint

* Minor maddpg related refinement

* Lint

Co-authored-by: Default <huo53926@126.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Miner bug fix

* Coroutine-related bug fix ("get_policy_state") (#452)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* deleted unwanted folder

* removed unwanted changes

* resolved PR452 comments

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Quick fix

* Redesign experience recording logic (#453)

* Two not important fix

* Temp draft. Prepare to WFH

* Done

* Lint

* Lint

* Calculating advantages / returns (#454)

* V1.0

* Complete DDPG

* Rl v3 hanging issue fix (#455)

* fixed rebase conflicts

* ren…
Jinyu-W added a commit that referenced this pull request Jan 16, 2023
* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* add branch v0.3 to github workflow

* update github test workflow

* Update requirements.dev.txt (#444)

Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact.

* Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460)

Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](https://github.com/ipython/ipython/compare/7.10.1...7.16.3)

---
updated-dependencies:
- dependency-name: ipython
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* RL component bundle (#513)

* CIM passed

* Update workers

* Refine annotations

* VM passed

* Code formatting.

* Minor import loop issue

* Pass batch in PPO again

* Remove Scenario

* Complete docs

* Minor

* Remove segment

* Optimize logic in RLComponentBundle

* Resolve PR comments

* Move 'post methods from RLComponenetBundle to EnvSampler

* Add method to get mapping of available tick to frame index (#415)

* add method to get mapping of available tick to frame index

* fix lint issue

* fix naming issue

* Cherry pick from sc_refinement (#527)

* Cherry pick from sc_refinement

* Cherry pick from sc_refinement

* Refine `terminal` / `next_agent_state` logic (#531)

* Optimize RL toolkit

* Fix bug in terminal/next_state generation

* Rewrite terminal/next_state logic again

* Minor renaming

* Minor bug fix

* Resolve PR comments

* Merge master into v0.3 (#536)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* add branch v0.3 to github workflow

* update github test workflow

* Update requirements.dev.txt (#444)

Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact.

* Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460)

Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](https://github.com/ipython/ipython/compare/7.10.1...7.16.3)

---
updated-dependencies:
- dependency-name: ipython
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add & sort requirements.dev.txt

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
Co-authored-by: solosilence <abhishekkr23rs@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* update github woorkflow config

* MARO v0.3: a new design of RL Toolkit, CLI refactorization, and corresponding updates. (#539)

* refined proxy coding style

* updated images and refined doc

* updated images

* updated CIM-AC example

* refined proxy retry logic

* call policy update only for AbsCorePolicy

* add limitation of AbsCorePolicy in Actor.collect()

* refined actor to return only experiences for policies that received new experiences

* fix MsgKey issue in rollout_manager

* fix typo in learner

* call exit function for parallel rollout manager

* update supply chain example distributed training scripts

* 1. moved exploration scheduling to rollout manager; 2. fixed bug in lr schedule registration in core model; 3. added parallel policy manager prorotype

* reformat render

* fix supply chain business engine action type problem

* reset supply chain example render figsize from 4 to 3

* Add render to all modes of supply chain example

* fix or policy typos

* 1. added parallel policy manager prototype; 2. used training ep for evaluation episodes

* refined parallel policy manager

* updated rl/__init__/py

* fixed lint issues and CIM local learner bugs

* deleted unwanted supply_chain test files

* revised default config for cim-dqn

* removed test_store.py as it is no longer needed

* 1. changed Actor class to rollout_worker function; 2. renamed algorithm to algorithms

* updated figures

* removed unwanted import

* refactored CIM-DQN example

* added MultiProcessRolloutManager and MultiProcessTrainingManager

* updated doc

* lint issue fix

* lint issue fix

* fixed import formatting

* [Feature] Prioritized Experience Replay (#355)

* added prioritized experience replay

* deleted unwanted supply_chain test files

* fixed import order

* import fix

* fixed lint issues

* fixed import formatting

* added note in docstring that rank-based PER has yet to be implemented

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* rm AbsDecisionGenerator

* small fixes

* bug fix

* reorganized training folder structure

* fixed lint issues

* fixed lint issues

* policy manager refined

* lint fix

* restructured CIM-dqn sync code

* added policy version index and used it as a measure of experience staleness

* lint issue fix

* lint issue fix

* switched log_dir and proxy_kwargs order

* cim example refinement

* eval schedule sorted only when it's a list

* eval schedule sorted only when it's a list

* update sc env wrapper

* added docker scripts for cim-dqn

* refactored example folder structure and added workflow templates

* fixed lint issues

* fixed lint issues

* fixed template bugs

* removed unused imports

* refactoring sc in progress

* simplified cim meta

* fixed build.sh path bug

* template refinement

* deleted obsolete svgs

* updated learner logs

* minor edits

* refactored templates for easy merge with async PR

* added component names for rollout manager and policy manager

* fixed incorrect position to add last episode to eval schedule

* added max_lag option in templates

* formatting edit in docker_compose_yml script

* moved local learner and early stopper outside sync_tools

* refactored rl toolkit folder structure

* refactored rl toolkit folder structure

* moved env_wrapper and agent_wrapper inside rl/learner

* refined scripts

* fixed typo in script

* changes needed for running sc

* removed unwanted imports

* config change for testing sc scenario

* changes for perf testing

* Asynchronous Training (#364)

* remote inference code draft

* changed actor to rollout_worker and updated init files

* removed unwanted import

* updated inits

* more async code

* added async scripts

* added async training code & scripts for CIM-dqn

* changed async to async_tools to avoid conflict with python keyword

* reverted unwanted change to dockerfile

* added doc for policy server

* addressed PR comments and fixed a bug in docker_compose_yml.py

* fixed lint issue

* resolved PR comment

* resolved merge conflicts

* added async templates

* added proxy.close() for actor and policy_server

* fixed incorrect position to add last episode to eval schedule

* reverted unwanted changes

* added missing async files

* rm unwanted echo in kill.sh

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* renamed sync to synchronous and async to asynchronous to avoid conflict with keyword

* added missing policy version increment in LocalPolicyManager

* refined rollout manager recv logic

* removed a debugging print

* added sleep in distributed launcher to avoid hanging

* updated api doc and rl toolkit doc

* refined dynamic imports using importlib

* 1. moved policy update triggers to policy manager; 2. added version control in policy manager

* fixed a few bugs and updated cim RL example

* fixed a few more bugs

* added agent wrapper instantiation to workflows

* added agent wrapper instantiation to workflows

* removed abs_block and added max_prob option for DiscretePolicyNet and DiscreteACNet

* fixed incorrect get_ac_policy signature for CIM

* moved exploration inside core policy

* added state to exploration call to support context-dependent exploration

* separated non_rl_policy_index and rl_policy_index in workflows

* modified sc example code according to workflow changes

* modified sc example code according to workflow changes

* added replay_agent_ids parameter to get_env_func for RL examples

* fixed a few bugs

* added maro/simulator/scenarios/supply_chain as bind mount

* added post-step, post-collect, post-eval and post-update callbacks

* fixed lint issues

* fixed lint issues

* moved instantiation of policy manager inside simple learner

* fixed env_wrapper get_reward signature

* minor edits

* removed get_eperience kwargs from env_wrapper

* 1. renamed step_callback to post_step in env_wrapper; 2. added get_eval_env_func to RL workflows

* added rollout exp disribution option in RL examples

* removed unwanted files

* 1. made logger internal in learner; 2 removed logger creation in abs classes

* checked out supply chain test files from v0.2_sc

* 1. added missing model.eval() to choose_action; 2.added entropy features to AC

* fixed a bug in ac entropy

* abbreviated coefficient to coeff

* removed -dqn from job name in rl example config

* added tmp patch to dev.df

* renamed image name for running rl examples

* added get_loss interface for core policies

* added policy manager in rl_toolkit.rst

* 1. env_wrapper bug fix; 2. policy manager update logic refinement

* refactored policy and algorithms

* policy interface redesigned

* refined policy interfaces

* fixed typo

* fixed bugs in refactored policy interface

* fixed some bugs

* refactoring in progress

* policy interface and policy manager redesigned

* 1. fixed bugs in ac and pg; 2. fixed bugs rl workflow scripts

* fixed bug in distributed policy manager

* fixed lint issues

* fixed lint issues

* added scipy in setup

* 1. trimmed rollout manager code; 2. added option to docker scripts

* updated api doc for policy manager

* 1. simplified rl/learning code structure; 2. fixed bugs in rl example docker script

* 1. simplified rl example structure; 2. fixed lint issues

* further rl toolkit code simplifications

* more numpy-based optimization in RL toolkit

* moved replay buffer inside policy

* bug fixes

* numpy optimization and associated refactoring

* extracted shaping logic out of env_sampler

* fixed bug in CIM shaping and lint issues

* preliminary implemetation of parallel batch inference

* fixed bug in ddpg transition recording

* put get_state, get_env_actions, get_reward back in EnvSampler

* simplified exploration and core model interfaces

* bug fixes and doc update

* added improve() interface for RLPolicy for single-thread support

* fixed simple policy manager bug

* updated doc, rst, notebook

* updated notebook

* fixed lint issues

* fixed entropy bugs in ac.py

* reverted to simple policy manager as default

* 1. unified single-thread and distributed mode in learning_loop.py; 2. updated api doc for algorithms and rst for rl toolkit

* fixed lint issues and updated rl toolkit images

* removed obsolete images

* added back agent2policy for general workflow use

* V0.2 rl refinement dist (#377)

* Support `slice` operation in ExperienceSet

* Support naive distributed policy training by proxy

* Dynamically allocate trainers according to number of experience

* code check

* code check

* code check

* Fix a bug in distributed trianing with no gradient

* Code check

* Move Back-Propagation from trainer to policy_manager and extract trainer-allocation strategy

* 1.call allocate_trainer() at first of update(); 2.refine according to code review

* Code check

* Refine code with new interface

* Update docs of PolicyManger and ExperienceSet

* Add images for rl_toolkit docs

* Update diagram of PolicyManager

* Refine with new interface

* Extract allocation strategy into `allocation_strategy.py`

* add `distributed_learn()` in policies for data-parallel training

* Update doc of RL_toolkit

* Add gradient workers for data-parallel

* Refine code and update docs

* Lint check

* Refine by comments

* Rename `trainer` to `worker`

* Rename `distributed_learn` to `learn_with_data_parallel`

* Refine allocator and remove redundant code in policy_manager

* remove arugments in allocate_by_policy and so on

* added checkpointing for simple and multi-process policy managers

* 1. bug fixes in checkpointing; 2. removed version and max_lag in rollout manager

* added missing set_state and get_state for CIM policies

* removed blank line

* updated RL workflow README

* Integrate `data_parallel` arguments into `worker_allocator` (#402)

* 1. simplified workflow config; 2. added comments to CIM shaping

* lint issue fix

* 1. added algorithm type setting in CIM config; 2. added try-except clause for initial policy state loading

* 1. moved post_step callback inside env sampler; 2. updated README for rl workflows

* refined READEME for CIM

* VM scheduling with RL (#375)

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* added DQN

* added get_experiences func for ac in vm scheduling

* added post_step callback to env wrapper

* moved Aiming's tracking and plotting logic into callbacks

* added eval env wrapper

* renamed AC config variable name for VM

* vm scheduling RL code finished

* updated README

* fixed various bugs and hard coding for vm_scheduling

* uncommented callbacks for VM scheduling

* Minor revision for better code style

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* vm example refactoring

* fixed bugs in vm_scheduling

* removed unwanted files from cim dir

* reverted to simple policy manager as default

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* resolved rebase conflicts

* fixed bugs in vm_scheduling

* added get_state and set_state to vm_scheduling policy models

* updated README for vm_scheduling with RL

Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* SC refinement (#397)

* Refine test scripts & pending_order_daily logic

* Refactor code for better code style: complete type hint, correct typos, remove unused items.

Refactor code for better code style: complete type hint, correct typos, remove unused items.

* Polish test_supply_chain.py

* update import format

* Modify vehicle steps logic & remove outdated test case

* Optimize imports

* Optimize imports

* Lint error

* Lint error

* Lint error

* Add SupplyChainAction

* Lint error

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* refined workflow scripts

* fixed bug in ParallelAgentWrapper

* 1. fixed lint issues; 2. refined main script in workflows

* lint issue fix

* restored default config for rl example

* Update rollout.py

* refined env var processing in policy manager workflow

* added hasattr check in agent wrapper

* updated docker_compose_yml.py

* Minor refinement

* Minor PR. Prepare to merge latest master branch into v0.3 branch. (#412)

* Prepare to merge master_mirror

* Lint error

* Minor

* Merge latest master into v0.3 (#426)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* Minor

* Remove docs/source/examples/multi_agent_dqn_cim.rst

* Update .gitignore

* Update .gitignore

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>

* Change `Env.set_seed()` logic (#456)

* Change Env.set_seed() logic

* Redesign CIM reset logic; fix lint issues;

* Lint

* Seed type assertion

* Remove all SC related files (#473)

* RL Toolkit V3 (#471)

* added daemon=True for multi-process rollout, policy manager and inference

* removed obsolete files

* [REDO][PR#406]V0.2 rl refinement taskq (#408)

* Add a usable task_queue

* Rename some variables

* 1. Add ; 2. Integrate  related files; 3. Remove

* merge `data_parallel` and `num_grad_workers` into `data_parallelism`

* Fix bugs in docker_compose_yml.py and Simple/Multi-process mode.

* Move `grad_worker` into marl/rl/workflows

* 1.Merge data_parallel and num_workers into data_parallelism in config; 2.Assign recently used workers as possible in task_queue.

* Refine code and update docs of `TaskQueue`

* Support priority for tasks in `task_queue`

* Update diagram of policy manager and task queue.

* Add configurable `single_task_limit` and correct docstring about `data_parallelism`

* Fix lint errors in `supply chain`

* RL policy redesign (V2) (#405)

* Drafi v2.0 for V2

* Polish models with more comments

* Polish policies with more comments

* Lint

* Lint

* Add developer doc for models.

* Add developer doc for policies.

* Remove policy manager V2 since it is not used and out-of-date

* Lint

* Lint

* refined messy workflow code

* merged 'scenario_dir' and 'scenario' in rl config

* 1. refined env_sampler and agent_wrapper code; 2. added docstrings for env_sampler methods

* 1. temporarily renamed RLPolicy from polivy_v2 to RLPolicyV2; 2. merged env_sampler and env_sampler_v2

* merged cim and cim_v2

* lint issue fix

* refined logging logic

* lint issue fix

* reversed unwanted changes

* .

.

.

.

ReplayMemory & IndexScheduler

ReplayMemory & IndexScheduler

.

MultiReplayMemory

get_actions_with_logps

EnvSampler on the road

EnvSampler

Minor

* LearnerManager

* Use batch to transfer data & add SHAPE_CHECK_FLAG

* Rename learner to trainer

* Add property for policy._is_exploring

* CIM test scenario for V3. Manual test passed. Next step: run it, make it works.

* env_sampler.py could run

* env_sampler refine on the way

* First runnable version done

* AC could run, but the result is bad. Need to check the logic

* Refine abstract method & shape check error info.

* Docs

* Very detailed compare. Try again.

* AC done

* DQN check done

* Minor

* DDPG, not tested

* Minors

* A rough draft of MAAC

* Cannot use CIM as the multi-agent scenario.

* Minor

* MAAC refinement on the way

* Remove ActionWithAux

* Refine batch & memory

* MAAC example works

* Reproduce-able fix. Policy share between env_sampler and trainer_manager.

* Detail refinement

* Simplify the user configed workflow

* Minor

* Refine example codes

* Minor polishment

* Migrate rollout_manager to V3

* Error on the way

* Redesign torch.device management

* Rl v3 maddpg (#418)

* Add MADDPG trainer

* Fit independent critics and shared critic modes.

* Add a new property: num_policies

* Lint

* Fix a bug in `sum(rewards)`

* Rename `MADDPG` to `DiscreteMADDPG` and fix type hint.

* Rename maddpg in examples.

* Preparation for data parallel (#420)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* Rename train worker to train ops; add placeholder for abstract methods;

* Lint

Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>

* [DRAFT] distributed training pipeline based on RL Toolkit V3 (#450)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* dsitributed training pipeline draft

* added temporary test files for review purposes

* Several code style refinements (#451)

* Polish rl_v3/utils/

* Polish rl_v3/distributed/

* Polish rl_v3/policy_trainer/abs_trainer.py

* fixed merge conflicts

* unified sync and async interfaces

* refactored rl_v3; refinement in progress

* Finish the runnable pipeline under new design

* Remove outdated files; refine class names; optimize imports;

* Lint

* Minor maddpg related refinement

* Lint

Co-authored-by: Default <huo53926@126.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Miner bug fix

* Coroutine-related bug fix ("get_policy_state") (#452)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* deleted unwanted folder

* removed unwanted changes

* resolved PR452 comments

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Quick fix

* Redesign experience recording logic (#453)

* Two not important fix

* Temp draft. Prepare to WFH

* Done

* Lint

* Lint

* Calculating advantages / returns (#454)

* V1.0

* Complete DDPG

* Rl v3 hanging issue fix (#455)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* unified worker interfaces

* recovered some files

* dist training + cli code move

* fixed bugs

* added retry logic to client

* 1. refactored CIM with various algos; 2. lint

* lint

* added type hint

* removed some logs

* lint

* Make main.py more IDE friendly

* Make main.py more IDE friendly

* Lint

* Final test & format. Ready to merge.

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* Rl v3 parallel rollout (#457)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* unified worker interfaces

* recovered some files

* dist training + cli code move

* fixed bugs

* added retry logic to client

* 1. refactored CIM with various algos; 2. lint

* lint

* added type hint

* removed some logs

* lint

* Make main.py more IDE friendly

* Make main.py more IDE friendly

* Lint

* load balancing dispatcher

* added parallel rollout

* lint

* Tracker variable type issue; rename to env_sampler_creator;

* Rl v3 parallel rollout follow ups (#458)

* AbsWorker & AbsDispatcher

* Pass env idx to AbsTrainer.record() method, and let the trainer to decide how to record experiences sampled from different worlds.

* Fix policy_creator reuse bug

* Format code

* Merge AbsTrainerManager & SimpleTrainerManager

* AC test passed

* Lint

* Remove AbsTrainer.build() method. Put all initialization operations into __init__

* Redesign AC preprocess batches logic

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* MADDPG performance bug fix (#459)

* Fix MARL (MADDPG) terminal recording bug; some other minor refinements;

* Restore Trainer.build() method

* Calculate latest action in the get_actor_grad method in MADDPG.

* Share critic bug fix

* Rl v3 example update (#461)

* updated vm_scheduling example and cim notebook

* fixed bugs in vm_scheduling

* added local train method

* bug fix

* modified async client logic to fix hidden issue

* reverted to default config

* fixed PR comments and some bugs

* removed hardcode

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Done (#462)

* Rl v3 load save (#463)

* added load/save feature

* fixed some bugs

* reverted unwanted changes

* lint

* fixed PR comments

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL Toolkit data parallelism revamp & config utils (#464)

* added load/save feature

* fixed some bugs

* reverted unwanted changes

* lint

* fixed PR comments

* 1. fixed data parallelism issue; 2. added config validator; 3. refactored cli local

* 1. fixed rollout exit issue; 2. refined config

* removed config file from example

* fixed lint issues

* fixed lint issues

* added main.py under examples/rl

* fixed lint issues

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL doc string (#465)

* First rough draft

* Minors

* Reformat

* Lint

* Resolve PR comments

* Rl type specific env getter (#466)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* fixed bugs

* fixed bugs

* bug fixes

* lint

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Example bug fix

* Optimize parser.py

* Resolve PR comments

* Rl config doc (#467)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* added detailed doc

* lint

* wording refined

* resolved some PR comments

* resolved more PR comments

* typo fix

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL online doc (#469)

* Model, policy, trainer

* RL workflows and env sampler doc in RST (#468)

* First rough draft

* Minors

* Reformat

* Lint

* Resolve PR comments

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* Rl type specific env getter (#466)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* fixed bugs

* fixed bugs

* bug fixes

* lint

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Example bug fix

* Optimize parser.py

* Resolve PR comments

* added detailed doc

* lint

* wording refined

* resolved some PR comments

* rewriting rl toolkit rst

* resolved more PR comments

* typo fix

* updated rst

Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: Default <huo53926@126.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Finish docs/source/key_components/rl_toolkit.rst

* API doc

* RL online doc image fix (#470)

* resolved some PR comments

* fix

* fixed PR comments

* added numfig=True setting in conf.py for sphinx

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Resolve PR comments

* Add example github link

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Rl v3 pr comment resolution (#474)

* added load/save feature

* 1. resolved pr comments; 2. reverted maro/cli/k8s

* fixed some bugs

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* RL renaming v2 (#476)

* Change all Logger in RL to LoggerV2

* TrainerManager => TrainingManager

* Add Trainer suffix to all algorithms

* Finish docs

* Update interface names

* Minor fix

* Cherry pick latest RL (#498)

* Cherry pick

* Remove SC related files

* Cherry pick RL changes from `sc_refinement` (latest commit: `2a4869`) (#509)

* Cherry pick RL changes from sc_refinement (2a4869)

* Limit time display precision

* RL incremental refactor (#501)

* Refactor rollout logic. Allow multiple sampling in one epoch, so that we can generate more data for training.

AC & PPO for continuous action policy; refine AC & PPO logic.

Cherry pick RL changes from GYM-DDPG

Cherry pick RL changes from GYM-SAC

Minor error in doc string

* Add min_n_sample in template and parser

* Resolve PR comments. Fix a minor issue in SAC.

* RL component bundle (#513)

* CIM passed

* Update workers

* Refine annotations

* VM passed

* Code formatting.

* Minor import loop issue

* Pass batch in PPO again

* Remove Scenario

* Complete docs

* Minor

* Remove segment

* Optimize logic in RLComponentBundle

* Resolve PR comments

* Move 'post methods from RLComponenetBundle to EnvSampler

* Add method to get mapping of available tick to frame index (#415)

* add method to get mapping of available tick to frame index

* fix lint issue

* fix naming issue

* Cherry pick from sc_refinement (#527)

* Cherry pick from sc_refinement

* Cherry pick from sc_refinement

* Refine `terminal` / `next_agent_state` logic (#531)

* Optimize RL toolkit

* Fix bug in terminal/next_state generation

* Rewrite terminal/next_state logic again

* Minor renaming

* Minor bug fix

* Resolve PR comments

* Merge master into v0.3 (#536)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* add branch v0.3 to github workflow

* update github test workflow

* Update requirements.dev.txt (#444)

Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact.

* Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460)

Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](https://github.com/ipython/ipython/compare/7.10.1...7.16.3)

---
updated-dependencies:
- dependency-name: ipython
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add & sort requirements.dev.txt

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
Co-authored-by: solosilence <abhishekkr23rs@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Remove random_config.py

* Remove test_trajectory_utils.py

* Pass tests

* Update rl docs

* Remove python 3.6 in test

* Update docs

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wang.Jinyu <jinywan@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: GQ.Chen <675865907@qq.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: solosilence <abhishekkr23rs@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Logger bug hotfix (#543)

* Rename param

* Rename param

* Quick fix in env_data_process

* frame data precision issue fix (#544)

* fix frame precision issue

* add .xmake to .gitignore

* update frame precision lost warning message

* add assert to frame precision checking

* typo fix

* add TODO for future Long data type issue fix

* Merge master into v0.3 (#545)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* add branch v0.3 to github workflow

* update github test workflow

* Update requirements.dev.txt (#444)

Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact.

* Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460)

Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](https://github.com/ipython/ipython/compare/7.10.1...7.16.3)

---
updated-dependencies:
- dependency-name: ipython
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* update github woorkflow config

* MARO v0.3: a new design of RL Toolkit, CLI refactorization, and corresponding updates. (#539)

* refined proxy coding style

* updated images and refined doc

* updated images

* updated CIM-AC example

* refined proxy retry logic

* call policy update only for AbsCorePolicy

* add limitation of AbsCorePolicy in Actor.collect()

* refined actor to return only experiences for policies that received new experiences

* fix MsgKey issue in rollout_manager

* fix typo in learner

* call exit function for parallel rollout manager

* update supply chain example distributed training scripts

* 1. moved exploration scheduling to rollout manager; 2. fixed bug in lr schedule registration in core model; 3. added parallel policy manager prorotype

* reformat render

* fix supply chain business engine action type problem

* reset supply chain example render figsize from 4 to 3

* Add render to all modes of supply chain example

* fix or policy typos

* 1. added parallel policy manager prototype; 2. used training ep for evaluation episodes

* refined parallel policy manager

* updated rl/__init__/py

* fixed lint issues and CIM local learner bugs

* deleted unwanted supply_chain test files

* revised default config for cim-dqn

* removed test_store.py as it is no longer needed

* 1. changed Actor class to rollout_worker function; 2. renamed algorithm to algorithms

* updated figures

* removed unwanted import

* refactored CIM-DQN example

* added MultiProcessRolloutManager and MultiProcessTrainingManager

* updated doc

* lint issue fix

* lint issue fix

* fixed import formatting

* [Feature] Prioritized Experience Replay (#355)

* added prioritized experience replay

* deleted unwanted supply_chain test files

* fixed import order

* import fix

* fixed lint issues

* fixed import formatting

* added note in docstring that rank-based PER has yet to be implemented

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* rm AbsDecisionGenerator

* small fixes

* bug fix

* reorganized training folder structure

* fixed lint issues

* fixed lint issues

* policy manager refined

* lint fix

* restructured CIM-dqn sync code

* added policy version index and used it as a measure of experience staleness

* lint issue fix

* lint issue fix

* switched log_dir and proxy_kwargs order

* cim example refinement

* eval schedule sorted only when it's a list

* eval schedule sorted only when it's a list

* update sc env wrapper

* added docker scripts for cim-dqn

* refactored example folder structure and added workflow templates

* fixed lint issues

* fixed lint issues

* fixed template bugs

* removed unused imports

* refactoring sc in progress

* simplified cim meta

* fixed build.sh path bug

* template refinement

* deleted obsolete svgs

* updated learner logs

* minor edits

* refactored templates for easy merge with async PR

* added component names for rollout manager and policy manager

* fixed incorrect position to add last episode to eval schedule

* added max_lag option in templates

* formatting edit in docker_compose_yml script

* moved local learner and early stopper outside sync_tools

* refactored rl toolkit folder structure

* refactored rl toolkit folder structure

* moved env_wrapper and agent_wrapper inside rl/learner

* refined scripts

* fixed typo in script

* changes needed for running sc

* removed unwanted imports

* config change for testing sc scenario

* changes for perf testing

* Asynchronous Training (#364)

* remote inference code draft

* changed actor to rollout_worker and updated init files

* removed unwanted import

* updated inits

* more async code

* added async scripts

* added async training code & scripts for CIM-dqn

* changed async to async_tools to avoid conflict with python keyword

* reverted unwanted change to dockerfile

* added doc for policy server

* addressed PR comments and fixed a bug in docker_compose_yml.py

* fixed lint issue

* resolved PR comment

* resolved merge conflicts

* added async templates

* added proxy.close() for actor and policy_server

* fixed incorrect position to add last episode to eval schedule

* reverted unwanted changes

* added missing async files

* rm unwanted echo in kill.sh

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* renamed sync to synchronous and async to asynchronous to avoid conflict with keyword

* added missing policy version increment in LocalPolicyManager

* refined rollout manager recv logic

* removed a debugging print

* added sleep in distributed launcher to avoid hanging

* updated api doc and rl toolkit doc

* refined dynamic imports using importlib

* 1. moved policy update triggers to policy manager; 2. added version control in policy manager

* fixed a few bugs and updated cim RL example

* fixed a few more bugs

* added agent wrapper instantiation to workflows

* added agent wrapper instantiation to workflows

* removed abs_block and added max_prob option for DiscretePolicyNet and DiscreteACNet

* fixed incorrect get_ac_policy signature for CIM

* moved exploration inside core policy

* added state to exploration call to support context-dependent exploration

* separated non_rl_policy_index and rl_policy_index in workflows

* modified sc example code according to workflow changes

* modified sc example code according to workflow changes

* added replay_agent_ids parameter to get_env_func for RL examples

* fixed a few bugs

* added maro/simulator/scenarios/supply_chain as bind mount

* added post-step, post-collect, post-eval and post-update callbacks

* fixed lint issues

* fixed lint issues

* moved instantiation of policy manager inside simple learner

* fixed env_wrapper get_reward signature

* minor edits

* removed get_eperience kwargs from env_wrapper

* 1. renamed step_callback to post_step in env_wrapper; 2. added get_eval_env_func to RL workflows

* added rollout exp disribution option in RL examples

* removed unwanted files

* 1. made logger internal in learner; 2 removed logger creation in abs classes

* checked out supply chain test files from v0.2_sc

* 1. added missing model.eval() to choose_action; 2.added entropy features to AC

* fixed a bug in ac entropy

* abbreviated coefficient to coeff

* removed -dqn from job name in rl example config

* added tmp patch to dev.df

* renamed image name for running rl examples

* added get_loss interface for core policies

* added policy manager in rl_toolkit.rst

* 1. env_wrapper bug fix; 2. policy manager update logic refinement

* refactored policy and algorithms

* policy interface redesigned

* refined policy interfaces

* fixed typo

* fixed bugs in refactored policy interface

* fixed some bugs

* refactoring in progress

* policy interface and policy manager redesigned

* 1. fixed bugs in ac and pg; 2. fixed bugs rl workflow scripts

* fixed bug in distributed policy manager

* fixed lint issues

* fixed lint issues

* added scipy in setup

* 1. trimmed rollout manager code; 2. added option to docker scripts

* updated api doc for policy manager

* 1. simplified rl/learning code structure; 2. fixed bugs in rl example docker script

* 1. simplified rl example structure; 2. fixed lint issues

* further rl toolkit code simplifications

* more numpy-based optimization in RL toolkit

* moved replay buffer inside policy

* bug fixes

* numpy optimization and associated refactoring

* extracted shaping logic out of env_sampler

* fixed bug in CIM shaping and lint issues

* preliminary implemetation of parallel batch inference

* fixed bug in ddpg transition recording

* put get_state, get_env_actions, get_reward back in EnvSampler

* simplified exploration and core model interfaces

* bug fixes and doc update

* added improve() interface for RLPolicy for single-thread support

* fixed simple policy manager bug

* updated doc, rst, notebook

* updated notebook

* fixed lint issues

* fixed entropy bugs in ac.py

* reverted to simple policy manager as default

* 1. unified single-thread and distributed mode in learning_loop.py; 2. updated api doc for algorithms and rst for rl toolkit

* fixed lint issues and updated rl toolkit images

* removed obsolete images

* added back agent2policy for general workflow use

* V0.2 rl refinement dist (#377)

* Support `slice` operation in ExperienceSet

* Support naive distributed policy training by proxy

* Dynamically allocate trainers according to number of experience

* code check

* code check

* code check

* Fix a bug in distributed trianing with no gradient

* Code check

* Move Back-Propagation from trainer to policy_manager and extract trainer-allocation strategy

* 1.call allocate_trainer() at first of update(); 2.refine according to code review

* Code check

* Refine code with new interface

* Update docs of PolicyManger and ExperienceSet

* Add images for rl_toolkit docs

* Update diagram of PolicyManager

* Refine with new interface

* Extract allocation strategy into `allocation_strategy.py`

* add `distributed_learn()` in policies for data-parallel training

* Update doc of RL_toolkit

* Add gradient workers for data-parallel

* Refine code and update docs

* Lint check

* Refine by comments

* Rename `trainer` to `worker`

* Rename `distributed_learn` to `learn_with_data_parallel`

* Refine allocator and remove redundant code in policy_manager

* remove arugments in allocate_by_policy and so on

* added checkpointing for simple and multi-process policy managers

* 1. bug fixes in checkpointing; 2. removed version and max_lag in rollout manager

* added missing set_state and get_state for CIM policies

* removed blank line

* updated RL workflow README

* Integrate `data_parallel` arguments into `worker_allocator` (#402)

* 1. simplified workflow config; 2. added comments to CIM shaping

* lint issue fix

* 1. added algorithm type setting in CIM config; 2. added try-except clause for initial policy state loading

* 1. moved post_step callback inside env sampler; 2. updated README for rl workflows

* refined READEME for CIM

* VM scheduling with RL (#375)

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* added DQN

* added get_experiences func for ac in vm scheduling

* added post_step callback to env wrapper

* moved Aiming's tracking and plotting logic into callbacks

* added eval env wrapper

* renamed AC config variable name for VM

* vm scheduling RL code finished

* updated README

* fixed various bugs and hard coding for vm_scheduling

* uncommented callbacks for VM scheduling

* Minor revision for better code style

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* vm example refactoring

* fixed bugs in vm_scheduling

* removed unwanted files from cim dir

* reverted to simple policy manager as default

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* resolved rebase conflicts

* fixed bugs in vm_scheduling

* added get_state and set_state to vm_scheduling policy models

* updated README for vm_scheduling with RL

Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* SC refinement (#397)

* Refine test scripts & pending_order_daily logic

* Refactor code for better code style: complete type hint, correct typos, remove unused items.

Refactor code for better code style: complete type hint, correct typos, remove unused items.

* Polish test_supply_chain.py

* update import format

* Modify vehicle steps logic & remo…
Jinyu-W added a commit that referenced this pull request Mar 30, 2023
…588)

* call policy update only for AbsCorePolicy

* add limitation of AbsCorePolicy in Actor.collect()

* refined actor to return only experiences for policies that received new experiences

* fix MsgKey issue in rollout_manager

* fix typo in learner

* call exit function for parallel rollout manager

* update supply chain example distributed training scripts

* 1. moved exploration scheduling to rollout manager; 2. fixed bug in lr schedule registration in core model; 3. added parallel policy manager prorotype

* reformat render

* fix supply chain business engine action type problem

* reset supply chain example render figsize from 4 to 3

* Add render to all modes of supply chain example

* fix or policy typos

* 1. added parallel policy manager prototype; 2. used training ep for evaluation episodes

* refined parallel policy manager

* updated rl/__init__/py

* fixed lint issues and CIM local learner bugs

* deleted unwanted supply_chain test files

* revised default config for cim-dqn

* removed test_store.py as it is no longer needed

* 1. changed Actor class to rollout_worker function; 2. renamed algorithm to algorithms

* updated figures

* removed unwanted import

* refactored CIM-DQN example

* added MultiProcessRolloutManager and MultiProcessTrainingManager

* updated doc

* lint issue fix

* lint issue fix

* fixed import formatting

* [Feature] Prioritized Experience Replay (#355)

* added prioritized experience replay

* deleted unwanted supply_chain test files

* fixed import order

* import fix

* fixed lint issues

* fixed import formatting

* added note in docstring that rank-based PER has yet to be implemented

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* rm AbsDecisionGenerator

* small fixes

* bug fix

* reorganized training folder structure

* fixed lint issues

* fixed lint issues

* policy manager refined

* lint fix

* restructured CIM-dqn sync code

* added policy version index and used it as a measure of experience staleness

* lint issue fix

* lint issue fix

* switched log_dir and proxy_kwargs order

* cim example refinement

* eval schedule sorted only when it's a list

* eval schedule sorted only when it's a list

* update sc env wrapper

* added docker scripts for cim-dqn

* refactored example folder structure and added workflow templates

* fixed lint issues

* fixed lint issues

* fixed template bugs

* removed unused imports

* refactoring sc in progress

* simplified cim meta

* fixed build.sh path bug

* template refinement

* deleted obsolete svgs

* updated learner logs

* minor edits

* refactored templates for easy merge with async PR

* added component names for rollout manager and policy manager

* fixed incorrect position to add last episode to eval schedule

* added max_lag option in templates

* formatting edit in docker_compose_yml script

* moved local learner and early stopper outside sync_tools

* refactored rl toolkit folder structure

* refactored rl toolkit folder structure

* moved env_wrapper and agent_wrapper inside rl/learner

* refined scripts

* fixed typo in script

* changes needed for running sc

* removed unwanted imports

* config change for testing sc scenario

* changes for perf testing

* Asynchronous Training (#364)

* remote inference code draft

* changed actor to rollout_worker and updated init files

* removed unwanted import

* updated inits

* more async code

* added async scripts

* added async training code & scripts for CIM-dqn

* changed async to async_tools to avoid conflict with python keyword

* reverted unwanted change to dockerfile

* added doc for policy server

* addressed PR comments and fixed a bug in docker_compose_yml.py

* fixed lint issue

* resolved PR comment

* resolved merge conflicts

* added async templates

* added proxy.close() for actor and policy_server

* fixed incorrect position to add last episode to eval schedule

* reverted unwanted changes

* added missing async files

* rm unwanted echo in kill.sh

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* renamed sync to synchronous and async to asynchronous to avoid conflict with keyword

* added missing policy version increment in LocalPolicyManager

* refined rollout manager recv logic

* removed a debugging print

* added sleep in distributed launcher to avoid hanging

* updated api doc and rl toolkit doc

* refined dynamic imports using importlib

* 1. moved policy update triggers to policy manager; 2. added version control in policy manager

* fixed a few bugs and updated cim RL example

* fixed a few more bugs

* added agent wrapper instantiation to workflows

* added agent wrapper instantiation to workflows

* removed abs_block and added max_prob option for DiscretePolicyNet and DiscreteACNet

* fixed incorrect get_ac_policy signature for CIM

* moved exploration inside core policy

* added state to exploration call to support context-dependent exploration

* separated non_rl_policy_index and rl_policy_index in workflows

* modified sc example code according to workflow changes

* modified sc example code according to workflow changes

* added replay_agent_ids parameter to get_env_func for RL examples

* fixed a few bugs

* added maro/simulator/scenarios/supply_chain as bind mount

* added post-step, post-collect, post-eval and post-update callbacks

* fixed lint issues

* fixed lint issues

* moved instantiation of policy manager inside simple learner

* fixed env_wrapper get_reward signature

* minor edits

* removed get_eperience kwargs from env_wrapper

* 1. renamed step_callback to post_step in env_wrapper; 2. added get_eval_env_func to RL workflows

* added rollout exp disribution option in RL examples

* removed unwanted files

* 1. made logger internal in learner; 2 removed logger creation in abs classes

* checked out supply chain test files from v0.2_sc

* 1. added missing model.eval() to choose_action; 2.added entropy features to AC

* fixed a bug in ac entropy

* abbreviated coefficient to coeff

* removed -dqn from job name in rl example config

* added tmp patch to dev.df

* renamed image name for running rl examples

* added get_loss interface for core policies

* added policy manager in rl_toolkit.rst

* 1. env_wrapper bug fix; 2. policy manager update logic refinement

* refactored policy and algorithms

* policy interface redesigned

* refined policy interfaces

* fixed typo

* fixed bugs in refactored policy interface

* fixed some bugs

* refactoring in progress

* policy interface and policy manager redesigned

* 1. fixed bugs in ac and pg; 2. fixed bugs rl workflow scripts

* fixed bug in distributed policy manager

* fixed lint issues

* fixed lint issues

* added scipy in setup

* 1. trimmed rollout manager code; 2. added option to docker scripts

* updated api doc for policy manager

* 1. simplified rl/learning code structure; 2. fixed bugs in rl example docker script

* 1. simplified rl example structure; 2. fixed lint issues

* further rl toolkit code simplifications

* more numpy-based optimization in RL toolkit

* moved replay buffer inside policy

* bug fixes

* numpy optimization and associated refactoring

* extracted shaping logic out of env_sampler

* fixed bug in CIM shaping and lint issues

* preliminary implemetation of parallel batch inference

* fixed bug in ddpg transition recording

* put get_state, get_env_actions, get_reward back in EnvSampler

* simplified exploration and core model interfaces

* bug fixes and doc update

* added improve() interface for RLPolicy for single-thread support

* fixed simple policy manager bug

* updated doc, rst, notebook

* updated notebook

* fixed lint issues

* fixed entropy bugs in ac.py

* reverted to simple policy manager as default

* 1. unified single-thread and distributed mode in learning_loop.py; 2. updated api doc for algorithms and rst for rl toolkit

* fixed lint issues and updated rl toolkit images

* removed obsolete images

* added back agent2policy for general workflow use

* V0.2 rl refinement dist (#377)

* Support `slice` operation in ExperienceSet

* Support naive distributed policy training by proxy

* Dynamically allocate trainers according to number of experience

* code check

* code check

* code check

* Fix a bug in distributed trianing with no gradient

* Code check

* Move Back-Propagation from trainer to policy_manager and extract trainer-allocation strategy

* 1.call allocate_trainer() at first of update(); 2.refine according to code review

* Code check

* Refine code with new interface

* Update docs of PolicyManger and ExperienceSet

* Add images for rl_toolkit docs

* Update diagram of PolicyManager

* Refine with new interface

* Extract allocation strategy into `allocation_strategy.py`

* add `distributed_learn()` in policies for data-parallel training

* Update doc of RL_toolkit

* Add gradient workers for data-parallel

* Refine code and update docs

* Lint check

* Refine by comments

* Rename `trainer` to `worker`

* Rename `distributed_learn` to `learn_with_data_parallel`

* Refine allocator and remove redundant code in policy_manager

* remove arugments in allocate_by_policy and so on

* added checkpointing for simple and multi-process policy managers

* 1. bug fixes in checkpointing; 2. removed version and max_lag in rollout manager

* added missing set_state and get_state for CIM policies

* removed blank line

* updated RL workflow README

* Integrate `data_parallel` arguments into `worker_allocator` (#402)

* 1. simplified workflow config; 2. added comments to CIM shaping

* lint issue fix

* 1. added algorithm type setting in CIM config; 2. added try-except clause for initial policy state loading

* 1. moved post_step callback inside env sampler; 2. updated README for rl workflows

* refined READEME for CIM

* VM scheduling with RL (#375)

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* added DQN

* added get_experiences func for ac in vm scheduling

* added post_step callback to env wrapper

* moved Aiming's tracking and plotting logic into callbacks

* added eval env wrapper

* renamed AC config variable name for VM

* vm scheduling RL code finished

* updated README

* fixed various bugs and hard coding for vm_scheduling

* uncommented callbacks for VM scheduling

* Minor revision for better code style

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* vm example refactoring

* fixed bugs in vm_scheduling

* removed unwanted files from cim dir

* reverted to simple policy manager as default

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* resolved rebase conflicts

* fixed bugs in vm_scheduling

* added get_state and set_state to vm_scheduling policy models

* updated README for vm_scheduling with RL

Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* SC refinement (#397)

* Refine test scripts & pending_order_daily logic

* Refactor code for better code style: complete type hint, correct typos, remove unused items.

Refactor code for better code style: complete type hint, correct typos, remove unused items.

* Polish test_supply_chain.py

* update import format

* Modify vehicle steps logic & remove outdated test case

* Optimize imports

* Optimize imports

* Lint error

* Lint error

* Lint error

* Add SupplyChainAction

* Lint error

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* refined workflow scripts

* fixed bug in ParallelAgentWrapper

* 1. fixed lint issues; 2. refined main script in workflows

* lint issue fix

* restored default config for rl example

* Update rollout.py

* refined env var processing in policy manager workflow

* added hasattr check in agent wrapper

* updated docker_compose_yml.py

* Minor refinement

* Minor PR. Prepare to merge latest master branch into v0.3 branch. (#412)

* Prepare to merge master_mirror

* Lint error

* Minor

* Merge latest master into v0.3 (#426)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* Minor

* Remove docs/source/examples/multi_agent_dqn_cim.rst

* Update .gitignore

* Update .gitignore

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>

* Change `Env.set_seed()` logic (#456)

* Change Env.set_seed() logic

* Redesign CIM reset logic; fix lint issues;

* Lint

* Seed type assertion

* Remove all SC related files (#473)

* RL Toolkit V3 (#471)

* added daemon=True for multi-process rollout, policy manager and inference

* removed obsolete files

* [REDO][PR#406]V0.2 rl refinement taskq (#408)

* Add a usable task_queue

* Rename some variables

* 1. Add ; 2. Integrate  related files; 3. Remove

* merge `data_parallel` and `num_grad_workers` into `data_parallelism`

* Fix bugs in docker_compose_yml.py and Simple/Multi-process mode.

* Move `grad_worker` into marl/rl/workflows

* 1.Merge data_parallel and num_workers into data_parallelism in config; 2.Assign recently used workers as possible in task_queue.

* Refine code and update docs of `TaskQueue`

* Support priority for tasks in `task_queue`

* Update diagram of policy manager and task queue.

* Add configurable `single_task_limit` and correct docstring about `data_parallelism`

* Fix lint errors in `supply chain`

* RL policy redesign (V2) (#405)

* Drafi v2.0 for V2

* Polish models with more comments

* Polish policies with more comments

* Lint

* Lint

* Add developer doc for models.

* Add developer doc for policies.

* Remove policy manager V2 since it is not used and out-of-date

* Lint

* Lint

* refined messy workflow code

* merged 'scenario_dir' and 'scenario' in rl config

* 1. refined env_sampler and agent_wrapper code; 2. added docstrings for env_sampler methods

* 1. temporarily renamed RLPolicy from polivy_v2 to RLPolicyV2; 2. merged env_sampler and env_sampler_v2

* merged cim and cim_v2

* lint issue fix

* refined logging logic

* lint issue fix

* reversed unwanted changes

* .

.

.

.

ReplayMemory & IndexScheduler

ReplayMemory & IndexScheduler

.

MultiReplayMemory

get_actions_with_logps

EnvSampler on the road

EnvSampler

Minor

* LearnerManager

* Use batch to transfer data & add SHAPE_CHECK_FLAG

* Rename learner to trainer

* Add property for policy._is_exploring

* CIM test scenario for V3. Manual test passed. Next step: run it, make it works.

* env_sampler.py could run

* env_sampler refine on the way

* First runnable version done

* AC could run, but the result is bad. Need to check the logic

* Refine abstract method & shape check error info.

* Docs

* Very detailed compare. Try again.

* AC done

* DQN check done

* Minor

* DDPG, not tested

* Minors

* A rough draft of MAAC

* Cannot use CIM as the multi-agent scenario.

* Minor

* MAAC refinement on the way

* Remove ActionWithAux

* Refine batch & memory

* MAAC example works

* Reproduce-able fix. Policy share between env_sampler and trainer_manager.

* Detail refinement

* Simplify the user configed workflow

* Minor

* Refine example codes

* Minor polishment

* Migrate rollout_manager to V3

* Error on the way

* Redesign torch.device management

* Rl v3 maddpg (#418)

* Add MADDPG trainer

* Fit independent critics and shared critic modes.

* Add a new property: num_policies

* Lint

* Fix a bug in `sum(rewards)`

* Rename `MADDPG` to `DiscreteMADDPG` and fix type hint.

* Rename maddpg in examples.

* Preparation for data parallel (#420)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* Rename train worker to train ops; add placeholder for abstract methods;

* Lint

Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>

* [DRAFT] distributed training pipeline based on RL Toolkit V3 (#450)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* dsitributed training pipeline draft

* added temporary test files for review purposes

* Several code style refinements (#451)

* Polish rl_v3/utils/

* Polish rl_v3/distributed/

* Polish rl_v3/policy_trainer/abs_trainer.py

* fixed merge conflicts

* unified sync and async interfaces

* refactored rl_v3; refinement in progress

* Finish the runnable pipeline under new design

* Remove outdated files; refine class names; optimize imports;

* Lint

* Minor maddpg related refinement

* Lint

Co-authored-by: Default <huo53926@126.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Miner bug fix

* Coroutine-related bug fix ("get_policy_state") (#452)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* deleted unwanted folder

* removed unwanted changes

* resolved PR452 comments

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Quick fix

* Redesign experience recording logic (#453)

* Two not important fix

* Temp draft. Prepare to WFH

* Done

* Lint

* Lint

* Calculating advantages / returns (#454)

* V1.0

* Complete DDPG

* Rl v3 hanging issue fix (#455)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* unified worker interfaces

* recovered some files

* dist training + cli code move

* fixed bugs

* added retry logic to client

* 1. refactored CIM with various algos; 2. lint

* lint

* added type hint

* removed some logs

* lint

* Make main.py more IDE friendly

* Make main.py more IDE friendly

* Lint

* Final test & format. Ready to merge.

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* Rl v3 parallel rollout (#457)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* unified worker interfaces

* recovered some files

* dist training + cli code move

* fixed bugs

* added retry logic to client

* 1. refactored CIM with various algos; 2. lint

* lint

* added type hint

* removed some logs

* lint

* Make main.py more IDE friendly

* Make main.py more IDE friendly

* Lint

* load balancing dispatcher

* added parallel rollout

* lint

* Tracker variable type issue; rename to env_sampler_creator;

* Rl v3 parallel rollout follow ups (#458)

* AbsWorker & AbsDispatcher

* Pass env idx to AbsTrainer.record() method, and let the trainer to decide how to record experiences sampled from different worlds.

* Fix policy_creator reuse bug

* Format code

* Merge AbsTrainerManager & SimpleTrainerManager

* AC test passed

* Lint

* Remove AbsTrainer.build() method. Put all initialization operations into __init__

* Redesign AC preprocess batches logic

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* MADDPG performance bug fix (#459)

* Fix MARL (MADDPG) terminal recording bug; some other minor refinements;

* Restore Trainer.build() method

* Calculate latest action in the get_actor_grad method in MADDPG.

* Share critic bug fix

* Rl v3 example update (#461)

* updated vm_scheduling example and cim notebook

* fixed bugs in vm_scheduling

* added local train method

* bug fix

* modified async client logic to fix hidden issue

* reverted to default config

* fixed PR comments and some bugs

* removed hardcode

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Done (#462)

* Rl v3 load save (#463)

* added load/save feature

* fixed some bugs

* reverted unwanted changes

* lint

* fixed PR comments

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL Toolkit data parallelism revamp & config utils (#464)

* added load/save feature

* fixed some bugs

* reverted unwanted changes

* lint

* fixed PR comments

* 1. fixed data parallelism issue; 2. added config validator; 3. refactored cli local

* 1. fixed rollout exit issue; 2. refined config

* removed config file from example

* fixed lint issues

* fixed lint issues

* added main.py under examples/rl

* fixed lint issues

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL doc string (#465)

* First rough draft

* Minors

* Reformat

* Lint

* Resolve PR comments

* Rl type specific env getter (#466)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* fixed bugs

* fixed bugs

* bug fixes

* lint

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Example bug fix

* Optimize parser.py

* Resolve PR comments

* Rl config doc (#467)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* added detailed doc

* lint

* wording refined

* resolved some PR comments

* resolved more PR comments

* typo fix

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* RL online doc (#469)

* Model, policy, trainer

* RL workflows and env sampler doc in RST (#468)

* First rough draft

* Minors

* Reformat

* Lint

* Resolve PR comments

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* Rl type specific env getter (#466)

* 1. type-sensitive env variable getter; 2. updated READMEs for examples

* fixed bugs

* fixed bugs

* bug fixes

* lint

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Example bug fix

* Optimize parser.py

* Resolve PR comments

* added detailed doc

* lint

* wording refined

* resolved some PR comments

* rewriting rl toolkit rst

* resolved more PR comments

* typo fix

* updated rst

Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: Default <huo53926@126.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Finish docs/source/key_components/rl_toolkit.rst

* API doc

* RL online doc image fix (#470)

* resolved some PR comments

* fix

* fixed PR comments

* added numfig=True setting in conf.py for sphinx

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Resolve PR comments

* Add example github link

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

* Rl v3 pr comment resolution (#474)

* added load/save feature

* 1. resolved pr comments; 2. reverted maro/cli/k8s

* fixed some bugs

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>

Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* RL renaming v2 (#476)

* Change all Logger in RL to LoggerV2

* TrainerManager => TrainingManager

* Add Trainer suffix to all algorithms

* Finish docs

* Update interface names

* Minor fix

* Cherry pick latest RL (#498)

* Cherry pick

* Remove SC related files

* Cherry pick RL changes from `sc_refinement` (latest commit: `2a4869`) (#509)

* Cherry pick RL changes from sc_refinement (2a4869)

* Limit time display precision

* RL incremental refactor (#501)

* Refactor rollout logic. Allow multiple sampling in one epoch, so that we can generate more data for training.

AC & PPO for continuous action policy; refine AC & PPO logic.

Cherry pick RL changes from GYM-DDPG

Cherry pick RL changes from GYM-SAC

Minor error in doc string

* Add min_n_sample in template and parser

* Resolve PR comments. Fix a minor issue in SAC.

* RL component bundle (#513)

* CIM passed

* Update workers

* Refine annotations

* VM passed

* Code formatting.

* Minor import loop issue

* Pass batch in PPO again

* Remove Scenario

* Complete docs

* Minor

* Remove segment

* Optimize logic in RLComponentBundle

* Resolve PR comments

* Move 'post methods from RLComponenetBundle to EnvSampler

* Add method to get mapping of available tick to frame index (#415)

* add method to get mapping of available tick to frame index

* fix lint issue

* fix naming issue

* Cherry pick from sc_refinement (#527)

* Cherry pick from sc_refinement

* Cherry pick from sc_refinement

* Refine `terminal` / `next_agent_state` logic (#531)

* Optimize RL toolkit

* Fix bug in terminal/next_state generation

* Rewrite terminal/next_state logic again

* Minor renaming

* Minor bug fix

* Resolve PR comments

* Merge master into v0.3 (#536)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* add branch v0.3 to github workflow

* update github test workflow

* Update requirements.dev.txt (#444)

Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact.

* Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460)

Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](https://github.com/ipython/ipython/compare/7.10.1...7.16.3)

---
updated-dependencies:
- dependency-name: ipython
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add & sort requirements.dev.txt

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
Co-authored-by: solosilence <abhishekkr23rs@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Merge master into v0.3 (#545)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* add branch v0.3 to github workflow

* update github test workflow

* Update requirements.dev.txt (#444)

Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact.

* Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460)

Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](https://github.com/ipython/ipython/compare/7.10.1...7.16.3)

---
updated-dependencies:
- dependency-name: ipython
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* update github woorkflow config

* MARO v0.3: a new design of RL Toolkit, CLI refactorization, and corresponding updates. (#539)

* refined proxy coding style

* updated images and refined doc

* updated images

* updated CIM-AC example

* refined proxy retry logic

* call policy update only for AbsCorePolicy

* add limitation of AbsCorePolicy in Actor.collect()

* refined actor to return only experiences for policies that received new experiences

* fix MsgKey issue in rollout_manager

* fix typo in learner

* call exit function for parallel rollout manager

* update supply chain example distributed training scripts

* 1. moved exploration scheduling to rollout manager; 2. fixed bug in lr schedule registration in core model; 3. added parallel policy manager prorotype

* reformat render

* fix supply chain business engine action type problem

* reset supply chain example render figsize from 4 to 3

* Add render to all modes of supply chain example

* fix or policy typos

* 1. added parallel policy manager prototype; 2. used training ep for evaluation episodes

* refined parallel policy manager

* updated rl/__init__/py

* fixed lint issues and CIM local learner bugs

* deleted unwanted supply_chain test files

* revised default config for cim-dqn

* removed test_store.py as it is no longer needed

* 1. changed Actor class to rollout_worker function; 2. renamed algorithm to algorithms

* updated figures

* removed unwanted import

* refactored CIM-DQN example

* added MultiProcessRolloutManager and MultiProcessTrainingManager

* updated doc

* lint issue fix

* lint issue fix

* fixed import formatting

* [Feature] Prioritized Experience Replay (#355)

* added prioritized experience replay

* deleted unwanted supply_chain test files

* fixed import order

* import fix

* fixed lint issues

* fixed import formatting

* added note in docstring that rank-based PER has yet to be implemented

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* rm AbsDecisionGenerator

* small fixes

* bug fix

* reorganized training folder structure

* fixed lint issues

* fixed lint issues

* policy manager refined

* lint fix

* restructured CIM-dqn sync code

* added policy version index and used it as a measure of experience staleness

* lint issue fix

* lint issue fix

* switched log_dir and proxy_kwargs order

* cim example refinement

* eval schedule sorted only when it's a list

* eval schedule sorted only when it's a list

* update sc env wrapper

* added docker scripts for cim-dqn

* refactored example folder structure and added workflow templates

* fixed lint issues

* fixed lint issues

* fixed template bugs

* removed unused imports

* refactoring sc in progress

* simplified cim meta

* fixed build.sh path bug

* template refinement

* deleted obsolete svgs

* updated learner logs

* minor edits

* refactored templates for easy merge with async PR

* added component names for rollout manager and policy manager

* fixed incorrect position to add last episode to eval schedule

* added max_lag option in templates

* formatting edit in docker_compose_yml script

* moved local learner and early stopper outside sync_tools

* refactored rl toolkit folder structure

* refactored rl toolkit folder structure

* moved env_wrapper and agent_wrapper inside rl/learner

* refined scripts

* fixed typo in script

* changes needed for running sc

* removed unwanted imports

* config change for testing sc scenario

* changes for perf testing

* Asynchronous Training (#364)

* remote inference code draft

* changed actor to rollout_worker and updated init files

* removed unwanted import

* updated inits

* more async code

* added async scripts

* added async training code & scripts for CIM-dqn

* changed async to async_tools to avoid conflict with python keyword

* reverted unwanted change to dockerfile

* added doc for policy server

* addressed PR comments and fixed a bug in docker_compose_yml.py

* fixed lint issue

* resolved PR comment

* resolved merge conflicts

* added async templates

* added proxy.close() for actor and policy_server

* fixed incorrect position to add last episode to eval schedule

* reverted unwanted changes

* added missing async files

* rm unwanted echo in kill.sh

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* renamed sync to synchronous and async to asynchronous to avoid conflict with keyword

* added missing policy version increment in LocalPolicyManager

* refined rollout manager recv logic

* removed a debugging print

* added sleep in distributed launcher to avoid hanging

* updated api doc and rl toolkit doc

* refined dynamic imports using importlib

* 1. moved policy update triggers to policy manager; 2. added version control in policy manager

* fixed a few bugs and updated cim RL example

* fixed a few more bugs

* added agent wrapper instantiation to workflows

* added agent wrapper instantiation to workflows

* removed abs_block and added max_prob option for DiscretePolicyNet and DiscreteACNet

* fixed incorrect get_ac_policy signature for CIM

* moved exploration inside core policy

* added state to exploration call to support context-dependent exploration

* separated non_rl_policy_index and rl_policy_index in workflows

* modified sc example code according to workflow changes

* modified sc example code according to workflow changes

* added replay_agent_ids parameter to get_env_func for RL examples

* fixed a few bugs

* added maro/simulator/scenarios/supply_chain as bind mount

* added post-step, post-collect, post-eval and post-update callbacks

* fixed lint issues

* fixed lint issues

* moved instantiation of policy manager inside simple learner

* fixed env_wrapper get_reward signature

* minor edits

* removed get_eperience kwargs from env_wrapper

* 1. renamed step_callback to post_step in env_wrapper; 2. added get_eval_env_func to RL workflows

* added rollout exp disribution option in RL examples

* removed unwanted files

* 1. made logger internal in learner; 2 removed logger creation in abs classes

* checked out supply chain test files from v0.2_sc

* 1. added missing model.eval() to choose_action; 2.added entropy features to AC

* fixed a bug in ac entropy

* abbreviated coefficient to coeff

* removed -dqn from job name in rl example config

* added tmp patch to dev.df

* renamed image name for running rl examples

* added get_loss interface for core policies

* added policy manager in rl_toolkit.rst

* 1. env_wrapper bug fix; 2. policy manager update logic refinement

* refactored policy and algorithms

* policy interface redesigned

* refined policy interfaces

* fixed typo

* fixed bugs in refactored policy interface

* fixed some bugs

* refactoring in progress

* policy interface and policy manager redesigned

* 1. fixed bugs in ac and pg; 2. fixed bugs rl workflow scripts

* fixed bug in distributed policy manager

* fixed lint issues

* fixed lint issues

* added scipy in setup

* 1. trimmed rollout manager code; 2. added option to docker scripts

* updated api doc for policy manager

* 1. simplified rl/learning code structure; 2. fixed bugs in rl example docker script

* 1. simplified rl example structure; 2. fixed lint issues

* further rl toolkit code simplifications

* more numpy-based optimization in RL toolkit

* moved replay buffer inside policy

* bug fixes

* numpy optimization and associated refactoring

* extracted shaping logic out of env_sampler

* fixed bug in CIM shaping and lint issues

* preliminary implemetation of parallel batch inference

* fixed bug in ddpg transition recording

* put get_state, get_env_actions, get_reward back in EnvSampler

* simplified exploration and core model interfaces

* bug fixes and doc update

* added improve() interface for RLPolicy for single-thread support

* fixed simple policy manager bug

* updated doc, rst, notebook

* updated notebook

* fixed lint issues

* fixed entropy bugs in ac.py

* reverted to simple policy manager as default

* 1. unified single-thread and distributed mode in learning_loop.py; 2. updated api doc for algorithms and rst for rl toolkit

* fixed lint issues and updated rl toolkit images

* removed obsolete images

* added back agent2policy for general workflow use

* V0.2 rl refinement dist (#377)

* Support `slice` operation in ExperienceSet

* Support naive distributed policy training by proxy

* Dynamically allocate trainers according to number of experience

* code check

* code check

* code check

* Fix a bug in distributed trianing with no gradient

* Code check

* Move Back-Propagation from trainer to policy_manager and extract trainer-allocation strategy

* 1.call allocate_trainer() at first of update(); 2.refine according to code review

* Code check

* Refine code with new interface

* Update docs of PolicyManger and ExperienceSet

* Add images for rl_toolkit docs

* Update diagram of PolicyManager

* Refine with new interface

* Extract allocation strategy into `allocation_strategy.py`

* add `distributed_learn()` in policies for data-parallel training

* Update doc of RL_toolkit

* Add gradient workers for data-parallel

* Refine code and update docs

* Lint check

* Refine by comments

* Rename `trainer` to `worker`

* Rename `distributed_learn` to `learn_with_data_parallel`

* Refine allocator and remove redundant code in policy_manager

* remove arugments in allocate_by_policy and so on

* added checkpointing for simple and multi-process policy managers

* 1. bug fixes in checkpointing; 2. removed version and max_lag in rollout manager

* added missing set_state and get_state for CIM policies

* removed blank line

* updated RL workflow README

* Integrate `data_parallel` arguments into `worker_allocator` (#402)

* 1. simplified workflow config; 2. added comments to CIM shaping

* lint issue fix

* 1. added algorithm type setting in CIM config; 2. added try-except clause for initial policy state loading

* 1. moved post_step callback inside env sampler; 2. updated README for rl workflows

* refined READEME for CIM

* VM scheduling with RL (#375)

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* added DQN

* added get_experiences func for ac in vm scheduling

* added post_step callback to env wrapper

* moved Aiming's tracking and plotting logic into callbacks

* added eval env wrapper

* renamed AC config variable name for VM

* vm scheduling RL code finished

* updated README

* fixed various bugs and hard coding for vm_scheduling

* uncommented callbacks for VM scheduling

* Minor revision for better code style

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* vm example refactoring

* fixed bugs in vm_scheduling

* removed unwanted files from cim dir

* reverted to simple policy manager as default

* added part of vm scheduling RL code

* refined vm env_wrapper code style

* vm scheduling RL code finished

* added config.py for vm scheduing

* resolved rebase conflicts

* fixed bugs in vm_scheduling

* added get_state and set_state to vm_scheduling policy models

* updated README for vm_scheduling with RL

Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>

* SC refinement (#397)

* Refine test scripts & pending_order_daily logic

* Refactor code for better code style: complete type hint, correct typos, remove unused items.

Refactor code for better code style: complete type hint, correct typos, remove unused items.

* Polish test_supply_chain.py

* update import format

* Modify vehicle steps logic & remove outdated test case

* Optimize imports

* Optimize imports

* Lint error

* Lint error

* Lint error

* Add SupplyChainAction

* Lint error

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* refined workflow scripts

* fixed bug in ParallelAgentWrapper

* 1. fixed lint issues; 2. refined main script in workflows

* lint issue fix

* restored default config for rl example

* Update rollout.py

* refined env var processing in policy manager workflow

* added hasattr check in agent wrapper

* updated docker_compose_yml.py

* Minor refinement

* Minor PR. Prepare to merge latest master branch into v0.3 branch. (#412)

* Prepare to merge master_mirror

* Lint error

* Minor

* Merge latest master into v0.3 (#426)

* update docker hub init (#367)

* update docker hub init

* replace personal account with maro-team

* update hello files for CIM

* update docker repository name

* update docker file name

* fix bugs in notebook, rectify docs

* fix doc build issue

* remove docs from playground; fix citibike lp example Event issue

* update the exampel for vector env

* update vector env example

* update README due to PR comments

* add link to playground above MARO installation in README

* fix some typos

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* update package version

* update README for package description

* update image links for pypi package description

* update image links for pypi package description

* change the input topology schema for CIM real data mode (#372)

* change the input topology schema for CIM real data mode

* remove unused importing

* update test config file correspondingly

* add Exception for env test

* add cost factors to cim data dump

* update CimDataCollection field name

* update field name of data collection related code

* update package version

* adjust interface to reflect actual signature (#374)

Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>

* update dataclasses requirement to setup

* fix: fixing spelling grammarr

* fix: fix typo spelling code commented and data_model.rst

* Fix Geo vis IP address & SQL logic bugs. (#383)

Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).

* Fix the "Wrong future stop tick predictions" bug (#386)

* Propose my new solution

Refine to the pre-process version

.

* Optimize import

* Fix reset random seed bug (#387)

* update the reset interface of Env and BE

* Try to fix reset routes generation seed issue

* Refine random related logics.

* Minor refinement

* Test check

* Minor

* Remove unused functions so far

* Minor

Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* update package version

* Add _init_vessel_plans in business_engine.reset (#388)

* update package version

* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Refine `event_buffer/` module (#389)

* Core & Business Engine code refinement (#392)

* First version

* Optimize imports

* Add typehint

* Lint check

* Lint check

* add higher python version (#398)

* add higher python version

* update pytorch version

* update torchvision version

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* CIM scenario refinement (#400)

* Cim scenario refinement (#394)

* CIM refinement

* Fix lint error

* Fix lint error

* Cim test coverage (#395)

* Enrich tests

* Refactor CimDataGenerator

* Refactor CIM parsers

* Minor refinement

* Fix lint error

* Fix lint error

* Fix lint error

* Minor refactor

* Type

* Add two test file folders. Make a slight change to CIM BE.

* Lint error

* Lint error

* Remove unnecessary public interfaces of CIM BE

* Cim disable auto action type detection (#399)

* Haven't been tested

* Modify document

* Add ActionType checking

* Minor

* Lint error

* Action quantity should be a position number

* Modify related docs & notebooks

* Minor

* Change test file name. Prepare to merge into master.

* .

* Minor test patch

* Add `clear()` function to class `SimRandom` (#401)

* Add SimRandom.clear()

* Minor

* Remove commented codes

* Lint error

* update package version

* Minor

* Remove docs/source/examples/multi_agent_dqn_cim.rst

* Update .gitignore

* Update .gitignore

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>

* Change `Env.set_seed()` logic (#456)

* Change Env.set_seed() logic

* Redesign CIM reset logic; fix lint issues;

* Lint

* Seed type assertion

* Remove all SC related files (#473)

* RL Toolkit V3 (#471)

* added daemon=True for multi-process rollout, policy manager and inference

* removed obsolete files

* [REDO][PR#406]V0.2 rl refinement taskq (#408)

* Add a usable task_queue

* Rename some variables

* 1. Add ; 2. Integrate  related files; 3. Remove

* merge `data_parallel` and `num_grad_workers` into `data_parallelism`

* Fix bugs in docker_compose_yml.py and Simple/Multi-process mode.

* Move `grad_worker` into marl/rl/workflows

* 1.Merge data_parallel and num_workers into data_parallelism in config; 2.Assign recently used workers as possible in task_queue.

* Refine code and update docs of `TaskQueue`

* Support priority for tasks in `task_queue`

* Update diagram of policy manager and task queue.

* Add configurable `single_task_limit` and correct docstring about `data_parallelism`

* Fix lint errors in `supply chain`

* RL policy redesign (V2) (#405)

* Drafi v2.0 for V2

* Polish models with more comments

* Polish policies with more comments

* Lint

* Lint

* Add developer doc for models.

* Add developer doc for policies.

* Remove policy manager V2 since it is not used and out-of-date

* Lint

* Lint

* refined messy workflow code

* merged 'scenario_dir' and 'scenario' in rl config

* 1. refined env_sampler and agent_wrapper code; 2. added docstrings for env_sampler methods

* 1. temporarily renamed RLPolicy from polivy_v2 to RLPolicyV2; 2. merged env_sampler and env_sampler_v2

* merged cim and cim_v2

* lint issue fix

* refined logging logic

* lint issue fix

* reversed unwanted changes

* .

.

.

.

ReplayMemory & IndexScheduler

ReplayMemory & IndexScheduler

.

MultiReplayMemory

get_actions_with_logps

EnvSampler on the road

EnvSampler

Minor

* LearnerManager

* Use batch to transfer data & add SHAPE_CHECK_FLAG

* Rename learner to trainer

* Add property for policy._is_exploring

* CIM test scenario for V3. Manual test passed. Next step: run it, make it works.

* env_sampler.py could run

* env_sampler refine on the way

* First runnable version done

* AC could run, but the result is bad. Need to check the logic

* Refine abstract method & shape check error info.

* Docs

* Very detailed compare. Try again.

* AC done

* DQN check done

* Minor

* DDPG, not tested

* Minors

* A rough draft of MAAC

* Cannot use CIM as the multi-agent scenario.

* Minor

* MAAC refinement on the way

* Remove ActionWithAux

* Refine batch & memory

* MAAC example works

* Reproduce-able fix. Policy share between env_sampler and trainer_manager.

* Detail refinement

* Simplify the user configed workflow

* Minor

* Refine example codes

* Minor polishment

* Migrate rollout_manager to V3

* Error on the way

* Redesign torch.device management

* Rl v3 maddpg (#418)

* Add MADDPG trainer

* Fit independent critics and shared critic modes.

* Add a new property: num_policies

* Lint

* Fix a bug in `sum(rewards)`

* Rename `MADDPG` to `DiscreteMADDPG` and fix type hint.

* Rename maddpg in examples.

* Preparation for data parallel (#420)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* Rename train worker to train ops; add placeholder for abstract methods;

* Lint

Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>

* [DRAFT] distributed training pipeline based on RL Toolkit V3 (#450)

* Preparation for data parallel

* Minor refinement & lint fix

* Lint

* Lint

* rename atomic_get_batch_grad to get_batch_grad

* Fix a unexpected commit

* distributed maddpg

* Add critic worker

* Minor

* Data parallel related minorities

* Refine code structure for trainers & add more doc strings

* Revert a unwanted change

* Use TrainWorker to do the actual calculations.

* Some minor redesign of the worker's abstraction

* Add set/get_policy_state_dict back

* Refine set/get_policy_state_dict

* Polish policy trainers

move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg

* Rl v3 data parallel grad worker (#432)

* Fit new `trainer_worker` in `grad_worker` and `task_queue`.

* Add batch dispatch

* Add `tensor_dict` for task submit interface

* Move `_remote_learn` to `AbsTrainWorker`.

* Complement docstring for task queue and trainer.

* dsitributed training pipeline draft

* added temporary test files for review purposes

* Several code style refinements (#451)

* Polish rl_v3/utils/

* Polish rl_v3/distributed/

* Polish rl_v3/policy_trainer/abs_trainer.py

* fixed merge conflicts

* unified sync and async interfaces

* refactored rl_v3; refinement in progress

* Finish the runnable pipeline under new design

* Remove outdated files; refine class names; optimize imports;

* Lint

* Minor maddpg related refinement

* Lint

Co-authored-by: Default <huo53926@126.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Miner bug fix

* Coroutine-related bug fix ("get_policy_state") (#452)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* deleted unwanted folder

* removed unwanted changes

* resolved PR452 comments

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Quick fix

* Redesign experience recording logic (#453)

* Two not important fix

* Temp draft. Prepare to WFH

* Done

* Lint

* Lint

* Calculating advantages / returns (#454)

* V1.0

* Complete DDPG

* Rl v3 hanging issue fix (#455)

* fixed rebase conflicts

* renamed get_policy_func_dict to policy_creator

* unified worker interfaces

* recovered some files

* dist training …
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants