feat: refine data push/pull#138
Conversation
Codecov Report
@@ Coverage Diff @@
## v0.1 #138 +/- ##
=======================================
Coverage 72.36% 72.36%
=======================================
Files 87 87
Lines 4089 4089
=======================================
Hits 2959 2959
Misses 1130 1130
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
Jinyu-W
left a comment
There was a problem hiding this comment.
- update with the base v0.1
- check the test status of this PR
Codecov Report
@@ Coverage Diff @@
## v0.1 #138 +/- ##
=======================================
Coverage 72.36% 72.36%
=======================================
Files 87 87
Lines 4089 4089
=======================================
Hits 2959 2959
Misses 1130 1130
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
| class TestGrass(unittest.TestCase): | ||
| """Tests for Grass Mode. | ||
|
|
||
| Tests should be executed in specific order, |
There was a problem hiding this comment.
import unittest
unittest.TestLoader.sortTestMethodsUsing = NoneThis one may also work for this case.
ref: https://stackoverflow.com/questions/4095319/unittest-tests-order/22317851#22317851
There was a problem hiding this comment.
apparently, a better way is to define a sorting function to determine the strategy we want to order, and that will incur some redundant workload. So we are not going to change it at this time.
* refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Romic Huang <romic.kid@gmail.com>
* refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com>
* refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
…#174) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.1 issues in CLI (#157) * fix: add docs for AzCopy * style: refine code style * fix: change stdout format to previous version * test: add dqn tests in grass/k8s mode && add "maro k8s job list" * docs: add scripts for installing AzCopy * v0.1 update citi bike doc and data version (#160) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * update data pipeline docs and data version * update for comments * update for comments * update for lint * update for lint * update for lint * V0.1revert data version to 0.1 (#164) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * update data pipeline docs and data version * update for comments * update for comments * update for lint * update for lint * update for lint * revert data version to 0.1 * initalize gnn network for CIM problem (#60) * initalize gnn netowrk for ECR problem * fix import path error and remove useless function * add the annotation for gnn and state shaper * rename to cim * modify topology 22p * polish coding style in cim.gnn * run pass after polishing * change the name of simplegat to simple_transformer * fix typo and suggestions * fix format * fix code bugs due to refactoring mistake * refine the code format * refine again * refine the format * refine again * refine again * fix syntax bug * refine again * remove useless comments * remove unnecessary blank line * rename pending_event to decision_event * remove some code faced to the future. * change topology * fix bugs in learner: no training logic executed * revert the topology changes Co-authored-by: wenshi-113 <wenshi@microsoft.com> * V0.1 store unpicklable bug fix (#166) * fixed unpicklable store bug * fixed a bug * fixed a bug * fixed a bug * fixed lint formatting * fixed a lint formatting issue * fixed a bug * renamed dump_experience_store to dump_experience_pool * fixed a PR comment Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.1 confusing component naming fix (#167) * confusing component naming fix * fixed a bug * fixed a bug * fixed lint formatting issues * fixed a PR comment * replaced 'actor_worker' with 'actor' * fixed a minor formatting issue * fixed a minor formatting issue * fixed a minor formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * version updated to 0.1.2a0 * modified according to flake8 * update according to lint checker * update import format according to lint checker * update according to lint checker * isort config added * add isort config file path in lint * update import according to lint isort * update import according to lint isort Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: WesleyStone <4548088+wesley-stone@users.noreply.github.com> Co-authored-by: wenshi-113 <wenshi@microsoft.com>
* refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.1 issues in CLI (#157) * fix: add docs for AzCopy * style: refine code style * fix: change stdout format to previous version * test: add dqn tests in grass/k8s mode && add "maro k8s job list" * docs: add scripts for installing AzCopy * v0.1 update citi bike doc and data version (#160) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * update data pipeline docs and data version * update for comments * update for comments * update for lint * update for lint * update for lint * V0.1revert data version to 0.1 (#164) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * update data pipeline docs and data version * update for comments * update for comments * update for lint * update for lint * update for lint * revert data version to 0.1 * initalize gnn network for CIM problem (#60) * initalize gnn netowrk for ECR problem * fix import path error and remove useless function * add the annotation for gnn and state shaper * rename to cim * modify topology 22p * polish coding style in cim.gnn * run pass after polishing * change the name of simplegat to simple_transformer * fix typo and suggestions * fix format * fix code bugs due to refactoring mistake * refine the code format * refine again * refine the format * refine again * refine again * fix syntax bug * refine again * remove useless comments * remove unnecessary blank line * rename pending_event to decision_event * remove some code faced to the future. * change topology * fix bugs in learner: no training logic executed * revert the topology changes Co-authored-by: wenshi-113 <wenshi@microsoft.com> * V0.1 store unpicklable bug fix (#166) * fixed unpicklable store bug * fixed a bug * fixed a bug * fixed a bug * fixed lint formatting * fixed a lint formatting issue * fixed a bug * renamed dump_experience_store to dump_experience_pool * fixed a PR comment Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.1 confusing component naming fix (#167) * confusing component naming fix * fixed a bug * fixed a bug * fixed lint formatting issues * fixed a PR comment * replaced 'actor_worker' with 'actor' * fixed a minor formatting issue * fixed a minor formatting issue * fixed a minor formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * version updated to 0.1.2a0 * modified according to flake8 * update according to lint checker * update import format according to lint checker * update according to lint checker * isort config added * add isort config file path in lint * update import according to lint isort * update import according to lint isort * skip file added for isort linter * filter files option added for isort * V0.1 reformat imports with isort (#184) * init auto-isort * re-format maro init * redis added to isort known third party * update according to lint isort * update according to lint isort * update according to lint * update import path for communication module * update import path for cli scripts module * update according to lint * filter file option added for isort in CONTRIBUTING * update import path for cli utils and communication * update import path for tests * remove filter file option in CONTRIBUTING Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * update according to isort Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: WesleyStone <4548088+wesley-stone@users.noreply.github.com> Co-authored-by: wenshi-113 <wenshi@microsoft.com>
* refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * update according to flake8 * re-formatting after merged upstream. * Updated import section. * Updated import section. * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * update param name * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * doc update * new link * image update * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * image change * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com>
* fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * merged with v0.2_embedded_optims * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * revised * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * revised code based on revised abstractions * fixed some bugs * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added shared_module property to LearningModel * added shared_module property to LearningModel * fixed a bug with k-step return in AC * fixed a bug * fixed a bug * merged pg, ac and ppo examples * fixed a bug * fixed a bug * fixed naming for ppo * renamed some variables in PPO * added ActionWithLogProbability return type for PO-type algorithms * fixed a bug * fixed a bug * fixed lint issues * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * formatting * fixed formatting * removed unnecessary comma * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * updated cim PO example code according to changes in maro/rl * removed early stopping from CIM dqn example * combined ac and ppo and simplified example code and config * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * put PG and AC under PolicyOptimization class and refined examples accordingly * fixed lint issues * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * updated cim example for policy optimization * typo fix * typo fix * typo fix * typo fix * misc edits * minor edits to rl_toolkit.rst * checked out docs from master * fixed typo in k-step shaper * fixed lint issues * bug fix in store * lint issue fix * changed default max_ep to 100 for policy_optimization algos * vis doc update to master (#244) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * update according to flake8 * re-formatting after merged upstream. * Updated import section. * Updated import section. * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * update param name * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * doc update * new link * image update * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * image change * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * bug fix related to np array divide (#245) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Master.simple bike (#250) * notebook for simple bike repositioning added * add simple rule-based algorithms * unify input * add policy based on statistics * update be for simple bike scenario to fit latest event buffer changes (#247) * change rendered graph * figures updated * change notebook * matplot updated * figures updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: wesley <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * simple bike repositioning article: formula updated * checked out docs/source from v0.2 * aligned with v0.2 * rm unwanted import * added references in policy_optimization.py * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com> Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com>
* refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * update according to flake8 * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * syntax fix (#253) * syntax fix * syntax fix * syntax fix * rm unwanted import Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm oversubscription (#246) * Remove topology * Update pipeline * Update pipeline * Update pipeline * Modify metafile * Add two attributes of VM * Update pipeline * Add vm category * Add todo * Add oversub config * Add oversubscription feature * Lint fix * Update based on PR comment. * Update pipeline * Update pipeline * Update config. * Update based on PR comment * Update * Add pm sku feature * Add sku setting * Add sku feature * Lint fix * Lint style * Update sku, overloading * Lint fix * Lint style * Fix bug * Modify config * Remove sky and replaced it by pm stype * Add and refactor vm category * Comment out cofig * Unify the enum format * Fix lint style * Fix import order * Update based on PR comment Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 vm scheduling decision event (#257) * Fix data preparation bug * Add frame index * V0.2 PG, K-step and lambda return utils (#155) * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * merged with v0.2_embedded_optims * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * revised * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * revised code based on revised abstractions * fixed some bugs * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added shared_module property to LearningModel * added shared_module property to LearningModel * fixed a bug with k-step return in AC * fixed a bug * fixed a bug * merged pg, ac and ppo examples * fixed a bug * fixed a bug * fixed naming for ppo * renamed some variables in PPO * added ActionWithLogProbability return type for PO-type algorithms * fixed a bug * fixed a bug * fixed lint issues * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * formatting * fixed formatting * removed unnecessary comma * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * updated cim PO example code according to changes in maro/rl * removed early stopping from CIM dqn example * combined ac and ppo and simplified example code and config * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * put PG and AC under PolicyOptimization class and refined examples accordingly * fixed lint issues * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * updated cim example for policy optimization * typo fix * typo fix * typo fix * typo fix * misc edits * minor edits to rl_toolkit.rst * checked out docs from master * fixed typo in k-step shaper * fixed lint issues * bug fix in store * lint issue fix * changed default max_ep to 100 for policy_optimization algos * vis doc update to master (#244) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * update according to flake8 * re-formatting after merged upstream. * Updated import section. * Updated import section. * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space …
…297) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * update according to flake8 * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * syntax fix (#253) * syntax fix * syntax fix * syntax fix * rm unwanted import Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm oversubscription (#246) * Remove topology * Update pipeline * Update pipeline * Update pipeline * Modify metafile * Add two attributes of VM * Update pipeline * Add vm category * Add todo * Add oversub config * Add oversubscription feature * Lint fix * Update based on PR comment. * Update pipeline * Update pipeline * Update config. * Update based on PR comment * Update * Add pm sku feature * Add sku setting * Add sku feature * Lint fix * Lint style * Update sku, overloading * Lint fix * Lint style * Fix bug * Modify config * Remove sky and replaced it by pm stype * Add and refactor vm category * Comment out cofig * Unify the enum format * Fix lint style * Fix import order * Update based on PR comment Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 vm scheduling decision event (#257) * Fix data preparation bug * Add frame index * V0.2 PG, K-step and lambda return utils (#155) * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * merged with v0.2_embedded_optims * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * revised * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * revised code based on revised abstractions * fixed some bugs * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added shared_module property to LearningModel * added shared_module property to LearningModel * fixed a bug with k-step return in AC * fixed a bug * fixed a bug * merged pg, ac and ppo examples * fixed a bug * fixed a bug * fixed naming for ppo * renamed some variables in PPO * added ActionWithLogProbability return type for PO-type algorithms * fixed a bug * fixed a bug * fixed lint issues * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * formatting * fixed formatting * removed unnecessary comma * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * updated cim PO example code according to changes in maro/rl * removed early stopping from CIM dqn example * combined ac and ppo and simplified example code and config * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * put PG and AC under PolicyOptimization class and refined examples accordingly * fixed lint issues * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * updated cim example for policy optimization * typo fix * typo fix * typo fix * typo fix * misc edits * minor edits to rl_toolkit.rst * checked out docs from master * fixed typo in k-step shaper * fixed lint issues * bug fix in store * lint issue fix * changed default max_ep to 100 for policy_optimization algos * vis doc update to master (#244) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * update according to flake8 * re-formatting after merged upstream. * Updated import section. * Updated import section. * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. *…
#324) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * update according to flake8 * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * syntax fix (#253) * syntax fix * syntax fix * syntax fix * rm unwanted import Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm oversubscription (#246) * Remove topology * Update pipeline * Update pipeline * Update pipeline * Modify metafile * Add two attributes of VM * Update pipeline * Add vm category * Add todo * Add oversub config * Add oversubscription feature * Lint fix * Update based on PR comment. * Update pipeline * Update pipeline * Update config. * Update based on PR comment * Update * Add pm sku feature * Add sku setting * Add sku feature * Lint fix * Lint style * Update sku, overloading * Lint fix * Lint style * Fix bug * Modify config * Remove sky and replaced it by pm stype * Add and refactor vm category * Comment out cofig * Unify the enum format * Fix lint style * Fix import order * Update based on PR comment Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 vm scheduling decision event (#257) * Fix data preparation bug * Add frame index * V0.2 PG, K-step and lambda return utils (#155) * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * merged with v0.2_embedded_optims * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * revised * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * revised code based on revised abstractions * fixed some bugs * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added shared_module property to LearningModel * added shared_module property to LearningModel * fixed a bug with k-step return in AC * fixed a bug * fixed a bug * merged pg, ac and ppo examples * fixed a bug * fixed a bug * fixed naming for ppo * renamed some variables in PPO * added ActionWithLogProbability return type for PO-type algorithms * fixed a bug * fixed a bug * fixed lint issues * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * formatting * fixed formatting * removed unnecessary comma * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * updated cim PO example code according to changes in maro/rl * removed early stopping from CIM dqn example * combined ac and ppo and simplified example code and config * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * put PG and AC under PolicyOptimization class and refined examples accordingly * fixed lint issues * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * updated cim example for policy optimization * typo fix * typo fix * typo fix * typo fix * misc edits * minor edits to rl_toolkit.rst * checked out docs from master * fixed typo in k-step shaper * fixed lint issues * bug fix in store * lint issue fix * changed default max_ep to 100 for policy_optimization algos * vis doc update to master (#244) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * update according to flake8 * re-formatting after merged upstream. * Updated import section. * Updated import section. * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. …
#324) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * update according to flake8 * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * syntax fix (#253) * syntax fix * syntax fix * syntax fix * rm unwanted import Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm oversubscription (#246) * Remove topology * Update pipeline * Update pipeline * Update pipeline * Modify metafile * Add two attributes of VM * Update pipeline * Add vm category * Add todo * Add oversub config * Add oversubscription feature * Lint fix * Update based on PR comment. * Update pipeline * Update pipeline * Update config. * Update based on PR comment * Update * Add pm sku feature * Add sku setting * Add sku feature * Lint fix * Lint style * Update sku, overloading * Lint fix * Lint style * Fix bug * Modify config * Remove sky and replaced it by pm stype * Add and refactor vm category * Comment out cofig * Unify the enum format * Fix lint style * Fix import order * Update based on PR comment Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 vm scheduling decision event (#257) * Fix data preparation bug * Add frame index * V0.2 PG, K-step and lambda return utils (#155) * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * merged with v0.2_embedded_optims * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * revised * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * revised code based on revised abstractions * fixed some bugs * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added shared_module property to LearningModel * added shared_module property to LearningModel * fixed a bug with k-step return in AC * fixed a bug * fixed a bug * merged pg, ac and ppo examples * fixed a bug * fixed a bug * fixed naming for ppo * renamed some variables in PPO * added ActionWithLogProbability return type for PO-type algorithms * fixed a bug * fixed a bug * fixed lint issues * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * formatting * fixed formatting * removed unnecessary comma * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * updated cim PO example code according to changes in maro/rl * removed early stopping from CIM dqn example * combined ac and ppo and simplified example code and config * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * put PG and AC under PolicyOptimization class and refined examples accordingly * fixed lint issues * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * updated cim example for policy optimization * typo fix * typo fix * typo fix * typo fix * misc edits * minor edits to rl_toolkit.rst * checked out docs from master * fixed typo in k-step shaper * fixed lint issues * bug fix in store * lint issue fix * changed default max_ep to 100 for policy_optimization algos * vis doc update to master (#244) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * update according to flake8 * re-formatting after merged upstream. * Updated import section. * Updated import section. * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. …
* ilp for vm scheduling runnable * remove code for debug * add simple time consumption analysis to README * update comment, add logger * change to offline interface, 1 formulation for 1 tick * fix the bug caused by wrong time index in ILP constraints * fix ILP bug caused by limited pm resource * 10k bug fixed * ilp for vm scheduling runnable * remove code for debug * add simple time consumption analysis to README * update comment, add logger * change to offline interface, 1 formulation for 1 tick * fix the bug caused by wrong time index in ILP constraints * fix ILP bug caused by limited pm resource * 10k bug fixed * V0.2 merged: MARO admin tool added, backend bug fixed, CLI doc updated (#324) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * update according to flake8 * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * syntax fix (#253) * syntax fix * syntax fix * syntax fix * rm unwanted import Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm oversubscription (#246) * Remove topology * Update pipeline * Update pipeline * Update pipeline * Modify metafile * Add two attributes of VM * Update pipeline * Add vm category * Add todo * Add oversub config * Add oversubscription feature * Lint fix * Update based on PR comment. * Update pipeline * Update pipeline * Update config. * Update based on PR comment * Update * Add pm sku feature * Add sku setting * Add sku feature * Lint fix * Lint style * Update sku, overloading * Lint fix * Lint style * Fix bug * Modify config * Remove sky and replaced it by pm stype * Add and refactor vm category * Comment out cofig * Unify the enum format * Fix lint style * Fix import order * Update based on PR comment Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 vm scheduling decision event (#257) * Fix data preparation bug * Add frame index * V0.2 PG, K-step and lambda return utils (#155) * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * merged with v0.2_embedded_optims * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * revised * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * revised code based on revised abstractions * fixed some bugs * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added shared_module property to LearningModel * added shared_module property to LearningModel * fixed a bug with k-step return in AC * fixed a bug * fixed a bug * merged pg, ac and ppo examples * fixed a bug * fixed a bug * fixed naming for ppo * renamed some variables in PPO * added ActionWithLogProbability return type for PO-type algorithms * fixed a bug * fixed a bug * fixed lint issues * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * formatting * fixed formatting * removed unnecessary comma * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * updated cim PO example code according to changes in maro/rl * removed early stopping from CIM dqn example * combined ac and ppo and simplified example code and config * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * put PG and AC under PolicyOptimization class and refined examples accordingly * fixed lint issues * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * updated cim example for policy optimization * typo fix * typo fix * typo fix * typo fix * misc edits * minor edits to rl_toolkit.rst * checked out docs from master * fixed typo in k-step shaper * fixed lint issues * bug fix in store * lint issue fix * changed default max_ep to 100 for policy_optimization algos * vis doc update to master (#244) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * update according to flake8 * re-formatting after merged upstream. * Updated import section. * Updated import section. * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixe…
* ilp for vm scheduling runnable * remove code for debug * add simple time consumption analysis to README * update comment, add logger * change to offline interface, 1 formulation for 1 tick * fix the bug caused by wrong time index in ILP constraints * fix ILP bug caused by limited pm resource * 10k bug fixed * ilp for vm scheduling runnable * remove code for debug * add simple time consumption analysis to README * update comment, add logger * change to offline interface, 1 formulation for 1 tick * fix the bug caused by wrong time index in ILP constraints * fix ILP bug caused by limited pm resource * 10k bug fixed * V0.2 merged: MARO admin tool added, backend bug fixed, CLI doc updated (#324) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * update according to flake8 * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * syntax fix (#253) * syntax fix * syntax fix * syntax fix * rm unwanted import Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm oversubscription (#246) * Remove topology * Update pipeline * Update pipeline * Update pipeline * Modify metafile * Add two attributes of VM * Update pipeline * Add vm category * Add todo * Add oversub config * Add oversubscription feature * Lint fix * Update based on PR comment. * Update pipeline * Update pipeline * Update config. * Update based on PR comment * Update * Add pm sku feature * Add sku setting * Add sku feature * Lint fix * Lint style * Update sku, overloading * Lint fix * Lint style * Fix bug * Modify config * Remove sky and replaced it by pm stype * Add and refactor vm category * Comment out cofig * Unify the enum format * Fix lint style * Fix import order * Update based on PR comment Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 vm scheduling decision event (#257) * Fix data preparation bug * Add frame index * V0.2 PG, K-step and lambda return utils (#155) * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * merged with v0.2_embedded_optims * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * revised * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * revised code based on revised abstractions * fixed some bugs * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added shared_module property to LearningModel * added shared_module property to LearningModel * fixed a bug with k-step return in AC * fixed a bug * fixed a bug * merged pg, ac and ppo examples * fixed a bug * fixed a bug * fixed naming for ppo * renamed some variables in PPO * added ActionWithLogProbability return type for PO-type algorithms * fixed a bug * fixed a bug * fixed lint issues * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * formatting * fixed formatting * removed unnecessary comma * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * updated cim PO example code according to changes in maro/rl * removed early stopping from CIM dqn example * combined ac and ppo and simplified example code and config * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * put PG and AC under PolicyOptimization class and refined examples accordingly * fixed lint issues * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * updated cim example for policy optimization * typo fix * typo fix * typo fix * typo fix * misc edits * minor edits to rl_toolkit.rst * checked out docs from master * fixed typo in k-step shaper * fixed lint issues * bug fix in store * lint issue fix * changed default max_ep to 100 for policy_optimization algos * vis doc update to master (#244) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * update according to flake8 * re-formatting after merged upstream. * Updated import section. * Updated import section. * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixe…
* V0.2 offline ILP method for vm scheduling (#275) * ilp for vm scheduling runnable * remove code for debug * add simple time consumption analysis to README * update comment, add logger * change to offline interface, 1 formulation for 1 tick * fix the bug caused by wrong time index in ILP constraints * fix ILP bug caused by limited pm resource * 10k bug fixed * ilp for vm scheduling runnable * remove code for debug * add simple time consumption analysis to README * update comment, add logger * change to offline interface, 1 formulation for 1 tick * fix the bug caused by wrong time index in ILP constraints * fix ILP bug caused by limited pm resource * 10k bug fixed * V0.2 merged: MARO admin tool added, backend bug fixed, CLI doc updated (#324) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * update according to flake8 * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * syntax fix (#253) * syntax fix * syntax fix * syntax fix * rm unwanted import Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm oversubscription (#246) * Remove topology * Update pipeline * Update pipeline * Update pipeline * Modify metafile * Add two attributes of VM * Update pipeline * Add vm category * Add todo * Add oversub config * Add oversubscription feature * Lint fix * Update based on PR comment. * Update pipeline * Update pipeline * Update config. * Update based on PR comment * Update * Add pm sku feature * Add sku setting * Add sku feature * Lint fix * Lint style * Update sku, overloading * Lint fix * Lint style * Fix bug * Modify config * Remove sky and replaced it by pm stype * Add and refactor vm category * Comment out cofig * Unify the enum format * Fix lint style * Fix import order * Update based on PR comment Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 vm scheduling decision event (#257) * Fix data preparation bug * Add frame index * V0.2 PG, K-step and lambda return utils (#155) * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * merged with v0.2_embedded_optims * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * revised * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * revised code based on revised abstractions * fixed some bugs * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added shared_module property to LearningModel * added shared_module property to LearningModel * fixed a bug with k-step return in AC * fixed a bug * fixed a bug * merged pg, ac and ppo examples * fixed a bug * fixed a bug * fixed naming for ppo * renamed some variables in PPO * added ActionWithLogProbability return type for PO-type algorithms * fixed a bug * fixed a bug * fixed lint issues * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * formatting * fixed formatting * removed unnecessary comma * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * updated cim PO example code according to changes in maro/rl * removed early stopping from CIM dqn example * combined ac and ppo and simplified example code and config * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * put PG and AC under PolicyOptimization class and refined examples accordingly * fixed lint issues * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * updated cim example for policy optimization * typo fix * typo fix * typo fix * typo fix * misc edits * minor edits to rl_toolkit.rst * checked out docs from master * fixed typo in k-step shaper * fixed lint issues * bug fix in store * lint issue fix * changed default max_ep to 100 for policy_optimization algos * vis doc update to master (#244) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * update according to flake8 * re-formatting after merged upstream. * Updated import section. * Updated import section. * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ …
…365) * V0.2 offline ILP method for vm scheduling (#275) * ilp for vm scheduling runnable * remove code for debug * add simple time consumption analysis to README * update comment, add logger * change to offline interface, 1 formulation for 1 tick * fix the bug caused by wrong time index in ILP constraints * fix ILP bug caused by limited pm resource * 10k bug fixed * ilp for vm scheduling runnable * remove code for debug * add simple time consumption analysis to README * update comment, add logger * change to offline interface, 1 formulation for 1 tick * fix the bug caused by wrong time index in ILP constraints * fix ILP bug caused by limited pm resource * 10k bug fixed * V0.2 merged: MARO admin tool added, backend bug fixed, CLI doc updated (#324) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * update according to flake8 * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * syntax fix (#253) * syntax fix * syntax fix * syntax fix * rm unwanted import Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm oversubscription (#246) * Remove topology * Update pipeline * Update pipeline * Update pipeline * Modify metafile * Add two attributes of VM * Update pipeline * Add vm category * Add todo * Add oversub config * Add oversubscription feature * Lint fix * Update based on PR comment. * Update pipeline * Update pipeline * Update config. * Update based on PR comment * Update * Add pm sku feature * Add sku setting * Add sku feature * Lint fix * Lint style * Update sku, overloading * Lint fix * Lint style * Fix bug * Modify config * Remove sky and replaced it by pm stype * Add and refactor vm category * Comment out cofig * Unify the enum format * Fix lint style * Fix import order * Update based on PR comment Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 vm scheduling decision event (#257) * Fix data preparation bug * Add frame index * V0.2 PG, K-step and lambda return utils (#155) * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * merged with v0.2_embedded_optims * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * revised * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * revised code based on revised abstractions * fixed some bugs * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added shared_module property to LearningModel * added shared_module property to LearningModel * fixed a bug with k-step return in AC * fixed a bug * fixed a bug * merged pg, ac and ppo examples * fixed a bug * fixed a bug * fixed naming for ppo * renamed some variables in PPO * added ActionWithLogProbability return type for PO-type algorithms * fixed a bug * fixed a bug * fixed lint issues * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * formatting * fixed formatting * removed unnecessary comma * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * updated cim PO example code according to changes in maro/rl * removed early stopping from CIM dqn example * combined ac and ppo and simplified example code and config * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * put PG and AC under PolicyOptimization class and refined examples accordingly * fixed lint issues * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * updated cim example for policy optimization * typo fix * typo fix * typo fix * typo fix * misc edits * minor edits to rl_toolkit.rst * checked out docs from master * fixed typo in k-step shaper * fixed lint issues * bug fix in store * lint issue fix * changed default max_ep to 100 for policy_optimization algos * vis doc update to master (#244) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * update according to flake8 * re-formatting after merged upstream. * Updated import section. * Updated import section. * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher …
No description provided.