Update features in v0.2 into branch master to release a new version by Jinyu-W · Pull Request #297 · microsoft/maro

Jinyu-W · 2021-03-21T12:28:40Z

Description

Update master branch with new features related to backend support, vm scheduling scneario, RL toolkit, distributed toolkit and visualization tool.

Linked issue(s)/Pull request(s)

Backend related:
- Vector env support (Add vector env support #266)
- Add slot filter functions for node attribute (Add slot filter functions for node attribute #273)
- Refine joint decision sequential action mode (Refine joint decision sequential action mode #219)
- Add dynamic node support (V0.2 backend dynamic node support #172)
VM scheduling scenario related:
- Add oversubscription feature (V0.2 vm oversubscription #246), (V0.2 vm oversub docs #256)
- Add hierarchy vm region architecture support (V0.2 vm region support #258)
- Fix bug of auto-downloading data (V0.2 vm scheduling decision event #257)
- Pricing model and energy model update (Add the price model #286)
- Rule-based algorithm examples (v0.2_rule_based_algorithm #255), (V0.2 rule based algorithm readme #282)
RL toolkit related:
- Add DDPG algorithm (V0.2 ddpg #252)
- Merge algorithm with agent (V0.2 merge algorithm into agent #259)
- Distributed framework update (V0.2_refactored_distributed_framework #206)
CLI related:
- CLI refactoring (V0.2 cli refactoring #227)
- Feature: Add a cli command to support create new project (Feature: Add a cli command to support create new project. #279)
- Add Env-Geographic visualization tool, CIM hello as example (Maro geo vis #291), (Maro Geographic Tool Doc Update #294), (Maro geo vis Data Update #295), (Maro Dashboard Vis Doc Update #298)
- CLI visualization support and maro grass local mode (CLI visualization support and maro grass local mode #277)

Type of Change

Related Component

Simulation toolkit
RL toolkit
Distributed toolkit

Has Been Tested

OS:
- Windows
- Mac OS
- Linux
Python version:
- 3.6
- 3.7
Key information snapshot(s):

Needs Follow Up Actions

New release package
New docker image

Checklist

Add/update the related comments
Add/update the related tests
Add/update the related documentations
Update the dependent downstream modules usage

* feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review

* fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check

* added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments

* refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com>

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>

* feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>

* merged algorithm with agent * bug fixes * fix * bug fixes * fixed lint issues and renamed models to model * removed exp pool type spec in AbsAgent * fixed lint issues * dqn exp pool bug fix * minor issues * updated notebooks and examples according to rl toolkit changes * updated images * moved exp pool init inside DQN * renamed column_based_store to simple_store * fixed lint issues * fixed lint issues * lint issue fix * lint issue fix * fixed bugs in test_store * typo fix * minor edits * lint issue fix * 1. removed state_shaper, action_shaper and exp_shaper abstractions; 2. used torch Categorical for sampling actions; 3. removed input_dim and output_dim properties from LearningModel * updated notebook * removed simple agent manager * fixed lint issues * fixed lint issues * bug fix * refined LearningModel * updated cim example doc * lint issue fix * small refinements * replaced ActionInfo with torch Categorical's log_prob for policy_optimization algorithms * refactored gnn example and added single-process script * removed obsolete files from gnn * lint issue fix * formatting * 1. moved early stopping logic inside scheduler; 2. added scheduler options for optimizers in learning-model * minor formatting fixes * refinement * rm unwanted import * add List typing in schedular * lint issue fix * removed redundant parameters for GNNBasedACModel * restored duration to 1120 Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wesley <Wenlei.Shi@microsoft.com>

* 1st version * make vectorenv can import under module root * allow outside control which environment to push, so we do not need to control the tick for each environments * remove comment * lint fixing * add test for vector env, correct the batch number * lint fixing * reduce parameters * Update vector env ut to test if support raw backend * correct comments on hello * fix review comments, cim actiontype wip * add a compatiable way to handle ActionType for cim scenario * lint fix * correct the action type to handle previous action * add doc string for wrappers Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* rule_based_algorithm * revise_the_code_by_aiming_hao * revise_the_code_by_aiming_hao * use the np.argmin * Update best_fit.py fix the "np not defined" * refine the code * fix the error * refine the code * fix the error * fix the error * refine the code * remove the history * refine the code * update first_fit * Refine the code Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: aiming hao <37905948+hamcoder@users.noreply.github.com>

* add where filter for general usage * test for general filter * simpler comparison for attribute * filter on raw * fix array fetch bug * ut for base comparison * lint fix * remove unused variables * update ignore

* Region init * Add region, zone, cluster * Fix bug * Add update parent id * Update PM config * Update number * Fix import order * Fix bug * Modify config * Add cluster attribute * Refine naming * Fix bug * Modify 336k config * Update region * Update config * Update pm config * pylint * Add comment * Update based on PR comment * Modify config and zone class * Add unit test * Update region part * Update pylint * Modify unit test * Refactor region structure * Add comment and fix style * Fix machine num bugs * Modify config * Fix style * Fix bugs and add empty machine attributes * Add update upper level metrics * Update config * Fix lint style * Modify doc strings * Fix amount counter * Update unit test * fix lint style * Update the ids init * Init total and empty machine num * Update lint style * Fix snapshot attributes initial state * Update config * add topologies for over-subscription and multi-cluster to be compatible with the previous topologies * Add simulation result * Move readme * Add overload results Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Add README.md and refine the bin_packing algorithm * refine round_robin and bin_packing * Update README.md * Refine the code and README.md * Refine the bin_packing and round_robin * Refine the code Co-authored-by: aiming hao <37905948+hamcoder@users.noreply.github.com>

* maro project new * remove maro project run * add get_metrics to template * add license * more comments * lint issue fix * linting issue fix * fix linting issue * linting issue fix * remove unused code gen * include template files * fix incorrect comment * include topologies for vm_scheduling scenario * rename to PositiveNumberValidator * refine command line comment * refine topology command comment * add a simple doc for new command * fix incorrect value for dummy frame * correct issues in docs * more comments on set_state * doc issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * refactor: extract reusable methods to GrassExecutor * feat: refine validation.py and add docstrings * fix: add remote prefix to ssh function * style: refine logging output * fix: extract param 'vm_name' * fix: linting errors * feat: add NodeStatus and ContainerStatus at executors * feat: use master_node_size as the size of build_node_image_vm * fix: refine comments * feat: add "state" key for node_details * fix: linting errors * fix: deployment error when ssh_port is the default port * refactor: extract utils/*.py in scripts * style: single quote to double quote * refactor: refine folder structure of scripts * fix: linting errors * fix: add executable to fix error initialization * refactor: use SubProcess to execute commands in scripts * refactor: refine script namings * refactor: extract utils/*.py and systemd/*.service in agents * feat: refine Exception structure, add SubProcess class in agents * feat: use psutil to get resource details, move resource details initialization to agents * fix: linting errors * feat: use docker sdk in node_agent * feat: extract RedisExecutor in agents * test: remove image when tearing down * feat: add LoadImageAgent * feat: move node status update to agents * refactor: move utils folder to upper level in scripts * feat: add node_api_server, refine agents folder structure * fix: linting errors * refactor: refine folder structure in grass/lib * refactor: build DeploymentValidator class * refactor: create DetailsReader, DetailsWriter, delete sync mode * refactor: rename DockerManager to DockerController * refactor: rename RedisManager to RedisController * refactor: rename AzureExecutor to AzureController * refactor: create NameCreator * refactor: create PathConvertor * refactor: rename checkers to details_validity_wrapper * refactor: rename lock to operation_lock_wrapper * refactor: create FileSynchronizer * refactor: create redis instance in RedisController * feat: add master_api_server, move job related scripts to api_server * refactor: move node related scripts to api_server * fix: use "DELETE" instead of "DEL" as http method * refactor: use mapping names instead of namings like "sths_details" * feat: move master related scripts to api_server * feat: move containers related scripts to api_server * fix: add gracefully wait for remote_start_master_services * feat: move image_files related scripts to api_server * fix: improper test in the training stage * refactor: use local variable "URL_PREFIX" directly, add 's' in node_api_client * refactor: refine namings in services * feat: move clean related scripts to api_server * refactor: delete "public_key" field * feat: build MasterApiClient * refactor: delete sync_mkdir * feat: refine locks in node_details * feat: build DockerController for grass/utils * refactor: rename Extractor to Controller * feat: move schedule related components to api_server * fix: incorrect allocation when starting batch jobs * fix: missing field "containers" in job_details * feat: add delete_job in master_api_server * feat: add logger in agents * fix: no "resources" field when scale up node at the very beginning * feat: use Process back instead of Thread in node_agent * feat: add 'v1' prefix to api_servers' urls * refactor: move lib/aks under lib/clouds * refactor: move lib/k8s_configs to lib/configs, move aks related configs to clouds/aks, delete volumn mount in redis * feat: extract K8sExecutor * fix: add one more searching layer of pakcage_data at maro.cli.k8s * refactor: move lib/configs/nvidia to lib/clouds/aks, make create() as a staticmethod at k8s mode * refactor: move id init to standardize_create_deployment in grass/azure mode * fix: use GlobalParams instead of hard-coded data * feat: build K8sDetailsReader, K8sDetailsWriter * feat: use k8s sdk to replace subprocess call * refactor: delete redundant vars * refactor: move more methods to K8sExecutor * test: use legal naming in tests/cli/k8s * refactor: refine logging messages * refactor: make create() as a staticmethod at grass/azure mode, refine logging messages * feat: build ArmTemplateParameterBuilder in K8sAzureExecutor * refactor: remove redundant params * refactor: rename /clouds to /modes * refactor: refine structures and logging messages in GrassExecutor * feat: add 'PENDING' to NodeStatus * feat: refine build_job_details for create schedule in grass/azure * feat: refine build_job_details for create schedule in k8s/aks * add grass local mode (non-pass) * feat: use node_join schema in grass/azure * refactor: replace /.maro with /.maro-shared, replace admin_username with node_username, remove redundant snippets in /grass/lib/scirpts * refactor: add 'ssh', 'api_server' into master_details and node_details * refactor: move master runtine params initialization into api_server * refactor: refine namings * feat: reconstruct grass/on-premises with new schema * refactor: delete field 'user' in grass_azure_create * refactor: rename 'blueprints_v1' to 'blueprints' * refactor: move some GlobalPaths to subfolders * Update grass local mode, run pass * refactor: replace 'connection' field with 'master' or 'node' * refactor: move start_service scripts to init_master.py * refactor: rename grass/master/release to grass/master/delete_master * refactor: load local_details in node services, refine script namings * refactor: move invocations of start_node and stop node to api server * fix: add missing imports * refactor: rename SubProcess to Subprocess * refactor: delete field 'user' in k8s_aks_create * add resource class * refactor: refine folder structures in /.maro/clusters/cluster * refactor: move /logs to /clusters/{cluster_name} * refactor: refine filenames * fix: export DEBIAN_FRONTEND=noninteractive to reduce irrelevant warnings * refactor: refine code structures, delete redundant code * refactor: change /{cluster_name}/details.yml to /{cluster_name}/cluster_details.yml * feat: add rsa+aes data encryption on dev-master communication * fix: change MasterApiClient to RedisController in node-related services and scripts * refactor: remove all "{cluster_name}" in redis keys * refactor: extract init_master and create_user to GrassExecutor * test: refine tests in grass/azure and k8s/aks * refactor: refine ArmTemplateParameterBuilder * add cli visible agent * feat: change the order of installation in init_build_node_image_vm.py * fix: add user/admin_id to grass_on_premises_create.yml * fix: change outdated container names * feat: add standardize_join_cluster_deployment in grass/on-premises * feat: add init_node_runtime_env in join_cluster.py * refactor: refine code structure in join_cluster.py * test: add TestGrassOnPremises * refactor: refine ARM templates * fix: linting errors * fix: test requirements error * fix: arm linting errors * refactor: late import in grass, k8s * style: refine load_parser_grass * style: refine load_parser_k8s * add jobstate and resource usage support * add local visible test * docs: update orchestrations * fix: fix get_job_logs * docs: add docs for GrassAzureExecutor, GrassExecutor * docs: add docs for GrassOnPremisesExecutor * docs: add docs for /grass/scripts * docs: add docs for /grass/services * docs: add docs for /grass/utils * docs: add docs for k8s * grass mode visible pass * grass local mode run pass * fixed pylint * Update resource, rm GPUtil depend * Update CLI local mode visible * grass local mode pass * add redis clear and pylint fixed * rm job status in grass azure mode * fix bug * fixed merge issue * fixed lin * update by pr comments * fixed isort issue * fixed stop bug * fixed local agent and cmp issue * fixed pending job cannot killed * add mount in Grass local mode * add resource check interval in redis Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com>

* streamit with questdb * script to import current dump data, except attention file, use influxdb line protocol for batch sending. * refine the interface to flatten dictionary * add messagetype.file to upload file later * correct tag name * correct the way to initial streamit, make it possible to use it any where after start * add data collecting in cim business engine * streamit client refactoring * fix import issue * update cim hello world, with a commented code to enable vis data streaming * fix metric replace bug * refactor the type checking code * V0.2 remove props from be (#269) * Fix bug * fix bu * Master vm doc - data preparation (#285) * Update vm docs * Update docs * Update data preparation docs * Update * Update docs Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * maro geo vis * add new line * doc update * lint refine * lint update * lint updata * lint update * lint update * lint update * code revert * add declare * code revert * add new line * add comment * delete surplus * delete core * lint update * lint update * lint update * lint update * specify version * lint update * specify docker version * import sort * backend revert * Delete test.py * format refact * doc update * import orders * change import orders * change import orders * add version of http-server * add specified port * delete print * lint update * lint update * lint update * update doc * dependecy update * update business engine * business engine * business engine update Co-authored-by: chaosyu <chaos.you@gmail.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Kuan Wei Yu <v-kyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* streamit with questdb * script to import current dump data, except attention file, use influxdb line protocol for batch sending. * refine the interface to flatten dictionary * add messagetype.file to upload file later * correct tag name * correct the way to initial streamit, make it possible to use it any where after start * add data collecting in cim business engine * streamit client refactoring * fix import issue * update cim hello world, with a commented code to enable vis data streaming * fix metric replace bug * refactor the type checking code * maro geo vis * add new line * doc update * lint refine * lint update * lint updata * lint update * lint update * lint update * code revert * add declare * code revert * add new line * add comment * delete surplus * delete core * lint update * lint update * lint update * lint update * specify version * lint update * specify docker version * import sort * backend revert * Delete test.py * format refact * doc update * import orders * change import orders * change import orders * add version of http-server * add specified port * delete print * lint update * lint update * lint update * update doc * dependecy update * update business engine * business engine * business engine update * doc update * delete irelevant file Co-authored-by: chaosyu <chaos.you@gmail.com>

* streamit with questdb * script to import current dump data, except attention file, use influxdb line protocol for batch sending. * refine the interface to flatten dictionary * add messagetype.file to upload file later * correct tag name * correct the way to initial streamit, make it possible to use it any where after start * add data collecting in cim business engine * streamit client refactoring * fix import issue * update cim hello world, with a commented code to enable vis data streaming * fix metric replace bug * refactor the type checking code * maro geo vis * add new line * doc update * lint refine * lint update * lint updata * lint update * lint update * lint update * code revert * add declare * code revert * add new line * add comment * delete surplus * delete core * lint update * lint update * lint update * lint update * specify version * lint update * specify docker version * import sort * backend revert * Delete test.py * format refact * doc update * import orders * change import orders * change import orders * add version of http-server * add specified port * delete print * lint update * lint update * lint update * update doc * dependecy update * update business engine * business engine * business engine update * doc update * delete irelevant file * update data Co-authored-by: chaosyu <chaos.you@gmail.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* added some more logs for dist RL * bug fix * fixed a typo * bug fix * refined logs * set session_id to None for exit message * add setup/clear/template for maro process * changed to internal logger for actor and learner * removed redundant component name from internal logs * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * fixed typos * update ProcessInternalError * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * updated notebook * 1. removed external loggers from cim example; 2. fixed batch inference bugs * removed actor_trainer mode and refactored * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * fixed conflicts * fixed typos * removed stale imports * fixed stale naming * removed dist_topologies folder * refined session id logic * bug fix * refactored * distributed RL refinement * refined * small bug fix * fixed lint issues * fixed lint issues * removed unwanted file * fixed a typo * gnn refactoring in progress * merged algorithm with agent * bug fixes * fix * bug fixes * fixed lint issues and renamed models to model * removed unwanted files * fixed merge conflicts * removed exp pool type spec in AbsAgent * fixed lint issues * changed to a single gnn agent * dqn exp pool bug fix * minor issues * removed GNNAgentManager * updated notebooks and examples according to rl toolkit changes * updated images * moved exp pool init inside DQN * renamed column_based_store to simple_store * mroe gnn refactoring * fixed lint issues * fixed lint issues * lint issue fix * lint issue fix * fixed bugs in test_store * typo fix * minor edits * lint issue fix * finished single process gnn * fixed bugs * 1. removed state_shaper, action_shaper and exp_shaper abstractions; 2. used torch Categorical for sampling actions; 3. removed input_dim and output_dim properties from LearningModel * updated notebook * removed simple agent manager * fixed lint issues * fixed lint issues * bug fix * bug fixes * refined LearningModel * modified gnn example based on latest rl toolkit changes * updated cim example doc * lint issue fix * small refinements * refactored GNN example * replaced ActionInfo with torch Categorical's log_prob for policy_optimization algorithms * refactored gnn example and added single-process script * removed obsolete files from gnn * lint issue fix * formatting * checked out gnn files from origin/v0.2 * refactored distributed rl toolkit * finished distributed rl refactoring and updated dqn example and notebook * merged request_rollout with collect * some refinement * refactored examples * distributed rl revamping complete * bug and formatting fixes * bug fixes * hid proxy instantiation inside dist components * small refinement * refined distributed RL and updated docs * updated docs and notebook * rm unwanted imports * added missing files * rm unwanted files * lint issue fix * bug fix * example doc update * rm agent_manager.svg * updated images * updated image file name in doc * revamped cim example code structure * added missing file * restored default training config for dqn and ac-gnn * added default loss function for actor-critic * rm unwanted import * updated README for cim/ac * removed log_p param for PolicyGradient train() * added READMEs for CIM * renamed ac-gnn to ac_gnn * updated README for CIM and added set_seeds to multi-process dqn * init * remove unit, make it same as logic * init by sku, world sku * init by sku, world sku * remove debug code * correct snapshot number issue * rename logic to unit, make it meaningful * add facility base * refine naming * refine the code, more comment to make it easy to read * add supplier facility, logic not tested yet * fix bug in facility initialize, add consumerunit not completed * refactoring the facilities in world config * add consumer for warehouse facility * add upstream topology, and save it state * add mapping from id to data model index * logic without reward of consumer * bug fix * seller unit * use tcod for path finding * retailer facility * bug fix, show seller demands in example * add a interactive and renderable env wrapper to later debugging * move font to subfolder with lisence to make it more clearly * add more details for node mapping * dispatch action by unit id * merge the frame changes to support data model inherit * add action for consumer, so that we can push the requirement * add unit id and facility in state for unit, add storage id for manufacture unit to simple the state retrieving * show manufacture related debug info step by step * add bom info for debug * add x,y to facility, bug fix * fix bugs in transport and distribution unit, correct the path finding issue * show vehicle movement in screen * remove completed todo * fix vehicle location issue, make all units and data model class from configs * show more states * fix slot number bug for dynamic backend * rename suppliers to manufactures * add missing file * remove code config, use yml instead * add 2 different step modes * update changes * rename manufacture * add action for manufacture unit * more attribute for states * add balance sheet * rename attribute to unify the feature name * reverted experimental changes in dqn learner * updated notebook * rm supply chain code * lint issue fix * lint issue fix * added missing file * added general rollout workflow and trajectory class * refactored * more refactoring * checked out backend from v0.2 * checked out setup.py from v0.2 Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> Co-authored-by: chaosyu <chaos.you@gmail.com>

* Add the price model * fix the error * Refine the energy consumption * Fix the error * Delete business_engine_20210225104622.py * Delete * Delete the history file * Delete common_20210205152100.py * Delete common_20210302150646.py * Refine the code * Refine the code * Refine the code * Delete history files * Fix the error * Fix the error * Fix the error * Fix the error * Fix the error * Fix the error * refine the code * Refine the code * Delete the history file * Fix the error * Fix the error * Fix the error * Refine the code * fix the error * fix the error * fix the error * Refine the code * Add toy files * Refine the code * Refine the code * Add file * Refine the code Co-authored-by: aiming hao <37905948+hamcoder@users.noreply.github.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

codecov · 2021-03-21T12:28:58Z

Codecov Report

Merging #297 (ed0ff75) into master (9b1aca9) will increase coverage by 3.92%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #297      +/-   ##
==========================================
+ Coverage   64.37%   68.29%   +3.92%     
==========================================
  Files         113      115       +2     
  Lines        5553     6123     +570     
==========================================
+ Hits         3575     4182     +607     
+ Misses       1978     1941      -37

Flag	Coverage Δ
unittests	`68.29% <ø> (+3.92%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
maro/utils/logger.py	`44.27% <0.00%> (-20.62%)`	⬇️
maro/communication/message.py	`67.39% <0.00%> (-19.46%)`	⬇️
maro/simulator/scenarios/cim/business_engine.py	`78.54% <0.00%> (-7.28%)`	⬇️
maro/simulator/scenarios/citi_bike/common.py	`64.81% <0.00%> (-6.02%)`	⬇️
maro/rl/exploration/epsilon_greedy_explorer.py	`47.05% <0.00%> (-2.95%)`	⬇️
maro/communication/registry_table.py	`91.53% <0.00%> (-2.73%)`	⬇️
maro/rl/agent/abs_agent.py	`52.00% <0.00%> (-1.34%)`	⬇️
maro/communication/driver/zmq_driver.py	`21.18% <0.00%> (-0.56%)`	⬇️
maro/communication/proxy.py	`22.69% <0.00%> (-0.41%)`	⬇️
maro/utils/utils.py	`48.80% <0.00%> (ø)`
... and 56 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b1aca9...ed0ff75. Read the comment docs.

Meroy9819 · 2021-03-22T03:33:58Z

* streamit with questdb * script to import current dump data, except attention file, use influxdb line protocol for batch sending. * refine the interface to flatten dictionary * add messagetype.file to upload file later * correct tag name * correct the way to initial streamit, make it possible to use it any where after start * add data collecting in cim business engine * streamit client refactoring * fix import issue * update cim hello world, with a commented code to enable vis data streaming * fix metric replace bug * refactor the type checking code * maro geo vis * add new line * doc update * lint refine * lint update * lint updata * lint update * lint update * lint update * code revert * add declare * code revert * add new line * add comment * delete surplus * delete core * lint update * lint update * lint update * lint update * specify version * lint update * specify docker version * import sort * backend revert * Delete test.py * format refact * doc update * import orders * change import orders * change import orders * add version of http-server * add specified port * delete print * lint update * lint update * lint update * update doc * dependecy update * update business engine * business engine * business engine update * doc update * delete irelevant file * update data * doc update Co-authored-by: chaosyu <chaos.you@gmail.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* fixed internal logger dumplicated output * delete unused import * fixed isort

Arthur Jiang and others added 30 commits October 1, 2020 15:08

refine readme

95c6413

Merge branch 'v0.1' of https://github.com/microsoft/maro into v0.1

ddd15a5

Merge branch 'master' into v0.1

de6f89d

feat: refine data push/pull (#138)

b3c01ca

* feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review

Merge branch 'master' into v0.1

080714c

Merge branch 'master' into v0.1

1fa9854

Merge branch 'master' into v0.1

ef7f870

added example docs (#136)

5bfc5eb

* added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com>

Merge branch 'master' into v0.1

11714ea

Merge branch 'v0.1' of https://github.com/microsoft/maro into v0.1

71c7e5b

Merge branch 'master' into v0.1

1c9e60d

Merge branch 'master' into v0.1

0b11548

Merge branch 'master' into v0.2

5a0e622

Merge branch 'master' into v0.2

1f6d5a7

Merge branch 'v0.1' into v0.2

dd2bc2b

merge with master

577bb1c

Merge branch 'master' into v0.2

2625427

merge master into this branch; update according to isort

977891f

update according to flake8

fb26bbf

fixed a bug in learner's test() (#193)

9afaac4

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

ysqyang and others added 19 commits February 1, 2021 16:57

delete duplicated rule based algorithms for VM scheduling

b66a5bd

Add slot filter functions for node attribute (#273)

8b0f8c9

* add where filter for general usage * test for general filter * simpler comparison for attribute * filter on raw * fix array fetch bug * ut for base comparison * lint fix * remove unused variables * update ignore

Merge branch 'master' into v0.2

ab7b5eb

Fix coding style (#284)

f2c82fb

Merge branch 'master' into v0.2

7195f83

add vm_scheduling meta into package data

68096f5

Merge branch 'v0.2' of github.com:microsoft/maro into v0.2

b02e565

Jinyu-W requested review from chaosddp and ysqyang March 21, 2021 12:28

Jinyu-W changed the title ~~Update features in v0.2 into branch master~~ Update features in v0.2 into branch master to release a new version Mar 21, 2021

ysqyang previously approved these changes Mar 22, 2021

View reviewed changes

fixed internal logger dumplicated output (#299)

ed0ff75

* fixed internal logger dumplicated output * delete unused import * fixed isort

Jinyu-W dismissed ysqyang’s stale review via ed0ff75 March 22, 2021 06:42

chaosddp approved these changes Mar 22, 2021

View reviewed changes

Jinyu-W merged commit cee5277 into master Mar 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update features in v0.2 into branch master to release a new version#297

Update features in v0.2 into branch master to release a new version#297
Jinyu-W merged 99 commits into
masterfrom
v0.2

Jinyu-W commented Mar 21, 2021 •

edited by ysqyang

Loading

Uh oh!

codecov Bot commented Mar 21, 2021 •

edited

Loading

Uh oh!

Meroy9819 commented Mar 22, 2021

Description

Linked issue(s)/Pull request(s)

Type of Change

Related Component

Has Been Tested

Needs Follow Up Actions

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Conversation

Jinyu-W commented Mar 21, 2021 • edited by ysqyang Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Linked issue(s)/Pull request(s)

Type of Change

Related Component

Has Been Tested

Needs Follow Up Actions

Checklist

Uh oh!

codecov Bot commented Mar 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Meroy9819 commented Mar 22, 2021

Description

Linked issue(s)/Pull request(s)

Type of Change

Related Component

Has Been Tested

Needs Follow Up Actions

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Jinyu-W commented Mar 21, 2021 •

edited by ysqyang

Loading

codecov Bot commented Mar 21, 2021 •

edited

Loading