trpo_try #567

ShangYizhan · 2022-12-08T15:29:39Z

Description

Linked issue(s)/Pull request(s)

issue_number

Type of Change

Related Component

Simulation toolkit
RL toolkit
Distributed toolkit

Has Been Tested

OS:
- Windows
- Mac OS
- Linux
Python version:
- 3.7
- 3.8
- 3.9
Key information snapshot(s):

Needs Follow Up Actions

New release package
New docker image

Checklist

Add/update the related comments
Add/update the related tests
Add/update the related documentations
Update the dependent downstream modules usage

lihuoran · 2022-12-27T06:04:52Z

examples/cim/rl/__init__.py

@@ -1,7 +1,7 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.

-from .rl_component_bundle import rl_component_bundle
+from rl_component_bundle import rl_component_bundle


Revert this change, otherwise the run_rl_example.py won't work.

lihuoran · 2022-12-27T06:06:38Z

examples/cim/rl/env_sampler.py

@@ -8,7 +8,7 @@
 from maro.rl.rollout import AbsEnvSampler, CacheElement
 from maro.simulator.scenarios.cim.common import Action, ActionType, DecisionEvent

-from .config import action_shaping_conf, port_attributes, reward_shaping_conf, state_shaping_conf, vessel_attributes
+from config import action_shaping_conf, port_attributes, reward_shaping_conf, state_shaping_conf, vessel_attributes


Revert this change, otherwise the run_rl_example.py won't work.

lihuoran · 2022-12-27T06:06:48Z

examples/cim/rl/rl_component_bundle.py

-from .algorithms.ppo import get_ppo, get_ppo_policy
-from examples.cim.rl.config import action_num, algorithm, env_conf, reward_shaping_conf, state_dim
-from examples.cim.rl.env_sampler import CIMEnvSampler
+from algorithms.ac import get_ac, get_ac_policy


Revert this change, otherwise the run_rl_example.py won't work.

lihuoran · 2022-12-27T06:07:34Z

maro/rl/policy/abs_policy.py

@@ -383,3 +383,5 @@ def to_device(self, device: torch.device) -> None:
    def _to_device_impl(self, device: torch.device) -> None:
        """Implementation of `to_device`."""
        raise NotImplementedError
+
+


Remove unnecessary blank lines (you may run pre-commit run --all to do auto-formatting).

lihuoran · 2022-12-27T06:08:36Z

maro/rl/training/algorithms/__init__.py

@@ -6,6 +6,7 @@
 from .dqn import DQNParams, DQNTrainer
 from .maddpg import DiscreteMADDPGParams, DiscreteMADDPGTrainer
 from .ppo import DiscretePPOWithEntropyTrainer, PPOParams, PPOTrainer
+from .trpo import *


Do not use from XXX import * since it is ambiguous.

lihuoran · 2022-12-27T06:33:38Z