Refine RL todos #1332

lihuoran · 2022-10-27T11:11:56Z

Description

Refine the return value of collect_data_loop: make it more structural & add more annotations. Refine the report generation logic in RL backtest accordingly.
Decouple Interpreter with Env. Now, each interpreter has its own tick counter, so it does not need to get this information from Env. Remove CollectDataEnvWrapper since it is no longer needed.
Move SAOEStateAdapter and related functions from qlib/rl/order_execution/state.py to qlib/rl/order_execution/strategy.py. (This was supposed to be done in PR 1316 but we somehow missed it).
Change related test files.

Motivation and Context

How Has This Been Tested?

Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

Pipeline test:
Your own tests:

Types of changes

Fix bugs
Add new feature
Update documentation

ultmaster · 2022-10-28T06:23:14Z

qlib/rl/utils/env_wrapper.py

-        super().__init__()
-
-        for obj in [state_interpreter, action_interpreter, reward_fn, aux_info_collector]:
+        for obj in [reward_fn, aux_info_collector]:


In the original design, (1) state interpreter, (2) action interpreter, (3) reward function, (4) auxiliary info collector, are born equal and treated equally in the EnvWrapper. This change breaks such philosophy. Extra discussion might be needed for this.

Another reason that motivates the previous design is that I've seen some algorithms that need extra communication between components. For example, reward function might rely on extra states calculated by state interpreter to fasten its calculation. Giving them access to env wrapper makes such "hacking" possible.

Understood. I reverted this change to let interpreters have env again. It's just that env won't be used at the moment.

ultmaster · 2022-10-28T06:28:53Z

qlib/rl/order_execution/strategy.py

    def reset(self, outer_trade_decision: BaseTradeDecision = None, **kwargs: Any) -> None:
        super().reset(outer_trade_decision=outer_trade_decision, **kwargs)

-        # In backtest, env.reset() needs to be manually called since there is no outer trainer to call it


Why can't we simply create a DummyEnvWrapper here?

In my personal understanding, we don't need a DummyEnvWrapper at least for now.

you-n-g · 2022-10-30T10:51:16Z

qlib/rl/order_execution/strategy.py

+    return pd.DatetimeIndex(ret)
+
+
+def fill_missing_data(


Hope #1333 can help to remove this function.

* Remove Dropna limitation of `quote_df` of Exchange * Impreove docstring

qlib/backtest/backtest.py

qlib/rl/data/native.py

you-n-g · 2022-11-07T07:09:20Z

qlib/rl/interpreter.py

@@ -35,12 +31,19 @@ class Interpreter:
    states by calling ``self.env.register_state()``, but it's not planned for first iteration.
    """

+    def __init__(self) -> None:


It is weird to maintain these states in the Interpreter,
we may have a discussion later about where there is a better design.

I think it is more reasonable to get current step from state.

Agree. I think it's better to have them in EnvWrapperStatus because it's designed for this purpose.
But it would require adding DummyEnv back.

you-n-g · 2022-11-07T13:25:25Z

qlib/rl/order_execution/policy.py

@@ -148,7 +148,11 @@ def __init__(
            action_space=action_space,
        )
        if weight_file is not None:
-            load_weight(self, weight_file)
+            loaded_weight = torch.load(weight_file, map_location="cpu")
+            if "vessel" in loaded_weight:


It seems that loading weight makes some assumptions about the loaded data than simply loading the data directly.
It looks like a reusable feature.

Will it be better to create a standalone function for it?

Sounds good, but I didn't find a good way to do this. Any suggestions?

@lihuoran
If I understand it correctly, it tries to load a policy state dict from a dumped Trainer.
Do you think it is a good idea to create a static utility function for Trainer with a name like get_policy_state_dict(ckpt_path)

you-n-g · 2022-11-07T13:40:23Z

qlib/rl/order_execution/strategy.py

-        # In backtest, env.step() needs to be manually called since there is no outer trainer to call it
-        if self._backtest:
-            self._env.step(None)
+        self._state_interpreter.step()


It is weird to help the interpreter maintain its state.
Will it be better to create a cur_step attribute in the state returned by the simulator?

you-n-g · 2022-11-07T13:41:25Z

qlib/rl/utils/env_wrapper.py

@@ -196,6 +179,8 @@ def reset(self, **kwargs: Any) -> ObsType:
            )

            self.simulator.env = cast(EnvWrapper, weakref.proxy(self))
+            self.state_interpreter.reset()


Please refer to other discussions about interpreter.

you-n-g · 2022-11-07T13:41:30Z

qlib/rl/utils/env_wrapper.py

@@ -230,6 +215,8 @@ def step(self, policy_action: PolicyActType, **kwargs: Any) -> Tuple[ObsType, fl

        # Use the converted action of update the simulator
        self.simulator.step(action)
+        self.state_interpreter.step()


Please refer to other discussions about interpreter.

you-n-g · 2022-11-07T13:41:55Z

tests/rl/test_qlib_simulator.py

@@ -183,8 +183,7 @@ def test_interpreter() -> None:
    order = get_order()
    simulator = get_simulator(order)
    interpreter_action = CategoricalActionInterpreter(values=NUM_EXECUTION)
-    interpreter_action.env = CollectDataEnvWrapper()
-    interpreter_action.env.reset()
+    interpreter_action.reset()


Please refer to other discussions about interpreter.

you-n-g · 2022-11-07T13:42:08Z

tests/rl/test_saoe_simple.py

-    interpreter_action.env.reset()
-    interpreter_action_twap.env.reset()
+    interpreter_action.reset()
+    interpreter_action_twap.reset()


Please refer to other discussions about interpreter.

you-n-g · 2022-11-07T14:14:41Z

Maybe further discussions are required to support cur_step in simulator_qlib.
Maybe we have to maintain a cur_step with a different meaning in trade_calendar.
Ask me if you need help.

* Refine several todos * CI issues * Remove Dropna limitation of `quote_df` in Exchange (microsoft#1334) * Remove Dropna limitation of `quote_df` of Exchange * Impreove docstring * Fix type error when expression is specified (microsoft#1335) * Refine fill_missing_data() * Remove several TODO comments * Add back env for interpreters * Change Literal import * Resolve PR comments * Move to SAOEState * Add Trainer.get_policy_state_dict() * Mypy issue Co-authored-by: you-n-g <you-n-g@users.noreply.github.com>

lihuoran and others added 2 commits October 27, 2022 18:49

Refine several todos

c22e19b

CI issues

9d32b9d

lihuoran requested review from you-n-g and ultmaster October 27, 2022 11:11

ultmaster reviewed Oct 28, 2022

View reviewed changes

you-n-g mentioned this pull request Oct 30, 2022

Remove Dropna limitation of quote_df of Exchange #1333

Closed

5 tasks

you-n-g reviewed Oct 30, 2022

View reviewed changes

you-n-g and others added 6 commits November 3, 2022 09:01

Remove Dropna limitation of quote_df in Exchange (#1334)

ad4a1da

* Remove Dropna limitation of `quote_df` of Exchange * Impreove docstring

Fix type error when expression is specified (#1335)

ea5003d

Refine fill_missing_data()

30fc8ee

Remove several TODO comments

493538d

Add back env for interpreters

471c674

Change Literal import

73851c8

you-n-g reviewed Nov 7, 2022

View reviewed changes

lihuoran and others added 4 commits November 8, 2022 12:51

Resolve PR comments

54f9a6c

Move to SAOEState

1f695d6

Add Trainer.get_policy_state_dict()

4039274

Mypy issue

8c81135

ultmaster approved these changes Nov 10, 2022

View reviewed changes

you-n-g merged commit 3579484 into main Nov 10, 2022

lihuoran deleted the huoran/fix_rl_todos branch November 29, 2022 03:59

you-n-g added the enhancement New feature or request label Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine RL todos #1332

Refine RL todos #1332

lihuoran commented Oct 27, 2022

ultmaster Oct 28, 2022

ultmaster Oct 28, 2022

lihuoran Nov 4, 2022

ultmaster Oct 28, 2022

lihuoran Nov 4, 2022

you-n-g Oct 30, 2022

you-n-g Nov 7, 2022

you-n-g Nov 7, 2022

ultmaster Nov 8, 2022

you-n-g Nov 7, 2022

lihuoran Nov 8, 2022

you-n-g Nov 8, 2022

you-n-g Nov 7, 2022

you-n-g Nov 7, 2022

you-n-g Nov 7, 2022

you-n-g Nov 7, 2022

you-n-g Nov 7, 2022

you-n-g commented Nov 7, 2022

Refine RL todos #1332

Refine RL todos #1332

Conversation

lihuoran commented Oct 27, 2022

Description

Motivation and Context

How Has This Been Tested?

Screenshots of Test Results (if appropriate):

Types of changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

you-n-g commented Nov 7, 2022