Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine RL todos #1332

Merged
merged 12 commits into from Nov 10, 2022
Merged

Refine RL todos #1332

merged 12 commits into from Nov 10, 2022

Conversation

lihuoran
Copy link
Contributor

Description

  • Refine the return value of collect_data_loop: make it more structural & add more annotations. Refine the report generation logic in RL backtest accordingly.
  • Decouple Interpreter with Env. Now, each interpreter has its own tick counter, so it does not need to get this information from Env. Remove CollectDataEnvWrapper since it is no longer needed.
  • Move SAOEStateAdapter and related functions from qlib/rl/order_execution/state.py to qlib/rl/order_execution/strategy.py. (This was supposed to be done in PR 1316 but we somehow missed it).
  • Change related test files.

Motivation and Context

How Has This Been Tested?

  • Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
  • If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

  1. Pipeline test:
  2. Your own tests:

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

super().__init__()

for obj in [state_interpreter, action_interpreter, reward_fn, aux_info_collector]:
for obj in [reward_fn, aux_info_collector]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original design, (1) state interpreter, (2) action interpreter, (3) reward function, (4) auxiliary info collector, are born equal and treated equally in the EnvWrapper. This change breaks such philosophy. Extra discussion might be needed for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason that motivates the previous design is that I've seen some algorithms that need extra communication between components. For example, reward function might rely on extra states calculated by state interpreter to fasten its calculation. Giving them access to env wrapper makes such "hacking" possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I reverted this change to let interpreters have env again. It's just that env won't be used at the moment.

def reset(self, outer_trade_decision: BaseTradeDecision = None, **kwargs: Any) -> None:
super().reset(outer_trade_decision=outer_trade_decision, **kwargs)

# In backtest, env.reset() needs to be manually called since there is no outer trainer to call it
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we simply create a DummyEnvWrapper here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my personal understanding, we don't need a DummyEnvWrapper at least for now.

return pd.DatetimeIndex(ret)


def fill_missing_data(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope #1333 can help to remove this function.

qlib/backtest/backtest.py Show resolved Hide resolved
qlib/backtest/backtest.py Show resolved Hide resolved
qlib/rl/data/native.py Outdated Show resolved Hide resolved
@@ -35,12 +31,19 @@ class Interpreter:
states by calling ``self.env.register_state()``, but it's not planned for first iteration.
"""

def __init__(self) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is weird to maintain these states in the Interpreter,
we may have a discussion later about where there is a better design.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more reasonable to get current step from state.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I think it's better to have them in EnvWrapperStatus because it's designed for this purpose.
But it would require adding DummyEnv back.

@@ -148,7 +148,11 @@ def __init__(
action_space=action_space,
)
if weight_file is not None:
load_weight(self, weight_file)
loaded_weight = torch.load(weight_file, map_location="cpu")
if "vessel" in loaded_weight:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that loading weight makes some assumptions about the loaded data than simply loading the data directly.
It looks like a reusable feature.

Will it be better to create a standalone function for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, but I didn't find a good way to do this. Any suggestions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lihuoran
If I understand it correctly, it tries to load a policy state dict from a dumped Trainer.
Do you think it is a good idea to create a static utility function for Trainer with a name like get_policy_state_dict(ckpt_path)

# In backtest, env.step() needs to be manually called since there is no outer trainer to call it
if self._backtest:
self._env.step(None)
self._state_interpreter.step()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is weird to help the interpreter maintain its state.
Will it be better to create a cur_step attribute in the state returned by the simulator?

@@ -196,6 +179,8 @@ def reset(self, **kwargs: Any) -> ObsType:
)

self.simulator.env = cast(EnvWrapper, weakref.proxy(self))
self.state_interpreter.reset()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to other discussions about interpreter.

@@ -230,6 +215,8 @@ def step(self, policy_action: PolicyActType, **kwargs: Any) -> Tuple[ObsType, fl

# Use the converted action of update the simulator
self.simulator.step(action)
self.state_interpreter.step()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to other discussions about interpreter.

@@ -183,8 +183,7 @@ def test_interpreter() -> None:
order = get_order()
simulator = get_simulator(order)
interpreter_action = CategoricalActionInterpreter(values=NUM_EXECUTION)
interpreter_action.env = CollectDataEnvWrapper()
interpreter_action.env.reset()
interpreter_action.reset()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to other discussions about interpreter.

interpreter_action.env.reset()
interpreter_action_twap.env.reset()
interpreter_action.reset()
interpreter_action_twap.reset()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to other discussions about interpreter.

@you-n-g
Copy link
Collaborator

you-n-g commented Nov 7, 2022

Maybe further discussions are required to support cur_step in simulator_qlib.
Maybe we have to maintain a cur_step with a different meaning in trade_calendar.
Ask me if you need help.

@you-n-g you-n-g merged commit 3579484 into main Nov 10, 2022
@lihuoran lihuoran deleted the huoran/fix_rl_todos branch November 29, 2022 03:59
@you-n-g you-n-g added the enhancement New feature or request label Dec 9, 2022
qianyun210603 pushed a commit to qianyun210603/qlib that referenced this pull request Mar 23, 2023
* Refine several todos

* CI issues

* Remove Dropna limitation of `quote_df` in Exchange  (microsoft#1334)

* Remove Dropna limitation of `quote_df` of Exchange

* Impreove docstring

* Fix type error when expression is specified (microsoft#1335)

* Refine fill_missing_data()

* Remove several TODO comments

* Add back env for interpreters

* Change Literal import

* Resolve PR comments

* Move  to SAOEState

* Add Trainer.get_policy_state_dict()

* Mypy issue

Co-authored-by: you-n-g <you-n-g@users.noreply.github.com>
qianyun210603 pushed a commit to qianyun210603/qlib that referenced this pull request Mar 23, 2023
* Refine several todos

* CI issues

* Remove Dropna limitation of `quote_df` in Exchange  (microsoft#1334)

* Remove Dropna limitation of `quote_df` of Exchange

* Impreove docstring

* Fix type error when expression is specified (microsoft#1335)

* Refine fill_missing_data()

* Remove several TODO comments

* Add back env for interpreters

* Change Literal import

* Resolve PR comments

* Move  to SAOEState

* Add Trainer.get_policy_state_dict()

* Mypy issue

Co-authored-by: you-n-g <you-n-g@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants