Question about abnormal validation reward of PPO baseline in order execution code

Specifically, when running the baseline PPO strategy, the reward on the validation set remains constant every time. After checking the training log, I found that the model takes a "sell all" action every time. May I ask what might be causing this issue? How should I do?

The code is almost unchanged, except for modifications to workflow.py and order_gen.py. Because running the original code kept throwing this error: TypeError: Cannot compare Timestamp with datetime.date. Use ts == pd.Timestamp(date) or ts.date() == date instead.

<img width="1518" height="1106" alt="Image" src="https://github.com/user-attachments/assets/7f0e21f1-e9f2-4344-acb8-9e9f42e1b4a0" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about abnormal validation reward of PPO baseline in order execution code #2211

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Question about abnormal validation reward of PPO baseline in order execution code #2211

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions