Skip to content

Question about abnormal validation reward of PPO baseline in order execution code #2211

@1486419653

Description

@1486419653

Specifically, when running the baseline PPO strategy, the reward on the validation set remains constant every time. After checking the training log, I found that the model takes a "sell all" action every time. May I ask what might be causing this issue? How should I do?

The code is almost unchanged, except for modifications to workflow.py and order_gen.py. Because running the original code kept throwing this error: TypeError: Cannot compare Timestamp with datetime.date. Use ts == pd.Timestamp(date) or ts.date() == date instead.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions