Specifically, when running the baseline PPO strategy, the reward on the validation set remains constant every time. After checking the training log, I found that the model takes a "sell all" action every time. May I ask what might be causing this issue? How should I do?
The code is almost unchanged, except for modifications to workflow.py and order_gen.py. Because running the original code kept throwing this error: TypeError: Cannot compare Timestamp with datetime.date. Use ts == pd.Timestamp(date) or ts.date() == date instead.

Specifically, when running the baseline PPO strategy, the reward on the validation set remains constant every time. After checking the training log, I found that the model takes a "sell all" action every time. May I ask what might be causing this issue? How should I do?
The code is almost unchanged, except for modifications to workflow.py and order_gen.py. Because running the original code kept throwing this error: TypeError: Cannot compare Timestamp with datetime.date. Use ts == pd.Timestamp(date) or ts.date() == date instead.