Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The reward function encourages short episodes #40

Open
MHamza-Y opened this issue Nov 13, 2021 · 2 comments
Open

The reward function encourages short episodes #40

MHamza-Y opened this issue Nov 13, 2021 · 2 comments

Comments

@MHamza-Y
Copy link

MHamza-Y commented Nov 13, 2021

I am trying to train a reinforcement learning algorithm to control basal rate using the given gym environment. The problem with reward function is it encourages as short episode as possible. I have tried different algorithms and hyper parameters variations. But the policy always learns to either output 0 or max basal value. To avoid accumulating any more penalty because of the long episode. Can the reward function be improved somehow?

@lorenzobrigato
Copy link

lorenzobrigato commented Nov 29, 2022

Any updates on this? Or solutions? I am also having some issues with different algorithms and hyper-parameters and experiencing similar behavior.

@jxx123
Copy link
Owner

jxx123 commented Dec 7, 2022

The documentation has a section showing how to use a custom reward function, https://github.com/jxx123/simglucose#openai-gym-usage, which serves exactly your purpose of tuning the reward function.

The default reward function is not intended to give you a nice reward (especially long-term reward), and you are supposed to define your own reward function.

But for the prosperity, it will be nice if anyone could share their insights and their carefully designed reward functions here. I could collect them and put them in the documentation for visibility (of course show your name to give you the credit).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants