Bug fix for SAC-discrete. #60

toshikwa · 2020-09-17T23:42:59Z

Hi, thank you for your great work :)

I fixed some bugs related to #54 and #56.
I tested it in CartPole.py and saw the training converged more stably.

Changes

fix min_qf_next_target to properly calculate expectations over policy.
similarly fix policy_loss.
fix max_probability_action to properly get argmax over each sample, not entire batch. (It doesn't actually affect the algorithm, though.)
fix device for Replay_Memory to train SAC-Discrete on CPU.
fix errors due to the update of PyTorch (now torch>=1.4.0 works!!).

Thanks :)

p-christ · 2020-09-18T08:00:10Z

thanks a lot

toshikwa · 2020-09-19T20:58:16Z

Thank you for merging.
Could you close relevant issues(54, 56)??

Also, I found that this repo got older and there were a few bugs.
Can I fix bugs and clean up codes??

Thanks :)

p-christ · 2020-09-20T07:22:59Z

Sure will close them. Yes pls do clean up the code and do a new pull request!

On Sat, Sep 19, 2020 at 21:58, Toshiki Watanabe ***@***.***> wrote: Hi @p-christ <https://github.com/p-christ> Thank you for merging. Could you close relevant issues(54, 56)?? Also, I found that this repo got older and there were a few bugs. Can I fix bugs and clean up codes?? Thanks :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#60 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGJAGA6AU2JDIJ3WD7WFR6TSGULPJANCNFSM4RRFGROA> .

toshikwa added 3 commits September 18, 2020 08:20

fix bugs

fbc84b8

fix device to properly calculate SAC-Discrete on cpu

bc6ee5f

fix errors of SAC and SAC-Discrete caused by torch>=1.4.0

8cc5c59

This was referenced Sep 18, 2020

Sac Discrete Error #56

Closed

Mean of expectation in SAC_discrete.py possibly wrong? #54

Closed

p-christ merged commit 8a0dd27 into p-christ:master Sep 18, 2020

Provide feedback