-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapting A2C and deep policy gradient methods to Discrete envs #20
Comments
Hi, the only problematic part is to remember to cast back to long the action that is converted to float tensor inside the _loss method (or change the line in the a2c code) |
Hi, thanks for the response! Do you know if there is a deep RL method in your library that supports finite actions? Apart from DQN? I will make the necessary changes for A2C if there aren't any available. |
Mostly is missing the policy. After that you should be almost done. All variants of dqn (double, averaged, categorical) support finite actions. no actor-critic method currently supports finite actions. You could also try fitted q iteration with deep networks, even if this algorithm works better with extra trees. |
Describe the bug
Hi, I am getting the following error. I’m trying to use the A2C algorithm, which samples actions from a Gaussian distribution when given a state. The code seems expects torch.float32. I have checked and my inputs are indeed torch.float32 not Long. I’m not sure what to do. I know the algorithm runs as I have run it before, but I get this error when I try to use it in an environment with discrete actions.
Here is the library’s draw_action function:
and
Final error:
Help is much appreciated!
System information (please complete the following information):
Additional context
I'm working with the Gym 'CartPole-v0' environment.
I need these deep policy gradient methods to work for discrete environments as I am testing something for my research. Any advice on this?
Help is greatly appreciated!
The text was updated successfully, but these errors were encountered: