Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I think "with torch.no_grad():" is needed when calculating critic loss #16

Closed
dbsxdbsx opened this issue Oct 11, 2020 · 2 comments
Closed

Comments

@dbsxdbsx
Copy link

In file sac_v2_lstm.py, the code below without with torch.no_grad(): when calculating target Q value:

    # Training Q Function
    #  I think `with torch.no_grad(): ` is needed
        predict_target_q1, _ = self.target_soft_q_net1(next_state, new_next_action, action, hidden_out)
        predict_target_q2, _ = self.target_soft_q_net2(next_state, new_next_action, action, hidden_out)
        target_q_min = torch.min(predict_target_q1, predict_target_q2) - self.alpha * next_log_prob
        target_q_value = reward + (1 - done) * gamma * target_q_min # if done==1, only reward

        q_value_loss1 = self.soft_q_criterion1(predicted_q_value1, target_q_value.detach())  # detach: no gradients for the variable
        q_value_loss2 = self.soft_q_criterion2(predicted_q_value2, target_q_value.detach())

Though, for some simple environment, the algorithm would finally converge, but I am not sure the case in complex environment. As far as I know, the parameter for calculating target_q_value should not contribute any gradient to model.

@quantumiracle
Copy link
Owner

Hi,
Is it just what the target_q_value.detach() is achieving?
Best,
zihan

@quantumiracle
Copy link
Owner

Check here.
I think there is no difference since there is only one variable (target_q_value) involved here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants