Inconsistent description of AMSGrad with code

### 📚 The doc issue

https://github.com/pytorch/pytorch/blob/0bd7b7ae58970f565eb22766549ad9d6d2e23fe6/torch/optim/adam.py#L469-L476

In the code, the bias correction term $1-\beta_2^t$ is used after the max operation. However, [the documentation](https://pytorch.org/docs/main/generated/torch.optim.Adam.html#adam) describes that it is used before the max operation:
<img width="254" alt="max operation" src="https://github.com/user-attachments/assets/77e6016d-b225-4aa3-b507-d8fc49d1d4fd">


### Suggest a potential alternative/fix

<img width="500" alt="amsgrad algo notes" src="https://github.com/user-attachments/assets/4aeedca2-457d-4bca-8f7e-a879bae45ef4">

cc @svekars @brycebortree @sekyondaMeta @AlannaBurke @vincentqb @jbschlosser @albanD @janeyx99 @crcrpar

	if amsgrad:
	# Maintains the maximum of all 2nd moment running avg. till now
	torch.maximum(max_exp_avg_sqs[i], exp_avg_sq, out=max_exp_avg_sqs[i])

	# Use the max. for normalizing running avg. of gradient
	denom = (max_exp_avg_sqs[i].sqrt() / bias_correction2_sqrt).add_(eps)
	else:
	denom = (exp_avg_sq.sqrt() / bias_correction2_sqrt).add_(eps)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent description of AMSGrad with code #142323

📚 The doc issue

Suggest a potential alternative/fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistent description of AMSGrad with code #142323

Description

📚 The doc issue

Suggest a potential alternative/fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions