The documentation says:
"If you call retain_grad() on a non-leaf node, it results in a no-op."
This is misleading or incomplete.
retain_grad() on a non-leaf tensor with requires_grad=True does not result in a no-op — it correctly retains gradients.
Only calling retain_grad() on:
a leaf tensor is a no-op
a tensor with requires_grad=False throws an error