Skip to content

Add python API for get_gradients() method. #28926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

pritamdamania87
Copy link
Contributor

@pritamdamania87 pritamdamania87 commented Oct 30, 2019

Stack from ghstack:

The get_gradients method was a pybind only method without any
documentation for this method for users.

I've moved this method to our python distributed autograd API and ensured that
we have appropriate docs for this method.

Differential Revision: D18234443

The get_gradients method was a pybind only method without any
documentation for this method for users.

I've moved this method to our python distributed autograd API and ensured that
we have appropriate docs for this method.

Differential Revision: [D18234443](https://our.internmc.facebook.com/intern/diff/D18234443/)

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this pull request Oct 30, 2019
The get_gradients method was a pybind only method without any
documentation for this method for users.

I've moved this method to our python distributed autograd API and ensured that
we have appropriate docs for this method.

Differential Revision: [D18234443](https://our.internmc.facebook.com/intern/diff/D18234443/)

ghstack-source-id: 92941094
Pull Request resolved: #28926
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a comment on the PR itself.

Why is this map keyed by tensors? Reason I ask: in autograd backward causes the grads to be accumulated in the grad member field of tensors with requires_grad=True. If you want access to the grads without accumulating them (i.e. compute them without side effects), you can use the torch.autograd.grad function. Similarly, I would expect dist_autograd.backward to accumulate grads in the grad member fields.

@pritamdamania87
Copy link
Contributor Author

@pietern The reason we don't accumulate grads on the .grad field is that multiple autograd passes from different trainers would end up stepping on each other. That is why we accumulate the grads in the autograd context instead. We had a discussion with @albanD and @ezyang and it made sense to also provide an option to accumulate gradients on the .grad field as well: #27641.

@ezyang
Copy link
Contributor

ezyang commented Oct 31, 2019

You don't have to do this to attach documentation. Take a look at what we do in torch/_tensor_docs.py

@pritamdamania87
Copy link
Contributor Author

@ezyang If we have a handful of methods, wouldn't it be better to attach the documentation as we've done in this PR? This way its easier for developers navigating the code to quickly see the documentation inline itself.

@ezyang
Copy link
Contributor

ezyang commented Nov 1, 2019

Making Python wrappers for C functions adds nontrivial overhead; it's one of the main reasons we've done it this way for regular torch functions. So in this case we decided to give up "there is alway a def ... for every Python function" in the name of performance.

@pritamdamania87
Copy link
Contributor Author

I agree this makes a lot of sense for the tensor methods. Although, in this case isn't each method called only once per forward and backward pass? Do you feel the performance overhead would still be significant?

@ezyang
Copy link
Contributor

ezyang commented Nov 1, 2019

Yes, you might be fine. So the request to do it the other way is more of a hygiene thing (so that someone else doesn't come along and copy what you did here for a case where the performance does matter.)

@pritamdamania87
Copy link
Contributor Author

@ezyang So it looks like add_docstr doesn't work since pybind is probably already setting the docstring and as a result we fail here:

"method '%s' already has a docstring", m->d_method->ml_name);

@albanD
Copy link
Collaborator

albanD commented Nov 6, 2019

@pritamdamania87 If I remember correctly, if you pass a char* as the 3rd argument of the module def() method, it will be used as the docstring for the function. We already do that for example for the cpp extensions here.
Would that work for you?

The get_gradients method was a pybind only method without any
documentation for this method for users.

I've moved this method to our python distributed autograd API and ensured that
we have appropriate docs for this method.

Differential Revision: [D18234443](https://our.internmc.facebook.com/intern/diff/D18234443/)

[ghstack-poisoned]
The get_gradients method was a pybind only method without any
documentation for this method for users.

I've moved this method to our python distributed autograd API and ensured that
we have appropriate docs for this method.

Differential Revision: [D18234443](https://our.internmc.facebook.com/intern/diff/D18234443/)

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this pull request Nov 8, 2019
Pull Request resolved: #28926

The get_gradients method was a pybind only method without any
documentation for this method for users.

I've moved this method to our python distributed autograd API and ensured that
we have appropriate docs for this method.
ghstack-source-id: 93558845

Differential Revision: [D18234443](https://our.internmc.facebook.com/intern/diff/D18234443/)
@pritamdamania87
Copy link
Contributor Author

@albanD @ezyang Moved the docs to the pybind initialization. Could you take another look? Thanks!

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 17b0ab4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants