Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning on ProcessGroup and ProcessGroup::Work APIs #46220

Closed
wants to merge 12 commits into from

Conversation

agolynski
Copy link
Contributor

@agolynski agolynski commented Oct 12, 2020

Stack from ghstack:

Differential Revision: D24294437

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Oct 12, 2020
@agolynski agolynski changed the title warning Add warning on ProcessGroup and ProcessGroup::Work APIs Oct 12, 2020
@dr-ci
Copy link

dr-ci bot commented Oct 12, 2020

💊 CI failures summary and remediations

As of commit 1a36843 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

codecov.io: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 32 times.

agolynski added a commit that referenced this pull request Oct 12, 2020
ghstack-source-id: 70fae3b9745be76053f12a847873e8f6a9e1900b
Pull Request resolved: #46220
Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @agolynski, I might not convey this clearly. The goal is that, whenever users call certain APIs on Work, we print the warning message. We will need to add things like TORCH_WARN_ONCE to impacted APIs. The impacted ones are those that won't be available in the new Future API. At least including the following ones.

.def("is_success", &::c10d::ProcessGroup::Work::isSuccess)
.def("exception", &::c10d::ProcessGroup::Work::exception)
.def("source_rank", &::c10d::ProcessGroup::Work::sourceRank)

.def("synchronize", &::c10d::ProcessGroup::Work::synchronize)

@wanchaol @gmagogsfm please comment if there are more impacted APIs.

@codecov
Copy link

codecov bot commented Oct 13, 2020

Codecov Report

Merging #46220 into gh/agolynski/10/base will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@                  Coverage Diff                  @@
##           gh/agolynski/10/base   #46220   +/-   ##
=====================================================
  Coverage                 68.33%   68.33%           
=====================================================
  Files                       410      410           
  Lines                     53795    53795           
=====================================================
  Hits                      36760    36760           
  Misses                    17035    17035           
Impacted Files Coverage Δ
torch/distributed/distributed_c10d.py 27.29% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2e5ae4...1a36843. Read the comment docs.

@mrshenli
Copy link
Contributor

@agolynski

Raised the concern that APIs like source_rank is also used by other features like torch.distributed.recv, so adding a warning message will impact those as well. To avoid this, can we do sth like (this can also apply to other APIs):

  1. provide both source_rank and _source_rank APIs.
  2. change our code to use _source_rank
  3. let source_rank print warning

Copy link
Contributor

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, minor comments inline. Thanks for doing this!

@@ -974,15 +974,48 @@ that adds a prefix to each key inserted to the store.

shared_ptr_class_<::c10d::ProcessGroup::Work>(module, "Work")
.def("is_completed", &::c10d::ProcessGroup::Work::isCompleted)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we deprecate this as well? This is called done in future, see

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is risky, as we did expose this API in our API doc. If you wanna change this, please mention it where it is defined in https://github.com/pytorch/pytorch/blob/master/docs/source/distributed.rst as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undone

@@ -157,6 +160,12 @@ class ProcessGroup {
return size_;
}

// *************************************************************************
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add this comment in the beginning of ProcessGroup class instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

@agolynski
Copy link
Contributor Author

Hey @agolynski, I might not convey this clearly. The goal is that, whenever users call certain APIs on Work, we print the warning message. We will need to add things like TORCH_WARN_ONCE to impacted APIs. The impacted ones are those that won't be available in the new Future API. At least including the following ones.

.def("is_success", &::c10d::ProcessGroup::Work::isSuccess)
.def("exception", &::c10d::ProcessGroup::Work::exception)
.def("source_rank", &::c10d::ProcessGroup::Work::sourceRank)

.def("synchronize", &::c10d::ProcessGroup::Work::synchronize)

@wanchaol @gmagogsfm please comment if there are more impacted APIs.

done

@agolynski
Copy link
Contributor Author

@agolynski

Raised the concern that APIs like source_rank is also used by other features like torch.distributed.recv, so adding a warning message will impact those as well. To avoid this, can we do sth like (this can also apply to other APIs):

  1. provide both source_rank and _source_rank APIs.
  2. change our code to use _source_rank
  3. let source_rank print warning

done

Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamp to unblock, but please fix the error I mentioned inline.

"deprecated, please ping "
"https://github.com/pytorch/pytorch/issues/46291 "
"if you see this warning");
return work.isSuccess();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be isCompleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 979 to 982
TORCH_WARN_ONCE("ProcessGroup::Work::is_completed API is being "
"deprecated, please ping "
"https://github.com/pytorch/pytorch/issues/46291 "
"if you see this warning");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what clang-format gives you? The devserver clang-format gives me this:

            TORCH_WARN_ONCE(
                "ProcessGroup::Work::is_completed API is being "
                "deprecated, please ping "
                "https://github.com/pytorch/pytorch/issues/46291 "
                "if you see this warning");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And ditto for other APIs as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reformatted via vscode clang-format

Comment on lines 976 to 1010
.def(
"is_completed",
[](::c10d::ProcessGroup::Work& work) -> bool {
TORCH_WARN_ONCE("ProcessGroup::Work::is_completed API is being "
"deprecated, please ping "
"https://github.com/pytorch/pytorch/issues/46291 "
"if you see this warning");
return work.isSuccess();
})
.def(
"is_success",
[](::c10d::ProcessGroup::Work& work) -> bool {
TORCH_WARN_ONCE("ProcessGroup::Work::is_success API is being "
"deprecated, please ping "
"https://github.com/pytorch/pytorch/issues/46291 "
"if you see this warning");
return work.isSuccess();
})
.def(
"exception",
[](::c10d::ProcessGroup::Work& work) -> std::exception_ptr {
TORCH_WARN_ONCE("ProcessGroup::Work::exception API is being "
"deprecated, please ping "
"https://github.com/pytorch/pytorch/issues/46291 "
"if you see this warning");
return work.exception();
})
.def(
"source_rank",
[](::c10d::ProcessGroup::Work& work) -> int {
TORCH_WARN_ONCE("ProcessGroup::Work::source_rank API is being "
"deprecated, please ping "
"https://github.com/pytorch/pytorch/issues/46291 "
"if you see this warning");
return work.sourceRank();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like preprocessor macro, but it seem to be ideal to avoid string literal duplication, i.e. use something like

#define _DEPRECATED_DEF(name, method) \
  def( name, [](::c10d::ProcessGroup::Work& work) -> int { \
  TORCH_WARN_ONCE("ProcessGroup::Work::" name " API is being " \
                                        "deprecated, please ping " \
                                        "https://github.com/pytorch/pytorch/issues/46291 " \
                                        "if you see this warning"); \
   return work.method(); \
   })
Suggested change
.def(
"is_completed",
[](::c10d::ProcessGroup::Work& work) -> bool {
TORCH_WARN_ONCE("ProcessGroup::Work::is_completed API is being "
"deprecated, please ping "
"https://github.com/pytorch/pytorch/issues/46291 "
"if you see this warning");
return work.isSuccess();
})
.def(
"is_success",
[](::c10d::ProcessGroup::Work& work) -> bool {
TORCH_WARN_ONCE("ProcessGroup::Work::is_success API is being "
"deprecated, please ping "
"https://github.com/pytorch/pytorch/issues/46291 "
"if you see this warning");
return work.isSuccess();
})
.def(
"exception",
[](::c10d::ProcessGroup::Work& work) -> std::exception_ptr {
TORCH_WARN_ONCE("ProcessGroup::Work::exception API is being "
"deprecated, please ping "
"https://github.com/pytorch/pytorch/issues/46291 "
"if you see this warning");
return work.exception();
})
.def(
"source_rank",
[](::c10d::ProcessGroup::Work& work) -> int {
TORCH_WARN_ONCE("ProcessGroup::Work::source_rank API is being "
"deprecated, please ping "
"https://github.com/pytorch/pytorch/issues/46291 "
"if you see this warning");
return work.sourceRank();
._DEPRECATED_DEF("is_completed", isCompleted)
._DEPRECATED_DEF("is_success", isSuccess)
._DEPRECATED_DEF("exception", exception)
._DEPRECATED_DEF("source_rank", sourceRank)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is constexpr + fmt::format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a macro

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and switched to fmt::format since clang-tidy doesn't like macros

@@ -972,17 +972,44 @@ that adds a prefix to each key inserted to the store.
py::call_guard<py::gil_scoped_release>());
#endif

#define PROCESS_GROUP_DEPRECATION_WARNING(api_method) \
TORCH_WARN_ONCE(#api_method \
"API is being deprecated, please ping " \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, do you need a space before "API"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

agolynski added a commit that referenced this pull request Oct 14, 2020
ghstack-source-id: 278069176ca85555d2d3bc458ca399bee5721a54
Pull Request resolved: #46220

Add warnings on deprecated ProcesssGroup::Work functionality

ghstack-source-id: 278069176ca85555d2d3bc458ca399bee5721a54
Pull Request resolved: #46294

fix formatting

ghstack-source-id: 278069176ca85555d2d3bc458ca399bee5721a54
Pull Request resolved: #46295
agolynski added a commit that referenced this pull request Oct 14, 2020
ghstack-source-id: ea13ae5b2b37f929408cff5ffb18995649342ef0
Pull Request resolved: #46220

Add warnings on deprecated ProcesssGroup::Work functionality

ghstack-source-id: ea13ae5b2b37f929408cff5ffb18995649342ef0
Pull Request resolved: #46294

fix formatting

ghstack-source-id: ea13ae5b2b37f929408cff5ffb18995649342ef0
Pull Request resolved: #46295
@agolynski
Copy link
Contributor Author

Clang-tidy is broken now, fix is on the way:
#46315

@mrshenli
Copy link
Contributor

Clang-tidy is broken now, fix is on the way: #46315

Isn't the clang-tidy failure caused by this PR?

[
  {
    path: 'torch/csrc/distributed/c10d/init.cpp',
    start_line: 975,
    end_line: 975,
    start_column: 9,
    end_column: 9,
    annotation_level: 'failure',
    message: "[cppcoreguidelines-macro-usage] warning: function-like macro 'PROCESS_GROUP_DEPRECATION_WARNING' used; consider a 'constexpr' template function"
  }
]

@agolynski
Copy link
Contributor Author

Clang-tidy is broken now, fix is on the way: #46315

Isn't the clang-tidy failure caused by this PR?

[
  {
    path: 'torch/csrc/distributed/c10d/init.cpp',
    start_line: 975,
    end_line: 975,
    start_column: 9,
    end_column: 9,
    annotation_level: 'failure',
    message: "[cppcoreguidelines-macro-usage] warning: function-like macro 'PROCESS_GROUP_DEPRECATION_WARNING' used; consider a 'constexpr' template function"
  }
]

it should be fixed now (before tidy run)

agolynski added a commit that referenced this pull request Oct 14, 2020
ghstack-source-id: f5427d315d18dc2585d68a394f36409602bbc505
Pull Request resolved: #46220
agolynski added a commit that referenced this pull request Oct 14, 2020
ghstack-source-id: f5427d315d18dc2585d68a394f36409602bbc505
Pull Request resolved: #46220
@facebook-github-bot
Copy link
Contributor

@gmagogsfm merged this pull request in e7e919f.

malfet pushed a commit that referenced this pull request Oct 15, 2020
ghstack-source-id: f5427d315d18dc2585d68a394f36409602bbc505
Pull Request resolved: #46220
@facebook-github-bot facebook-github-bot deleted the gh/agolynski/10/head branch October 18, 2020 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged oncall: distributed Add this issue/PR to distributed oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants