Skip to content

Conversation

lamroger
Copy link
Contributor

@lamroger lamroger commented Jan 11, 2024

@pytorch-bot pytorch-bot bot added the release notes: distributed (c10d) release notes category label Jan 11, 2024
Copy link

pytorch-bot bot commented Jan 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/117224

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 3c8a1a3 with merge base 5046b49 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions github-actions bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jan 11, 2024
@lamroger
Copy link
Contributor Author

I'll write tests tomorrow


@_exception_logger
def all_gather_into_tensor(output_tensor, input_tensor, group=None, async_op=False):
def all_gather_into_tensor(output_tensor, input_tensor, group: ProcessGroup = None, async_op=False):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I matched the type in the doc below but lmk if this is wrong

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add a single ProcessGroup type to this API, the other args like output_tensor and input_tensor does not have type too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The heredoc below specify this

Args:
        output_tensor (Tensor): Output tensor to accommodate tensor elements
            from all ranks. It must be correctly sized to have one of the
            following forms:
            (i) a concatenation of all the input tensors along the primary
            dimension; for definition of "concatenation", see ``torch.cat()``;
            (ii) a stack of all the input tensors along the primary dimension;
            for definition of "stack", see ``torch.stack()``.
            Examples below may better explain the supported output forms.
        input_tensor (Tensor): Tensor to be gathered from current rank.
            Different from the ``all_gather`` API, the input tensors in this
            API must have the same size across all ranks.
        group (ProcessGroup, optional): The process group to work on. If None,
the default process group will be used.

I'm cool with taking it out but wondering if there's a reason why this method shouldnt have type annotations

group, # TODO add a type,
output_tensor: torch.Tensor,
input_tensor: torch.Tensor,
group: ProcessGroup = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functional collective does not only accept a ProcessGroup so please don't add type annotation with ProcessGroup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - thanks for the heads up. I think explicit typing is still helpful but that can be in another PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't mean to not adding types because other field already have types, I think we should add RANK_TYPES like other APIs in this file.


@_exception_logger
def all_gather_into_tensor(output_tensor, input_tensor, group=None, async_op=False):
def all_gather_into_tensor(output_tensor, input_tensor, group: ProcessGroup = None, async_op=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add a single ProcessGroup type to this API, the other args like output_tensor and input_tensor does not have type too

@lamroger lamroger marked this pull request as ready for review January 11, 2024 17:59
@lamroger lamroger requested a review from wanchaol January 11, 2024 17:59
@lamroger
Copy link
Contributor Author

Hi @wanchaol - I'm on a macbook so prob not the best ticket to pickup for local testing. For the test, I mostly grabbed from an existing test and used kwargs instead to test. LMK if there's a better way or example I can follow. Thanks!

@colesbury colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 11, 2024
Copy link
Contributor

@wconstab wconstab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, thanks for adding the test. IIUC you matched the status who for type annotation but if you want to improve the file the type for process group should be like wanchao said

Copy link
Collaborator

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for addressing comments.

@lamroger
Copy link
Contributor Author

lamroger commented Jan 15, 2024

Hi @wconstab @wanchaol @voznesenskym - thanks for the reviews! If it looks good, could someone help merge for me? I'm not authorized. Not sure if there are more steps. thanks!

@wconstab
Copy link
Contributor

wconstab commented Jan 17, 2024

you just need to use pytorchbot to help you merge. There is probably a wiki about it, let me ask it for help to find out where it is

@pytorchbot --help

Copy link

pytorch-bot bot commented Jan 17, 2024

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'help' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'close')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,close} ...

Try @pytorchbot --help for more info.

@wconstab
Copy link
Contributor

@pytorchbot --help

Copy link

pytorch-bot bot commented Jan 17, 2024

PyTorchBot Help

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,close} ...

In order to invoke the bot on your PR, include a line that starts with
@pytorchbot anywhere in a comment. That line will form the command; no
multi-line commands are allowed. Some commands may be used on issues as specified below.

Example:
    Some extra context, blah blah, wow this PR looks awesome

    @pytorchbot merge

optional arguments:
  -h, --help            Show this help message and exit.

command:
  {merge,revert,rebase,label,drci,close}
    merge               Merge a PR
    revert              Revert a PR
    rebase              Rebase a PR
    label               Add label to a PR
    drci                Update Dr. CI
    close               Close a PR

Merge

usage: @pytorchbot merge [-f MESSAGE | -i] [-ic] [-r [{viable/strict,main}]]

Merge an accepted PR, subject to the rules in .github/merge_rules.json.
By default, this will wait for all required checks (lint, pull) to succeed before merging.

optional arguments:
  -f MESSAGE, --force MESSAGE
                        Merge without checking anything. This requires a reason for auditting purpose, for example:
                        @pytorchbot merge -f 'Minor update to fix lint. Expecting all PR tests to pass'
                        
                        Please use `-f` as last resort, prefer `--ignore-current` to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.
  -i, --ignore-current  Merge while ignoring the currently failing jobs.  Behaves like -f if there are no pending jobs.
  -ic                   Old flag for --ignore-current. Deprecated in favor of -i.
  -r [{viable/strict,main}], --rebase [{viable/strict,main}]
                        Rebase the PR to re run checks before merging.  Accepts viable/strict or main as branch options and will default to viable/strict if not specified.

Revert

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Revert a merged PR. This requires that you are a Meta employee.

Example:
  @pytorchbot revert -m="This is breaking tests on trunk. hud.pytorch.org/" -c=nosignal

optional arguments:
  -m MESSAGE, --message MESSAGE
                        The reason you are reverting, will be put in the commit message. Must be longer than 3 words.
  -c {nosignal,ignoredsignal,landrace,weird,ghfirst}, --classification {nosignal,ignoredsignal,landrace,weird,ghfirst}
                        A machine-friendly classification of the revert reason.

Rebase

usage: @pytorchbot rebase [-s | -b BRANCH]

Rebase a PR. Rebasing defaults to the stable viable/strict branch of pytorch.
Repeat contributor may use this command to rebase their PR.

optional arguments:
  -s, --stable          [DEPRECATED] Rebase onto viable/strict
  -b BRANCH, --branch BRANCH
                        Branch you would like to rebase to

Label

usage: @pytorchbot label labels [labels ...]

Adds label to a PR or Issue [Can be used on Issues]

positional arguments:
  labels  Labels to add to given Pull Request or Issue [Can be used on Issues]

Dr CI

usage: @pytorchbot drci 

Update Dr. CI. Updates the Dr. CI comment on the PR in case it's gotten out of sync with actual CI results.

Close

usage: @pytorchbot close

Close a PR [Can be used on issues]

@wconstab
Copy link
Contributor

anyway i will attempt to merge
@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 17, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@lamroger
Copy link
Contributor Author

anyway i will attempt to merge @pytorchbot merge

Ah appreciate it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: distributed (c10d) release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

funccol collectives rewrite in dynamo does not work w/ kwargs

6 participants