Skip to content

Conversation

rohan-varma
Copy link
Contributor

@rohan-varma rohan-varma commented May 3, 2021

Stack from ghstack:

Closes #45145
Closes #45067

Fixes the flaky tests #45145
and #45067.

The root cause is that it is not the case that all remote events will be
children of the record function remote event, as other events can sometimes be
profiled under the hood such as the issue described in
#43868.

We fix this issue by verifying that the set of events that are children on the
remote end and children on the local end are the same, without necessarily
enforcing specific events to be logged.

Tested by running the test 1000+ times and verifying it passed. Will also test on CI box before landing

Differential Revision: D28166602

Fixes the flaky tests #45145
and #45067.

The root cause is that it is not the case that all remote events will be
children of the record function remote event, as other events can sometimes be
profiled under the hood such as the issue described in
#43868.

We fix this issue by verifying that the set of events that are children on the
remote end and children on the local end are the same, without necessarily
enforcing specific events to be logged.

Tested by running the test 1000+ times and verifying it passed. Will also test on CI box before landing

Differential Revision: [D28166602](https://our.internmc.facebook.com/intern/diff/D28166602/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 3, 2021

💊 CI failures summary and remediations

As of commit 5a4ea57 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 1/2 non-scanned failure(s)

1 failure not recognized by patterns:

Job Step Action
GitHub Actions render_test_results Output Test Results (Click Me) 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added oncall: distributed Add this issue/PR to distributed oncall triage queue cla signed labels May 3, 2021
rohan-varma added a commit that referenced this pull request May 3, 2021
Fixes the flaky tests #45145
and #45067.

The root cause is that it is not the case that all remote events will be
children of the record function remote event, as other events can sometimes be
profiled under the hood such as the issue described in
#43868.

We fix this issue by verifying that the set of events that are children on the
remote end and children on the local end are the same, without necessarily
enforcing specific events to be logged.

Tested by running the test 1000+ times and verifying it passed. Will also test on CI box before landing

Differential Revision: [D28166602](https://our.internmc.facebook.com/intern/diff/D28166602/)

ghstack-source-id: 128017866
Pull Request resolved: #57517
Closes #45145
Closes #45067

Fixes the flaky tests #45145
and #45067.

The root cause is that it is not the case that all remote events will be
children of the record function remote event, as other events can sometimes be
profiled under the hood such as the issue described in
#43868.

We fix this issue by verifying that the set of events that are children on the
remote end and children on the local end are the same, without necessarily
enforcing specific events to be logged.

Tested by running the test 1000+ times and verifying it passed. Will also test on CI box before landing

Differential Revision: [D28166602](https://our.internmc.facebook.com/intern/diff/D28166602/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 4, 2021
Pull Request resolved: #57517

Fixes the flaky tests #45145
and #45067.

The root cause is that it is not the case that all remote events will be
children of the record function remote event, as other events can sometimes be
profiled under the hood such as the issue described in
#43868.

We fix this issue by verifying that the set of events that are children on the
remote end and children on the local end are the same, without necessarily
enforcing specific events to be logged.

Tested by running the test 1000+ times and verifying it passed. Will also test on CI box before landing
ghstack-source-id: 128020820

Differential Revision: [D28166602](https://our.internmc.facebook.com/intern/diff/D28166602/)
Closes #45145
Closes #45067

Fixes the flaky tests #45145
and #45067.

The root cause is that it is not the case that all remote events will be
children of the record function remote event, as other events can sometimes be
profiled under the hood such as the issue described in
#43868.

We fix this issue by verifying that the set of events that are children on the
remote end and children on the local end are the same, without necessarily
enforcing specific events to be logged.

Tested by running the test 1000+ times and verifying it passed. Will also test on CI box before landing

Differential Revision: [D28166602](https://our.internmc.facebook.com/intern/diff/D28166602/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 5, 2021
Pull Request resolved: #57517

Fixes the flaky tests #45145
and #45067.

The root cause is that it is not the case that all remote events will be
children of the record function remote event, as other events can sometimes be
profiled under the hood such as the issue described in
#43868.

We fix this issue by verifying that the set of events that are children on the
remote end and children on the local end are the same, without necessarily
enforcing specific events to be logged.

Tested by running the test 1000+ times and verifying it passed. Will also test on CI box before landing
ghstack-source-id: 128200041

Differential Revision: [D28166602](https://our.internmc.facebook.com/intern/diff/D28166602/)
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 69e64b2.

@facebook-github-bot facebook-github-bot deleted the gh/rohan-varma/307/head branch May 9, 2021 14:17
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary:
Pull Request resolved: pytorch#57517

Fixes the flaky tests pytorch#45145
and pytorch#45067.

The root cause is that it is not the case that all remote events will be
children of the record function remote event, as other events can sometimes be
profiled under the hood such as the issue described in
pytorch#43868.

We fix this issue by verifying that the set of events that are children on the
remote end and children on the local end are the same, without necessarily
enforcing specific events to be logged.

Tested by running the test 1000+ times and verifying it passed. Will also test on CI box before landing
ghstack-source-id: 128200041

Test Plan: CI

Reviewed By: pritamdamania87

Differential Revision: D28166602

fbshipit-source-id: 8145857da4642aef31f360b20db00f4328abe2ca
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants