Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ReduceOp] ameliorate custom __eq__ #90088

Closed
wants to merge 2 commits into from
Closed

Conversation

crcrpar
Copy link
Collaborator

@crcrpar crcrpar commented Dec 2, 2022

Improve the completeness of ReduceOp.__eq__.

Should support the equal operator with the first argument of RedOpType and the second of ReduceOp in a follow-up.

Fixes #90072

cc @kwen2501 @pritamdamania87

Needs to achieve the associativity of ReduceOp.__eq__ and RedOpType.__eq__

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 2, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90088

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f50e4ff:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@crcrpar
Copy link
Collaborator Author

crcrpar commented Dec 5, 2022

The failure doesn't look related.

2022-12-02T22:33:34.7404375Z ======================================================================
2022-12-02T22:33:34.7404634Z ERROR [0.004s]: test_dispatch_symbolic_meta_outplace_all_strides_linalg_lu_factor_cuda_float32 (__main__.TestMetaCUDA)
2022-12-02T22:33:34.7404904Z ----------------------------------------------------------------------
2022-12-02T22:33:34.7405040Z Traceback (most recent call last):
2022-12-02T22:33:34.7405393Z   File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2206, in setUp
2022-12-02T22:33:34.7405505Z     set_rng_seed(SEED)
2022-12-02T22:33:34.7405845Z   File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 1346, in set_rng_seed
2022-12-02T22:33:34.7405966Z     torch.manual_seed(seed)
2022-12-02T22:33:34.7406270Z   File "/opt/conda/lib/python3.10/site-packages/torch/random.py", line 40, in manual_seed
2022-12-02T22:33:34.7406408Z     torch.cuda.manual_seed_all(seed)
2022-12-02T22:33:34.7406949Z   File "/opt/conda/lib/python3.10/site-packages/torch/cuda/random.py", line 113, in manual_seed_all
2022-12-02T22:33:34.7407226Z     _lazy_call(cb, seed_all=True)
2022-12-02T22:33:34.7407929Z   File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 176, in _lazy_call
2022-12-02T22:33:34.7408117Z     callable()
2022-12-02T22:33:34.7408646Z   File "/opt/conda/lib/python3.10/site-packages/torch/cuda/random.py", line 111, in cb
2022-12-02T22:33:34.7408902Z     default_generator.manual_seed(seed)
2022-12-02T22:33:34.7409313Z RuntimeError: CUDA error: device-side assert triggered
2022-12-02T22:33:34.7409793Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
2022-12-02T22:33:34.7410097Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-12-02T22:33:34.7410125Z 
2022-12-02T22:33:34.7410592Z ----------------------------------------------------------------------
2022-12-02T22:33:34.7410787Z Ran 17244 tests in 915.139s
2022-12-02T22:33:34.7410812Z 
2022-12-02T22:33:34.7411087Z FAILED (errors=1, skipped=9033, expected failures=3)

Copy link
Contributor

@kwen2501 kwen2501 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix and for adding the tests. LGTM.

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
@crcrpar
Copy link
Collaborator Author

crcrpar commented Dec 6, 2022

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 6, 2022
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@crcrpar crcrpar deleted the reduceop_eq branch December 6, 2022 05:54
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
Improve the completeness of `ReduceOp.__eq__`.

Should support the equal operator with the first argument of `RedOpType` and the second of `ReduceOp` in a follow-up.

Fixes pytorch#90072

Pull Request resolved: pytorch#90088
Approved by: https://github.com/kwen2501
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: distributed (c10d) release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ReduceOP] Type bug since Torch 1.13
4 participants