-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upstream master bump 0513 #77471
Upstream master bump 0513 #77471
Conversation
* Allow cast from Int to Int32 type * Update test_binary_ops with scalar tests * Add integral scalars to optional cast exception list
... and reuse in lower_allocation pass.
fixes the assertion from pytorch#1325 on our devel branch. 1. update alias information after graph mutation 2. patch unsqueeze: i. support negative dimension; ii. fixing range check
Fixing a few smaller issues here and there: Exposing python API to switch single node fusion; Exposing python API to switch horizontal fusion (Needed to avoid PW scheduler failure on fusion with outputs of different shapes/ranks); Adding shape expression short-cut support for native_dropout (Bug reported by AOTAutograd); Fixing device check to avoid fusion of node with inputs on different device. Long term we should have supported this, but disabling it for now to avoid assert. (e.g. scalar cpu tensor can be operated on cuda tensors, feature from TensorIterator).
Summary: Pull Request resolved: pytorch#69964 Things added in this PR that requires review: 1. cuLaunchCooperativeKernel driver API added aten/src/ATen/cuda/detail/LazyNVRTC.cpp aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h nvfuser code update: 1. perf turning on codegen scheduler that improves performance. 2. permutation support has been extended beyond contiguous/channels-last. (The improvements could be observed on PW benchmark) Things reverted from local changes: 1. aten::gelu with approximation 2. local changes that is upstreamed in PR pytorch#68804 Pull Request resolved: pytorch#69428 Reviewed By: ngimel Differential Revision: D33073817 Pulled By: wconstab fbshipit-source-id: e77d32e81d037d7370822b040456fd4c3bd68edb
* Refactor War Sync Insertion Pass (pytorch#1339) * Remove kir::Expr::scope_ (pytorch#1341) * Fusion IR Refactor (pytorch#1343) * Refactor KIR Step 1 - Remove kir::Node (pytorch#1347) * Refactor KIR Step 2 - TMP IrUtils change (pytorch#1348) * Refactor KIR Step 3 - Remove kir::Expr and kir::Val. (pytorch#1349) * Refactor KIR Step 4 - Remove kir::Bool,Double,Int,NamedScalar. (pytorch#1350) * Refactor KIR Step 5 - Remove kir::IterDomain/TensorDomain/TensorView (pytorch#1351) * Refactor KIR Step 6 - Remove kir::UnaryOp/BinaryOp/TernaryOp/ReductionOp/WelfordOp/BroadcastOp. (pytorch#1352) * Refactor KIR Step 7 - Remove kir dispatch (pytorch#1353) * Refactor KIR Step 8 - Clean up lower_utils (pytorch#1355) * Refactor KIR Step 9 - lower_utils ir_utils::applyReplacements. (pytorch#1354) * Refactor KIR Step 10 - Remove kir_printer in favor of io_stream (pytorch#1356)
This PR relaxes the constraint so that arbitrary padding sizes can be used as long as output domains don't get larger than input domains.
* force segment un-connected graphs * derive heuristic on empty groups * add test * lint * handled aliased output in batchnorm * empty tensor * lint and comment * clang format * check reference tv available in pointwise scheduler * comment * cleanup test and check utils
* Have Kernel Inherit IrContainer (pytorch#1375) * Kernel<-Fusion Step 1 - Convert ExprSort to StmtSort (pytorch#1376) * Kernel<-Fusion Step 2 - Mutator refactor (pytorch#1377) * Kernel<-Fusion Step 3 - Debug print for expr_eval and type promotion fix (pytorch#1379) * Kernel<-Fusion Step 4 - Have kernel inherit Fusion (pytorch#1380) * Kernel<-Fusion Step 5 - Move lowering passes into their own files (pytorch#1382) * Kernel<-Fusion Step 6 - Remove kir::IrBuilder (pytorch#1383) * Kernel<-Fusion Step 7 - Remove kir functions from ComputeAtMap (pytorch#1384) * Kernel<-Fusion Step 8 - Clean up [lower/executor] utils (pytorch#1387) * Kernel<-Fusion Step 9 - Remove TensorView::fuserTv (pytorch#1388) * Kernel<-Fusion Step 10 - Remove lowerVal/lowerExpr (pytorch#1389) * Kernel<-Fusion Step 11 - Finish cleaning up kir (pytorch#1390)
Adds TensorView::doubleBuffer(). See the new tests how it is used. For an overview of the lowering algorithm, please see lower_double_buffer.h.
1. extend buildShapeExpression for squeeze_copy/unsqueeze_copy ops. 2. patching broadcastSizes insertion point for buildShapeExpression to avoid graph::copy() linter assert. 3. adding tests 4. supports no-op squeeze (squeezing on dimension that's not size-1) TODO (in follow up PRs): 1. extend buildShapeExpression to view_copy and reshape_copy as well 2. refactor broadcastSizesExpression to allow graceful failure instead of hard assert
@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
2 similar comments
@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@pytorchbot merge this (Initiating merge automatically since Phabricator Diff has merged) |
Merge failed due to list index out of range |
cc @seemethere, do you know anything about this ? it landed internally |
|
@pytorchbot force merge this |
Merge failed due to list index out of range |
Looks like this PR is so massive that our GHF tooling can't actually handle it correctly, investigating on how to unblock |
Hey @jjsjann123. |
Merged manually with the following modifications to diff --git a/.github/scripts/trymerge.py b/.github/scripts/trymerge.py
index f482358a85..067ca14184 100755
--- a/.github/scripts/trymerge.py
+++ b/.github/scripts/trymerge.py
@@ -521,10 +521,12 @@ class GitHubPR:
return authors
def get_committer_login(self, num: int = 0) -> str:
+ return "jjsjann123"
return self._fetch_authors()[num][0]
def get_committer_author(self, num: int = 0) -> str:
- return self._fetch_authors()[num][1]
+ return "jjsjann123 <jiej@nvidia.com>"
+ # return self._fetch_authors()[num][1]
def get_checkrun_conclusions(self) -> Dict[str, str]:
""" Returns list of checkrun / conclusions """ Ran using the following commands:
|
This is the last time we will be merging a PR of 20k lines |
Summary: Updating nvfuser code base. This should fix the indexing issue observed in pytorch/vision#6015. Running tests locally as well. Will update the description here at a later point bypass-github-export-checks Pull Request resolved: #77471 Reviewed By: malfet, seemethere Differential Revision: D36393120 Pulled By: eellison fbshipit-source-id: 876f2d066e8e54b5d076de66ad1811f6970be1c8
Enable NVFuser in OSS. Retry of #77213, because it was breaking torchvision tests. Fix in #77471 has been verified by jjsjann123 Pull Request resolved: #77579 Approved by: https://github.com/eellison, https://github.com/malfet, https://github.com/atalman, https://github.com/seemethere
Enable NVFuser in OSS. Retry of #77213, because it was breaking torchvision tests. Fix in #77471 has been verified by jjsjann123 Pull Request resolved: #77579 Approved by: https://github.com/eellison, https://github.com/malfet, https://github.com/atalman, https://github.com/seemethere
Summary: Enable NVFuser in OSS. Retry of #77213, because it was breaking torchvision tests. Fix in #77471 has been verified by jjsjann123 Pull Request resolved: #77579 Approved by: https://github.com/eellison, https://github.com/malfet, https://github.com/atalman, https://github.com/seemethere Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/38bc10ae25c6fd2f445926fdee148ac19a4a1c08 Reviewed By: seemethere Differential Revision: D36552636 Pulled By: seemethere fbshipit-source-id: 3ee5eb9ad5ee2638ef75105a366d90db54b0b436
cherry-picked from pytorch#77471 fixing MSVC build on lambda bug
Updating nvfuser code base. This should fix the indexing issue observed in pytorch/vision#6015. Running tests locally as well. Will update the description here at a later point @bypass-github-export-checks Pull Request resolved: pytorch/pytorch#77471 Approved by: https://github.com/seemethere, https://github.com/eellison
Updating nvfuser code base. This should fix the indexing issue observed in pytorch/vision#6015. Running tests locally as well. Will update the description here at a later point @bypass-github-export-checks Pull Request resolved: pytorch/pytorch#77471 Approved by: https://github.com/seemethere, https://github.com/eellison
Updating nvfuser code base.
This should fix the indexing issue observed in pytorch/vision#6015.
Running tests locally as well. Will update the description here at a later point
@bypass-github-export-checks