Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream master bump 0513 #77471

Closed
wants to merge 794 commits into from

Conversation

jjsjann123
Copy link
Collaborator

@jjsjann123 jjsjann123 commented May 14, 2022

Updating nvfuser code base.

This should fix the indexing issue observed in pytorch/vision#6015.

Running tests locally as well. Will update the description here at a later point

@bypass-github-export-checks

csarofeen and others added 30 commits December 15, 2021 12:31
* Allow cast from Int to Int32 type
* Update test_binary_ops with scalar tests
* Add integral scalars to optional cast exception list
fixes the assertion from pytorch#1325 on our devel branch.

1. update alias information after graph mutation
2. patch unsqueeze: i. support negative dimension; ii. fixing range check
Fixing a few smaller issues here and there:

Exposing python API to switch single node fusion;
Exposing python API to switch horizontal fusion (Needed to avoid PW scheduler failure on fusion with outputs of different shapes/ranks);
Adding shape expression short-cut support for native_dropout (Bug reported by AOTAutograd);
Fixing device check to avoid fusion of node with inputs on different device. Long term we should have supported this, but disabling it for now to avoid assert. (e.g. scalar cpu tensor can be operated on cuda tensors, feature from TensorIterator).
Summary:
Pull Request resolved: pytorch#69964

Things added in this PR that requires review:
1. cuLaunchCooperativeKernel driver API added
aten/src/ATen/cuda/detail/LazyNVRTC.cpp
aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h

nvfuser code update:
1. perf turning on codegen scheduler that improves performance.
2. permutation support has been extended beyond contiguous/channels-last. (The improvements could be observed on PW benchmark)

Things reverted from local changes:
1. aten::gelu with approximation
2. local changes that is upstreamed in PR pytorch#68804

Pull Request resolved: pytorch#69428

Reviewed By: ngimel

Differential Revision: D33073817

Pulled By: wconstab

fbshipit-source-id: e77d32e81d037d7370822b040456fd4c3bd68edb
* Refactor War Sync Insertion Pass (pytorch#1339)
* Remove kir::Expr::scope_ (pytorch#1341)
* Fusion IR Refactor (pytorch#1343)
* Refactor KIR Step 1 - Remove kir::Node (pytorch#1347)
* Refactor KIR Step 2 - TMP IrUtils change (pytorch#1348)
* Refactor KIR Step 3 - Remove kir::Expr and kir::Val. (pytorch#1349)
* Refactor KIR Step 4 - Remove kir::Bool,Double,Int,NamedScalar. (pytorch#1350)
* Refactor KIR Step 5 - Remove kir::IterDomain/TensorDomain/TensorView (pytorch#1351)
* Refactor KIR Step 6 - Remove 
 kir::UnaryOp/BinaryOp/TernaryOp/ReductionOp/WelfordOp/BroadcastOp. (pytorch#1352)
* Refactor KIR Step 7 - Remove kir dispatch (pytorch#1353)
* Refactor KIR Step 8 - Clean up lower_utils (pytorch#1355)
* Refactor KIR Step 9 - lower_utils ir_utils::applyReplacements. (pytorch#1354)
* Refactor KIR Step 10 - Remove kir_printer in favor of io_stream (pytorch#1356)
 This PR relaxes the constraint so that arbitrary padding sizes can be used as long as output domains don't get larger than input domains.
)

* Implement alias_copy operations only for CudaFusionGroup to support fallback path
* Remove alias (a) annotation from alias_copy schema
* force segment un-connected graphs

* derive heuristic on empty groups

* add test

* lint

* handled aliased output in batchnorm

* empty tensor

* lint and comment

* clang format

* check reference tv available in pointwise scheduler

* comment

* cleanup test and check utils
* Have Kernel Inherit IrContainer (pytorch#1375)
* Kernel<-Fusion Step 1 - Convert ExprSort to StmtSort (pytorch#1376)
* Kernel<-Fusion Step 2 - Mutator refactor (pytorch#1377)
* Kernel<-Fusion Step 3 - Debug print for expr_eval and type promotion fix (pytorch#1379)
* Kernel<-Fusion Step 4 - Have kernel inherit Fusion (pytorch#1380)
* Kernel<-Fusion Step 5 - Move lowering passes into their own files (pytorch#1382)
* Kernel<-Fusion Step 6 - Remove kir::IrBuilder (pytorch#1383)
* Kernel<-Fusion Step 7 - Remove kir functions from ComputeAtMap (pytorch#1384)
* Kernel<-Fusion Step 8 - Clean up [lower/executor] utils (pytorch#1387)
* Kernel<-Fusion Step 9 - Remove TensorView::fuserTv (pytorch#1388)
* Kernel<-Fusion Step 10 - Remove lowerVal/lowerExpr (pytorch#1389)
* Kernel<-Fusion Step 11 - Finish cleaning up kir (pytorch#1390)
Adds TensorView::doubleBuffer(). See the new tests how it is used.

For an overview of the lowering algorithm, please see lower_double_buffer.h.
1. extend buildShapeExpression for squeeze_copy/unsqueeze_copy ops.
2. patching broadcastSizes insertion point for buildShapeExpression to avoid graph::copy() linter assert.
3. adding tests
4. supports no-op squeeze (squeezing on dimension that's not size-1)

TODO (in follow up PRs):
1. extend buildShapeExpression to view_copy and reshape_copy as well
2. refactor broadcastSizesExpression to allow graceful failure instead of hard assert
@facebook-github-bot
Copy link
Contributor

@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

2 similar comments
@facebook-github-bot
Copy link
Contributor

@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge this

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge failed due to list index out of range
Raised by https://github.com/pytorch/pytorch/actions/runs/2347398630

@eellison
Copy link
Contributor

Merge failed due to list index out of range
Raised by https://github.com/pytorch/pytorch/actions/runs/2347398630

cc @seemethere, do you know anything about this ? it landed internally

@eellison
Copy link
Contributor

  File ".github/scripts/trymerge.py", line 923, in main
    pr.merge_into(repo, dry_run=args.dry_run, force=args.force, comment_id=args.comment_id)
  File ".github/scripts/trymerge.py", line 695, in merge_into
    repo._run_git("commit", f"--author=\"{self.get_author()}\"", "-m", msg)
  File ".github/scripts/trymerge.py", line 571, in get_author
    authors = self.get_authors()
  File ".github/scripts/trymerge.py", line 566, in get_authors
    rc[self.get_committer_login(idx)] = self.get_committer_author(idx)
  File ".github/scripts/trymerge.py", line 527, in get_committer_author
    return self._fetch_authors()[num][1]

@seemethere
Copy link
Member

@pytorchbot force merge this

@pytorchmergebot
Copy link
Collaborator

Merge failed due to list index out of range
Raised by https://github.com/pytorch/pytorch/actions/runs/2347479349

@seemethere
Copy link
Member

Looks like this PR is so massive that our GHF tooling can't actually handle it correctly, investigating on how to unblock

@github-actions
Copy link

Hey @jjsjann123.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@seemethere
Copy link
Member

seemethere commented May 18, 2022

Merged manually with the following modifications to trymerge.py:

diff --git a/.github/scripts/trymerge.py b/.github/scripts/trymerge.py
index f482358a85..067ca14184 100755
--- a/.github/scripts/trymerge.py
+++ b/.github/scripts/trymerge.py
@@ -521,10 +521,12 @@ class GitHubPR:
         return authors

     def get_committer_login(self, num: int = 0) -> str:
+        return "jjsjann123"
         return self._fetch_authors()[num][0]

     def get_committer_author(self, num: int = 0) -> str:
-        return self._fetch_authors()[num][1]
+        return "jjsjann123 <jiej@nvidia.com>"
+        # return self._fetch_authors()[num][1]

     def get_checkrun_conclusions(self) -> Dict[str, str]:
         """ Returns list of checkrun / conclusions """

Ran using the following commands:

python3 .github/scripts/trymerge.py --dry-run 77471
git push

@seemethere
Copy link
Member

This is the last time we will be merging a PR of 20k lines

facebook-github-bot pushed a commit that referenced this pull request May 18, 2022
Summary:
Updating nvfuser code base.

This should fix the indexing issue observed in pytorch/vision#6015.

Running tests locally as well. Will update the description here at a later point

bypass-github-export-checks

Pull Request resolved: #77471

Reviewed By: malfet, seemethere

Differential Revision: D36393120

Pulled By: eellison

fbshipit-source-id: 876f2d066e8e54b5d076de66ad1811f6970be1c8
pytorchmergebot pushed a commit that referenced this pull request May 20, 2022
Enable NVFuser in OSS.

Retry of #77213, because it was breaking torchvision tests.

Fix in #77471 has been verified by jjsjann123

Pull Request resolved: #77579

Approved by: https://github.com/eellison, https://github.com/malfet, https://github.com/atalman, https://github.com/seemethere
atalman pushed a commit that referenced this pull request May 20, 2022
Enable NVFuser in OSS.

Retry of #77213, because it was breaking torchvision tests.

Fix in #77471 has been verified by jjsjann123

Pull Request resolved: #77579

Approved by: https://github.com/eellison, https://github.com/malfet, https://github.com/atalman, https://github.com/seemethere
facebook-github-bot pushed a commit that referenced this pull request May 23, 2022
Summary:
Enable NVFuser in OSS.

Retry of #77213, because it was breaking torchvision tests.

Fix in #77471 has been verified by jjsjann123

Pull Request resolved: #77579

Approved by: https://github.com/eellison, https://github.com/malfet, https://github.com/atalman, https://github.com/seemethere

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/38bc10ae25c6fd2f445926fdee148ac19a4a1c08

Reviewed By: seemethere

Differential Revision: D36552636

Pulled By: seemethere

fbshipit-source-id: 3ee5eb9ad5ee2638ef75105a366d90db54b0b436
jjsjann123 added a commit to csarofeen/pytorch that referenced this pull request May 25, 2022
cherry-picked from pytorch#77471
fixing MSVC build on lambda bug
@jjsjann123 jjsjann123 deleted the upstream_master_bump_0513 branch May 26, 2022 23:23
jjsjann123 added a commit to jjsjann123/nvfuser that referenced this pull request Oct 29, 2022
Updating nvfuser code base.

This should fix the indexing issue observed in pytorch/vision#6015.

Running tests locally as well. Will update the description here at a later point

@bypass-github-export-checks
Pull Request resolved: pytorch/pytorch#77471
Approved by: https://github.com/seemethere, https://github.com/eellison
jjsjann123 added a commit to jjsjann123/nvfuser that referenced this pull request Nov 10, 2022
Updating nvfuser code base.

This should fix the indexing issue observed in pytorch/vision#6015.

Running tests locally as well. Will update the description here at a later point

@bypass-github-export-checks
Pull Request resolved: pytorch/pytorch#77471
Approved by: https://github.com/seemethere, https://github.com/eellison
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request cla signed oncall: jit Add this issue/PR to JIT oncall triage queue open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module with-ssh
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet