-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge OpenAI Triton commit b0f8332
#1725
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This makes its interface more similar to `do_bench`, making it easier to switch between the two.
… (#4401) This would silently do nothing right now. Having this error prevents this behavior.
…… (#4187) … instead of being a separate class
….if as live (#4404) Summary: When scf.if is marked as live in ForOpDeadArgElim, we should mark its condition as live too. Without this fix, with the test case added in this patch, the scf.if will be removed.
The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, **if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description.** Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. - [x] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [x] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [x] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)
PyTorch 2.4 was officially released on July 24, 2024: https://pytorch.org/blog/pytorch2-4/. This makes sure we pick up the latest features in PyTorch in the CI.
This PR first promotes common infrastructure in `lib/Dialect/TritonGPU/Transforms/Pipeliner` to enable inclusion by other target backends. No other changes have been made to the lib/include directories. Second, the `tritonamdgpu-stream-pipeline` pass has been completely revamped based on code from `lib/Dialect/TritonGPU/Transforms/Pipeliner/MatmulLoopPipeline.cpp` using similar scheduling passes to compute multi-stage pipelines. Some of this code could be consolidated further in the CoarseSchedule class (or perhaps a derived LoopScheduler class). This modulo scheduler collects `tt.load` ops and generates local_storage and management ops for the ramp-up stage (stage-0), then collecting all uses of the loads for stage-1. Multi-buffering is introduced when num_stages exceeds the max distance between load and uses. Buffering may be in Shared memory for `tt.dot` uses or Registers for all other uses. This current implement does not support peeling the last iteration if the loop is dynamic. Lastly, the `tritonamdgpu-reorder-instructions` pass has been enhanced to move `tt.load` ops as early as possible in its region. This includes loop bodies as well as func entry blocks for the case of ramp-up. This pass will also move `triton_gpu.local_store` ops as early as possible if their source is not directly from a `tt.load`. In this way, a multi-buffered pipeline will overlap in this order: 1. tt.load buffer+2 2. tg.local_store buffer+1 3. tt.dot buffer+0 --------- Co-authored-by: Lei Zhang <antiagainst@gmail.com>
…408) This commit changes the AccelerateAMDMatmul pass to use common target utils to avoid duplication. While here, cleaned up `using` namespaces and symbols.
I'm adding Windows support to XLA and this PR updates `f2reduce.cpp` so that it compiles successfully on Windows.
Fixed FLOPS viewer (which wasn't showing before since flops have a width) Co-authored-by: Jokeren <robinho364@gmail.com>
…ersion (#4383) This is the first PR that replace the old distributed->distributed layout conversion using linear layout. We tried to match the original conversion mechanism as much as possible for now, but will try to improve its memory usage, reduce bank conflicts, and promote generalizability. There are a list of TODOs after this PR: 1. Remove the old code 2. Implement conversion within warps 3. Implement DotOpLayout conversion 4. Avoid bank conflicts using swizzling instead of padding 5. Update comments/revisit barriers for reduce/atomic operations --------- Co-authored-by: Justin Lebar <justin.lebar@gmail.com>
chengjunlu
approved these changes
Jul 30, 2024
whitneywhtsang
changed the title
Merge OpenAI Triton commit
Merge OpenAI Triton commit Jul 30, 2024
7b617bc
b0f8332
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR change the Triton base from a51de76 to b0f8332 (Jul 29).
Pass rate: 98.49%
Please do not squash and merge this PR.