-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[NVGPU] Fix nvdsl examples #156830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[NVGPU] Fix nvdsl examples #156830
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
Thanks for bringing this to our attention! I looked into this a bit, and it does look like this crash is occurring in the @durga4github @abhilash1910 Do you have any idea why this might be happening? |
Taking a look at the codegen. Thanks for highlighting. IR does not seem incorrect though at first glance. |
Context: Highlighted from #156830 , this is an Isel lowering issue in the NVPTX backend for prefetch.tensormap intrinsic. It is caused by unchecked pattern rewrite during infer-address-space pass. This intrinsic is valid only for const, param and generic address-spaces. Any other address space is invalid. Currently, this intrinsic gets falsely re-written to target AS(1), when the pointer-argument of the intrinsic comes as an argument of a kernel function. So, this patch adds a check on the correct address-spaces before re-writing them. cc @durga4github FYI: @Wolfram70 @rupprecht @castigli
Context: Highlighted from llvm#156830 , this is an Isel lowering issue in the NVPTX backend for prefetch.tensormap intrinsic. It is caused by unchecked pattern rewrite during infer-address-space pass. This intrinsic is valid only for const, param and generic address-spaces. Any other address space is invalid. Currently, this intrinsic gets falsely re-written to target AS(1), when the pointer-argument of the intrinsic comes as an argument of a kernel function. So, this patch adds a check on the correct address-spaces before re-writing them. cc @durga4github FYI: @Wolfram70 @rupprecht @castigli
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-mlir-gpu Author: Giacomo Castiglioni (castigli) ChangesThis PR aims at fixing the nvdsl examples which got a bit out of sync not being tested in the CI. The fixed bugs were related to the following PRs:
There is one remaining bug that I think #153134 introduced. When running the Ch4 and Ch5 the nvvm.prefetch tensormap intrisic leads to the following error on sm_90a LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.prefetch.tensormap
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: mlir-opt before.mlir --gpu-module-to-binary
1. Running pass 'Function Pass Manager' on module 'LLVMDialectModule'.
2. Running pass 'NVPTX DAG->DAG Pattern Instruction Selection' on function '@<!-- -->gemm_multistage_kernel'
... Perahps @Wolfram70 or @grypp could help me out with the last bug? Full diff: https://github.com/llvm/llvm-project/pull/156830.diff 3 Files Affected:
diff --git a/mlir/test/Examples/NVGPU/Ch5.py b/mlir/test/Examples/NVGPU/Ch5.py
index f98cfd758a75f..91c346c837dda 100644
--- a/mlir/test/Examples/NVGPU/Ch5.py
+++ b/mlir/test/Examples/NVGPU/Ch5.py
@@ -156,7 +156,7 @@ def producer_loop(
):
phase = const(True, ty=T.bool())
- for iv, phase in scf.for_(0, (K // TILE_K), 1, [phase]):
+ for iv, phase, _ in scf.for_(0, (K // TILE_K), 1, [phase]):
stage = iv % num_stages
# Wait MMA to be done
mbar_mma[stage].try_wait(phase)
diff --git a/mlir/test/Examples/NVGPU/tools/nvdsl.py b/mlir/test/Examples/NVGPU/tools/nvdsl.py
index 90dbb2355e1c8..d4c50fc9bc28d 100644
--- a/mlir/test/Examples/NVGPU/tools/nvdsl.py
+++ b/mlir/test/Examples/NVGPU/tools/nvdsl.py
@@ -84,8 +84,7 @@ def arrive(self, txcount: int = 0, predicate=None):
self.mbar_group_op, txcount_op, self.id_op, predicate=predicate
)
else:
- nvgpu.mbarrier_arrive(
- ir.Type.parse("!nvgpu.mbarrier.token"), self.mbar_group_op, self.id_op
+ nvgpu.mbarrier_arrive(self.mbar_group_op, self.id_op
)
def try_wait(self, phase: bool = False, ticks: int = 10000000):
@@ -144,7 +143,7 @@ def create_descriptor(self, device_ptr):
device_ptr,
)
self.tma_descriptor = nvgpu.TmaCreateDescriptorOp(
- tma_descriptor_ty, device_unranked_memref, map(const, self.tma_box_shape)
+ tma_descriptor_ty, device_unranked_memref, list(map(const, self.tma_box_shape))
)
return self.tma_descriptor.result
@@ -156,7 +155,7 @@ def load(self, dest, mbarrier: Mbarriers, coords=[0], predicate=None):
dest,
mbarrier.mbar_group_op,
self.tma_descriptor,
- coordinates=map(const, coords),
+ coordinates=list(map(const, coords)),
mbarId=mbarrier.id_op,
predicate=predicate,
)
diff --git a/mlir/test/Examples/NVGPU/tools/nvgpucompiler.py b/mlir/test/Examples/NVGPU/tools/nvgpucompiler.py
index 1c9cc74fcd169..4b661f8df6a9f 100644
--- a/mlir/test/Examples/NVGPU/tools/nvgpucompiler.py
+++ b/mlir/test/Examples/NVGPU/tools/nvgpucompiler.py
@@ -35,9 +35,11 @@ def compile(self, module: ir.Module):
def jit(self, module: ir.Module) -> execution_engine.ExecutionEngine:
"""Wraps the module in a JIT execution engine."""
- return execution_engine.ExecutionEngine(
+ ee = execution_engine.ExecutionEngine(
module, opt_level=self.opt_level, shared_libs=self.shared_libs
)
+ ee.initialize()
+ return ee
def compile_and_jit(self, module: ir.Module) -> execution_engine.ExecutionEngine:
"""Compiles and jits the module."""
|
This PR aims at fixing the nvdsl examples which got a bit out of sync not being tested in the CI.
The fixed bugs were related to the following PRs:
There is one remaining bug that I think #153134 introduced. When running the Ch4 and Ch5 the nvvm.prefetch tensormap intrisic leads to the following error on sm_90a
Perahps @Wolfram70 or @grypp could help me out with the last bug?
Could be the solution to revert momentarily to inline ptx?
[edit] this was resolved in #159253