[mlir][sparse] Calling mlir-opt with sparsifier's dumped pass pipeline for GPU codegen does not emit GPU code #91774

ggeorgakoudis · 2024-05-10T17:33:21Z

First call mlir-opt for the sparsifier to generate GPU code and dump the pass pipeline (I'm omitting the mlir input file, reproducers contain it):

mlir-opt --sparsifier="enable-runtime-library=false parallelization-strategy=dense-outer-loop gpu-triple=nvptx64-nvidia-cuda gpu-chip=sm_80 gpu-features=+ptx71 gpu-format=llvm" --dump-pass-pipeline

Reproducer: https://godbolt.org/z/7cP8z1qcY

Then use the dumped pipeline directly in mlir-opt:

mlir-opt -pass-pipeline="builtin.module(func.func(linalg-generalize-named-ops),func.func(linalg-fuse-elementwise-ops),sparsification-and-bufferization,sparse-storage-specifier-to-llvm,func.func(canonicalize{  max-iterations=10 max-num-rewrites=-1 region-simplify=true test-convergence=false top-down=true}),func.func(finalizing-bufferize),sparse-gpu-codegen{enable-runtime-library=true num-threads=1024},gpu.module(strip-debuginfo),gpu.module(convert-scf-to-cf),gpu.module(convert-gpu-to-nvvm{has-redux=false index-bitwidth=0 use-bare-ptr-memref-call-conv=false}),func.func(convert-linalg-to-loops),func.func(convert-vector-to-scf{full-unroll=false lower-tensors=false target-rank=1}),func.func(expand-realloc{emit-deallocs=true}),func.func(convert-scf-to-cf),expand-strided-metadata,lower-affine,convert-vector-to-llvm{enable-amx=false enable-arm-neon=false enable-arm-sve=false enable-x86vector=false force-32bit-vector-indices=true reassociate-fp-reductions=false},finalize-memref-to-llvm{index-bitwidth=0 use-aligned-alloc=false use-generic-functions=false},func.func(convert-complex-to-standard),func.func(arith-expand{include-bf16=false}),func.func(convert-math-to-llvm{approximate-log1p=true}),convert-math-to-libm,convert-complex-to-libm,convert-vector-to-llvm{enable-amx=false enable-arm-neon=false enable-arm-sve=false enable-x86vector=false force-32bit-vector-indices=true reassociate-fp-reductions=false},convert-complex-to-llvm,convert-vector-to-llvm{enable-amx=false enable-arm-neon=false enable-arm-sve=false enable-x86vector=false force-32bit-vector-indices=true reassociate-fp-reductions=false},convert-func-to-llvm{index-bitwidth=0 use-bare-ptr-memref-call-conv=false},nvvm-attach-target{O=2 chip=sm_80 fast=false features=+ptx71 ftz=false  module= triple=nvptx64-nvidia-cuda},gpu-to-llvm{gpu-binary-annotation=gpu.binary use-bare-pointers-for-host=false use-bare-pointers-for-kernels=false},gpu-module-to-binary{format=llvm  opts= toolkit=},reconcile-unrealized-casts)"

Reproducer: https://godbolt.org/z/1zz64j895

Expected to see to same the same GPU codegen as with callng the sparsifier but output does not contain GPU code

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-05-10T17:33:36Z

@llvm/issue-subscribers-mlir-sparse

Author: Giorgis Georgakoudis (ggeorgakoudis)

First call `mlir-opt` for the sparsifier to generate GPU code and dump the pass pipeline (I'm omitting the mlir input file, reproducers contain it): ``` mlir-opt --sparsifier="enable-runtime-library=false parallelization-strategy=dense-outer-loop gpu-triple=nvptx64-nvidia-cuda gpu-chip=sm_80 gpu-features=+ptx71 gpu-format=llvm" --dump-pass-pipeline ``` Reproducer: https://godbolt.org/z/7cP8z1qcY

Then use the dumped pipeline directly in mlir-opt:

mlir-opt -pass-pipeline="builtin.module(func.func(linalg-generalize-named-ops),func.func(linalg-fuse-elementwise-ops),sparsification-and-bufferization,sparse-storage-specifier-to-llvm,func.func(canonicalize{  max-iterations=10 max-num-rewrites=-1 region-simplify=true test-convergence=false top-down=true}),func.func(finalizing-bufferize),sparse-gpu-codegen{enable-runtime-library=true num-threads=1024},gpu.module(strip-debuginfo),gpu.module(convert-scf-to-cf),gpu.module(convert-gpu-to-nvvm{has-redux=false index-bitwidth=0 use-bare-ptr-memref-call-conv=false}),func.func(convert-linalg-to-loops),func.func(convert-vector-to-scf{full-unroll=false lower-tensors=false target-rank=1}),func.func(expand-realloc{emit-deallocs=true}),func.func(convert-scf-to-cf),expand-strided-metadata,lower-affine,convert-vector-to-llvm{enable-amx=false enable-arm-neon=false enable-arm-sve=false enable-x86vector=false force-32bit-vector-indices=true reassociate-fp-reductions=false},finalize-memref-to-llvm{index-bitwidth=0 use-aligned-alloc=false use-generic-functions=false},func.func(convert-complex-to-standard),func.func(arith-expand{include-bf16=false}),func.func(convert-math-to-llvm{approximate-log1p=true}),convert-math-to-libm,convert-complex-to-libm,convert-vector-to-llvm{enable-amx=false enable-arm-neon=false enable-arm-sve=false enable-x86vector=false force-32bit-vector-indices=true reassociate-fp-reductions=false},convert-complex-to-llvm,convert-vector-to-llvm{enable-amx=false enable-arm-neon=false enable-arm-sve=false enable-x86vector=false force-32bit-vector-indices=true reassociate-fp-reductions=false},convert-func-to-llvm{index-bitwidth=0 use-bare-ptr-memref-call-conv=false},nvvm-attach-target{O=2 chip=sm_80 fast=false features=+ptx71 ftz=false  module= triple=nvptx64-nvidia-cuda},gpu-to-llvm{gpu-binary-annotation=gpu.binary use-bare-pointers-for-host=false use-bare-pointers-for-kernels=false},gpu-module-to-binary{format=llvm  opts= toolkit=},reconcile-unrealized-casts)"

Reproducer: https://godbolt.org/z/1zz64j895

Expected to see to same the same GPU codegen as with callng the sparsifier but output does not contain GPU code

llvmbot · 2024-05-10T17:33:36Z

@llvm/issue-subscribers-mlir-gpu

Author: Giorgis Georgakoudis (ggeorgakoudis)

First call `mlir-opt` for the sparsifier to generate GPU code and dump the pass pipeline (I'm omitting the mlir input file, reproducers contain it): ``` mlir-opt --sparsifier="enable-runtime-library=false parallelization-strategy=dense-outer-loop gpu-triple=nvptx64-nvidia-cuda gpu-chip=sm_80 gpu-features=+ptx71 gpu-format=llvm" --dump-pass-pipeline ``` Reproducer: https://godbolt.org/z/7cP8z1qcY

Then use the dumped pipeline directly in mlir-opt:

mlir-opt -pass-pipeline="builtin.module(func.func(linalg-generalize-named-ops),func.func(linalg-fuse-elementwise-ops),sparsification-and-bufferization,sparse-storage-specifier-to-llvm,func.func(canonicalize{  max-iterations=10 max-num-rewrites=-1 region-simplify=true test-convergence=false top-down=true}),func.func(finalizing-bufferize),sparse-gpu-codegen{enable-runtime-library=true num-threads=1024},gpu.module(strip-debuginfo),gpu.module(convert-scf-to-cf),gpu.module(convert-gpu-to-nvvm{has-redux=false index-bitwidth=0 use-bare-ptr-memref-call-conv=false}),func.func(convert-linalg-to-loops),func.func(convert-vector-to-scf{full-unroll=false lower-tensors=false target-rank=1}),func.func(expand-realloc{emit-deallocs=true}),func.func(convert-scf-to-cf),expand-strided-metadata,lower-affine,convert-vector-to-llvm{enable-amx=false enable-arm-neon=false enable-arm-sve=false enable-x86vector=false force-32bit-vector-indices=true reassociate-fp-reductions=false},finalize-memref-to-llvm{index-bitwidth=0 use-aligned-alloc=false use-generic-functions=false},func.func(convert-complex-to-standard),func.func(arith-expand{include-bf16=false}),func.func(convert-math-to-llvm{approximate-log1p=true}),convert-math-to-libm,convert-complex-to-libm,convert-vector-to-llvm{enable-amx=false enable-arm-neon=false enable-arm-sve=false enable-x86vector=false force-32bit-vector-indices=true reassociate-fp-reductions=false},convert-complex-to-llvm,convert-vector-to-llvm{enable-amx=false enable-arm-neon=false enable-arm-sve=false enable-x86vector=false force-32bit-vector-indices=true reassociate-fp-reductions=false},convert-func-to-llvm{index-bitwidth=0 use-bare-ptr-memref-call-conv=false},nvvm-attach-target{O=2 chip=sm_80 fast=false features=+ptx71 ftz=false  module= triple=nvptx64-nvidia-cuda},gpu-to-llvm{gpu-binary-annotation=gpu.binary use-bare-pointers-for-host=false use-bare-pointers-for-kernels=false},gpu-module-to-binary{format=llvm  opts= toolkit=},reconcile-unrealized-casts)"

Reproducer: https://godbolt.org/z/1zz64j895

Expected to see to same the same GPU codegen as with callng the sparsifier but output does not contain GPU code

ggeorgakoudis added mlir:gpu mlir:sparse Sparse compiler in MLIR labels May 10, 2024

aartbik assigned PeimingLiu and aartbik May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mlir][sparse] Calling mlir-opt with sparsifier's dumped pass pipeline for GPU codegen does not emit GPU code #91774

[mlir][sparse] Calling mlir-opt with sparsifier's dumped pass pipeline for GPU codegen does not emit GPU code #91774

ggeorgakoudis commented May 10, 2024

llvmbot commented May 10, 2024

llvmbot commented May 10, 2024

[mlir][sparse] Calling mlir-opt with sparsifier's dumped pass pipeline for GPU codegen does not emit GPU code #91774

[mlir][sparse] Calling mlir-opt with sparsifier's dumped pass pipeline for GPU codegen does not emit GPU code #91774

Comments

ggeorgakoudis commented May 10, 2024

llvmbot commented May 10, 2024

llvmbot commented May 10, 2024