Skip to content

Conversation

@ivanradanov
Copy link
Collaborator

These options enabled me to cross compile an aarch64 object file which I was then able to link and run natively on aarch64.

@ivanradanov ivanradanov requested a review from wsmoses February 3, 2022 02:20
Copy link
Member

@wsmoses wsmoses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, add a test that the llvm output is the correct arch?

@ivanradanov
Copy link
Collaborator Author

It actually never seems to be the correct arch when compiling a .cu file, at least on my end, a bug perhaps?

(cmd)$ cat test-no-kernel.cu; /scr0/ivan/src/Polygeist/build//bin/mlir-clang --function=* --cuda-lower --cpuify="distribute" -resource-dir=/scr0/ivan/src/Polygeist/mlir-build//lib/clang/14.0.0/ --cuda-gpu-arch=sm_60 --cuda-path=/opt/cuda-10.2/ -c test-no-kernel.cu -o test.ll -emit-llvm -S; cat test.ll

int main() {
        return 0;
}
; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"
target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

declare i8* @malloc(i64)

declare void @free(i8*)

define i32 @main() !dbg !3 {
  ret i32 0
}

!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!2}

!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "mlir", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
!1 = !DIFile(filename: "LLVMDialectModule", directory: "/")
!2 = !{i32 2, !"Debug Info Version", i32 3}
!3 = distinct !DISubprogram(name: "main", linkageName: "main", scope: null, file: !4, line: 2, type: !5, scopeLine: 2, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !6)
!4 = !DIFile(filename: "test-no-kernel.cu", directory: "/home/ivan/src/rodinia/cuda/bfs")
!5 = !DISubroutineType(types: !6)
!6 = !{}

The generated module is with target triple = "nvptx64-nvidia-cuda"
Or is this intended?

When the file is .cpp it works as expected:

(ins)$ cat test.cpp; /scr0/ivan/src/Polygeist/build//bin/mlir-clang --function=* --cuda-lower --cpuify="distribute" -resource-dir=/scr0/ivan/src/Polygeist/mlir-build//lib/clang/14.0.0/ --cuda-gpu-arch=sm_60 --cuda-path=/opt/cuda-10.2/ -c test.cpp -o test.ll -emit-llvm -S -target aarch64-unknown-linux-gnu -mcpu=a64fx; cat test.ll

int main() {
        return 0;
}
warning: argument unused during compilation: '--cuda-gpu-arch=sm_60'
; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

declare i8* @malloc(i64)

declare void @free(i8*)

define i32 @main() !dbg !3 {
  ret i32 0
}
(ins)$ cat test.cpp; /scr0/ivan/src/Polygeist/build//bin/mlir-clang --function=* --cuda-lower --cpuify="distribute" -resource-dir=/scr0/ivan/src/Polygeist/mlir-build//lib/clang/14.0.0/ --cuda-gpu-arch=sm_60 --cuda-path=/opt/cuda-10.2/ -c test.cpp -o test.ll -emit-llvm -S; cat test.ll

int main() {
        return 0;
}
warning: argument unused during compilation: '--cuda-gpu-arch=sm_60'
; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

declare i8* @malloc(i64)

declare void @free(i8*)

define i32 @main() !dbg !3 {
  ret i32 0
}

!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!2}

!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "mlir", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
!1 = !DIFile(filename: "LLVMDialectModule", directory: "/")
!2 = !{i32 2, !"Debug Info Version", i32 3}
!3 = distinct !DISubprogram(name: "main", linkageName: "main", scope: null, file: !4, line: 2, type: !5, scopeLine: 2, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !6)
!4 = !DIFile(filename: "test.cpp", directory: "/home/ivan/src/rodinia/cuda/bfs")
!5 = !DISubroutineType(types: !6)
!6 = !{}

@wsmoses
Copy link
Member

wsmoses commented Feb 3, 2022

So part of that is that when compiling modules of two types it uses the one that was compiled first. I'd probably override it if it is set (and in the cuda lower case drop that after merged).

@ivanradanov
Copy link
Collaborator Author

ivanradanov commented Feb 8, 2022

I made it so that the triple and data layout do not get overridden when the compilation job target is nvptx* which is when the device code is compiled, unless that is the only type of compilation job we have. Does this work?

@wsmoses
Copy link
Member

wsmoses commented Feb 8, 2022

Seems reasonable to me

@wsmoses wsmoses merged commit dce289f into llvm:main Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants