New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid PTX error when using NVPTX `syncthreads` intrinsic #54115

Closed
bheisler opened this Issue Sep 11, 2018 · 2 comments

Comments

Projects
None yet
3 participants
@bheisler
Copy link

bheisler commented Sep 11, 2018

If a library references the syncthreads intrinsic using the NVPTX LLVM backend, rustc generates an invalid PTX file. Basic example:

test.rs:

#![feature(lang_items)]
#![feature(no_core, platform_intrinsics)]
#![crate_type = "lib"]
#![no_core]

#[lang = "copy"]
trait Copy {}

#[lang = "freeze"]
trait Freeze {}

#[lang = "sized"]
trait Sized {}

extern "platform-intrinsic" {
    pub fn nvptx_syncthreads();
}

#[no_mangle]
pub unsafe fn foo() {
    nvptx_syncthreads();
}

nvptx64-nvidia-cuda.json:

{
    "arch": "nvptx64",
    "cpu": "sm_20",
    "data-layout": "e-i64:64-v16:16-v32:32-n16:32:64",
    "linker": "ptx-linker",
    "linker-flavor": "ld",
    "linker-is-gnu": true,
    "dll-prefix": "",
    "dll-suffix": ".ptx",
    "dynamic-linking": true,
    "llvm-target": "nvptx64-nvidia-cuda",
    "max-atomic-width": 0,
    "os": "cuda",
    "obj-is-bitcode": true,
    "panic-strategy": "abort",
    "target-endian": "little",
    "target-pointer-width": "64",
    "target-c-int-width": "32"
}

Command:
rustc --target nvptx64-nvidia-cuda test.rs -O --emit=asm

This produces the following (invalid) PTX assembly file:

//
// Generated by LLVM NVPTX Back-End
//

.version 3.2
.target sm_20
.address_size 64

        // .globl       foo
.extern .func llvm.cuda.syncthreads
()
;

.visible .func foo()
{


        { // callseq 0, 0
        .reg .b32 temp_param_reg;
        call.uni
        llvm.cuda.syncthreads,
        (
        );
        } // callseq 0
        ret;

}

Notice how the file is treating llvm.cuda.syncthreads as an external function? I believe that should be a single instruction (though I'm not sure which instruction).

See also denzp/rust-ptx-linker#19

@peterhj

This comment has been minimized.

Copy link
Contributor

peterhj commented Nov 6, 2018

For some LLVM versions now, llvm.cuda.syncthreads has been renamed to llvm.nvvm.barrier0. On nightly, the following Rust source:

#![feature(lang_items)]
#![feature(no_core)]
#![feature(link_llvm_intrinsics)]
#![crate_type = "lib"]
#![no_core]

#[lang = "copy"]
trait Copy {}

#[lang = "freeze"]
trait Freeze {}

#[lang = "sized"]
trait Sized {}

#[allow(improper_ctypes)]
extern "C" {
    #[link_name = "llvm.nvvm.barrier0"]
    fn syncthreads() -> ();
}

#[no_mangle]
pub unsafe fn foo() {
    syncthreads();
}

should, compiled using your target json and command, yield the following assembly:

//
// Generated by LLVM NVPTX Back-End
//

.version 3.2
.target sm_20
.address_size 64

	// .globl	foo

.visible .func foo()
{


	bar.sync 	0;
	ret;

}

Also note that I just submitted a PR w/ this change to https://github.com/rust-lang-nursery/stdsimd, so if core works with the NVPTX backend (it does now, right?) then I'd prefer to use the syncthreads from core::arch instead of the platform-intrinsic which seems to be sort-of-deprecated.

@jonas-schievink

This comment has been minimized.

Copy link
Member

jonas-schievink commented Jan 27, 2019

The original problem seems caused by an incorrect intrinsic declaration. Like @peterhj showed, fixing the bad declaration generates correct code, so I'll close this (please leave a comment if this is not correct and I'll reopen).

Regarding support for core on NVPTX, I was able to build core using cargo-xbuild, but for some reason the compiler can't find core::arch::nvptx, so my setup must be wrong somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment