laanwj edited this page Sep 13, 2010 · 14 revisions
Clone this wiki locally

Cubin Utilities


The Cubin Utilities package currently consists of the following utilities:

  • decuda is a disassembler for the NVIDIA CUDA binary (.cubin) format. It provides insight into the internal instructions generated for the G8x and G9x architectures. Also, it can help in finding bottlenecks, as you can see what parts of your algorithm require a lot of actual instructions. It has an option to generate a format that is compatible with cudasm to make it
    possible to hand-optimize kernels.

  • cudasm is an assembler for the NVIDIA G8x architecture of Graphics Processing Units (GPUs). It allows writing and optimizing code specificially for the G8x and G9x series, and provides a (basic) independent toolchain for this hardware. It takes a text file with assembly instructions as input, and produces a .cubin file as output. Note that cudasm is in the very early stages of development and not yet fully usable.

As NVIDIA is unwilling to provide any information, the raw cubin instructions remained a mystery for quite some time. After a lot of experimentation it appears I figured out most of it. I acquired all the information by differential analysis on the cubin files produced by ptxas and extensive experimentation on the results. It is not based on reverse engineering of hardware or software.

I tried to mimic PTX instructions as closely as possible in the output, for an overview of the PTX assembly language see PTX_ISA_x.×.PDF in the doc directory of the NVIDIA CUDA toolkit (which can be found on the download page of NVIDIA CUDA).

Especially cudasm is a relatively new program, probably there are some bugs. Let me know if you find any problems or have questions.


See the README file in the source package.


Here is a small example of the output of decuda.

Example kernel:

global void my_kernel(int *x)
*x = 0×123456;

Corresponding assembly code:

.entry my_kernel
.lmem 0
.smem 24
.reg 2
.bar 0
  mov.b32 $r0, s[0x0010]
  mov.b32 $r1, 0x00123456
  mov.end.u32 g[$r0], $r1


  • Python 2.4


Starting with version 0.4.0, the assembler (cudasm) is included.

latest (tar.gz)
latest (zip)

See the ChangeLog (also included in the distribution) for an overview of changes for each version.


The project switched over to GIThub (as of September 2009), the source can be downloaded using anonymous git with

$ git clone git://github.com/laanwj/decuda.git



Wladimir J. van der Laan