ASM generation tool for GAS/NASM/MASM with Xbyak-like syntax in Python.
This file provides an Xbyak-like DSL to generate ASM code for GAS/NASM/MASM. i.e., A static version of Xbyak
- gas : GNU Assembler
- nasm : Netwide Assembler (NASM)
- masm : Microsoft Macro Assembler
There are several samples in the sample/
directory.
import sys
sys.path.append('../')
from s_xbyak import *
def main():
parser = getDefaultParser()
param = parser.parse_args()
init(param)
segment('text')
with FuncProc('add2'):
with StackFrame(2) as sf:
x = sf.p[0]
y = sf.p[1]
lea(rax, ptr(x + y))
term()
if __name__ == '__main__':
main()
Commentaries:
-
getDefaultParser()
parses some options.-win
: use Win64 ABI (default : AMD64 ABI)-m mode
: mode = gas/nasm/masm (default : nasm)
-
param
must have the following keys.win : bool
mode : str
-
segment('text')
- Declare that the code starts here.
-
FuncProc('add2')
- Declare that the function
add2
starts here.
- Declare that the function
-
StackFrame(2)
- Declare that the function has two integer-type arguments
- Remark : The current version supports only integer-(pointer)-type and the max number is four.
sf.p[0]
: The register corresponding to the 1st argument.sf.p[1]
: The register corresponding to the 2nd argument.
-
lea(rax, ptr(x + y))
s_xbyak
usesptr(...)
instead ofptr[...]
.
-
ret()
is automatically inserted when theStackFrame
ends. -
term()
- Terminates code generation.
$ python3 add.py -m gas > add_s.S
.text
.global PRE(add2)
PRE(add2):
TYPE(add2)
lea (%rdi,%rsi,1), %rax
ret
SIZE(add2)
PRE
,TYPE
,SIZE
are macros to absorb OS differences.
For Linux/Intel macOS
python3 add.py -m nasm
segment .text
_global add2
lea rax, [rdi+rsi]
ret
For Windows
> python3 add.py -m nasm -win
segment .text
export add2
_global add2
lea rax, [rcx+rdx]
ret
$ python3 add.py -m masm
_text segment
add2 proc export
lea rax, qword ptr [rcx+rdx]
ret
add2 endp
_text ends
end
def main():
parser = getDefaultParser()
param = parser.parse_args()
init(param)
segment('data')
global_('g_x')
dd_(123)
segment('text')
with FuncProc('inc_and_add'):
with StackFrame(1) as sf:
inc(dword(rip+'g_x'))
y = sf.p[0]
mov(eax, ptr(rip+'g_x'))
add(rax, y)
term()
Commentaries:
segment('data')
- Declare that the data starts here.
global_('g_x')
- The memory named
g_x
can be accessed from the other files.
- The memory named
dd_(123)
- Put
123
as 32-bit integer.
- Put
(rip+'g_x')
- Use
rip+
to access by relative addressing.
- Use
inc(dword(...))
- Use
qword(64-bit)/dword(32-bit)/word(16-bit)/byte(8-bit)
instead ofptr
to specify the memory size.
- Use
def main():
parser = getDefaultParser()
param = parser.parse_args()
init(param)
segment('text')
with FuncProc('add_avx'):
with StackFrame(4) as sf:
pz = sf.p[0]
px = sf.p[1]
py = sf.p[2]
n = sf.p[3]
lpL = Label()
L(lpL)
vmovups(xmm0, ptr(px))
vaddps(xmm0, xmm0, ptr(py))
vmovups(ptr(pz), xmm0)
add(px, 16)
add(py, 16)
add(pz, 16)
sub(n, 4)
jnz(lpL)
term()
This is a tiny sample of a label and AVX. This is not fast code.
Commentaries:
lpL = Label()
- Define a label instance.
L(lpL)
- Set lpL here.
jnz(lpL)
- Jump to lpL if non-zero.
Most of the mnemonics are the same as defined in the Intel manual except for and_
, or_
, xor_
, not_
, in_
, out_
, int_
.
lpL = Label()
nextL = Label()
L(lpL) # lpL is set here
ja(nextL)
jmp(lpL)
L(nextL)
Use db_
, dw_
, dd_
, dq_
.
makeLabel('varX')
dq_(12345)
mov(rax, ptr(rip+'varX'))`
- Merge-masking
vaddps(xmm1 | k1, xmm2, xmm3)
vmovups(ptr(rax+rcx*4+123)|k1, zmm0)
- Zero-masking
vsubps(ymm0 | k4 | T_z, ymm1, ymm2)
- Broadcast
vmulps(zmm0, zmm1, ptr_b(rax))
ptr_b
is converted to{1toX}
according to the mnemonics.
- Rounding
vdivps(zmm0, zmm1, zmm2|T_rz_sae)
- Suppress all exceptions
vmaxss(xmm1, xmm2, xmm3|T_sae)
- Distinguish
m128
andm256
vcvtpd2dq(xmm16, xword (eax+32))
#m128
vcvtpd2dq(xmm0, yword (eax+32))
#m256
vcvtpd2dq(xmm21, ptr_b (eax+32))
#m128
+ broadcastvcvtpd2dq(xmm19, yword_b (eax+32))
#m256
+ broadcast
MITSUNARI Shigeo(herumi@nifty.com)