Skip to content

Commit

Permalink
Doc updates
Browse files Browse the repository at this point in the history
  • Loading branch information
campagnola committed Jan 22, 2015
1 parent 33612bb commit f8054b7
Show file tree
Hide file tree
Showing 4 changed files with 348 additions and 67 deletions.
182 changes: 152 additions & 30 deletions docs/asm.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
PyCCA Assembler
==============
===============


The core of pycca is an x86 assembly compiler that allows the creation and
Expand All @@ -8,20 +8,63 @@ of pycca's assembler is tested to generate identical output to the GNU
assembler for all supported instructions.


Assembly Language
Assembly language
-----------------

PyCCA's assembler uses a syntax and instruction mnemonics very similar to the
intel assembly syntax. Instructions consist of a mnemonic (instruction name)
followed by whitespace and a number of comma-separated operands::
intel / NASM assembly syntax. Instructions consist of a mnemonic (instruction
name) followed by whitespace and a number of comma-separated operands::
push ebp
sub esp, 32
mov eax, dword ptr [edx + ecx*8 + 12]
label: # Comments follow a hash
push ebp
sub esp, 32
mov eax, dword ptr [edx + ecx*8 + 12]
jmp label

Many assembler examples found on the internet use the ATT syntax, which
Note: Many assembler examples found on the internet use the AT&T syntax, which
prefixes register names with '%' and reverses the order of operands (the intel
syntax puts the destination operand first; ATT puts the source operand first).
syntax puts the destination operand first; AT&T puts the source operand first).

Operands may be one of four types:

* The name of a register (see :ref:`all registers <registers>`).
* An "immediate" integer data value. These may be signed or unsigned and are
evaluated as python expressions, so "0xFF" and "0b1101" are also accepted
syntaxes.
* The name of a label declared elsewhere in the code (these are ultimately
compiled as immediate values pointing to the address of the label
declaration).
* A pointer to data in memory. Pointers provide both a memory address and
the size of data they point to.

In x86, memory addresses are specified as the sum of a base register, a scaled
offset register, and an integer displacement::
address = base + offset*scale + displacement

Where ``scale`` may be 1, 2, 4, or 8, and addresses may contain any combination
of these three elements. Memory operands are written with square brackets
surrounding the address expression. For example:

================================ ==============================================
Memory operand Description
================================ ==============================================
[rax] Pointer to address stored in register rax
[eax + ebx*2] Address calculated as eax + ebx*2
[0x1000] Pointer to address 0x1000
[rax + rbx + 8] Address calculated as rax + rbx + 8
word ptr [rbp - 0x10] Pointer to 2 byte data beginning at rbp - 0x10
qword ptr [eax] pointer to 8 byte data beginning at eax
================================ ==============================================

On 64 bit architectures, it is only valid to use 64 or 32 bit registers for
memory addresses. On 32 bit architectures, it is only valid to use 32 or 16 bit
registers for memory addresses. Note: the allowed expression forms for 16 bit
addresses are very limited and are not covered here.


Building assembly from Python objects
-------------------------------------

It is also possible to write assembly code as a list of Instruction instances.
This has the advantage of avoiding the parsing stage and facilitating
Expand All @@ -30,6 +73,7 @@ dynamically-generated assembly code::
from pycca.asm import *
code = [
label('start'),
push(ebp),
mov(ebp, esp),
push(dword([ebp+12])),
Expand All @@ -39,8 +83,21 @@ dynamically-generated assembly code::
mov(esp, ebp),
pop(ebp),
ret(ret_byts),
jmp('start'),
]

Thanks to similarities in the NASM and Python syntaxes, there are only minor
differences in this approach:

* Instructions are classes and thus require parentheses to instantiate
* Use pointer size functions like ``dword(address)`` instead of ``dword ptr``.
* Labels are also objects and they are referenced by their string name
(see ``label`` and ``jmp`` lines above).


Compiling Python functions from assembly
----------------------------------------

Executing this code is only a matter of compiling it into a ctypes function and
providing the return and argument types::
Expand All @@ -56,6 +113,7 @@ For lists of supported :ref:`instructions <instructions>` and
:ref:`registers <registers>`, see the :ref:`asm_api_ref`.



Differences with GNU-AS
-----------------------

Expand Down Expand Up @@ -115,6 +173,7 @@ To add new instructions:
Note that advanced CPU extensions such as SSE2 and AVX are not yet supported.



.. _asm_api_ref:

Assembly API Reference
Expand All @@ -140,27 +199,55 @@ Supported registers
All registers may be accessed as attributes of the ``pycca.asm`` or
``pycca.asm.register`` modules.

+-------+---------------+--------------------------------+-----------------+
|arch | 32 / 64 | 64 only | 32/64 |
+-------+----+----+-----+-----+------+------+------+-----+-----+-----+-----+
|size |8 |16 |32 |64 |8 |16 |32 |64 |80 |64 |128 |
+-------+----+----+-----+-----+------+------+------+-----+-----+-----+-----+
| |al |ax |eax |rax |r8b |r8w |r8d |r8 |st(0)|mm0 |xmm0 |
+-------+----+----+-----+-----+------+------+------+-----+-----+-----+-----+
| |cl |cx |ecx |rcx |r9b |r9w |r9d |r9 |st(0)|mm1 |xmm1 |
+-------+----+----+-----+-----+------+------+------+-----+-----+-----+-----+
| |dl |dx |edx |rdx |r10b |r10w |r10d |r10 |st(0)|mm2 |xmm2 |
+-------+----+----+-----+-----+------+------+------+-----+-----+-----+-----+
| |bl |bx |ebx |rbx |r11b |r11w |r11d |r11 |st(0)|mm3 |xmm3 |
+-------+----+----+-----+-----+------+------+------+-----+-----+-----+-----+
| |ah |sp |esp |rsp |r12b |r12w |r12d |r12 |st(0)|mm4 |xmm4 |
+-------+----+----+-----+-----+------+------+------+-----+-----+-----+-----+
| |ch |bp |ebp |rbp |r13b |r13w |r13d |r13 |st(0)|mm5 |xmm5 |
+-------+----+----+-----+-----+------+------+------+-----+-----+-----+-----+
| |dh |si |esi |rsi |r14b |r14w |r14d |r14 |st(0)|mm6 |xmm6 |
+-------+----+----+-----+-----+------+------+------+-----+-----+-----+-----+
| |bh |di |edi |rdi |r15b |r15w |r15d |r15 |st(0)|mm7 |xmm7 |
+-------+----+----+-----+-----+------+------+------+-----+-----+-----+-----+
General purpose registers:

+-------+---------------+--------------------------------+
|arch | 32 / 64 | 64 only |
+-------+----+----+-----+-----+------+------+------+-----+
|size |8 |16 |32 |64 |8 |16 |32 |64 |
+-------+----+----+-----+-----+------+------+------+-----+
| |al |ax |eax |rax |r8b |r8w |r8d |r8 |
+-------+----+----+-----+-----+------+------+------+-----+
| |cl |cx |ecx |rcx |r9b |r9w |r9d |r9 |
+-------+----+----+-----+-----+------+------+------+-----+
| |dl |dx |edx |rdx |r10b |r10w |r10d |r10 |
+-------+----+----+-----+-----+------+------+------+-----+
| |bl |bx |ebx |rbx |r11b |r11w |r11d |r11 |
+-------+----+----+-----+-----+------+------+------+-----+
| |ah |sp |esp |rsp |r12b |r12w |r12d |r12 |
+-------+----+----+-----+-----+------+------+------+-----+
| |ch |bp |ebp |rbp |r13b |r13w |r13d |r13 |
+-------+----+----+-----+-----+------+------+------+-----+
| |dh |si |esi |rsi |r14b |r14w |r14d |r14 |
+-------+----+----+-----+-----+------+------+------+-----+
| |bh |di |edi |rdi |r15b |r15w |r15d |r15 |
+-------+----+----+-----+-----+------+------+------+-----+


Floating-point registers:

+-------+-----------------+
|arch | 32 / 64 |
+-------+-----+-----+-----+
|size |80 |64 |128 |
+-------+-----+-----+-----+
| |st(0)|mm0 |xmm0 |
+-------+-----+-----+-----+
| |st(1)|mm1 |xmm1 |
+-------+-----+-----+-----+
| |st(2)|mm2 |xmm2 |
+-------+-----+-----+-----+
| |st(3)|mm3 |xmm3 |
+-------+-----+-----+-----+
| |st(4)|mm4 |xmm4 |
+-------+-----+-----+-----+
| |st(5)|mm5 |xmm5 |
+-------+-----+-----+-----+
| |st(6)|mm6 |xmm6 |
+-------+-----+-----+-----+
| |st(7)|mm7 |xmm7 |
+-------+-----+-----+-----+


.. automodule:: pycca.asm.register
:members:
Expand All @@ -171,6 +258,41 @@ All registers may be accessed as attributes of the ``pycca.asm`` or
Supported instructions
----------------------

All instructions currently supported by pycca are listed below. Most
instructions accept a variety of operand types which are listed in a table
for each instruction:

============ =========================
Operand code Operand type
============ =========================
r general purpose register
r/m register or memory
imm immediate value
st(i) x87 ST register
xmmI xmm register
============ =========================

Operand codes are followed by one or more values indicating the allowed size(s)
for the operand. For example, the ``push`` instruction gives the following
table:

=============== ====== ====== ======================================
src 32-bit 64-bit description
=============== ====== ====== ======================================
r/m8 X X Push src onto stack
r/m16 X X
r/m32 X
r/m64 X
imm8/32 X X
=============== ====== ====== ======================================

This table indicates that ``push`` accepts one operand ``src`` that may be
a general purpose register, a memory address, or an immediate value. The
allowed operand sizes depend on the target architecture (for this instruction,
64 bit memory/register operands are not allowed on 32 bit architectures and
vice-versa).


.. automodule:: pycca.asm.instructions
:members:

Expand Down
2 changes: 1 addition & 1 deletion docs/cc.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
PyCCA C Compiler
===============
================

PyCCA's C compiler is currently in very early development. See
``cc_examples.py`` for examples of what is currently working (for some
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
contain the root `toctree` directive.
PyCCA Documentation
==================
===================

PyCCA provides compilers for C and x86 assembly that allow optimized routines
to be compiled and executed at runtime with no external dependencies.
Expand Down

0 comments on commit f8054b7

Please sign in to comment.