# Assembler Syntax and Machine Model

## Registers

tenyr has sixteen 32-bit two's-complement signed integer registers, named `A` through `P` (case is ignored). Two of these registers, `A` and `P`, are special, while the others are general purpose (see Instruction Shorthand and Control Flow for descriptions of the special `A` and `P` registers).

## Instruction Format

Every tenyr instruction can be expressed in four algebraic instruction formats (type0, type1, type2, type3) :

``````Z <- X op Y + I
Z <- X op I + Y
Z <- I op X + Y
Z <- X      + I20
``````

where `Z` is a register, `op` is one of the accepted arithmetic operations, `+` is addition, `X` and `Y` are registers, `I` is any immediate integer value between -2048 and 2047 inclusive, and `I20` is any immediate integer value between -524288 and 524287 inclusive (i.e., `I` and `I20` are 12-bit and 20-bit signed two's-complement integers immediates). In the first three formats, any one of the operands on the right hand side can be left out, along with the operation that precedes it. Examples :

``````a <- a + a + 0     # type 0
b <- c * d + 3     # type 0
c <- d - e + -2    # type 0
d <- e ^ f         # type 0
e <- f - 2         # type 1
f <- 2 | g         # type 2
g <- -h            # type 1
h <- i >= 0        # type 1
i <- j < k         # type 0
j <- k             # type 0
k <- j + 0x7abcd   # type 3
l <- 0x7abcd       # type 3
m <- a + a + 0x79a # type 0
``````

## Operations

Here are the operations that tenyr supports, with their binary encodings. Operations are encoded this way to group hardware-similar operations together, differing by the most sigificant bit only. In the table below, the column header is read before the row ; e.g., `&` is `0b0001` and `-` is `0b1100`.

Encoding Operator Description Encoding Operator Description
`0000` `|` bitwise OR `1000` `|~` bitwise OR with complemented second operand
`0001` `&` bitwise AND `1001` `&~` bitwise AND with complemented second operand
`0010` `^` bitwise XOR `1010` `^^` pack (see below)
`0011` `>>` arithmetic right shift `1011` `>>>` logical right shift
`0100` `+` signed addition `1100` `-` signed subtraction
`0101` `*` signed multiplication `1101` `<<` left shift
`0110` `==` bitwise equality `1110` `@` test bit at position
`0111` `<` signed less-than `1111` `>=` signed greater-than-or-equal

Some of the operations merit explanation. The test operations (`<`, `>=`, `==`, and `@`) produce a result that is either `0` (false) or `-1` (true). The canonical truth value in tenyr is `-1`, not `1`. This allows us to do clever things with masks, and also explains the existence of the special `&~` and `|~` operations -- when the second operand is a truth value, the bitwise complement works as a Boolean NOT. The operations also support some syntactical sugar ; for example, `B <- ~C` is accepted by the assembler and transformed into `B <- A |~ C` or `B <- 0 |~ C`, depending on the required operation type.

The `pack` operation (represented by `^^`) concatenates the 20 least significant bits of the left operand with the 12 least significant bits of the right operand. This operation makes it easier to construct large values in registers using immediates.

tenyr works with signed two's complement numbers - the only operation (besides bitwise operations that have no concept of sign) that is explicitly unsigned is `>>>`, the logical right shift. Whereas `>>` (arithmetic right shift) fills in shifted bits with the most significant bit of the word, `>>>` fills in zeros.

## Memory Operations

A memory operation looks just like a register-register operation, but with one side of the instruction dereferenced, using brackets :

`````` D  <- [E  *  4 + F] # a load into D
E  -> [F  << 2]     # a store from E
[F] <-  2            # another kind of store, with an immediate value
``````

One instruction can't have brackets on both sides of an arrow, and an immediate value cannot appear on the left side of an arrow.

## Instruction Shorthand

Although pieces of the right-hand-side of an instruction can be left out during assembly, under the covers all the pieces are still there ; the missing parts are filled in with zeros or with references to the special `A` register, which always contains `0`, even if it is written to. Therefore, each instruction in the following pairs is identical to the other one in the pair :

``````B  <-  3       ; B  <-  A  |  A + 0x00000003
C  <-  D  *  E ; C  <-  D  *  E + 0x00000000
E  <-  1  << B ; E  <-  0x00000001  << B + A
``````

To see the expanded form, invoke the disassembler (`tas -d`) with the `-v` option.

## Control Flow

tenyr has no dedicated control-flow instructions ; flow is controlled by updating the `P` register, which is the program counter / instruction pointer. Reading from `P` will produce the address of the currently executing instruction, plus one. Writing to it will cause the next instruction executed to be fetched from the address written into `P`. For example, if this program starts at address 0 :

``````B <- P      # after this instruction, B contains 1
D <- 3      # after this instruction, D contains 3
P <- P - 3  # this is a loop back to the first instruction above
``````

Notice that in the third instruction it was necessary to subtract 3 instead of 2, because the value in `P` was effectively the location of the next instruction that would have been executed in the absence of a control flow change.

Under normal circumstances, the programmer is not expected to update the `P` register in such a direct fashion, but rather to use a macro like `jnzrel(reg,target)` from common.th :

``````#include "common.th"
D <- 5
C <- 10
loop_top:
C <- C - 1
N <- C > D
jnzrel(N,loop_top)
``````

where `loop_top` is a label to jump to, and `jnzrel` means "jump if not zero to relative" (admittedly, this is not a very good name, because `N` needs to be `-1` not merely nonzero).

Notice that we used `>` even though this is not one of the supported operations. The assembler accepts `>` and rewrites it into a valid tenyr instruction by swapping the order of the operands and using `<` instead. An analogous transformation occurs for `<=`.

## Special instructions

All 32-bit words decode to a legal instruction of type 0, 1, 2, or 3. The token `illegal` is accepted by the assembler and encoded as `0xffffffff`, which is the type3 instruction `P <- [P - 1]`. This instruction will update P with the value of the instruction itself, so it has the effect of `P <- 0xffffffff`. The simulator halts before attempting to execute the instruction at address 0xffffffff.

## Labels

Labels can be used to identify segments of code and data. A label is defined by a sequence of alphanumeric characters and underscores, where that sequence cannot look like a register name (this restriction may be relaxed in the future). A label is referred to by prefixing `@` to its name :

``````data:
top:
B <- C
D <- @data
E <- @top
``````

Getting the value of `@label` directly isn't generally useful, because its value is relative to where the code or data was loaded in memory. If code was loaded in a single section at the default address of `0x1000`, one would need to add `0x1000` to `@data` to get the absolute value in memory. This is easier when using the special label `.`, which gives the offset from the beginning of the current section to the current address ; then the expression `P - (. + 1)` will be the loading offset. This is handled by the `rel()` macro from common.th. `rel()` produces a "PC-relocated" address from a label reference.

## Immediates

Immediate values in type{0,1,2} instructions are 12 bits wide, thus ranging from -2048 to 2047. In type3 instructions, they are 20 bits wide, thus ranging from -524288 to 524287. ASCII (the ASCII subset of UTF-32, really) character constants can appear in immediate expressions :

``````B <- '\$'
C <- 4
``````

An immediate value can also be an expression with multiple terms, as long as :

1. all of the terms are constants
2. the entire expression is enclosed in parentheses
3. a `@label` reference occurs at most once, at the outermost nesting

The result of an immediate expression is computed in the assembler, and only the resulting immediate value is written out. Many of the tenyr operations can be used in immediate expressions, too, as well as an additional one : integer division, with `/`. Operator precedence within an immediate expression follows the rules for the C language.

``````B <- B ^ ('A' ^ 'a')    # flip case of the character in B
C <- ((124 - 1) | 1)    # after this, C will contain 123
D <- (8 / 4)            # D will contain 2
E <- (16 - 8 / 4)       # E will contain 14, not 2
``````

## Directives

There are a few assembly directives to make assembly easier :

``````.word 0, 1, 2, 0x1234, 'A'  # each value is expanded to 32 bits
.word (2 + @bar), (8 / 4)   # expressions are accepted by `.word`
.utf32 "Hello, world"       # each character in its own 32-bit word
.utf32 "concat" "" "enate"  # string constants concatenate
.utf32 "concat", "enate"    # string constants store consecutively
.zero 0x14                  # this creates 0x14 = 20 zeros
.global foo                 # mark symbol visible during linking
``````

Three kinds of comments are supported : C89-style comments (non-nesting) delimited by `/*` and `*/` ; C99-style comments starting with `//` ; and shell-style comments starting with `#`. The latter two types of comments extend only to the end of line, and as there is no line continuation character, every commented line must have its own `//` or `#` character. This means that `//`-comments behave differently when processed by tenyr from when processed by `cpp`, if line continuation characters are used. `#`-comments are recommended ; the others are deprecated.