-
Notifications
You must be signed in to change notification settings - Fork 9
Tagha Assembly Language Manual
This article teaches about the Tagha Assembly language that's compiled by the Tagha Assembler into a working .tbc script
In Tagha Assembly, there are single and multi-line comments.
; is for single line comments and C style /* */ multi-line comments.
In Tagha Assembly, the assembly directives start with a dollar sign $. Here's a break down of all the Tagha Assembly directives.
TBC scripts, by default, are alloted an operand stack size of 4kb.
the $opstack_size directive allows a programmer (or generating compiler) to change the default operand stack size given to scripts.
The directive only takes one argument which is either a decimal, 0x hexadecimal, 0b binary, or 0# octal argument.
Example Tagha Assembly code usage:
$opstack_size 10 ;; sets the stack size to 10 * 8 bytes (decimal)
$opstack_size 0x10 ;; sets the stack size to 16 * 8 bytes (hexadecimal)
$opstack_size 010 ;; sets the stack size to 8 * 8 bytes (octal)
$opstack_size 0b10 ;; sets the stack size to 2 * 8 bytes (binary)Used the same way as $opstack_size but for a module's call stack size.
$callstack_size 10 ;; sets the stack size to 10 * 8 bytes (decimal)
$callstack_size 0x10 ;; sets the stack size to 16 * 8 bytes (hexadecimal)
$callstack_size 010 ;; sets the stack size to 8 * 8 bytes (octal)
$callstack_size 0b10 ;; sets the stack size to 2 * 8 bytes (binary)TBC scripts utilize a memory allocator which stores the stack and data tables. After allocating the stack and data tables, what's left of the allocator can be used internally as heap data.
The $heap_size directive allows a programmer (or generating compiler) to leave extra heap memory for scripts to use.
The directive only takes one argument which is either a decimal, 0x hexadecimal, 0b binary, or 0# octal argument.
Example Tagha Assembly code usage:
$heap_size 10 ;; sets the heap size to 10 bytes (decimal)
$heap_size 0x10 ;; sets the heap size to 16 bytes (hexadecimal)
$heap_size 010 ;; sets the heap size to 8 bytes (octal)
$heap_size 0b10 ;; sets the heap size to 2 bytes (binary)Since global variables in a tbc script are designed to be accessible by the host application, global variable require to be named and defined through Tagha Assembly code.
the $global directive has different arguments depending on the type of global variable.
For strings (anonymous or named), the arguments are...
- arg 1 - the name of the string.
- arg 2 - the actual string data.
Example Tagha Assembly code usage of defining a string.
$global str1, "hello world\n" ;; escape chars are handled.For every other variable, there's 2 kinds of possible arguments: Like the string version, you must have a name for the variable as well as the byte size of the variable. for the 3rd argument (the actual data). You have two (limited) options.
-
the data for a global var must have each data count listed by size and the data and the count must be equal to the bytesize of the global var.
-
if you're making a large global variable (struct or array or something) and you only want its data to be zero, you simply put a
0as the 3rd argument without requiring to give a size.
$global i, 12, 0
$global n, 4, byte 0, byte 0, byte 0, byte 0x40
$global m, 4, half 0, half 0x4000 ;; same as what n does, including data
$global c, 4, long 0x40000000 ;; same as n does, including data
$global ptr, 8, word 0 ;; 'word' is redundant since we're only zeroing.the $native directive exists to expose the C or C++ native to the script. The directive only takes one argument which is the name of the native you want to expose.
Here's an example Tagha Assembly code that exposes malloc from stdlib.h to a script:
$native mallocthe $extern directive exists to do dynamic linking between scripts. The directive only takes one argument which is the name of the external function to link.
Here's an example of how it's used:
$extern function_nameIn Tagha Assembly, operations can only exist within function blocks. After defining the function name, you must follow the name with a beginning curly brace {. Once a function's code is finished, you must end the block with, you guessed it, an ending curly brace }.
Let's create an example by defining a function and call it main! Since main is a function, we must give it a ret opcode so that the virtual machine can destroy the execution frame after main is done calling!
;; create "main" and its code block!
main: { ;; : colon is optional here!
ret
}It's necessary at this point to bring up that the rest of this article assumes that you've read the Tagha Instruction Set article!
Alright, we've created our main function but it doesn't do jack. Now let's make it do jack.
Before we start writing code, let me introduce you to registers in Tagha.
Registers are special locations in a CPU that store data. All registers in Tagha are 64-bit and can be used for both integer, floating point, & memory based operations. Registers are vital is much of Tagha's operations as they're useful in many optimization cases.
Note: all the register names start with an r to denote that it's a register!
Tagha allows you to use up to 256 registers in a given execution frame. All that is necessary to save register space is by allocating and reducing the amount of registers available in each function execution context.
One of the most common ways to input data is through constants or immediate values. Immediate values are always encoded as 8 bytes for the sake of convenience. Just like the $opstack_size or $heap_size directive, an immediate value can be numbered as a decimal, hexadecimal, binary, or octal value!
As mentioned in the registers portion, registers data can be used as memory addresses. Dereferencing a register as a memory address has the following format:
[register +/- constant_offset]Register-Memory operations also gives you the ability to add (or subtract) an offset to the address you're dereferencing. The offset is calculated in terms of bytes (again Tagha is byte-addressable).
Here's an example where we assume r0 contains a memory address and we want to dereference it as an int16:
st2 [r0 + 0] ;; store 2 bytesConveniently, you don't need the zero since it will not change the address given
st2 [r0] ;; store 2 bytesbut you do need to do arithmetic for any number larger than 0:
st2 [r0+2]which is necessary for cases such as arrays or structs.
Now that we've gotten the 3 data operations covered, we can go back to our main example. Let's create an example where we do something like 1 + 2!
;; create "main" and its code block!
main: {
alloc 2 ;; allocate 2 registers
movi r0, 2 ;; set r0's value to 2
movi r1, 1 ;; set r1's value to 1
add r0, r1 ;; add r1 to r0, modifying r0. r0 is now "3"
ret
}So far the only type of label you've been exposed to is the function label but that's only for defining functions.
One of the most prized and valuable constructs in any programming language is the control flow controls like if-else statements, switch, and while or for loops.
In order to implement these control flow tools, we need the ability to label areas of flow!
In Tagha ASM, jump labels are defined using a . dot sign!
Here's an example:
main {
.exit
ret
}the jump label so far is pretty useless, all it does is point to where the ret opcode is.
Let's take all that we've learned and create real, usable code!
Let's create a main function in our script so the script can have a direct entry point and we'll have a secondary function called factorial:
main: {
ret
}
factorial: {
ret
}So far both functions do nothing. We'll implement factorial as a function that takes a single int32 parameter and returns an int32 as well.
the implementation of the factorial looks like this in C and Python respectively:
uint32_t factorial(const uint32_t i) {
return i<=1 ? 1 : i * factorial(i-1);
}def factorial(i:int) -> int:
if i<=1:
return i
return i * factorial(i-1)So now let's implement that in our Tagha ASM language!
Now the important part here is that since factorial calls another function (itself), we must save the link register before calling another function and then restore it once the execution frame goes back to the original one by using pushlr and poplr.
For further details on Calling Convention, please refer to the Instruction Set Arch documentation.
factorial {
pushlr ;; preserve link register.
alloc 3 ;; allot 3 registers for this function frame!
mov r0, r3 ;; r3 is previous call's r0!
;; if( i<=1 )
movi r1, 1
ule r0, r1
jz .L1
;; return 1;
mov r0, r1
jmp .L2
.L1
;; return i * factorial(i-1);
mov r2, r0 ;; int temp = i;
sub r2, r1 ;; temp -= 1;
mov r0, r2
call factorial ;; int res = factorial(temp);
mul r3, r0 ;; i * res;
.L2
poplr ;; restore link register.
redux 3 ;; put back the 3 registers we allotted.
ret
}Alright, we have a defined and working function in Tagha Assembly, how can we call it within our script's Tagha ASM code?
To actually give factorial arguments and call it from Tagha Assembly code, we must set the semkath register to a numeric value and then call factorial, here's an example.
main {
pushlr ;; since we're gonna call factorial, preserve link register.
alloc 1 ;; allocate 1 register.
movi r0, 5 ;; set that 1 reg to hold the argument of 5.
call factorial ;; call factorial.
poplr ;; we've returned from 'factorial', restore old link register.
ret
}When factorial has finished execution, the final return value will be in r0. The factorial of 5 is 120, so register r0 data will contain the 32-bit integer value of 120. Now concerning the C standard, main should be returning 0, so r0 must be set to 0 after the call to factorial is finished.