Switch branches/tags
Nothing to show
Find file
Fetching contributors…
Cannot retrieve contributors at this time
87 lines (46 sloc) 6.13 KB
= The Bottom =
Low level programming is primarily focused on telling the computer what to do. To do low level programming we need a language that both we and the computer understand which represents our instructions to the computer.
When we talk about "instructions to the computer", really we mean instructions to the CPU, which then sends instructions of various kinds to other pieces of hardware as well. Every kind of CPU has different capabilities, and different attached hardware that it supports, and thus has different instructions that we can give it. Because of this, there are many languages, even at the lowest level, which we need to use when talking to any particular CPU. In this material we will use ARMv6 assembly as accepted by the GNU assembler. An assembler is a program which takes assembly code that we type in text files, and translates it into the internal representation used by the computer. You will need a copy of the GNU assembler for ARM, or a compatible assembler, and a copy of qemu-system, which is a program we will use to simulate an ARMv6 CPU so that we may test our code on any computer. These programs are freely available on the Internet.
# TODO: material (seperate document) about setup of toolchains, etc, how to inspect registers, etc
Here is your first program:
mov r0, #42
`mov` is the name of the instruction we are giving to the computer. `r0` is the name of a register, and `42` is just the number forty-two.
What is a register? Every CPU has one or more "slots" where they can store data temporarily while working. These "slots" are called registers. When a program starts, these slots may be set to any value, since the CPU does not usually bother to make sure they have any particular value, so they may just contain random electrical noise. After some value has been put into them, then that value is in there until it is overwritten with a different value.
If you run the above program, and then inspect the contents of register 0, you will see the number 42. `mov` is the instruction that tells the computer to write a value into the a register. The value can also be specified as a register to read from:
mov r0, #42
mov r1, r0
Now both r0 and r1 are 42.
The act of writing a value into a register may also be called "assignment" or "assigning a value to a register".
The values given to an instruction are often called "arguments" or "parametres".
We can also add to a value:
mov r0, #30
add r0, r0, #12
The `add` instruction takes the values from the second and third arguments, adds them, and puts the result into the register specified by the first argument.
Of course, it doesn't matter what registers we use, as long as we are consistant:
mov r1, #30
add r1, r1, #12
Now the value 42 is in `r1` instead of in `r0`. We could also use `r1` while performing the computation, but put the result in `r0`:
mov r1, #30
add r0, r1, #12
Now `r0` is 42 again. Question: what is `r1`? It's 30, because we wrote a 30 into it with `mov` and `add` only changed the value in `r0`.
== Program Layout ==
All of the computers we will be dealing with, and indeed most computer with which you are likely to ever interact, are so-called "stored program computers". These computers have instructions to the machine stored off of the CPU at run-time in RAM. So-called "Harvard Architecture" computers have two kinds of RAM: one for instructions and one for data. We will only be dealing with so-called "Von Neumann" or "Princeton" architecture computers, which only have one kind of RAM that stores everything. Before a program starts running, the instructions that make up the program must be loaded into RAM at some location. Locations in RAM are sometimes called "addresses".
The CPU has a special register called the "Program Counter" which holds the address of the next instruction it should execute. This instruction is loaded in from RAM and then executed. When an instruction is done being loaded, the program counter is incremented by the size of one instruction (so that it now holds the address of the next instruction).
The address of the first instruction of a program is called the "entry point". To run a loaded program, the program counter (also called "pc") is loaded with the value of this entry point.
== Number Systems ==
So far all of the numbers we have seen have been written out using the arabic-numeral decimal notation you may be used to. Let's do a quick refresh of what this notation really means:
42 ≡ 4*(10^1) + 2*(10^0)
163 ≡ 1*(10^2) + 6*(10^1) + 3*(10^0)
There is no reason the base number must be ten. We might decide to use, for example, sixteen:
6A ≡ 6*(16^1) + 10*(10^0)
Or two:
10011 = 1*(2^4) + 0*(2^3) + 0*(2^2) + 1*(2^1) + 1*(2^0)
Both base-sixteen ("hexadecimal") and base-two ("binary") are fairly common in low-level programming. Binary is common because it is the high-level conceptual model of how numbers are stored in a computer. Hexadecimal is common because numbers in a computer are usually addressed as some number of "bytes" (8 bits, which is 8 binary digits) and these are nicely representex in hexadecimal (two hexadecimal digits to one byte).
== Two's Complement ==
If computers conceptually store numbers using binary, how do they store negative numbers? It varies by computer, but arithmetic instructions on the CPUs we will be considering use a system called "Two's Complement". The two's complement of an N-bit number (usually 32 for our purposes) is the result of subtracting the absolute value from 2^N. The two's complement representation of signed numbers with N bits can represent numbers from -(2^(N-1)) to (2^(N-1) - 1). Two's complement has the advantage that most arithmetic operations can operate on both negative and positive numbers in exactly the same way.
For clarity: the first bit of a negative number under two's complement is always a 1. The first bit of a positive number under two's complement is always a 0.
There is a strage side-effect of this representation. Adding 1 to the largest positive number on N bits results in the smallest negative number on N bits:
01111111 + 1 = 10000000
And vice-versa:
10000000 - 1 = 01111111
This behaviour is called "overflow".