02-bottom

= Branches and Conditionals =

So far the instructions we have seen only allow us to give the computer a fixed-size list of things to do.  Once all of the instructions have run, the program is done.  What if we wanted to do something simple like subtract 6 from r0 the number of times that are stored in r1?

Well, first we will need a way to run instructions more than once.  To do this, we use the `b` instruction:

	doing:
		sub r0, r0, #6
		b doing

What is this `doing:`?  Any word at the start of a line followed by a colon is a "label".  A "label" marks a position in the code so that we can refer to it elsewhere.  This is because instructions may get loaded into the computer's RAM anywhere, and so we might not know ahead of time where a given instruction is in order to jump to it.  Labels solve this problem, and also let us use nice names that make sense to us instead of obscure numbers representing locations in RAM.

If you run this program, it will never finish.  That's because the `b` instruction just causes the CPU to jump (or "branch") to the `doing` label.  So it will just keep running that subtract instruction forever.

To solve this problem, ARMv6 allows us to put a specifier on the end of any instruction to say that some condition must be true in order for the instruction to run.  For example, to branch only if something is equal to something else, we can use `beq` instead of just `b`, or `bne` to branch if they are not equal.

We also need a way to say what things we are testing for equality.  The simplest instruction for doing this is `cmp`, which takes two arguments and, instead of storing a result somewhere that you can access, changes the value that the CPU uses for conditional execution of any instructions.  For example in:

	doing:
		sub r0, r0, #6
		sub r1, r1, #1
		cmp r1, #0
		bne doing

The `cmp` causes the `bne` to only execute if r1 and 0 are not equal.

The above code will keep subtracting 6 from r0, and 1 from r1, until r1 equals 0.  What if r1 starts at 0?  Then the first subtraction (which runs before we check for equality) will cause it to no longer equal 0, and then the number will get smaller and smaller until it reaches the limits of the size of the register and then strange machine-specific things will happen.  Lets fix that:

	b loopcond
	doing:
		sub r0, r0, #6
		sub r1, r1, #1
	loopcond:
		cmp r1, #0
		bne doing

Now we check the condition before starting the loop.

== Pipelining ==

Many CPUs use a technique called "instruction pipelining" to increase the number of instructions they can run at one time.  This works because the circuitry used to do one thing may not be needed to do something else.  So, for example, the CPU can be fetching the next instruction from RAM while it is executing the current one.  The classic RISC pipeline steps are:

# Instruction fetch
# Instruction decode and register fetch
# Execute
# Memory access
# Register write back

Some assemblers expose the pipelined nature of their underlying processor to the programmer.  This means that instructions may not quite run exactly in the order that they are written, but may have parts run in parallel.  Normally this is not a problem, but consider the following on a pipeline-exposing assembler:

	add r5, r5, #1
	mov r6, r5

How will this be executed?

# Fetch the `add` instruction
# Decode the `add` instruction and fetch `r5`, while fetching the `mov` instruction
# Execute the `add` instruction, while decoding the `mov` instruction and fetching `r5`
# No memory access for `add`, while executing the `mov` instruction
# Write the incremented value of `r5` from `add`, while there is no memory access for `mov`
# Write the copied value to `r6` for `mov`

Do you see the problem?  `r5` is copied before the `add` is done!  The value of `r6` will be wrong.  This is called a "data hazard".

Even if the assembler does not expose the pipelined nature of the CPU to the programmer, there may still be things that need to be taken into account because of the way hazards are worked around.  For example, consider this code:

	b doing
	mov r1, #1

We do not want to execute that `mov` simultaneously with the `b`!  In fact, because of the branch, we do not want to execute the `mov` at all.  This means that the processor will have to "stall" the pipeline every time we branch.  That is, any further loads will have to be stopped until the branch can be completed (since until `pc` gets the new value written, instruction loading is impossible).

All of this means that branching can be more expensive than other instructions (because it defeats pipelining).  We could rewrite our code like this:

	doing:
		cmp r1, #0
		subne r0, r0, #6
		subne r1, r1, #1
		bne doing

This way, the subtractions and branch will never happen if r1 is equal to 0, and we have less branches happening (since we don't need to jump to the end to do the comparison).

== Procedures ==

So now we have a program that can do something to r0 based on r1.  What if we wanted to do this many times in our program?  Would we have to copy this code everywhere?

Well, we can run this code from anywhere in our program (with `b doing`), but how would it get back to us?  Right now it would just keep running whatever instructions came next.

Maybe we could store the location to come back to in a register?  But how do we know where we are?  Well, we could use labels, like so:

	mov r14, comeback
	b doing
	comeback: ...rest...

This would work, but then we have to come up with a unique label name for every place we want to do this.  Luckily, we can store the current value of the program counter:

	mov r14, pc
	b doing

But of course, we don't want to come back to exactly our current instruction, since then we will just branch to `doing` again right away!

	mov r14, pc
	add r14, r14, #8
	b doing

It turns out that wanting to do this is common enough, that ARM has a special instruction name for it: `bl` (or "Branch and Link").  It always stores the place we want to come back to in register 14, sometimes called `lr` (for "Link Register").

	bl doing

Of course, we would also need to modify `doing` to come back.

	doing:
		sub r0, r0, #6
		sub r1, r1, #1
		cmp r1, #0
		bne doing
		mov pc, lr

Now we can set r0 and r1 from anywhere in our code, and then cause `doing` to operate on them.  But what if we didn't want doing to modify r1?  What if we still need that value?  Well, we could just `mov r3, r1` or something, but what if we were using r3 for something?  `doing` can't know what registers the code that calls it is using, and even we as the programmer can't know for sure how we will use it in the future.  We could be careful to use only certain registers, but what if we needed those in a new place later on?  What if we had some code that needed almost *all* of the registers?

The solution to this, as we will see, is to store things in RAM and not just in registers.