-
Notifications
You must be signed in to change notification settings - Fork 35
Home
This is a simple multi-core embedded processor. It contains 16 independent RISC cores, each with 512 words of local memory. Cores can also access a 1024 word shared memory segment on a round-robin basis.
Each core has 8 general purpose registers, each which contains a 16 bit integer value. There are also 4 condition code flags (zero, negative, carry, and overflow). These are set by arithmetic operations and control conditional branches.
The processor does not have interlocks, so some read-after-write dependencies must be handled in software:
- The processor has two "branch delay slots," which is a fancy way of saying it executes the next two instructions after a branch. NOPs can be inserted to avoid side effects, or the code can be structured to take advantage of this.
- Loads have 2 cycles of latency, but the processor does not stall these automatically. Accessing a load destination register within two instructions will not return the correct value.
Each core has a four stage pipeline:
- Instruction fetch Issues the address of the next instruction to local memory. Local memory is dual ported, and the instruction fetch stage has its own read port. Local memory has one cycle of latency, so the instruction is available in the next stage.
- Instruction decode Issues register addresses to the register file and decodes the immediate operand. The register file has one cycle of latency, so the values are available in the next stage.
- Execute Bypasses register results from later stages of pipeline. Detects branches. Performs arithmetic. Issues read/write requests to data memory (which can either be local or shared memory). Reads from memory have one cycle of latency and are available in the next stage.
- Writeback Selects between arithmetic or memory result, depending on instruction type. Signals writeback to register file.
There is a shared stall signal which disables clocks to the flops between each stage when asserted. The pipeline asserts the stall signal when it is attempting to access to shared memory (read or write) and the core_enable signal is not asserted.
15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | ||
Arithmetic | 0 | 0 | 0 | operation | opb | opa | dest | ||||||||||
Load | 0 | 0 | 1 | offset | ptr | dest | |||||||||||
Store | 0 | 1 | 0 | offset h | src | ptr | offset l | ||||||||||
Addi | 0 | 1 | 1 | immediate | opa | dest | |||||||||||
Lui | 1 | 0 | 0 | immediate | dest | ||||||||||||
Conditional branch | 1 | 0 | 1 | cond | offset | ||||||||||||
Unconditional branch | 1 | 1 | 0 | link | offset | ||||||||||||
Jump to reg | 1 | 1 | 1 | link | unused | target | unused |
Name | Params | Instruction Format | Flags Affected | operation/cond/ | Description |
---|---|---|---|---|---|
and | dest, opa, opb | Arithmetic | NZ | 0 | Bitwise logical and |
or | dest, opa, opb | Arithmetic | NZ | 1 | Bitwise logical or |
shl | dest, opa | Arithmetic | CNZ | 2 | Logical shift left one position |
shr | dest, opa | Arithmetic | CNZ | 3 | Logical shift right one position |
add | dest, opa, opb | Arithmetic | CNZO | 4 | Add without carry |
sub | dest, opa, opb | Arithmetic | CNZO | 5 | Subtract without carry |
xor | dest, opa, opb | Arithmetic | NZ | 6 | Bitwise logical exclusive or |
not | dest, opa | Arithmetic | NZ | 7 | Bitwise logical not |
rol | dest, opa | Arithmetic | NZ | 10 | Rotate left (carry bit loaded into LSB) |
ror | dest, opa | Arithmetic | NZ | 11 | Rotate right (carry bit loaded into MSB) |
adc | dest, opa, opb | Arithmetic | CNZO | 12 | Add with carry in |
sbc | dest, opa, opb | Arithmetic | CNZO | 13 | Subtract with carry[borrow] in |
load | dest, [offset](ptr) | Load | Load word | ||
store | src, [offset](ptr) | Store | Store word | ||
addi | dest, opa, immediate | Addi | NZ1 | Add signed immediate value -31 to 31 | |
lui | dest, immediate | Lui | Load upper immediate. Value is loaded into top 10 bits of dest. Low 6 bits are cleared. | ||
jump | label | Unconditional branch | link=0 | Jump to label | |
call | label | Unconditional branch | link=1 | Call to label (return address saved in r7) | |
jumpr | reg | Jump to reg | link=0 | Jump to address in register | |
callr | reg | Jump to reg | link=1 | Call to address in register (return address saved in r7) | |
bcc | label | Cond branch | 6 | Branch if carry flag clear | |
bcs | label | Cond branch | 2 | Branch if carry flag set | |
bzc | label | Cond branch | 4 | Branch if zero flag clear | |
bzs | label | Cond branch | 0 | Branch if zero flag set | |
bnc | label | Cond branch | 5 | Branch if negative flag clear | |
bns | label | Cond branch | 1 | Branch if negative flag set | |
boc | label | Cond branch | 7 | Branch if overflow flag clear | |
bos | label | Cond branch | 3 | Branch if overflow flag set |
- This instruction should also update C and O flags, but that currently is not hooked up.
ldi | immediate | Load 16-bit immediate into register (creates LDI/ADDI pair) |
nop | No-operation (and r0, r0, r0) | |
lea | label | Load effective address of label into register |
+------------------+ 0000 | Boot ROM | +------------------+ 0010 | Local memory | +------------------+ 4000 | Shared memory | +------------------+ FC00 | Device Registers | +------------------+ FFFF
The processor uses a Harvard architecture, with separate buses for instructions and data. The address spaces are mostly the same, except the instruction bus cannot access shared memory or device registers.
When the cores come out of reset, each one starts executing code at address 0 in their local address space. A small ROM bootloader at that location copies the main program from shared memory into local memory and then executes it. This is necessary since the cores cannot execute directly from shared memory.
Hardware mutexes are located at addresses 0xfffd and 0xfffe. Writing a one to a hardware mutex location attempts to acquire it and writing a zero releases it if it is held by the owning core. A core may read the location to determine if it has acquired the mutex: it returns one if so, zero if not. For example:
ldi r0, -2 # semaphore address (0xfffd)
ldi r1, 1 # Value to store
spinlock: store r1, (r0) # Try to acquire semaphore
load r3, (r0) # Did we get it?
nop # Delay to wait for load result
nop
and r3, r3, r3 # Set condition code based on value
bzs spinlock # If the result was zero (no success), busy wait