

Qupls3

Robert Finch

## **Table of Contents**

| Overview                          | 13                           |
|-----------------------------------|------------------------------|
| Motivation                        | 13                           |
| History                           | 13                           |
| Features of Qupls3                | 14                           |
| Programming Model                 | 15                           |
| Register File – Visible Registers | 15                           |
| Register File – Hidden Registers  | 15                           |
| Physical Registers                | 16                           |
| Code Address (Branch) Registers   | 16                           |
| Condition Registers – CR0 to CR7  | 16                           |
| Bound Registers                   | 17                           |
| Special Purpose Registers         | 18                           |
| SR - Status Register (CSR 0x?004) | 18                           |
| SC - Stack Canary (GPR nn)        | 19                           |
| [U/S/H/M]_IE (0x?004)             | 19                           |
| [U/S/H/M]_CAUSE (CSR- 0x?006)     | 19                           |
| [U/S/H/M]_SCRATCH - CSR 0x?041.   | 20                           |
| S_ASID (CSR 0x101F)               | 20                           |
| S_KEYS (CSR 0x1020 to 0x1027)     | Error! Bookmark not defined. |
| M_CORENO (CSR 0x3001)             | 21                           |
| M_TICK (CSR 0x3002)               | 21                           |
| M_SEED (CSR 0x3003)               | 21                           |
| M_CBA (CSR 0x3005)                | 21                           |
| M_BADADDR (CSR 0x3007)            | 22                           |
| M_BAD_INSTR (CSR 0x300B)          | 22                           |
| M_SEMA (CSR 0x300C)               | 22                           |
| M_BOUND (CSR 0x3010 to CSR 0x30   | 013)22                       |
| M_TVEC - CSR 0x3030 to 0x3034     | 22                           |

|    | M_SR_STACK (CSR 0x3080 to CSR 0x3087)              | 23 |
|----|----------------------------------------------------|----|
|    | M_MC_STACK (CSR 0x3090 to CSR 0x3097)              | 23 |
|    | M_IOS – IO Select Register (CSR 0x3100)            | 23 |
|    | M_CFGS – Configuration Space Register (CSR 0x3101) | 23 |
|    | M_EPC (CSR 0x3108 to 0x310F)                       | 23 |
| Op | erating Modes                                      | 24 |
| Ex | ceptions                                           | 25 |
|    | External Interrupts                                | 25 |
|    | Effect on Machine Status                           | 25 |
|    | Exception Stack                                    | 25 |
| ,  | Vector Table                                       | 25 |
|    | Breakpoint Fault (0)                               | 27 |
|    | Single Step Breakpoint (1)                         | 27 |
|    | Bus Error Fault (2)                                | 27 |
|    | Address Error (3)                                  | 27 |
|    | Unimplemented Instruction Fault (4)                | 27 |
|    | Page Fault (6)                                     | 27 |
|    | Instruction Trace Fault (7)                        | 27 |
|    | Stack Canary Fault (8)                             | 27 |
|    | Abort (9)                                          | 27 |
|    | Interrupt (10)                                     | 27 |
|    | Reset Vector (12)                                  | 27 |
|    | Alternate Cause (13)                               | 28 |
|    | User / App Environment Call (16)                   | 28 |
|    | Supervisor Environment Call (17)                   | 28 |
|    | Hypervisor Environment Call (18)                   | 28 |
|    | Machine Environment Call (19)                      | 28 |
|    | TRAP (20)                                          | 28 |
|    | Bound (21)                                         | 28 |

|    | Reset                                                        | 28 |
|----|--------------------------------------------------------------|----|
|    | Precision                                                    | 28 |
| lr | struction Set                                                | 30 |
|    | Overview                                                     | 30 |
|    | Code Alignment                                               | 30 |
|    | Root Opcode                                                  | 30 |
|    | Destination Register Spec                                    | 30 |
|    | Source Register Spec                                         | 30 |
|    | Constant Field Spec                                          | 31 |
|    | Table of Constant Location Bits – L, LX <sub>2</sub>         | 31 |
|    | Instruction Format Tables                                    | 32 |
|    | Compare Instruction Format                                   | 32 |
|    | Branch Instruction Formats                                   | 32 |
|    | Load and Store Instruction Formats                           | 32 |
|    | ALU Instruction Formats                                      | 32 |
|    | Shift Instruction Formats                                    | 33 |
|    | CSR Instruction Formats                                      | 34 |
|    | BRK / SYS Instruction Formats                                | 34 |
|    | Macro Instruction Formats                                    | 34 |
|    | MOV Instruction Format                                       | 35 |
|    | Exception Triggering Instruction Formats                     | 35 |
|    | Instruction Pres/Postfixes and Modifiers Instruction Formats | 35 |
|    | Condition Register Manipulation Instruction Formats          | 35 |
|    | Table of Root Opcodes                                        | 36 |
|    | Instruction Descriptions                                     | 36 |
|    | ABS[.] – Absolute Value                                      | 37 |
|    | ADD[.] – Add                                                 | 38 |
|    | ADB[.] - Add Immediate to Branch Register                    | 39 |
|    | ADC[.] – Add with Carry                                      | 40 |

|   | CNTLO[.] – Count Leading Ones                                        | 41 |
|---|----------------------------------------------------------------------|----|
|   | CNTLZ[.] – Count Leading Zeros                                       | 41 |
|   | CNTPOP[.] - Count Population                                         | 42 |
|   | CNTTZ[.] – Count Trailing Zeros                                      | 42 |
|   | CSR[.] – Control and Special Registers Operations                    | 43 |
|   | LOADA[.] - Load Address                                              | 44 |
|   | SBC[.] – Subtract with Carry                                         | 45 |
|   | SUBF[.] – Subtract From                                              | 46 |
| ٨ | fultiply and Divide                                                  | 47 |
|   | DIV[.] – Signed Division                                             | 47 |
|   | DIVA[.] – Address Division                                           | 48 |
|   | MUL[.] – Multiply                                                    | 49 |
|   | MULA[.] – Multiply for Addressing                                    | 50 |
| S | hift and Rotatehift and Rotate                                       | 51 |
|   | SLL[.] -Shift Left Logical                                           | 51 |
|   | SRA[.] –Shift Right Arithmetic                                       | 52 |
|   | SRL[.] –Shift Right Logical                                          | 53 |
| L | ogical Operations                                                    | 54 |
|   | AND[.] – Bitwise And                                                 | 54 |
|   | XOR[.] – Bitwise Exclusive Or                                        | 55 |
|   | OR[.] – Bitwise Or                                                   | 56 |
|   | CHK – Check Register Against Bounds                                  | 57 |
| С | Pata Movement                                                        | 59 |
|   | MOVE[.] / MOVEA[.] / MOVSZ[.] / MOVZX[.] – Move Register to Register | 59 |
| L | oad / Store Instructions                                             | 61 |
|   | Overview                                                             | 61 |
|   | Addressing Modes                                                     | 61 |
|   | LDB[.] Rn, <ea> - Load Byte</ea>                                     | 62 |
|   | LDBZ[.] Rn, <ea> - Load Byte and Zero Extend</ea>                    | 62 |

|   | LDT[.] Rn, <ea> - Load Tetra</ea>                  | . 63 |
|---|----------------------------------------------------|------|
|   | LDTZ[.] Rn, <ea> - Load Tetra and Zero Extend</ea> | .63  |
|   | LDW[.] Rn, <ea> - Load Wyde</ea>                   | .64  |
|   | LDWZ[.] Rn, <ea> - Load Wyde and Zero Extend</ea>  | .64  |
|   | LOAD[.] Rn, <ea> - Load</ea>                       | . 65 |
|   | STB Rn, <ea> - Store Byte</ea>                     | .66  |
|   | STBI Rn, <ea> - Store Byte Immediate</ea>          | .66  |
|   | STORE Rn, <ea> - Store Register</ea>               | .67  |
|   | STOREI N, <ea> - Store Immediate</ea>              | .67  |
|   | STT Rn, <ea> - Store Tetra</ea>                    | .68  |
|   | STTI Rn, <ea> - Store Tetra Immediate</ea>         | .68  |
|   | STW Rn, <ea> - Store Wyde</ea>                     | . 69 |
|   | STWI Rn, <ea> - Store Wyde Immediate</ea>          | .69  |
| С | ondition Register Instructions                     | .70  |
|   | CLC - Clear Carry                                  | .70  |
|   | CLV - Clear Overflow                               | .70  |
|   | CRAND – Bit And                                    | .71  |
|   | CRANDC – Bit And with Complement                   | .72  |
|   | CROR – Bit Or                                      | .73  |
|   | CRXOR – Bit Exclusive Or                           | .74  |
|   | SEC – Set Carry                                    | .75  |
|   | SEV – Set Overflow                                 | .75  |
| 3 | ranch / Flow Control Instructions                  | .76  |
|   | Overview                                           | .76  |
|   | Conditional Branch Format                          | .76  |
|   | Branch Target                                      | .77  |
|   | Decrementing Branches                              | .78  |
|   | Unconditional Branches                             | .78  |
|   | B – Branch Always                                  | .79  |

| BAND –Branch if And80                         |
|-----------------------------------------------|
| BANDL –Branch if And and Link80               |
| BCC –Branch if Carry Clear81                  |
| BCCL –Branch if Carry Clear and Link81        |
| BCS – Branch if Carry Set82                   |
| BCSL –Branch if Carry Set and Link82          |
| BEQ –Branch if Equal83                        |
| BEQL –Branch if Equal and Link83              |
| BGE –Branch if Greater Than or Equal83        |
| BGEL –Branch if Less Than or Equal and Link84 |
| BGT –Branch if Greater Than84                 |
| BGTL –Branch if Less Than or Equal and Link84 |
| BL – Branch and Link86                        |
| BLR – Branch to Link Register87               |
| BLRL – Branch to Link Register and Link88     |
| BLE –Branch if Less Than or Equal89           |
| BLEL –Branch if Less Than or Equal and Link89 |
| BLT –Branch if Less Than90                    |
| BLTL –Branch if Less Than and Link90          |
| BNAND –Branch if Nand91                       |
| BNANDL –Branch if Nand and Link91             |
| BNOR –Branch if Nor92                         |
| BNORL –Branch if Nor and Link92               |
| BNE –Branch if Not Equal93                    |
| BNEL –Branch if Not Equal and Link93          |
| BOR –Branch if Or94                           |
| BORL –Branch if Or and Link94                 |
| BVS –Branch if Overflow Set95                 |
| BVSL –Branch if Overflow Set and Link96       |

|    | NOP – No Operation                    | 97           |
|----|---------------------------------------|--------------|
| 5  | System Instructions                   | 99           |
|    | BRK – Break                           | 99           |
|    | REX – Redirect Exception              | 100          |
|    | SYS – System Call                     | 102          |
|    | TRAPcc – Trap if Condition Met        | 103          |
|    | XJMP – Exchange JumpError! Bookmark r | not defined. |
| F  | Pre/Postfixes and Modifiers           | 105          |
|    | ATOM Modifier                         | 105          |
|    | QEXT Prefix                           | 106          |
|    | PFX[ABCD] – A/B/C/D Immediate Postfix | 107          |
|    | PRED Modifier                         | 108          |
| MΡ | PU Hardware                           | 110          |
| На | rdware Description                    | 110          |
| (  | Caches                                | 110          |
|    | Overview                              | 110          |
|    | Instructions                          | 110          |
|    | L1 Instruction Cache                  | 110          |
|    | Fetch Rate                            | 111          |
|    | Data Cache                            | 112          |
|    | Cache Enables                         | 112          |
|    | Cache Validation                      | 113          |
|    | Un-cached Data Area                   | 113          |
| F  | Return Address Stack Predictor (RSB)  | 113          |
| E  | Branch Predictor                      | 114          |
| E  | Branch Target Buffer (BTB)            | 114          |
| [  | Decode Logic                          | 114          |
| I  | Instruction Queue (ROB)               | 115          |
|    | Queue Rate                            | 116          |

|      | Sequence Numbers                 | 116 |
|------|----------------------------------|-----|
| Inpu | ut / Output Management           | 117 |
| D    | evice Configuration Blocks       | 117 |
| R    | eset                             | 117 |
| D    | evices Built into the CPU / MPU  | 117 |
| S    | ystem Devices                    | 117 |
| Exte | ernal Interrupts                 | 118 |
| 0    | verview                          | 118 |
| In   | iterrupt Messages                | 118 |
| In   | terrupt Controller               | 118 |
|      | Interrupt Vector Table           | 119 |
| In   | terrupt Group Filter             | 119 |
| In   | iterrupt Reflector               | 119 |
| In   | iterrupt Logger                  | 120 |
| Q    | upls3 Memory Management          | 121 |
|      | Overview                         | 121 |
|      | Page Table                       | 121 |
|      | PTE Format                       | 121 |
|      | Bounds                           | 121 |
| Q    | IC – Qupls3 Interrupt Controller | 122 |
|      | Overview                         | 122 |
|      | System Usage                     | 122 |
|      | Priority Resolution              | 122 |
|      | Config Space                     | 122 |
|      | Registers                        | 123 |
|      | Base Address Fields              | 124 |
|      | CPU Affinity Group Table         | 125 |
|      | Interrupt Enable Bits            | 125 |
|      | Interrupt Pending Bits           | 125 |

|   | Interrupt Vector Table      | 125 |
|---|-----------------------------|-----|
|   | QIT – Qupls3 Interval Timer | 126 |
|   | Overview                    | 126 |
|   | System Usage                | 126 |
|   | Config Space                | 126 |
|   | Parameters                  | 127 |
|   | Registers                   | 128 |
|   | Programming                 | 131 |
|   | Interrupts                  | 131 |
| ı | FTA Bus                     | 132 |
|   | Overview                    | 132 |
|   | Bus Tags                    | 132 |
|   | Single Cycle                | 132 |
|   | Retry                       | 133 |
|   | Signal Description          | 133 |
|   | Requests                    | 133 |
|   | Responses                   | 133 |
|   | Om                          | 134 |
|   | Cmd                         | 134 |
|   | BTE                         | 135 |
|   | CTI                         | 135 |
|   | Blen                        | 135 |
|   | Sz                          | 135 |
|   | Segment                     | 136 |
|   | TID                         | 136 |
|   | Cache                       | 136 |
|   | Message Signaled Interrupts | 137 |
| ( | Glossary                    | 138 |
|   | ABI                         | 138 |

| AMO                    | 138 |
|------------------------|-----|
| Assembler              | 138 |
| ATC                    | 138 |
| Base Pointer           | 138 |
| Burst Access           | 139 |
| BTB                    | 139 |
| Card Memory            | 139 |
| Commit                 | 139 |
| Decimal Floating Point | 140 |
| Decode                 | 140 |
| Diadic                 | 140 |
| DUT                    | 140 |
| Endian                 | 140 |
| FIFO                   | 140 |
| FPGA                   | 140 |
| Floating Point         | 141 |
| Frame Pointer          | 141 |
| HDL                    | 141 |
| HLL                    | 141 |
| Instruction Bundle     | 141 |
| Instruction Pointers   | 141 |
| Instruction Prefix     | 142 |
| Instruction Modifier   | 142 |
| ISA                    | 142 |
| IPI                    | 142 |
| JIT                    | 142 |
| Keyed Memory           | 142 |
| Linear Address         | 143 |
| Machine Code           | 143 |

|   | Milli-code                       | . 143 |
|---|----------------------------------|-------|
|   | Monadic                          | . 143 |
|   | MSI                              | . 143 |
|   | Opcode                           | . 143 |
|   | Operand                          | . 143 |
|   | Physical Address                 | . 144 |
|   | Physical Memory Attributes (PMA) | . 144 |
|   | PIC                              | . 144 |
|   | Posits                           | . 144 |
|   | Program Counter                  | . 144 |
|   | RAT                              | . 145 |
|   | Retire                           | . 145 |
|   | ROB                              | . 145 |
|   | RSB                              | . 145 |
|   | SIMD                             | . 145 |
|   | Stack Pointer                    | . 146 |
|   | Telescopic Memory                | . 146 |
|   | TLB                              | . 146 |
|   | Trace Memory                     | . 146 |
|   | Triadic                          | . 147 |
|   | Vector Chaining                  | . 147 |
|   | Vector Length (VL register)      | . 147 |
|   | Vector Mask (VM)                 | . 147 |
|   | Virtual Address                  | . 147 |
|   | Writeback                        | . 147 |
| Μ | iscellaneous                     | . 148 |
|   | Reference Material               | . 148 |
|   | Trademarks                       | . 149 |
| W | /ISHBONE Compatibility Datasheet | . 150 |

## Overview

Qupls3 is a thirty-two-bit processor.

The processor features 32, 32-bit integer registers and 32, 32-bit floating-point registers.

#### **Motivation**

The author desired a CPU core to experiment with cache-line constants and operating systems. He also wanted a core he could develop himself. Complexity is something the author must manage to get the project done and a flat 32-bit design is simple.

Good single thread performance is also a goal.

Having worked on Qupls for two years, the author realized that it did not have very good code density. Having a reasonably good code density is desirable as it is unknown where the CPU will end up. So, Qupls3 arrived and is a mix of the best from previous designs.

The CPU is also designed around the idea of using a simple compiler. Some operations like multiply and divide could have been left out and supported with software generated by a compiler rather than having hardware support. But I was after a simple compiler design. There's lots of room for expansion in the future. It could easily be adapted to a 64-bit design in part anticipating more than 4GB of memory available sometime down the road. A 64-bit architecture is doable in FPGA's today, although it uses two or more times the resources that a 32-bit design would.

## History

Qupls3 is a work in progress beginning March 2025. It is a major re-write from earlier versions. Thor which originated from RiSC-16 by Dr. Bruce Jacob. RiSC-16 evolved from the Little Computer (LC-896) developed by Peter Chen at the University of Michigan. The author has tried to be innovative with this design borrowing ideas from many other processing cores.

## Features of Qupls3

- Fixed 32-bit length instruction set
- Four way out-of-order superscalar operation
- Four operating modes, machine, hypervisor, supervisor, and user.
- 32-bit data path
- 16 (or more) entry re-order buffer
- Separate fixed-point integer and floating-point register files
- 32 general purpose registers
- Eight condition code registers
- Dedicated loop count register
- Eight branch registers
- Register renaming to remove dependencies.
- Standard suite of ALU operations, add subtract, compare, multiply and divide.
- Arithmetic right shift with rounding.
- Conditional branches with 13 effective displacement bits.
- 1024 Entry Three-way TLB shared between data and code.

# Programming Model

## Register File – Visible Registers

|     | Registers |   |  |     |           |   |
|-----|-----------|---|--|-----|-----------|---|
| tag | 31        | 0 |  | tag | 31        | 0 |
| 0   | zero      |   |  | 32  | f0 / zero |   |
| 1   | A0        |   |  | 33  | f1        |   |
| ••• | •••       |   |  | ••• | •••       |   |
| 8   | A7        |   |  | ••• |           |   |
| 9   | T0        |   |  | ••• |           |   |
| ••• | •••       |   |  | ••• |           |   |
| 18  | T9        |   |  | ••• |           |   |
| 19  | S0        |   |  | ••• |           |   |
| ••• | •••       |   |  | ••• |           |   |
| 28  | S9        |   |  | ••• | •••       |   |
| 29  | GP        |   |  | 61  |           |   |
| 30  | FP        |   |  | 62  | f30       |   |
| 31  | SP        |   |  | 63  | f31       |   |

| tag    | Branch / Link Registers |  |  |
|--------|-------------------------|--|--|
| 72     | BR0                     |  |  |
| 73     | BR1                     |  |  |
| 74 BR2 |                         |  |  |
| 75     | BR3                     |  |  |
| 76     | BR4                     |  |  |
| 77     | BR5                     |  |  |
| 78     | BR6                     |  |  |
| 79     | PC                      |  |  |

| 88 | LC |  |
|----|----|--|
|    | -  |  |
|    |    |  |
|    |    |  |

|    | Condition Registers |     |  |  |  |  |
|----|---------------------|-----|--|--|--|--|
| 80 | CR0                 |     |  |  |  |  |
| 81 |                     | CR1 |  |  |  |  |
| 82 |                     | CR2 |  |  |  |  |
| 83 |                     | CR3 |  |  |  |  |
| 84 |                     | CR4 |  |  |  |  |
| 85 |                     | CR5 |  |  |  |  |
| 86 |                     | CR6 |  |  |  |  |
| 87 |                     | CR7 |  |  |  |  |

## Register File – Hidden Registers

| tag | Micro-Code Support | tag |               |
|-----|--------------------|-----|---------------|
| 68  | MC0                | 64  | User SP       |
| 69  | MC1                | 65  | Supervisor SP |
| 70  | MC2                | 66  | Hypervisor SP |

| 71 | MC3   |     | 67 | Machine SP |
|----|-------|-----|----|------------|
| 89 | MLR   |     |    |            |
| 91 |       | MPC | 90 | СВ         |
|    | Bound |     |    |            |
|    | BN0   |     |    |            |
|    | BN1   |     |    |            |
|    | BN2   |     |    |            |
|    | BN3   |     |    |            |

## **Physical Registers**

There are 256 general purpose physical registers in the CPU. This provides rename coverage for the 96 logical registers in the design. On average there are 2.6 renamed registers available for every register.

## Code Address (Branch) Registers

Many architectures have registers dedicated to addressing code. Almost every modern architecture has a program counter or instruction pointer register to identify the location of instructions. Many architectures also have at least one link register or return address register holding the address of the next instruction after a subroutine call. There are also dedicated branch address registers in some architectures. These are all code addressing registers.

| Regno | ABI  | Encode | ABI Usage                             |
|-------|------|--------|---------------------------------------|
| 72    | Zero | 0      | No linkage (read-only)                |
| 73    | BR1  | 1      | Link register #1                      |
| 74    | BR2  | 2      | Link register #2                      |
| 75    | BR3  | 3      | Link register #3                      |
| 76    | BR4  | 4      | Link register #4                      |
| 77    | BR5  | 5      | Link register #5                      |
| 78    | BR6  | 6      | Link register #6                      |
| 79    | PC   | 7      | Program counter reference (read-only) |

It is possible to do an indirect method call using any register.

## Condition Registers – CR0 to CR7

Register tags 80 to 87 are reserved for condition results for the compare (<u>CMP</u>) and branch instructions. The low order eight bits of the register typically contain a bit vector representing the results of a comparison operation.

Restricting the CMP and branch instructions to just eight registers conserves opcode bits. Since compares are almost always followed directly by branches, there is not a need for a lot of registers.

| 7 | 6  | 5  | 4  | 3  | 2   | 1    | 0    |
|---|----|----|----|----|-----|------|------|
| ~ | OF | CA | LE | LT | NOR | NAND | XNOR |
|   | UN |    |    |    |     |      | EQ   |

| Bit |                                              |                                                         |
|-----|----------------------------------------------|---------------------------------------------------------|
| 0   | EQ / XNOR                                    | Set if bitwise XNOR of operands is true, equal          |
| 1   | NAND Set if logical NAND of operands is true |                                                         |
| 2   | NOR                                          | Set if logical NOR of operands is true                  |
| 3   | LT                                           | Set if less than                                        |
| 4   | LE                                           | Set if less than or equal                               |
| 5   | CA                                           | Carry out from operation (addition, subtraction, shift) |
| 6   | OF/UN                                        | Overflow status or unordered for floating-point         |
| 7   |                                              |                                                         |

## **Bound Registers**

The bound registers (M\_BOUND) define lower and upper bounds within which a memory reference is restricted for user or applications software. The top eighteen bits of an address are compared against the values in the bounds register. All bound registers are checked in parallel. The virtual address must be less than the limit specified in the bounds register. Memory access will not be granted unless it is within range of at least one of the bound registers.

| 31 |       | 16 | 15 |      | 0 |
|----|-------|----|----|------|---|
|    | Limit |    |    | Base |   |

There are multiple bound registers to allow a program to occupy different areas of memory. For instance, a program may have separate code, data, and stack areas.

## Special Purpose Registers

### SR - Status Register (CSR 0x?004)

The processor status register holds bits controlling the overall operation of the processor, state that needs to be saved and restored across interrupts. The bits have individual bit set / clear capability using the CSRRS, CSRRC instructions. Only the user interrupt enable bit is available in user mode, other bits will read as zero.

| Bit      |       | Usage                       |
|----------|-------|-----------------------------|
| 0        | uie   | User interrupt enable       |
| 1        | sie   | Supervisor interrupt enable |
| 2        | hie   | Hypervisor interrupt enable |
| 3        | mie   | Machine interrupt enable    |
| 4        | die   | Debug interrupt enable      |
| 5 to 10  | ipl   | Interrupt level             |
| 11       | ssm   | Single step mode            |
| 12       | te    | Trace enable                |
| 13 to 14 | om    | Operating mode              |
| 15 to 16 | ps    | Pointer size                |
| 17       |       | reserved                    |
| 18       | dbg   | Debug mode                  |
| 19       | mprv  | memory privilege            |
| 20 to 22 | Swstk | Software stack              |
| 23       |       | reserved                    |
| 24 to 31 | cpl   | Current privilege level     |

CPL is the current privilege level the processor is operating at.

T indicates that trace mode is active.

OM processor operating mode.

PS: indicates the size of pointers in use. This may be one of 32, 64 or 128 bits.

IPL is the interrupt mask level, 0=all interrupts allowed, 63=nmi only

MPRV Memory Privilege, indicates to use previous operating mode for memory privileges

SWSTK: indicates which OS is running.

SSM: 1=step processor one instruction at a time. Trigger debugger after instruction executes.

### SC - Stack Canary (GPR nn)

This special purpose register is available in the general register file as register nn. The stack canary register is used to alleviate issues resulting from buffer overflows on the stack. The canary register contains a random value which remains consistent throughout the run-time of a program. In the right conditions, the canary register is written to the stack during the function's prolog code. In the function's epilog code, the value of the canary on stack is checked to ensure it is correct, if not a check exception occurs.

#### [U/S/H/M]\_IE (0x?004)

See status register.

This register contains interrupt enable bits. The register is present at all operating levels. Only enable bits at the current operating level or lower are visible and may be set or cleared. Other bits will read as zero and ignore writes. Only the lower four bits of this register are implemented. The bits have individual bit set / clear capability using the CSRRS, CSRRC instructions.

|   | 63 | 4 | 3   | 2   | 1   | 0   |  |
|---|----|---|-----|-----|-----|-----|--|
| Ī | •  |   | mie | hie | sie | uie |  |

## [U/S/H/M]\_CAUSE (CSR- 0x?006)

This register contains a code indicating the cause of an exception or interrupt. The break handler will examine this code to determine what to do. Only the low order 16 bits are implemented. The high order bits read as zero and are not updateable. The info field, filled in by hardware, may supply additional information related to the exception.

| 63 | 16 | 15      | 8 | 7   | 0   |
|----|----|---------|---|-----|-----|
| ,  |    | Info Ca |   | Cau | ıse |

## [U/S/H/M]\_SCRATCH - CSR 0x?041

This is a scratchpad register. Useful when processing exceptions. There is a separate scratch register for each operating mode.

## S\_ASID (CSR 0x101F)

This register contains the address space identifier (ASID) or memory map index (MMI). The ASID is used in this design to select (index into) a memory map in the paging tables. Only the low order sixteen bits of the register are implemented.

#### M\_CORENO (CSR 0x3001)

This register contains a number that is externally supplied on the coreno\_i input bus to represent the hardware thread id or the core number. It should be non-zero.

#### M\_TICK (CSR 0x3002)

This register contains a tick count of the number of clock cycles that have passed since the last reset. Note that this register should not be used for precise timing as the processor's clock frequency may vary for performance and power reasons. The TIME CSR may be used for wall-clock timing as it has its own timing source.

#### M\_SEED (CSR 0x3003)

This register contains a random seed value based on an external entropy collector. The most significant bit of the state is a busy bit.

| 63 60              | 59 |                        | 16 | 15  | 0               |
|--------------------|----|------------------------|----|-----|-----------------|
| State <sub>4</sub> |    | <b>~</b> <sub>44</sub> |    | see | d <sub>16</sub> |

| State <sub>4</sub> |                                                   |
|--------------------|---------------------------------------------------|
| Bit                |                                                   |
| 0                  | dead                                              |
| 1                  | test                                              |
| 2                  | valid, the seed value is valid                    |
| 3                  | Busy, the collector is busy collecting a new seed |
|                    | value                                             |

#### M\_TCBA (CSR 0x3005)

This register contains the address of the currently active task control block, in which the CPU's context will be saved on a context switch. Context switching may be done with the exchange jump XJMP instruction. Only the upper 50-bits may be set; the lower 14 bits of the address are always zero.

| 63 |                                  | 14 | 13 |   | 0 |
|----|----------------------------------|----|----|---|---|
|    | Context Block Page <sub>50</sub> |    |    | ~ |   |

#### M\_BADADDR (CSR 0x3007)

This register contains the address for a load / store operation that caused a memory management exception or a bus error. Note that the address of the instruction causing the exception is available in the EIP register.

#### M\_BAD\_INSTR (CSR 0x300B)

This register contains a copy of the exceptioned instruction.

### M\_SEMA (CSR 0x300C)

This register contains semaphores. The semaphores are shared between all cores in the MPU.

#### M\_BOUND (CSR 0x3010 to CSR 0x301F)

This set of sixteen registers establishes boundaries within which a program may operate. Only address bits 14 to 27 of the address (the memory page number) are compared to the bounds. A program address (instruction or data) which is out of bounds will cause a bounds exception. The four MSBs of the virtual memory address are not considered as they select the bounds register. The test system has 49152 pages of memory (under 1GB).

|        |       | 31 | 16    | 15 |      | 0 |
|--------|-------|----|-------|----|------|---|
| 0x3010 | PBL0  | ~2 | Limit | ~2 | Base |   |
| 0x3011 | PBL1  | ~2 | Limit | ~2 | Base |   |
|        | •••   | ~2 |       | ~2 |      |   |
| 0x301F | PBL15 | ~2 | Limit | ~2 | Base |   |

#### M TVEC - CSR 0x3030 to 0x3034

These registers contain the address of the exception handler table for a given operating mode. TVEC[0] to TVEC[2] are used by the REX instruction.

A sync instruction should be used after modifying one of these registers to ensure the update is valid before continuing program execution.

| Reg#   |                           |
|--------|---------------------------|
| 0x3030 | TVEC[0] – user mode       |
| 0x3031 | TVEC[1] - supervisor mode |
| 0x3032 | TVEC[2] – hypervisor mode |
| 0x3033 | TVEC[3] – machine mode    |
| 0x3034 | TVEC[4] - debug           |

#### M\_SR\_STACK (CSR 0x3080 to CSR 0x3087)

This set of registers contains a stack of the status register which is pushed during exception processing and popped on return from interrupt. There are only eight slots as that is the maximum nesting depth for interrupts.

#### M\_MC\_STACK (CSR 0x3090 to CSR 0x3097)

This set of registers is a stack for the micro-code instruction register (MCIR) and the micro-code instruction pointer (MCIP). MCIR and MCIP need to be retained through exception processing.

Bits 52 to 63 of the register contain the MCIP. Bits 0 to 51 contain the MCIR.

#### M\_IOS - IO Select Register (CSR 0x3100)

The location of IO is determined by the contents of the IOS control register. The select is for a 1MB region. This address is a virtual address. The low order 13 bits of this register should be zero and are ignored.

| 63 |                                 | 13 | 12 |                 | 0 |
|----|---------------------------------|----|----|-----------------|---|
|    | Virtual Address <sub>6720</sub> |    |    | 0 <sub>13</sub> |   |

## M\_CFGS – Configuration Space Register (CSR 0x3101)

The location of configuration space is determined by the contents of the CFGS control register. The select is for a 256MB region. This address is a virtual address. The low order 12 bits of this address are assumed to be zero. The default value of this registers is \$FF...FD000

| 63 |                                 | 0 |
|----|---------------------------------|---|
|    | Virtual Address <sub>7512</sub> |   |

### M\_EPC (CSR 0x3108 to 0x310F)

This set of registers contains the address stack for the program counter used in exception handling.

| Reg#   | Name |
|--------|------|
| 0x3108 | EPC0 |
|        |      |

0x310F EPC7

## **Operating Modes**

The core operates in one of four basic modes: application/user mode, supervisor mode, hypervisor mode or machine mode. Machine mode is switched to when an interrupt or exception occurs, or when debugging is triggered. On power-up the core is running in machine mode. An RFI instruction must be executed to leave machine mode after power-up.

Most modern OSs require at least two modes of operation, a user mode, and a more secure system mode. It can be advantageous to have more operating modes as it eases the software implementation when dealing with multiple operating systems running on the same machine at the same time.

A subset of instructions is limited to machine mode.

| Mode Bits | Mode            |
|-----------|-----------------|
| 0         | User / App      |
| 1         | Supervisor      |
| 2         | Hypervisor      |
| 3         | Machine / Debug |

Each operating mode has its own vector table. Different sets of CSR registers are visible to each operating mode.

## Exceptions

## **External Interrupts**

There is little difference between an externally generated exception and an internally generated one. An externally caused exception will set the exception cause code for the currently fetched instruction. A hardware interrupt displaces the instruction at the point the interrupt occurred with a special TRAP instruction.

There are sixty-four priority interrupt levels for external interrupts. When an external interrupt occurs the mask level is set to the level of the current interrupt. A subsequent interrupt must exceed the mask level to be recognized.

#### **Effect on Machine Status**

The operating mode is always switched to machine mode on exception. It is up to the machine mode code to redirect the exception to a lower operating mode when desired. Further exceptions at the same or lower interrupt level are disabled automatically. Machine mode code must enable interrupts at some point.

### **Exception Stack**

The status register and instruction pointer are quickly pushed onto an internal stack when an exception occurs. This stack is at least 8 entries deep to allow for nested interrupts and multiply nested traps and exceptions. The stack pointer is also switched to one corresponding to the machine's operating mode. A hardware interrupt will also cause the stack pointer to change to one specific to the interrupt level.

#### **Vector Table**

The machine mode kernel vector is always used to locate the exception routine. The exception routine may then redirect the exception to a lower operating mode using the REX instruction. When an exception occurs the CPU just jumps to the entry in the vector table. The entry should contain a branch instruction to the exception handler.

| Cause Code | Usage                          |
|------------|--------------------------------|
| 0          | Debug Breakpoint (BRK)         |
| 1          | Debug breakpoint – single step |
| 2          | Bus Error                      |
| 3          | Address Error                  |

| 4        | Unimplemented Instruction   |
|----------|-----------------------------|
| 5        | Privilege Violation         |
| 6        | Page fault                  |
| 7        | Instruction trace           |
| 8        | Stack Canary                |
| 9        | Abort                       |
| 10       | Interrupt                   |
| 11       | Non-maskable interrupt      |
| 12       | Reset                       |
| 13       | Alternate Cause > 31        |
| 14, 15   | Reserved                    |
| 16       | User / App Environment call |
| 17       | Supervisor Environment call |
| 18       | Hypervisor environment call |
| 19       | Machine environment call    |
| 20       | Trap                        |
| 21       | Bound                       |
| 22 to 31 | reserved                    |

## Applications Usage

| 32 to 63  | reserved                           |
|-----------|------------------------------------|
| 64        | Divide by zero                     |
| 65        | Overflow                           |
| 66        | Table Limit                        |
| 67 to 251 | Unassigned usage                   |
| 252       | Reset value of stack pointer       |
| 253       | Reset value of instruction pointer |
| 254, 255  | Reserved                           |

#### Breakpoint Fault (0)

The breakpoint instruction, 0, was encountered.

#### Single Step Breakpoint (1)

This fault is performed at the end of a single step operation. Single stepping is turned off so that a debugger may begin processing.

#### Bus Error Fault (2)

The bus error fault is performed if the bus error signal was active during the bus transaction. This could be due to a bad or missing device.

#### Address Error (3)

This fault will occur if an instruction address does not have the two LSBs equal to zero.

#### Unimplemented Instruction Fault (4)

An unimplemented instruction causes this fault.

#### Page Fault (6)

The page table walker was unable to find a valid translation for the virtual address.

## Instruction Trace Fault (7)

An instruction trace was triggered. This fault requires the Trace module to be present.

## Stack Canary Fault (8)

This fault is caused if the stack canary was overwritten. A load instruction using the canary register did not match the value in the canary register.

## Abort (9)

The external abort input signal was asserted.

### Interrupt (10)

The external interrupt signal was asserted, and the interrupt level was greater than the current mask level.

### Reset Vector (12)

This vector is the address that the processor begins running at.

#### Alternate Cause (13)

The alternate cause vector is jumped to if the cause code is greater than 31.

#### User / App Environment Call (16)

This fault is triggered when a system call instruction is executed while in User / App mode.

#### Supervisor Environment Call (17)

This fault is triggered when a system call instruction is executed while in Supervisor mode.

### Hypervisor Environment Call (18)

This fault is triggered when a system call instruction is executed while in Hypervisor mode.

#### Machine Environment Call (19)

This fault is triggered when a system call instruction is executed while in Machine mode.

#### TRAP (20)

This fault is triggered by the <u>TRAP</u> instruction when the trap condition is met.

### Bound (21)

This fault is triggered if an address is out of bounds as set by the bound registers.

#### Reset

Reset is treated as an exception. The reset routine should exit using an RFI instruction. The status register should be setup appropriately for the return.

The core begins executing instructions at the address defined by the reset vector in the exception table. At reset the exception table is set to the last 256 bytes of memory \$FF...FFC00. All registers are in an undefined state.

#### Precision

Exceptions in Qupls3 are precise. They are processed according to program order of the instructions. If an exception occurs during the execution of an instruction, then an exception field is set in the pipeline buffer. The exception is processed when the instruction commits which happens in program order. If the instruction was

executed in a speculative fashion, then no exception processing will be invoked unless the instruction makes it to the commit stage.

## Task Support

## Task Control Block / Context Block Layout

Context blocks have the following memory layout:

| Word Number | Registers                         |
|-------------|-----------------------------------|
| 0           | Not used, reserved                |
| 0 to 30     | General purpose registers 0 to 30 |
| 31          | Safe stack pointer                |
| 32 to 63    | Floating-point registers          |
| 64          | User stack pointer                |
| 65          | Supervisor stack pointer          |
| 66          | Hypervisor stack pointer          |
| 67          | Machine stack pointer             |
| 68 to 71    | Micro-code temporaries #0 to #3   |
| 72 to 78    | Branch registers                  |
| 79          | Program Counter                   |
| 80 to 87    | Condition Registers               |
| 88          | Loop Counter                      |
| 89          | micro-code link register          |
| 90          | Context block address register    |
| 91          | Micro-code program counter        |
| 92 to 95    | reserved                          |
|             | CSR registers                     |
| 96          | Status register                   |
| 97          | Program Base and Limit register   |

The context block is aligned with a 16kB memory page. The processor's registers are stored beginning with the first word of the context block. The first 256 words of the context area are for CPU register file use. Space after the first 1k words of the context block may be used by the OS. Context blocks may be larger than 16kB depending on what information is stored in them. For instance, the Femtiki OS buffers the text mode video screen for the app in the context block.

## Instruction Set

#### Overview

Qupls3 is a fixed length instruction set with lengths of 32-bits. There are several different classes of instructions including arithmetic, memory operate, branch, floating-point and others.

## Code Alignment

Program code may be relocated at any tetra-byte (4 byte) address. However, within a subroutine code should be contiguous.

## **Root Opcode**

The root opcode determines the class of instructions executed. Some commonly executed instructions are also encoded at the root level to make more bits available for the instruction. The root opcode is always present in all instructions as bits zero to five of the instruction.

|   |        |                         |    |                  |     | ▼  |
|---|--------|-------------------------|----|------------------|-----|----|
| L | $LX_2$ | Immediate <sub>12</sub> | Cr | Rs1 <sub>5</sub> | Rd₅ | 46 |

## **Destination Register Spec**

Most instructions have a destination register. The register spec for the destination register is always in the same position, bits 6 to 10 of an instruction.

|   |        |                         |    |      | ▼   |    |
|---|--------|-------------------------|----|------|-----|----|
| L | $LX_2$ | Immediate <sub>12</sub> | Cr | Rs1₅ | Rd₅ | 46 |

## Source Register Spec

Most instructions have at least one source register. There may be as many a three source register specs. Please refer to individual instruction descriptions for the location of the source register specification fields.

|   |   |        |                         |    | ▼    |     |    |
|---|---|--------|-------------------------|----|------|-----|----|
| I | ٦ | $LX_2$ | Immediate <sub>12</sub> | Cr | Rs1₅ | Rd₅ | 46 |

## **Constant Field Spec**

Many instructions have constants associated with them. Constants may be embedded directly in the instruction, or they may occupy instruction words on the instruction cache line. Most instructions follow the same template for constants.

| • | ▼      | ▼                       |    |      |        |    |
|---|--------|-------------------------|----|------|--------|----|
| L | $LX_2$ | Immediate <sub>12</sub> | Cr | Rs1₅ | $Rd_5$ | 46 |

#### Format for Reference to Cache Line:

| • | ▼  | ▼                     |         |   |    |                  |     |    |
|---|----|-----------------------|---------|---|----|------------------|-----|----|
| 1 | 12 | <b>~</b> <sub>7</sub> | Offset₄ | 0 | Cr | Rs1 <sub>5</sub> | Rd₅ | 46 |

### Table of Constant Location Bits - L, LX<sub>2</sub>

| L | LX <sub>2</sub> | Location                                                                                                 |
|---|-----------------|----------------------------------------------------------------------------------------------------------|
| 0 | ?               | Value is constant encoded directly in instruction, LX <sub>2</sub> is top two bits of the constant field |
|   |                 | in the instruction or additional opcode bits                                                             |
| 1 | 0               | Value comes from register Rs2                                                                            |
| 1 | 1               | Value is 32-bit constant on the cache line                                                               |
| 1 | 2               | Value is 64-bit constant on the cache line                                                               |
| 1 | 3               | reserved                                                                                                 |

 $CL_3$  is a tetra index into the cache line locating the constant; it is an offset from the address of the instruction. Constants may be placed only in the last half of a cache line. Instructions must occupy the first half.

## **Instruction Format Tables**

## Compare Instruction Format

|       | 31 | 3029            | 28 17                   | 16 | 15 11             | 109 | 8 6  | 5 0            |
|-------|----|-----------------|-------------------------|----|-------------------|-----|------|----------------|
| CMP   | L  | $LX_2$          | Immediate <sub>12</sub> | 0  | Rs1₅              | 0   | CRd₃ | 36             |
| CMP   | 1  | 02              | Rs2 <sub>5</sub>        | 0  | Rs1₅              | 0   | CRd₃ | 36             |
| CMPA  | L  | LX <sub>2</sub> | Immediate <sub>12</sub> | 0  | Rs1₅              | 1   | CRd₃ | 36             |
| CMPA  | 1  | 02              | Rs2₅                    | 0  | Rs1₅              | 1   | CRd₃ | 36             |
| FCMPS | L  | LX <sub>2</sub> | Immediate <sub>12</sub> | 0  | FRs1₅             | 2   | CRd₃ | 36             |
| FCMPS | 1  | 02              | FRs2 <sub>5</sub>       | 0  | FRs1 <sub>5</sub> | 2   | CRd₃ | 3 <sub>6</sub> |
| FCMPD | L  | LX <sub>2</sub> | Immediate <sub>12</sub> | 1  | FRs1₅             | 2   | CRd₃ | 3 <sub>6</sub> |
| FCMPD | 1  | 02              | FRs2 <sub>5</sub>       | 1  | FRs1₅             | 2   | CRd₃ | 3 <sub>6</sub> |

## **Branch Instruction Formats**

|             | 31 | 3029            | 28               |                             |                      | 16 | 15                  | 11 | 10         | 6    | 5               | 0 |
|-------------|----|-----------------|------------------|-----------------------------|----------------------|----|---------------------|----|------------|------|-----------------|---|
| B[L]        | 0  | LX <sub>2</sub> |                  | Displacement <sub>223</sub> |                      |    |                     |    |            |      | 13 <sub>5</sub> |   |
| BLR[L]      | 1  | 02              | BRs <sub>3</sub> | Lim                         | nit <sub>143</sub>   | ۲  | Rs1₅                |    | <b>~</b> 2 | BRd₃ | 13 <sub>5</sub> | L |
| BLR[L]      | 1  | 12              | BRs <sub>3</sub> | Lim                         | Limit <sub>143</sub> |    | ~ <sub>4</sub> CL   |    | 3 ~        | BRd₃ | 13 <sub>5</sub> | L |
| [Dcc]Bcc[L] | 0  | D <sub>12</sub> | BRs <sub>3</sub> | Cnd₄                        | CRs <sub>6</sub>     |    | Disp <sub>103</sub> |    |            | BRd₃ | 125             |   |
| [Dcc]Bcc[L] | 1  | 02              | BRs <sub>3</sub> | Cnd₄                        | CRs <sub>6</sub>     | ۲  | Rs1₅                |    | <b>~</b> 2 | BRd₃ | 125             | ۲ |
| [Dcc]Bcc[L] | 1  | 12              | BRs <sub>3</sub> | Cnd₄                        | CRs <sub>6</sub>     |    | ~4                  | CL | 3 ~        | BRd₃ | 125             | ~ |

## Load and Store Instruction Formats

| Load       | L | $LX_2$          | Displacement <sub>110</sub>                        |                 |                    |    | Rs1₅ |      | Rd₅              | Opcode <sub>5</sub> |  |
|------------|---|-----------------|----------------------------------------------------|-----------------|--------------------|----|------|------|------------------|---------------------|--|
| Indexed Ld | 1 | LX <sub>2</sub> | Disp <sub>5</sub> Sc <sub>2</sub> Rs2 <sub>5</sub> |                 |                    | Cr | Rs1₅ |      | $Rd_5$           | Opcode₅             |  |
| Store      | L | LX <sub>2</sub> | D                                                  | isplacem        | ent <sub>110</sub> | U  | Rs1₅ |      | Rs2 <sub>5</sub> | Opcode₅             |  |
| Indexed St | 1 | LX <sub>2</sub> | Disp <sub>5</sub>                                  | Sc <sub>2</sub> | Rs2₅               | U  | Rs1₅ | Rs3₅ |                  | Opcode₅             |  |
| Store      | L | LX <sub>2</sub> | D                                                  | isplacem        | ent <sub>110</sub> | U  | Rs1₅ | ~    | CL₃              | Opcode₅             |  |
| Indexed St | 1 | LX <sub>2</sub> | Disp <sub>5</sub>                                  | Sc <sub>2</sub> | Rs2₅               | U  | Rs1₅ | ~    | CL₃              | Opcode₅             |  |

## **ALU Instruction Formats**

|        | 31 | 3029            | 28                            |                  | 17                    | 16   | 15   | 11               | 10                               | 6  | 5  | 0 |
|--------|----|-----------------|-------------------------------|------------------|-----------------------|------|------|------------------|----------------------------------|----|----|---|
| ADD    | L  | $LX_2$          |                               | Immedia          | ite <sub>12</sub>     | Cr   | Rs1₅ |                  | Rd₅                              |    | 46 |   |
| ADD    | 1  | LX <sub>2</sub> | 04                            | Rs2 <sub>5</sub> | Cr                    | Rs1₅ |      | Rd               | 5                                | 46 |    |   |
| ADC    | 1  | LX <sub>2</sub> | 14                            | ~3               | Rs2₅                  | Cr   |      | Rs1 <sub>5</sub> | Rd                               | 5  | 46 | 6 |
| ABS    | 1  | LX <sub>2</sub> | 24                            | ~3               | Rs2 <sub>5</sub>      | Cr   |      | Rs1 <sub>5</sub> | Rd                               | 5  | 46 |   |
| CNTLO  | 1  | $LX_2$          | 3 <sub>4</sub> ~ <sub>3</sub> |                  | <b>~</b> <sub>5</sub> | Cr   | Rs1₅ |                  | Rd₅                              |    | 46 |   |
| CNTLZ  | 1  | $LX_2$          | 44                            | 44 ~3            |                       | Cr   | Rs1₅ |                  | $Rd_5$                           |    | 46 | 6 |
| CNTPOP | 1  | $LX_2$          | 54                            | ~3               | <b>~</b> <sub>5</sub> | Cr   | Rs1₅ |                  | Rs1 <sub>5</sub> Rd <sub>5</sub> |    | 46 | 6 |
| CNTTZ  | 1  | $LX_2$          | 64                            | ~3               | <b>~</b> <sub>5</sub> | Cr   | Rs1₅ |                  | Rs1 <sub>5</sub> Rd <sub>5</sub> |    | 46 | 6 |
| ADB    | L  | LX <sub>2</sub> |                               | Immedia          | ite <sub>12</sub>     | Cr   | ~2   | BR <sub>3</sub>  | Rd                               | 5  | 56 | 5 |
| ADB    | 1  | LX <sub>2</sub> | ~-                            | ,                | Rs2₅                  | Cr   | ~2   | BR <sub>3</sub>  | Rd                               | 5  | 56 | 6 |
| MULA   | L  | LX <sub>2</sub> | Immediate <sub>12</sub>       |                  |                       | Cr   | Rs1₅ |                  | Rs1₅ Rd₅                         |    | 66 | 3 |
| MULA   | 1  | 02              | 03                            | ~4               | Rs2 <sub>5</sub>      | Cr   | Rs1₅ |                  | Rs1 <sub>5</sub> Rd <sub>5</sub> |    | 6  | 5 |

| MILI   | 1 | _               |                   |                 | D-0                   | 0  | D-1              | D-I             |                 |
|--------|---|-----------------|-------------------|-----------------|-----------------------|----|------------------|-----------------|-----------------|
| MUL    | 1 | 02              | 1 <sub>3</sub>    | ~4              | Rs2₅                  | Cr | Rs1 <sub>5</sub> | Rd <sub>5</sub> | 66              |
| MULSA  | 1 | 02              | 23                | ~4              | Rs2₅                  | Cr | Rs1₅             | Rd₅             | 66              |
| MULH   | 1 | 02              | 43                | ~4              | Rs2₅                  | Cr | Rs1₅             | Rd₅             | 6 <sub>6</sub>  |
| AND    | L | $LX_2$          |                   | Immedia         | ite <sub>12</sub>     | Cr | Rs1₅             | Rd₅             | 86              |
| AND    | 1 | $LX_2$          | 03                | ~4              | Rs2 <sub>5</sub>      | Cr | Rs1₅             | Rd₅             | 86              |
| OR     | L | $LX_2$          |                   | Immedia         | ite <sub>12</sub>     | Cr | Rs1₅             | Rd₅             | 96              |
| OR     | 1 | LX <sub>2</sub> | 03                | ~4              | Rs2 <sub>5</sub>      | Cr | Rs1₅             | Rd₅             | 96              |
| XOR    | L | LX <sub>2</sub> |                   | Immedia         | ite <sub>12</sub>     | Cr | Rs1₅             | Rd₅             | 10 <sub>6</sub> |
| XOR    | 1 | LX <sub>2</sub> | 03                | ~4              | Rs2 <sub>5</sub>      | Cr | Rs1₅             | Rd₅             | 10 <sub>6</sub> |
| SUBF   | L | LX <sub>2</sub> |                   | Immedia         | nte <sub>12</sub>     | Cr | Rs1₅             | Rd₅             | 126             |
| SUBF   | 1 | LX <sub>2</sub> | 03                | ~4              | Rs2 <sub>5</sub>      | Cr | Rs1₅             | Rd₅             | 126             |
| SBC    | 1 | $LX_2$          | 13                | ~4              | Rs2 <sub>5</sub>      | Cr | Rs1₅             | Rd₅             | 126             |
| PTRDIF | 1 | $LX_2$          | 23                | Ui <sub>4</sub> | Rs2 <sub>5</sub>      | Cr | Rs1₅             | $Rd_5$          | 126             |
| DIVA   | L | $LX_2$          |                   | Immedia         | ite <sub>12</sub>     | Cr | Rs1₅             | Rd₅             | 14 <sub>6</sub> |
| DIVA   | 1 | LX <sub>2</sub> | 03                | ~4              | Rs2₅                  | Cr | Rs1₅             | Rd₅             | 146             |
| DIV    | 1 | LX <sub>2</sub> | 13                | ~4              | Rs2₅                  | Cr | Rs1₅             | Rd₅             | 146             |
| DIVSU  | 1 | LX <sub>2</sub> | 23                | ~4              | Rs2₅                  | Cr | Rs1₅             | Rd₅             | 146             |
| SQRT   | 1 | $LX_2$          | 33                | ~4              | <b>~</b> <sub>5</sub> | Cr | Rs1 <sub>5</sub> | Rd₅             | 146             |
| MODA   | 1 | $LX_2$          | 43                | ~4              | Rs2₅                  | Cr | Rs1 <sub>5</sub> | Rd₅             | 146             |
| MOD    | 1 | LX <sub>2</sub> | 53                | ~4              | Rs2 <sub>5</sub>      | Cr | Rs1 <sub>5</sub> | Rd <sub>5</sub> | 146             |
| LOADA  | L | LX <sub>2</sub> | Di                | isplacem        | ent <sub>110</sub>    | Cr | Rs1₅             | Rd₅             | 39 <sub>5</sub> |
| LOADA  | 1 | LX <sub>2</sub> | Disp <sub>5</sub> | Sc <sub>2</sub> | Rs2 <sub>5</sub>      | Cr | Rs1₅             | Rd₅             | 39 <sub>5</sub> |

| L | LX <sub>2</sub> | Immediate <sub>12</sub> | Cr | Rs1 <sub>5</sub> | Rd₅ | Opcode <sub>6</sub> |
|---|-----------------|-------------------------|----|------------------|-----|---------------------|
| 1 | $LX_2$          | Rs2₅                    | Cr | Rs1₅             | Rd₅ | Opcode <sub>6</sub> |

## **Shift Instruction Formats**

|         | 31 | 3029 | 28 |                |                       | 17                 | 16 | 15 11    | 10 6                             | 5 0 |
|---------|----|------|----|----------------|-----------------------|--------------------|----|----------|----------------------------------|-----|
| SLL     | 0  | 02   | 03 | Н              | ۲                     | Shamt <sub>6</sub> | Cr | Rs1₅     | Rd₅                              | 26  |
| SLL     | 1  | 02   | 03 | Н              | ~3                    | Rs2 <sub>5</sub>   | Cr | Rs1₅     | Rd₅                              | 26  |
| SRL     | 0  | 02   | 13 | Н              | ~                     | Shamt <sub>6</sub> | Cr | Rs1₅     | Rd₅                              | 26  |
| SRL     | 1  | 02   | 13 | Н              | <b>~</b> <sub>3</sub> | Rs2₅               | Cr | Rs1₅     | Rd₅                              | 26  |
| SRA     | 0  | 02   | 23 | Rm             | 12                    | Shamt <sub>6</sub> | Cr | Rs1₅     | Rs1₅ Rd₅                         |     |
| SRA     | 1  | 02   | 23 | Rn             | n <sub>2</sub>        | Rs2 <sub>5</sub>   | Cr | Rs1₅     | Rd₅                              | 26  |
| RO[L R] | 0  | 02   | 43 | L              | ~                     | Shamt <sub>6</sub> | Cr | Rs1₅     | Rd₅                              | 26  |
| RO[L R] | 1  | 02   | 43 | L              | <b>~</b> <sub>3</sub> | Rs2₅               | Cr | Rs1₅     | Rs1 <sub>5</sub> Rd <sub>5</sub> |     |
| EXT     | 0  | 32   | Me | e <sub>6</sub> |                       | $Mb_6$             | Cr | Rs1₅     | Rd₅                              | 26  |
| EXT[Z]  | 1  | 02   | 33 | Z              | ~                     | Rs2 <sub>5</sub>   | Cr | Rs1₅ Rd₅ |                                  | 26  |

## **CSR Instruction Formats**

|             | 31 | 3029            | 28 17  |                                                 |    |    | 1514            | 15 11           | 10  | 6  | 5 | 0          |
|-------------|----|-----------------|--------|-------------------------------------------------|----|----|-----------------|-----------------|-----|----|---|------------|
| CSRxx       | 0  | Op <sub>2</sub> |        | CSRno                                           | Cr | R  | s1 <sub>5</sub> | Rd₅             |     | 76 |   |            |
|             | 1  | 02              | $Op_2$ | Op <sub>2</sub> ~ <sub>5</sub> Rs2 <sub>5</sub> |    | Cr | Rs1₅            |                 | Rd₅ |    | 7 | <b>7</b> 6 |
| 32-bit data | 1  | 12              |        | CSRno <sub>12</sub>                             |    |    | Op <sub>2</sub> | CL <sub>3</sub> | Rd  | 5  | 7 | <b>7</b> 6 |

## **BRK / SYS Instruction Formats**

|      | 31 | 3029            | 28              | 17               | 16 | 15 11          | 10 6            | 5 0                 |
|------|----|-----------------|-----------------|------------------|----|----------------|-----------------|---------------------|
| BRK  | 0  | 02              | 0 <sub>12</sub> | 0 <sub>12</sub>  |    |                | 05              | 06                  |
| SYS  | 0  | 02              | 1 <sub>12</sub> |                  | 0  | 05             | 05              | 06                  |
| RFI  | 0  | 02              | 2 <sub>12</sub> |                  | 0  | 0 <sub>5</sub> | 0 <sub>5</sub>  | 06                  |
| RFI2 | 0  | 02              | 3 <sub>12</sub> | 3 <sub>12</sub>  |    | 0 <sub>5</sub> | 0 <sub>5</sub>  | 06                  |
|      | 1  | LX <sub>2</sub> |                 | Rs2 <sub>5</sub> | Cr | Rs1₅           | Rd <sub>5</sub> | Opcode <sub>6</sub> |

## **Macro Instruction Formats**

|       | 31 | 3029 | 28              |                                     |                   | 17 | 1 | 1514            | 15 11                            | 10              | 6               | 5  | 0              |
|-------|----|------|-----------------|-------------------------------------|-------------------|----|---|-----------------|----------------------------------|-----------------|-----------------|----|----------------|
|       |    |      |                 |                                     |                   |    | 6 |                 |                                  |                 |                 |    |                |
| PUSH  | 0  | 22   | Gr <sub>2</sub> | L                                   | Rs <sub>158</sub> |    | 0 | Gr <sub>2</sub> |                                  | Rs <sub>7</sub> | 0               | 30 | ) <sub>6</sub> |
| ENTER | 0  | 32   |                 | Immediate <sub>12</sub>             |                   |    |   |                 | ~ <sub>5</sub> ~ Ns <sub>4</sub> |                 |                 | 30 | ) <sub>6</sub> |
| POP   | 0  | 22   | Gr <sub>2</sub> | Gr <sub>2</sub> L Rd <sub>158</sub> |                   |    | 0 | Gr <sub>2</sub> |                                  | Rd <sub>7</sub> | .0              | 3  | 16             |
| LEAVE | 0  | 32   |                 | Immediate <sub>12</sub>             |                   |    | 0 | 0               | ffs <sub>62</sub>                | ٧               | Nr <sub>4</sub> | 31 | 16             |

## **MOV Instruction Format**

|       | 31 | 3028 | 27                    |                   |                   | 17               | 16 | 15                | 11 | 10               | 6         | 5  | 0              |
|-------|----|------|-----------------------|-------------------|-------------------|------------------|----|-------------------|----|------------------|-----------|----|----------------|
| MOV   | 0  | 03   |                       | <b>%</b>          | Rs1 <sub>65</sub> | Rd <sub>65</sub> | Cr | Rs1 <sub>40</sub> |    | Rd <sub>40</sub> |           | 15 | 5 <sub>6</sub> |
| MOVA  | 0  | 13   |                       | <b>~</b> 9        | Rs1 <sub>65</sub> | Rd <sub>65</sub> | Cr | Rs1 <sub>40</sub> |    | Rd               | $Rd_{40}$ |    | 5 <sub>6</sub> |
| MOVSX | 0  | 23   | <b>~</b> <sub>3</sub> | Uimm <sub>6</sub> | Rs1 <sub>65</sub> | Rd <sub>65</sub> | Cr | Rs1 <sub>40</sub> |    | Rd <sub>40</sub> |           | 15 | 5 <sub>6</sub> |
| MOVZX | 0  | 33   | <b>~</b> <sub>3</sub> | Uimm <sub>6</sub> | Rs1 <sub>65</sub> | Rd <sub>65</sub> | Cr | Rs1               | 10 | Rd               | 40        | 15 | <b>5</b> 6     |

## **Exception Triggering Instruction Formats**

|      | 31 | 3029 | 28             |                       | 17               | 16 | 15                    | 11   | 10  | 6               | 5  | 0  |
|------|----|------|----------------|-----------------------|------------------|----|-----------------------|------|-----|-----------------|----|----|
| TRAP | 1  | 02   |                | Immedia               | ۲                | Rs | <b>1</b> <sub>5</sub> | Coi  | nd₅ | 28              | 86 |    |
| TRAP | 0  | 02   |                | <b>~</b> <sub>7</sub> | Rs2₅             | ۲  | Rs                    | Rs1₅ |     | Cond₅           |    | 86 |
| CHK  | 0  | 0    | p <sub>4</sub> | Rs3 <sub>5</sub>      | Rs2 <sub>5</sub> | 2  | Rs1₅                  |      | Of  | fs <sub>5</sub> | 29 | 96 |

## Instruction Pres/Postfixes and Modifiers Instruction Formats

|      | 31 | 3029            | 28                    | 17                  | 16                 | 15         | 11 | 10              | 6                  | 5              | 0              |
|------|----|-----------------|-----------------------|---------------------|--------------------|------------|----|-----------------|--------------------|----------------|----------------|
| ATOM | 1  | 02              | Mask <sub>12</sub>    |                     |                    | <b>~</b> 5 |    |                 | IPL <sub>5</sub>   | 60             | O <sub>6</sub> |
| QEXT | 0  | 02              | <b>~</b> <sub>7</sub> | ~                   | Rs1                | 5          |    | Rd <sub>5</sub> | 60                 | O <sub>6</sub> |                |
| PRED | 0  | 12              | Mask <sub>15</sub>    | Mask <sub>154</sub> |                    |            | 5  | ~               | Mask <sub>30</sub> | 60             | O <sub>6</sub> |
| PFX  | 0  | LX <sub>2</sub> |                       | Immedia             | nte <sub>255</sub> |            |    |                 | Wh <sub>2</sub>    | 6              | 16             |

## Condition Register Manipulation Instruction Formats

|        | 31 | 3029 | 28 |   |                       | 18 | 17                | 12                | 11               | 6                | 5  | 0 |
|--------|----|------|----|---|-----------------------|----|-------------------|-------------------|------------------|------------------|----|---|
| CRAND  | 0  | 02   | 04 |   | <b>~</b> <sub>6</sub> | Ι  | CRs1 <sub>6</sub> |                   | CR               | d <sub>6</sub>   | 11 | 6 |
| CROR   | 0  | 02   | 14 |   | <b>~</b> <sub>6</sub> |    | CRs1 <sub>6</sub> |                   | CRd <sub>6</sub> |                  | 11 | 6 |
| CRXOR  | 0  | 02   | 24 |   | <b>~</b> <sub>6</sub> |    | CRs1 <sub>6</sub> | CRs1 <sub>6</sub> |                  | CRd <sub>6</sub> |    | 6 |
| CRANDC | 0  | 02   | 34 |   | <b>~</b> <sub>6</sub> | Ι  | CRs1 <sub>6</sub> |                   | CR               | d <sub>6</sub>   | 11 | 6 |
| CRAND  | 1  | 02   | 04 | ۲ | CRs2 <sub>6</sub>     |    | CRs1 <sub>6</sub> |                   | CR               | d <sub>6</sub>   | 11 | 6 |
| CROR   | 1  | 02   | 14 | ۲ | CRs2 <sub>6</sub>     |    | CRs1 <sub>6</sub> |                   | CR               | d <sub>6</sub>   | 11 | 6 |
| CRXOR  | 1  | 02   | 24 | ٧ | CRs2 <sub>6</sub>     |    | CRs1 <sub>6</sub> |                   | CR               | d <sub>6</sub>   | 11 | 6 |
| CRANDC | 1  | 02   | 34 | ٧ | CRs2 <sub>6</sub>     |    | CRs1 <sub>6</sub> |                   | CR               | d <sub>6</sub>   | 11 | 6 |
| CRNAND | 1  | 02   | 44 | ۲ | CRs2 <sub>6</sub>     |    | CRs1 <sub>6</sub> |                   | CR               | d <sub>6</sub>   | 11 | 6 |
| CRNOR  | 1  | 02   | 54 | ۲ | CRs2 <sub>6</sub>     |    | CRs1 <sub>6</sub> |                   | CR               | d <sub>6</sub>   | 11 | 6 |
| CRXNOR | 1  | 02   | 64 | ~ | CRs2 <sub>6</sub>     |    | CRs1 <sub>6</sub> |                   | CRd <sub>6</sub> |                  | 11 | 6 |
| CRORC  | 1  | 02   | 74 | ~ | CRs2 <sub>6</sub>     |    | CRs1 <sub>6</sub> |                   | CR               | d <sub>6</sub>   | 11 | 6 |

# Table of Root Opcodes

|      | x000                  | x001                  | xx010                | x011                    | x100                       | x101          | x110                         | x111                       |
|------|-----------------------|-----------------------|----------------------|-------------------------|----------------------------|---------------|------------------------------|----------------------------|
| 000x | 0<br>BRK              | 1<br>Custom           | 2<br>{SHIFT}         | 3<br>CMP<br>CMPA        | 4<br>ADD                   | 5<br>ADB      | 6<br>MUL                     | 7<br>CSR                   |
| 001x | 8<br>AND              | 9<br>OR               | 10<br>XOR            | 11<br>{CR}              | 12<br>SUBF                 | 13            | 14<br>DIV                    | 15<br>MOV                  |
| 010x | 16                    | 17                    | 18                   | 19                      | 20                         | 21            | 22                           | 23                         |
| 011x | 24<br>Bc[L]<br>DBc[L] | 25<br>Bc[L]<br>DBc[L] | 26<br>B[L]<br>BLR[L] | 27<br>B[L]<br>BLR[L]    | 28<br>TRAP                 | 29<br>CHK     | 30<br>PUSH<br>PUSHF<br>ENTER | 31<br>POP<br>POPF<br>LEAVE |
| 100x | 32<br>LDB             | 33<br>LDBZ            | 34<br>LDW            | 35<br>LDWZ              | 36<br>LDT                  | 37<br>LDTZ    | 38<br>LOAD                   | 39<br>LOADA                |
| 101x | 40<br>STB             | 41<br>STBI            | 42<br>STW            | 43<br>STWI              | 44<br>STT                  | 45<br>STTI    | 46<br>STORE                  | 47<br>STOREI               |
| 110x | 48<br>LDFS            | 49<br>LDFD            | 50<br>LDFQ           | 51<br>Fence<br>Misc Mem | 52<br>STPTR                | 53<br>{BLOCK} | 54<br>{Float}                | 55                         |
| 111x | 56<br>STFS            | 57<br>STFD            | 58<br>STFQ           | 59<br>AMO               | 60<br>ATOM<br>QEXT<br>PRED | 61<br>PFX     | 62                           | 63<br>NOP                  |

# **Instruction Descriptions**

## ABS[.] – Absolute Value

## **Description:**

This instruction computes the absolute value of the sum two source operands in registers Rs1 and Rs2 and places the result in Rd. Condition register CR0 may be updated if the Cr bit of the instruction is set.

#### **Instruction Format: R1**

|     | 31 | 3029 | 28 |    | 17   | 16 | 15   | 11 | 10 | 6              | 5 | 0                     |
|-----|----|------|----|----|------|----|------|----|----|----------------|---|-----------------------|
| ABS | 1  | 02   | 24 | ~3 | Rs2₅ | Cr | Rs1₅ |    | Ro | l <sub>5</sub> | 4 | <b>l</b> <sub>6</sub> |

## Operation:

Execution Units: Integer ALU #0 only

Clock Cycles: 1

Exceptions: none

## ADD[.] - Add

## **Description:**

Add two source registers Rs1 and Rs2 or Rs1 and a constant and place the sum in the destination register Rd. All register values are integers. Condition register CR0 may be updated if the Cr bit of the instruction is set.

**Instruction Format: R3** 

|     | 31 | 3029            | 28                                             |  | 17 | 16 | 15   | 11 | 10  | 6 | 5  | 0 |
|-----|----|-----------------|------------------------------------------------|--|----|----|------|----|-----|---|----|---|
| ADD | L  | LX <sub>2</sub> | Immediate <sub>12</sub>                        |  |    | Cr | Rs1₅ |    | Rd₅ |   | 46 |   |
| ADD | 1  | 02              | 0 <sub>4</sub> ~ <sub>3</sub> Rs2 <sub>5</sub> |  |    | Cr | Rs1₅ |    | Rd₅ |   | 46 |   |

## Operation:

Rd = Rs1 + Rs2

OR

Rd = Constant + Rs1

Clock Cycles: 1

**Execution Units:** All Integer ALUs, all FPUs

Exceptions: none

## ADB[.] - Add Immediate to Branch Register

## **Description:**

Add an immediate value to the branch register and place the result in a destination register Rd. This instruction may be used in the formation of program counter relative addresses.

#### **Instruction Format:** RI

|     | 31 | 3029            | 28                      | 17 | 16               | 15                    | 11              | 10 | 6                     | 5   | 0 |   |   |
|-----|----|-----------------|-------------------------|----|------------------|-----------------------|-----------------|----|-----------------------|-----|---|---|---|
| ADB | L  | LX <sub>2</sub> | Immediate <sub>12</sub> |    |                  | <b>~</b> <sub>2</sub> | BR <sub>3</sub> | Ro | <b>d</b> <sub>5</sub> | 5   |   |   |   |
| ADB | 1  | $LX_2$          | Rs2 <sub>5</sub>        |    | Rs2 <sub>5</sub> |                       | Cr              | ~2 | BR <sub>3</sub>       | Rd₅ |   | 5 | 6 |

Clock Cycles: 1

Execution Units: All ALU's

Operation:

Rd = BR + immediate

**Exceptions:** 

## ADC[.] - Add with Carry

## **Description:**

Add two source registers Rs1 and Rs2 or Rs1 and a constant and the carry flag and place the sum in the destination register Rd. All register values are integers.

Condition register CR0 may be updated if the Cr bit of the instruction is set.

#### **Instruction Format: R3**

|     | 31 | 3029 | 28 |    | 17   | 16 | 15   | 11 | 10 | 6 | 5 | 0          |
|-----|----|------|----|----|------|----|------|----|----|---|---|------------|
| ADC | 1  | 02   | 14 | ~3 | Rs2₅ | Cr | Rs1₅ |    | Ro |   | 4 | <b>1</b> 6 |

## Operation:

$$Rd = Rs1 + Rs2 + carry$$

Clock Cycles: 1

Execution Units: All Integer ALUs, all FPUs

Exceptions: none

## CNTLO[.] - Count Leading Ones

#### **Description:**

This instruction counts the number of consecutive one bits beginning at the most significant bit towards the least significant bit for the register Rs1 and places the count in register Rd.

#### **Instruction Format: R3**

|       | 31 | 3029   | 28 |    | 17    | 16 | 15   | 11 | 10 | 6  | 5 | 0                     |
|-------|----|--------|----|----|-------|----|------|----|----|----|---|-----------------------|
| CNTLO | 1  | $LX_2$ | 34 | ~3 | ~3 ~5 |    | Rs1₅ |    | Ro | 15 | 4 | <b>1</b> <sub>6</sub> |

Operation:

Execution Units: Integer ALU #0 only

**Clock Cycles: 1** 

Exceptions: none

Notes:

## CNTLZ[.] - Count Leading Zeros

#### **Description:**

This instruction counts the number of consecutive zero bits beginning at the most significant bit towards the least significant bit for the register Rs1 and places the count in register Rd.

#### **Instruction Format: R3**

|       | 31 | 3029   | 28 |                       | 17                    | 16 | 15 | 11              | 10 | 6  | 5 | 0  |
|-------|----|--------|----|-----------------------|-----------------------|----|----|-----------------|----|----|---|----|
| CNTLZ | 1  | $LX_2$ | 44 | <b>~</b> <sub>3</sub> | <b>~</b> <sub>5</sub> | Cr | Rs | s1 <sub>5</sub> | Ro | 15 | 4 | -6 |

Operation:

Execution Units: Integer ALU #0 only

**Clock Cycles: 1** 

Exceptions: none

## CNTPOP[.] - Count Population

#### **Description:**

This instruction counts the number of bits set in source register Rs1 and places the count in destination register Rd.

#### **Instruction Format:**

|        | 31 | 3029   | 28 |                       | 17                    | 16 | 15 | 11             | 10 | 6              | 5 | 0              |
|--------|----|--------|----|-----------------------|-----------------------|----|----|----------------|----|----------------|---|----------------|
| CNTPOP | 1  | $LX_2$ | 54 | <b>~</b> <sub>3</sub> | <b>~</b> <sub>5</sub> | Cr | Rs | 1 <sub>5</sub> | Ro | l <sub>5</sub> | 4 | l <sub>6</sub> |

**Operation:** 

Execution Units: Integer ALU #0

Clock Cycles: 1

Exceptions: none

Notes:

CNTTZ[.] - Count Trailing Zeros

#### **Description:**

This instruction counts the number of consecutive zero bits beginning at the least significant bit towards the most significant bit of the value in register Rs1 and places the count in register Rd. This instruction can also be used to get the position of the first one bit from the right-hand side.

#### **Instruction Format: R3**

|       | 31 | 3029   | 28 |                       | 17                    | 16 | 15 | 11              | 10 | 6  | 5 | 0              |
|-------|----|--------|----|-----------------------|-----------------------|----|----|-----------------|----|----|---|----------------|
| CNTTZ | 1  | $LX_2$ | 64 | <b>~</b> <sub>3</sub> | <b>~</b> <sub>5</sub> | Cr | Rs | s1 <sub>5</sub> | Ro | 15 | 4 | l <sub>6</sub> |

Operation:

Execution Units: Integer ALU #0

**Clock Cycles: 1** 

Exceptions: none

## CSR[.] - Control and Special Registers Operations

#### **Description:**

Perform an operation on a CSR specified either as a constant in the instruction or as a number in source register Rs2. The previous value of the CSR is placed in the destination register Rd. New values for the CSR may come from either the value in Rs1 or an immediate constant.

| Operation                          | Op <sub>2</sub> | Mnemonic |
|------------------------------------|-----------------|----------|
| Read CSR                           | 0               | CSRRD    |
| Write CSR                          | 1               | CSRRW    |
| Or to CSR (set bits)               | 2               | CSRRS    |
| And complement to CSR (clear bits) | 3               | CSRRC    |

**Supported Operand Sizes: N/A** 

#### **Instruction Formats:**

|             | 31 | 3029            | 28                  |                                                 | 17              | 16              | 1514 | 15 11           | 10  | 6 | 5 | 0 |
|-------------|----|-----------------|---------------------|-------------------------------------------------|-----------------|-----------------|------|-----------------|-----|---|---|---|
| CSRxx       | 0  | Op <sub>2</sub> |                     | CSRnc                                           | ) <sub>12</sub> | Cr              | R    | s1 <sub>5</sub> | Rd  | 5 | 7 | 6 |
|             | 1  | 02              | Op <sub>2</sub>     | Op <sub>2</sub> ~ <sub>5</sub> Rs2 <sub>5</sub> |                 |                 | R    | s1 <sub>5</sub> | Rd₅ |   | 7 | 6 |
| 32-bit data | 1  | 12              | CSRno <sub>12</sub> |                                                 | Cr              | Op <sub>2</sub> | CL₃  | Rd              | 5   | 7 | 6 |   |

#### Notes:

The top two bits of the Regno field correspond to the operating mode.

## LOADA[.] - Load Address

## **Description:**

This instruction computes the virtual address following the same format as a load or store instruction and places it in the destination register Rd.

#### **Instruction Format:**

|       | 31 | 3029   | 28    |                                  | 17                 | 16 | 15               | 11                      | 10  | 6                     | 5  | 0  |
|-------|----|--------|-------|----------------------------------|--------------------|----|------------------|-------------------------|-----|-----------------------|----|----|
| LOADA | L  | $LX_2$ | D     | isplaceme                        | ent <sub>110</sub> | Cr | Rs               | s <b>1</b> <sub>5</sub> | Ro  | <b>d</b> <sub>5</sub> | 39 | 95 |
| LOADA | 1  | $LX_2$ | Disp₅ | Sc <sub>2</sub> Rs2 <sub>5</sub> |                    | Cr | Rs1 <sub>5</sub> |                         | Rd₅ |                       | 39 | 95 |

Clock Cycles: 1

**Execution Units:** All ALU's

Operation:

Rd = Rs1 + Displacement

OR

Rd = Rs1 + Rs2 \*Scale + displacement

## **Exceptions:**

## SBC[.] - Subtract with Carry

## **Description:**

Subtract two source registers Rs1 and Rs2 or Rs1 and a constant and the carry flag and place the difference in the destination register Rd. All register values are integers. Condition register CR0 may be updated if the Cr bit of the instruction is set.

#### **Instruction Format: R3**

|     | 31 | 3029 | 28 |    | 17   | 16 | 15   | 11 | 10 | 6  | 5  | 0  |
|-----|----|------|----|----|------|----|------|----|----|----|----|----|
| SBC | 1  | 02   | 14 | ~3 | Rs2₅ | Cr | Rs1₅ |    | Ro | 15 | 1: | 26 |

## Operation:

Rd = Rs2 - Rs1 - carry

Clock Cycles: 1

**Execution Units:** All Integer ALUs, all FPUs

Exceptions: none

## SUBF[.] - Subtract From

## **Description:**

Subtract two source registers Rs1 and Rs2 or Rs1 and a constant and place the difference in the destination register Rd. All register values are integers. Condition register CR0 may be updated if the Cr bit of the instruction is set.

#### **Instruction Format: R3**

|      | 31 | 3029            | 28                      |         | 17 | 16 | 15   | 11 | 10  | 6 | 5  | 0  |
|------|----|-----------------|-------------------------|---------|----|----|------|----|-----|---|----|----|
| SUBF | L  | LX <sub>2</sub> | Immediate <sub>12</sub> |         |    | Cr | Rs1₅ |    | Rd₅ |   | 1: | 26 |
| SUBF | 1  | 02              | 04                      | 0 a Po2 |    | Cr | Rs1₅ |    | Rd₅ |   | 1: | 26 |

## Operation:

Rd = Rs2 - Rs1

OR

Rd = Constant - Rs1

Clock Cycles: 1

**Execution Units:** All Integer ALUs, all FPUs

Exceptions: none

## Multiply and Divide

## DIV[.] – Signed Division

## **Description:**

Divide source dividend operand in Rs1 by divisor operand in Rs2 and place the quotient in the destination register Rd. All registers are integer registers. Arithmetic is signed twos-complement values.

#### **Instruction Format:**

|     | 31 | 3029   | 28 | 17   | 16 | 15 | 11                      | 10 | 6              | 5  | 0  |
|-----|----|--------|----|------|----|----|-------------------------|----|----------------|----|----|
| DIV | 1  | $LX_2$ | 13 | Rs2₅ | Cr | Rs | : <b>1</b> <sub>5</sub> | Ro | l <sub>5</sub> | 14 | 46 |

### Operation:

Rt = Ra / Rb

Execution Units: ALU #0 Only

**Exceptions:** DBZ

## DIVA[.] - Address Division

## **Description:**

Divide source dividend operand in Rs1 by divisor operand in either Rs2 or an immediate constant and place the quotient in the destination register Rd. All registers are integer registers. Arithmetic is unsigned twos-complement values. DIVA may be used in pointer to index conversions.

#### **Instruction Format:**

|      | 31 | 3029            | 28 |      | 17 | 16 | 15   | 11                    | 10  | 6                     | 5  | 0  |
|------|----|-----------------|----|------|----|----|------|-----------------------|-----|-----------------------|----|----|
| DIVA | L  | LX <sub>2</sub> |    | Cr   |    |    |      | <b>d</b> <sub>5</sub> | 146 |                       |    |    |
| DIVA | 1  | $LX_2$          | 03 | Rs2₅ |    | Cr | Rs1₅ |                       | Ro  | <b>1</b> <sub>5</sub> | 1- | 46 |

## Operation:

Rd = Rs1 / Rs2

OR

Rd = Rs1 / Constant

Execution Units: ALU #0 Only

Exceptions: none

## MUL[.] – Multiply

## **Description:**

Multiply two source registers Rs1 and Rs2 and place the product in the destination register Rd. All registers are integer registers. Values are treated as signed integers.

**Instruction Format:** R3

| ſ |     | 31 | 3029 | 28 | 17   | 16 | 15 | 11                    | 10 | 6          | 5 | 0          |
|---|-----|----|------|----|------|----|----|-----------------------|----|------------|---|------------|
|   | MUL | 1  | 02   | 13 | Rs2₅ | Cr | Rs | <b>1</b> <sub>5</sub> | Ro | <b> </b> 5 | E | $\delta_6$ |

Operation: R2

Rd = Rs1 \* Rs2

Clock Cycles: 4

**Execution Units:** All Integer ALUs

Exceptions: none

## MULA[.] - Multiply for Addressing

## **Description:**

Multiply two source registers Rs1 and Rs2 or Rs1 and an immediate constant, and place the product in the destination register Rd. All registers are integer registers. Values are treated as unsigned integers. This instruction is typically used in address calculations for arrays.

**Instruction Format: R3** 

|      | 31 | 3029            | 28                              |    | 17   | 16 | 15   | 11 | 10  | 6 | 5 | 0              |
|------|----|-----------------|---------------------------------|----|------|----|------|----|-----|---|---|----------------|
| MULA | L  | LX <sub>2</sub> |                                 | Cr | Rs1₅ |    | Rd₅  |    | 6   | 6 |   |                |
| MULA | 1  | 02              | 0 <sub>3</sub> Rs2 <sub>5</sub> |    |      | Cr | Rs1₅ |    | Rd₅ |   | 6 | ) <sub>6</sub> |

**Operation: R2** 

Rd = Rs1 \* Rs2

OR

Rd = Rs1 \* Constant

Clock Cycles: 4

**Execution Units:** All Integer ALUs

Exceptions: none

#### **Shift and Rotate**

## SLL[.] - Shift Left Logical

#### **Description:**

Left shift a source operand in Rs1 by a source operand value ins Rs2 or a constant, and place the result in the destination register Rd. The second source operand may be either a register specified by the Rs2 field of the instruction, or an immediate value. If the 'H' bit is set, the upper 64-bits of the result are transferred to the destination register, Rd. Condition register CR0 may be updated if the Cr bit of the instruction is set. Also, the carry bit of CR0 will be set if any bit shifted out from the high order bits is non-zero, otherwise it will be cleared.

#### **Instruction Format: SHIFT**

|     | 31 | 3029 | 28 |     | 17                       | 16 | 15   | 11 | 10  | 6 | 5 | 0 |
|-----|----|------|----|-----|--------------------------|----|------|----|-----|---|---|---|
| SLL | 0  | 02   | 03 | H ~ | H ~ Shamt <sub>6</sub> C |    | Rs1₅ |    | Rd₅ |   | 2 | 6 |
| SLL | 1  | 02   | 03 | Н ~ | Rs2 <sub>5</sub>         | Cr | Rs1₅ |    | Rd₅ |   | 2 | 6 |

### Operation:

Rd = Rs1 << Rs2

OR

Rd = Rs1 << constant

**Operation Size:** 

**Execution Units:** integer ALU

Exceptions: none

Notes:

Left shift instructions are faster than multiply.

Example:

## SRA[.] - Shift Right Arithmetic

## **Description:**

Right shift a source operand value in Rs1 by a source operand value in Rs2 or a constant and place the sign extended result in the destination register. The result may be rounded.

#### **Instruction Format: SHIFT**

|     | 31 | 3029 | 28 |                 | 17                 | 16                  | 15 | 11                               | 10  | 6   | 5 | 0          |
|-----|----|------|----|-----------------|--------------------|---------------------|----|----------------------------------|-----|-----|---|------------|
| SRA | 0  | 02   | 23 | Rm <sub>2</sub> | Shamt <sub>6</sub> | Cr                  |    |                                  | Rd₅ |     | 2 | <u>2</u> 6 |
| SRA | 1  | 02   | 23 | Rm <sub>2</sub> | Rs2₅               | Cr Rs1 <sub>5</sub> |    | Rs1 <sub>5</sub> Rd <sub>5</sub> |     | Rd₅ |   | <u>2</u> 6 |

| Rm <sub>2</sub> |                                                                           |
|-----------------|---------------------------------------------------------------------------|
| 0               | Truncate                                                                  |
| 1               | Round towards zero, If the result is negative, then it is rounded up.     |
| 2               | Round up, one is added to the result if there was a carry out of the LSB. |
| 3               | reserved                                                                  |

## Operation:

#### **Operation Size:**

**Execution Units:** integer ALU

Exceptions: none

Example:

## SRL[.] - Shift Right Logical

#### **Description:**

Right shift a source operand value in Rs1 by a second source operand value in Rs2 or a constant and place the result in the destination register. If the 'H' bit is set, the lower 64-bits of the result are transferred to the destination register, Rd. Condition register CR0 may be updated if the Cr bit of the instruction is set. Also, the carry bit of CR0 will be set if any bit shifted out from the low order bits is non-zero, otherwise it will be cleared.

#### **Instruction Format: SHIFT**

|     | 31 | 3029 | 28             |      | 17                       | 16 | 15   | 11 | 10              | 6 | 5 | 0  |
|-----|----|------|----------------|------|--------------------------|----|------|----|-----------------|---|---|----|
| SRL | 0  | 02   | 1 <sub>3</sub> | H ~  | H ~ Shamt <sub>6</sub> C |    | Rs1₅ |    | Rd₅             |   | 2 | -6 |
| SRL | 1  | 02   | 13             | H ~3 | Rs2 <sub>5</sub>         | Cr | Rs1₅ |    | Rd <sub>5</sub> |   | 2 | 6  |

### Operation:

Rd = Rs1 >> Rs2

OR

Rd = Rs1 >> constant

**Operation Size:** 

**Execution Units: integer ALU** 

Exceptions: none

Example:

## **Logical Operations**

## AND[.] - Bitwise And

## **Description:**

And two source registers Rs1 and Rs2 or 'and' Rs1 and a constant and place the result in the destination register Rd. All register values are integers. Condition register zero may be updated with the result of the operation compared to zero.

**Instruction Format: R3** 

|     | 31 | 3029   | 28 |    | 17               | 16 | 15   | 11                    | 10  | 6 | 5 | 0 |
|-----|----|--------|----|----|------------------|----|------|-----------------------|-----|---|---|---|
| AND | L  | $LX_2$ |    | Cr | Rs1₅             |    |      | <b>d</b> <sub>5</sub> | 8   | 6 |   |   |
| AND | 1  | $LX_2$ | 03 | ~4 | Rs2 <sub>5</sub> | Cr | Rs1₅ |                       | Rd₅ |   | 8 | 6 |

Operation:

Clock Cycles: 1

**Execution Units:** All Integer ALUs, all FPUs

Exceptions: none

## XOR[.] - Bitwise Exclusive Or

## **Description:**

Bitwise exclusively or two source registers Rs1 and Rs2 OR Rs1 and a constant and place the result in the destination register. All registers are integer registers.

#### **Instruction Format:** R3

|     | 31 | 3029            | 28               | 17 | 16   | 15   | 11  | 10  | 6  | 5  | 0  |
|-----|----|-----------------|------------------|----|------|------|-----|-----|----|----|----|
| XOR | L  | $LX_2$          | Immedia          | Cr | Rs1₅ |      | Rd₅ |     | 10 | 06 |    |
| XOR | 1  | LX <sub>2</sub> | Rs2 <sub>5</sub> |    | Cr   | Rs1₅ |     | Rd₅ |    | 10 | 06 |

Operation: R3

Rt = Ra ^ Rb

Clock Cycles: 1

**Execution Units:** All Integer ALUs, all FPUs

Exceptions: none

## OR[.] - Bitwise Or

## **Description:**

Bitwise or two source registers Rs1 and Rs2 OR Rs1 and a constant, and place the result in the destination register Rd. All registers are integer registers.

#### **Instruction Format:**

|    | 31 | 3029   | 28      | 17                | 16 | 15 | 11             | 10 | 6                     | 5 | 0 |
|----|----|--------|---------|-------------------|----|----|----------------|----|-----------------------|---|---|
| OR | L  | $LX_2$ | Immedia | ite <sub>12</sub> | Cr | Rs | 1 <sub>5</sub> | Ro | <b>d</b> <sub>5</sub> | 9 | 6 |
| OR | 1  | $LX_2$ |         | Rs2₅              | Cr | Rs | 15             | Ro | <b>1</b> <sub>5</sub> | 9 | 6 |

## Operation:

Rd = Rs1 | Rs2

OR

Rd = Rs1 | Constant

Clock Cycles: 1

**Execution Units:** All Integer ALUs, all FPUs

Exceptions: none

## CHK - Check Register Against Bounds

#### **Description:**

A register, Rs1, is compared to two values. If the register is outside of the bounds defined by Rs2 and Rs3 then a bounds check exception will occur. Comparisons may be signed or unsigned, indicated by 'S', 1 = signed, 0 = unsigned. The constant Offs<sub>3</sub> is multiplied by four and added to the program counter address of the CHK instruction and stored on an internal stack. This allows a return to a point up to 256 bytes after the CHK. Typical values are zero or one.

#### **Instruction Format:** R2

#### **ALU Instruction Formats**

| 31 | 30 27 | 26                                | 17 | 16 | 15 | 11   | 10 | 6              | 5  | 0              |
|----|-------|-----------------------------------|----|----|----|------|----|----------------|----|----------------|
| 0  | Op₄   | Rs3 <sub>5</sub> Rs2 <sub>5</sub> |    | ۲  | Rs | Rs1₅ |    | S <sub>5</sub> | 29 | ) <sub>6</sub> |

#### Op<sub>4</sub> exception when not

| 0  | Ra >= Rb and Ra < Rc        |                               |
|----|-----------------------------|-------------------------------|
| 1  | Ra >= Rb and Ra <= Rc       |                               |
| 2  | Ra > Rb and Ra < Rc         |                               |
| 3  | Ra > Rb and Ra <= Rc        |                               |
| 4  | Not (Ra >= Rb and Ra < Rc)  |                               |
| 5  | Not (Ra >= Rb and Ra <= Rc) |                               |
| 6  | Not (Ra > Rb and Ra < Rc)   |                               |
| 7  | Not (Ra > Rb and Ra <= Rc)  |                               |
| 8  | Ra >= CPL                   | CHKCPL – code privilege level |
| 9  | Ra <= CPL                   | CHKDPL – data privilege level |
| 10 | Ra == SC                    | Stack canary check            |

## Operation:

IF check failed

PUSH SR onto internal stack
PUSH PC plus  $O_5 * 4$  onto internal stack
PC = vector at (tvec[3])

Clock Cycles: 1

**Execution Units:** Integer ALU

**Exceptions**: bounds check

Notes:

The system exception handler will typically transfer processing back to a local exception handler.

## **Data Movement**

## MOVE[.] / MOVEA[.] / MOVSZ[.] / MOVZX[.] – Move Register to Register

#### **Description:**

Move register-to-register. This instruction may move between different types of registers. Raw binary data is moved. No data conversions are applied. Some registers are accessible only in specific operating modes. Some registers are readonly. Normally referencing the stack pointer register r31 will map to the stack pointer according to the operating mode, however the 'MOVA' instruction may be used to disable this. The MOVSX and MOVZX instructions perform moves with sign and zero extensions from the specified bit respectively.

**Instruction Formats: MOV** 

|       | 31 | 3028 | 27                    |                   |                   | 17               | 16 | 15 11             | 1 | 10  | 6  | 5   | 0                |
|-------|----|------|-----------------------|-------------------|-------------------|------------------|----|-------------------|---|-----|----|-----|------------------|
| MOVE  | 0  | 03   |                       | <b>~</b> 9        | Rs1 <sub>65</sub> | Rd <sub>65</sub> | Cr | Rs1 <sub>40</sub> |   | Rd∠ | 10 | Орс | ode <sub>6</sub> |
| MOVEA | 0  | 13   |                       | <b>~</b> 9        | Rs1 <sub>65</sub> | Rd <sub>65</sub> | Cr | Rs1 <sub>40</sub> |   | Rd∠ | 10 | Орс | ode <sub>6</sub> |
| MOVSX | 0  | 23   | ~3                    | Uimm <sub>6</sub> | Rs1 <sub>65</sub> | Rd <sub>65</sub> | Cr | Rs1 <sub>40</sub> |   | Rd₄ | 10 | Орс | ode <sub>6</sub> |
| MOVZX | 0  | 33   | <b>~</b> <sub>3</sub> | Uimm <sub>6</sub> | Rs1 <sub>65</sub> | Rd <sub>65</sub> | Cr | Rs1 <sub>40</sub> |   | Rd∠ | l0 | Орс | ode <sub>6</sub> |

**Operation: R2** 

Rt = Ra

Clock Cycles: 1

**Execution Units:** All Integer ALU's

Exceptions: none

| Ra <sub>7</sub> / Rt <sub>7</sub> | Register file                     | Mode   | RW |
|-----------------------------------|-----------------------------------|--------|----|
|                                   |                                   | Access |    |
| 0 to 30                           | General purpose registers 0 to 30 | USHM   | RW |
| 31                                | Safe stack pointer                | SHM    | RW |
| 32 to 63                          | Floating-point registers          | USHM   | RW |
| 64                                | User stack pointer                | USHM   | RW |
| 65                                | Supervisor stack pointer          | SHM    | RW |
| 66                                | Hypervisor stack pointer          | НМ     | RW |
| 67                                | Machine stack pointer             | М      | RW |
| 68 to 71                          | Micro-code temporaries #0 to #3   | НМ     | RW |
| 72 to 78                          | Branch registers                  | USHM   | RW |

| 79       | Instruction pointer            | USHM | R  |
|----------|--------------------------------|------|----|
| 80 to 87 | Condition Registers            | USHM | RW |
| 88       | Loop Counter                   | USHM | RW |
| 89       | micro-code link register       | НМ   | RW |
| 90       | Context block address register | SHM  | RW |
| 91       | Micro-code program counter     | SHM  | RW |

## Load / Store Instructions

#### Overview

## **Addressing Modes**

Load and store instructions have two addressing modes: register indirect with displacement and scaled indexed addressing. Note that store instructions cannot updated CR0.

#### Register Indirect with Displacement Format

| $0 \mid LX_2 \mid Displacement_{110} \mid Cr \mid RSI_5 \mid RG_5 \mid Opcode_5 \mid$ |
|---------------------------------------------------------------------------------------|
|---------------------------------------------------------------------------------------|

#### Scaled Indexed with Displacement Format

For scaled indexed with displacement format the load or store address is the sum of register Rs1, scaled register Rs2, and a displacement constant found in the instruction.

#### **Instruction Format:** d[Rs1+Rs2\*]

|  |  | 1 | $LX_2$ | Disp <sub>5</sub> | Sc <sub>2</sub> | Rs2 <sub>5</sub> | Cr | Rs1 <sub>5</sub> | Rd₅ | Opcode <sub>5</sub> |
|--|--|---|--------|-------------------|-----------------|------------------|----|------------------|-----|---------------------|
|--|--|---|--------|-------------------|-----------------|------------------|----|------------------|-----|---------------------|

## LDB[.] Rn, <ea> - Load Byte

#### **Description:**

Load register Rd with a byte of data from source. The source value is sign extended to the machine width.

#### **Instruction Formats**

| Disp       | L | LX <sub>2</sub> | D                 | isplaceme | ent <sub>110</sub> | Cr | Rs1₅             | Rd₅    | 325 |
|------------|---|-----------------|-------------------|-----------|--------------------|----|------------------|--------|-----|
| Indexed Ld | 1 | $LX_2$          | Disp <sub>5</sub> | $Sc_2$    | Rs2 <sub>5</sub>   | Cr | Rs1 <sub>5</sub> | $Rd_5$ | 325 |

Execution Units: AGEN, MEM

**Exceptions:** 

Notes:

LDBZ[.] Rn, <ea> - Load Byte and Zero Extend

### **Description:**

Load register Rd with a byte of data from source. The source value is zero extended to the machine width.

#### **Instruction Formats**

| Disp       | L | $LX_2$ | D                 | isplaceme | ent <sub>110</sub> | Cr | Rs1₅ | Rd₅ | <b>33</b> <sub>5</sub> |
|------------|---|--------|-------------------|-----------|--------------------|----|------|-----|------------------------|
| Indexed Ld | 1 | $LX_2$ | Disp <sub>5</sub> | $Sc_2$    | Rs2 <sub>5</sub>   | Cr | Rs1₅ | Rd₅ | <b>33</b> <sub>5</sub> |

Execution Units: AGEN, MEM

**Exceptions:** 

## LDT[.] Rn, <ea> - Load Tetra

#### **Description:**

Load register Rd with a tetra of data from source. The source value is sign extended to the machine width.

#### **Instruction Formats**

| Disp       | L | LX <sub>2</sub> | D                 | isplaceme | ent <sub>110</sub> | Cr | Rs1 <sub>5</sub> | Rd₅ | 365 |
|------------|---|-----------------|-------------------|-----------|--------------------|----|------------------|-----|-----|
| Indexed Ld | 1 | $LX_2$          | Disp <sub>5</sub> | $Sc_2$    | Rs2 <sub>5</sub>   | Cr | Rs1₅             | Rd₅ | 365 |

Execution Units: AGEN, MEM

**Exceptions:** 

Notes:

LDTZ[.] Rn, <ea> - Load Tetra and Zero Extend

#### **Description:**

Load register Rd with a tetra of data from source. The source value is zero extended to the machine width.

#### **Instruction Formats**

| Disp       | L | LX <sub>2</sub> | D                 | isplacem | ent <sub>110</sub> | Cr | Rs1₅ | Rd₅ | 37 <sub>5</sub> |
|------------|---|-----------------|-------------------|----------|--------------------|----|------|-----|-----------------|
| Indexed Ld | 1 | LX <sub>2</sub> | Disp <sub>5</sub> | $Sc_2$   | Rs2 <sub>5</sub>   | Cr | Rs1₅ | Rd₅ | 37 <sub>5</sub> |

Execution Units: AGEN, MEM

**Exceptions:** 

## LDW[.] Rn, <ea> - Load Wyde

#### **Description:**

Load register Rd with a wyde of data from source. The source value is sign extended to the machine width.

#### **Instruction Formats**

| Disp       | L | LX <sub>2</sub> | D                 | isplaceme | ent <sub>110</sub> | Cr | Rs1 <sub>5</sub> | Rd₅ | 345 |
|------------|---|-----------------|-------------------|-----------|--------------------|----|------------------|-----|-----|
| Indexed Ld | 1 | $LX_2$          | Disp <sub>5</sub> | $Sc_2$    | Rs2₅               | Cr | Rs1₅             | Rd₅ | 345 |

Execution Units: AGEN, MEM

**Exceptions:** 

Notes:

LDWZ[.] Rn, <ea> - Load Wyde and Zero Extend

## **Description:**

Load register Rd with a wyde of data from source. The source value is zero extended to the machine width.

#### **Instruction Formats**

| Disp       | L | LX <sub>2</sub> | D                 | isplacem | ent <sub>110</sub> | Cr | Rs1₅ | Rd₅ | <b>35</b> ₅ |
|------------|---|-----------------|-------------------|----------|--------------------|----|------|-----|-------------|
| Indexed Ld | 1 | LX <sub>2</sub> | Disp <sub>5</sub> | $Sc_2$   | Rs2 <sub>5</sub>   | Cr | Rs1₅ | Rd₅ | <b>35</b> ₅ |

Execution Units: AGEN, MEM

**Exceptions:** 

## LOAD[.] Rn, <ea> - Load

## **Description:**

This is an alternate mnemonic for the LDO instruction. Load register Rd with an octa byte of data from source.

## **Instruction Formats**

| Disp       | L | LX <sub>2</sub> | D                 | isplaceme | ent <sub>110</sub> | Cr | Rs1 <sub>5</sub> | Rd₅ | 385             |
|------------|---|-----------------|-------------------|-----------|--------------------|----|------------------|-----|-----------------|
| Indexed Ld | 1 | $LX_2$          | Disp <sub>5</sub> | $Sc_2$    | Rs2₅               | Cr | Rs1₅             | Rd₅ | 38 <sub>5</sub> |

Execution Units: AGEN, MEM

**Exceptions:** 

## STB Rn, <ea> - Store Byte

#### **Description:**

Store the lowest byte from register Rs to memory.

#### **Instruction Format**

| Disp       | 0 | LX <sub>2</sub> | Displacement <sub>110</sub> |        |                  |   | Rs1 <sub>5</sub> | Rs2 <sub>5</sub> | 405 |
|------------|---|-----------------|-----------------------------|--------|------------------|---|------------------|------------------|-----|
| Indexed St | 1 | $LX_2$          | Disp <sub>5</sub>           | $Sc_2$ | Rs2 <sub>5</sub> | C | Rs1₅             | Rs3₅             | 405 |

Execution Units: AGEN, MEM

**Exceptions:** 

Notes:

STBI Rn, <ea> - Store Byte Immediate

#### **Description:**

Store a constant byte to memory. The constant is located in the last half of the cache line offset by  $CL_3$  words.

#### **Instruction Format**

| Disp       | 0 | LX <sub>2</sub> | Displacement <sub>110</sub> |        |      |   | Rs1 <sub>5</sub> | ٧ | CL <sub>3</sub> | 415 |
|------------|---|-----------------|-----------------------------|--------|------|---|------------------|---|-----------------|-----|
| Indexed St | 1 | $LX_2$          | Disp₅                       | $Sc_2$ | Rs2₅ | С | Rs1₅             | ۲ | CL₃             | 415 |

Execution Units: AGEN, MEM

**Exceptions:** 

## STORE Rn, <ea> - Store Register

#### **Description:**

This is an alternate mnemonic for the STO instruction. Store register Rs to memory.

#### **Instruction Formats**

|    | Disp     | 0 | LX <sub>2</sub> | D                 | isplaceme | ent <sub>110</sub> | U | Rs1₅ | Rs2 <sub>5</sub> | Opcode₅ |
|----|----------|---|-----------------|-------------------|-----------|--------------------|---|------|------------------|---------|
| In | dexed St | 1 | $LX_2$          | Disp <sub>5</sub> | $Sc_2$    | Rs2₅               | С | Rs1₅ | Rs3₅             | Opcode₅ |

Execution Units: AGEN, MEM

**Exceptions:** 

Notes:

STOREI N, <ea> - Store Immediate

#### **Description:**

This is an alternate mnemonic for the <u>STO</u> instruction. Store immediate value to memory. The immediate value is referenced as a constant on the cache line. The index to the memory containing the constant is specified by CL<sub>3</sub>. Note that the immediate constants may be located only in the second half of a cache line. There are only eight possible locations.

#### **Instruction Formats**

| Disp       | 0 | LX <sub>2</sub> | D                 | isplaceme       | ent <sub>110</sub> | U | Rs1₅ | 2 | CL <sub>3</sub> | Opcode <sub>5</sub> |
|------------|---|-----------------|-------------------|-----------------|--------------------|---|------|---|-----------------|---------------------|
| Indexed St | 1 | LX <sub>2</sub> | Disp <sub>5</sub> | Sc <sub>2</sub> | Rs2 <sub>5</sub>   | U | Rs1₅ | 2 | CL <sub>3</sub> | Opcode <sub>5</sub> |

Execution Units: AGEN, MEM

**Exceptions:** 

## STT Rn, <ea> - Store Tetra

## **Description:**

Store the lowest tetra (4 bytes) from register Rs2 to memory.

#### **Instruction Format**

| Disp       | 0 | LX <sub>2</sub> | Displacement <sub>110</sub> |        |      |   | Rs1 <sub>5</sub> | Rs2 <sub>5</sub> | 445                    |
|------------|---|-----------------|-----------------------------|--------|------|---|------------------|------------------|------------------------|
| Indexed St | 1 | $LX_2$          | Disp <sub>5</sub>           | $Sc_2$ | Rs2₅ | С | Rs1₅             | Rs3₅             | <b>44</b> <sub>5</sub> |

Execution Units: AGEN, MEM

**Exceptions:** 

Notes:

STTI Rn, <ea> - Store Tetra Immediate

#### **Description:**

Store a constant tetra to memory. The constant is located in the last half of the cache line offset by  $CL_3$  words.

#### **Instruction Format**

| Disp       | 0 | LX <sub>2</sub> | Displacement <sub>110</sub> |        |      |   | Rs1 <sub>5</sub> | ٧ | CL <sub>3</sub> | 45 <sub>5</sub> |
|------------|---|-----------------|-----------------------------|--------|------|---|------------------|---|-----------------|-----------------|
| Indexed St | 1 | $LX_2$          | Disp₅                       | $Sc_2$ | Rs2₅ | С | Rs1₅             | ۲ | CL₃             | 45₅             |

Execution Units: AGEN, MEM

**Exceptions:** 

## STW Rn, <ea> - Store Wyde

#### **Description:**

Store the lowest wyde (2 bytes) from register Rs2 to memory.

#### **Instruction Format**

| Disp       | 0 | LX <sub>2</sub> | Displacement <sub>110</sub> |        |                  |   | Rs1 <sub>5</sub> | Rs2 <sub>5</sub> | 425 |
|------------|---|-----------------|-----------------------------|--------|------------------|---|------------------|------------------|-----|
| Indexed St | 1 | $LX_2$          | Disp <sub>5</sub>           | $Sc_2$ | Rs2 <sub>5</sub> | C | Rs1₅             | Rs3₅             | 425 |

Execution Units: AGEN, MEM

**Exceptions:** 

Notes:

STWI Rn, <ea> - Store Wyde Immediate

## **Description:**

Store a constant wyde to memory. The constant is located in the last half of the cache line offset by  $CL_3$  words.

#### **Instruction Format**

| Disp       | 0 | LX <sub>2</sub> | Displacement <sub>110</sub> |        |      |   | Rs1 <sub>5</sub> | ٧ | CL <sub>3</sub> | 435 |
|------------|---|-----------------|-----------------------------|--------|------|---|------------------|---|-----------------|-----|
| Indexed St | 1 | $LX_2$          | Disp₅                       | $Sc_2$ | Rs2₅ | С | Rs1₅             | ۲ | CL₃             | 435 |

Execution Units: AGEN, MEM

**Exceptions:** 

## **Condition Register Instructions**

## CLC - Clear Carry

#### **Description:**

This is an alternate mnemonic for the CRANDC instruction where the manipulated bit in the condition register is the carry bit (bit 5 of the CR).

#### **Instruction Format: R3**

|     | 31 | 3029 | 28 |                       | 18 | 17 | 12                               | 11 | 6                              | 5 | 0  |
|-----|----|------|----|-----------------------|----|----|----------------------------------|----|--------------------------------|---|----|
| CLC | 0  | 02   | 34 | <b>~</b> <sub>6</sub> | 1  | CF | Rs1 <sub>3</sub> ,5 <sub>3</sub> | CR | d <sub>3</sub> ,5 <sub>3</sub> | 1 | 16 |

Operation:

Clock Cycles: 1

**Execution Units:** All Integer ALUs, all FPUs

Exceptions: none

Notes:

#### CLV - Clear Overflow

#### **Description:**

This is an alternate mnemonic for the CRANDC instruction where the manipulated bit in the condition register is the overflow bit (bit 6 of the CR).

#### **Instruction Format: R3**

| Ī |     | 31 | 3029 | 28 |                       | 18 17 |          | 12 | 11      | 6 | 5   | 0 |
|---|-----|----|------|----|-----------------------|-------|----------|----|---------|---|-----|---|
| Ī | CLV | 0  | 02   | 34 | <b>~</b> <sub>6</sub> | 1     | CRs13,63 |    | CRd₃,6₃ |   | 116 |   |

Operation:

Clock Cycles: 1

**Execution Units:** All Integer ALUs, all FPUs

Exceptions: none

#### CRAND – Bit And

## **Description:**

Bit 'and' two source condition register bits CRs1 and CRs2 OR source condition register Rs1 and a constant bit and place the result in the destination condition register bit CRd.

#### **Instruction Format:** R3

|       | 31 | 3029 | 28 |                       |                   | 18 | 17                | 12               | 11               | 6      | 5   | 0  |
|-------|----|------|----|-----------------------|-------------------|----|-------------------|------------------|------------------|--------|-----|----|
| CRAND | 0  | 02   | 04 | <b>~</b> <sub>6</sub> |                   | _  | CRs1 <sub>6</sub> |                  | CRd <sub>6</sub> |        | 116 |    |
| CRAND | 1  | 02   | 04 | ۲                     | CRs2 <sub>6</sub> |    | С                 | Rs1 <sub>6</sub> | CI               | $Rd_6$ | 1   | 16 |

Operation:

Clock Cycles: 1

**Execution Units:** All Integer ALUs, all FPUs

Exceptions: none

## CRANDC – Bit And with Complement

## **Description:**

Bit 'and' with complement two source condition register bits CRs1 and CRs2 OR source condition register Rs1 and a constant bit and place the result in the destination condition register bit CRd. This instruction may be used to clear the specified bit in the condition register, for example, the carry bit.

#### **Instruction Format: R3**

|        | 31 | 3029 | 28 |   |                   | 18                | 17 | 12                                 | 1                | 1 6 | 5   | 0   |
|--------|----|------|----|---|-------------------|-------------------|----|------------------------------------|------------------|-----|-----|-----|
| CRANDC | 0  | 02   | 34 |   | ~ <sub>6</sub>    |                   |    | CRs1 <sub>6</sub>                  | CRd <sub>6</sub> |     |     | 116 |
| CRANDC | 1  | 02   | 34 | ٧ | CRs2 <sub>6</sub> | CRs2 <sub>6</sub> |    | CRs1 <sub>6</sub> CRd <sub>6</sub> |                  |     | 116 |     |

Operation:

Clock Cycles: 1

**Execution Units:** All Integer ALUs, all FPUs

Exceptions: none

#### CROR – Bit Or

## **Description:**

Bit 'or' two source condition register bits CRs1 and CRs2 OR source condition register Rs1 and a constant bit and place the result in the destination condition register bit CRd. This instruction may be used to set a bit in a condition register. For example, the carry bit.

### **Instruction Format: R3**

|      | 31 | 3029 | 28 |   |                       | 18                | 17 | 12                                 | 11 | 6                | 5 | 0  |
|------|----|------|----|---|-----------------------|-------------------|----|------------------------------------|----|------------------|---|----|
| CROR | 0  | 02   | 14 |   | <b>~</b> <sub>6</sub> | -                 |    | CRs1 <sub>6</sub>                  |    | CRd <sub>6</sub> | 1 | 16 |
| CROR | 1  | 02   | 14 | ۲ | CRs2 <sub>6</sub>     | CRs2 <sub>6</sub> |    | CRs1 <sub>6</sub> CRd <sub>6</sub> |    | CRd <sub>6</sub> | 1 | 16 |

Operation:

Clock Cycles: 1

**Execution Units:** All Integer ALUs, all FPUs

Exceptions: none

### CRXOR - Bit Exclusive Or

### **Description:**

Bit exclusive 'or' two source condition register bits CRs1 and CRs2 OR source condition register Rs1 and a constant bit and place the result in the destination condition register bit CRd. This instruction may be used to flip a bit in a condition register. For example, the carry bit.

#### **Instruction Format: R3**

|       | 31 | 3029 | 28 |   |                       | 18                | 17 | 12                                 | 11               | 6                | 5 | 0  |
|-------|----|------|----|---|-----------------------|-------------------|----|------------------------------------|------------------|------------------|---|----|
| CRXOR | 0  | 02   | 24 |   | <b>~</b> <sub>6</sub> |                   |    | CRs1 <sub>6</sub>                  | CRd <sub>6</sub> |                  |   | 16 |
| CRXOR | 1  | 02   | 24 | ۲ | CRs2 <sub>6</sub>     | CRs2 <sub>6</sub> |    | CRs1 <sub>6</sub> CRd <sub>6</sub> |                  | CRd <sub>6</sub> | 1 | 16 |

Operation:

Clock Cycles: 1

**Execution Units:** All Integer ALUs, all FPUs

Exceptions: none

## SEC - Set Carry

#### **Description:**

This is an alternate mnemonic for the CROR instruction where the manipulated bit in the condition register is the carry bit (bit 5 of the CR).

#### **Instruction Format: R3**

|     | 31 | 3029 | 28 |                       | 18 | 17 | 12                                | 11 | 6                              | 5 | 0  |
|-----|----|------|----|-----------------------|----|----|-----------------------------------|----|--------------------------------|---|----|
| SEC | 0  | 02   | 14 | <b>~</b> <sub>6</sub> | 1  | С  | CRs1 <sub>3</sub> ,5 <sub>3</sub> | CR | d <sub>3</sub> ,5 <sub>3</sub> | 1 | 16 |

Operation:

Clock Cycles: 1

**Execution Units:** All Integer ALUs, all FPUs

Exceptions: none

Notes:

SEV - Set Overflow

#### **Description:**

This is an alternate mnemonic for the CROR instruction where the manipulated bit in the condition register is the overflow bit (bit 6 of the CR).

#### **Instruction Format: R3**

|     | 31 | 3029 | 28 |                       | 18 | 17 | 12      | 11  | 6                              | 5 | 0  |
|-----|----|------|----|-----------------------|----|----|---------|-----|--------------------------------|---|----|
| SEV | 0  | 02   | 14 | <b>~</b> <sub>6</sub> | 1  | CR | Rs1₃,6₃ | CRo | d <sub>3</sub> ,6 <sub>3</sub> | 1 | 16 |

Operation:

Clock Cycles: 1

**Execution Units:** All Integer ALUs, all FPUs

Exceptions: none

### Branch / Flow Control Instructions

#### Overview

#### **Mnemonics**

There are mnemonics for specifying the comparison method. Floating-point comparisons prefix the branch mnemonic with 'F' as in FBEQ. There is no prefix for integer branches. For branches that decrement the loop count register LC, the mnemonic is prefixed with the decrement condition as in 'DNZ\_BNE'.

#### **Conditional Branch Format**

|             | 31 | 3029            | 28               |      | 1                | 16 | 15   | 11              | 10         | 6    | 5   | 0 |   |
|-------------|----|-----------------|------------------|------|------------------|----|------|-----------------|------------|------|-----|---|---|
| [Dcc]Bcc[L] | 0  | D <sub>12</sub> | BRs <sub>3</sub> | Cnd₄ | CRs <sub>6</sub> |    | Disp | 103             |            | BRd₃ | 125 |   | D |
| [Dcc]Bcc[L] | 1  | 02              | BRs <sub>3</sub> | Cnd₄ | CRs <sub>6</sub> | ٧  | Rs1₅ |                 | <b>~</b> 2 | BRd₃ | 125 |   |   |
| [Dcc]Bcc[L] | 1  | 12              | BRs <sub>3</sub> | Cnd₄ | CRs <sub>6</sub> |    | ~4   | CL <sub>3</sub> | ~          | BRd₃ | 125 |   |   |

| Field              | Purpose                                                                     |
|--------------------|-----------------------------------------------------------------------------|
| Cnd₄               | Branch condition that must be met                                           |
| CRs <sub>6</sub>   | Condition register bit to test; low 3 bits are bit number, high 3 are regno |
| BRd <sub>3</sub>   | Destination branch linkage register to store return address                 |
| BRs <sub>3</sub>   | Source branch linkage register to get address to jump to                    |
| Disp <sub>13</sub> | 13-bit displacement from BRs <sub>3</sub>                                   |

Note that extended displacements are possible. A 32 or 64-bit displacement may be selected by setting bits 29 to 31 of the instruction appropriately.

### **Predicated Execution**

Branch instructions will execute only if both the predicate and branch condition are true.

#### Conditions

Conditional branches branch to the target address only if the condition is true. The condition is determined by the status of the loop count register and the specified condition register bit state. The condition register will have typically been previously set by a compare instruction.

#### Table of Conditions

| Cnd₄   | Tested Conditions                                          | Mnemonic |
|--------|------------------------------------------------------------|----------|
| 0      | Decrement and branch if LC non-zero and condition is false | DBNZ_Bcc |
| 1      | Decrement and branch if LC zero and condition is false     | DBZ_Bcc  |
| 2      | Branch if condition false                                  | Всс      |
| 3      | Decrement and branch if LC non-zero and condition is true  | DBNZ_Bcc |
| 4      | Decrement and branch if LC zero and condition is true      | DBZ_Bcc  |
| 5      | Branch if condition true                                   | Всс      |
| 6      | Decrement and branch if LC non-zero                        | DBNZ     |
| 7      | Decrement and branch if LC zero                            | DBZ      |
| Others | reserved                                                   |          |

The condition register contains eight bits with the following format:

| Bit |           |                                                         |
|-----|-----------|---------------------------------------------------------|
| 0   | EQ / XNOR | Set if bitwise XNOR of operands is true, equal          |
| 1   | NAND      | Set if logical NAND of operands is true                 |
| 2   | NOR       | Set if logical NOR of operands is true                  |
| 3   | LT        | Set if less than                                        |
| 4   | LE        | Set if less than or equal                               |
| 5   | CA        | Carry out from operation (addition, subtraction, shift) |
| 6   | OF/UN     | Overflow status or unordered for floating-point         |
| 7   |           |                                                         |

# **Branch Target**

#### Conditional Branches

For conditional branches, the destination address is formed as the sum of a code address (branch) register and a displacement constant specified in the instruction. Relative branches have a maximum range of 64 displacement bits. Inherent in the first word of the instruction is a 13-bit displacement making the address range ±4kB.

The destination displacement field is recommended to be at least 16-bits. It is possible to get by with a displacement as small as 12-bits before a significant percentage of branches must be implemented as two or more instructions.

## **Decrementing Branches**

Branches may decrement the loop count register by one after performing the branch comparison or logical operation. The condition field of the instruction indicates when a change should occur. Decrementing branches make use of both the flow control unit and an ALU at the same time.

#### **Unconditional Branches**

The destination displacement field is large enough to accommodate a  $\pm 2^{24}$  range or  $\pm 16$ MB. The range may be extended using extended constants to 32 or 64 bits. The destination address is formed as the sum of a code address register PC and the displacement constant. The return address may be stored in register BRd.

|        | 31 | 3029            | 28               | 16                        |    | 15   | 11              | 10 | 6    | 5   | 0 |   |
|--------|----|-----------------|------------------|---------------------------|----|------|-----------------|----|------|-----|---|---|
| B[L]   | L  | LX <sub>2</sub> |                  | Displacement <sub>2</sub> | 23 | 3    |                 |    | BRd₃ | 135 |   | D |
| BLR[L] | 1  | 02              | BRs <sub>3</sub> | Immedia                   |    | BRd₃ | 13 <sub>5</sub> |    | D    |     |   |   |

## B – Branch Always

## B label[BR]

## **Description:**

This instruction always jumps to the destination address. The destination address range is  $\pm 2^{24}$  bits. This is an alternate mnemonic for the BL instruction where the branch link register is BR0.

## Formats Supported: B

|   | 31 | 3029   | 28 | 16 | 15  | 11 | 10 | 6 | 5 | 0 |  |
|---|----|--------|----|----|-----|----|----|---|---|---|--|
| В | L  | $LX_2$ |    | 03 | 135 |    | D  |   |   |   |  |

### Operation:

PC = PC + Constant

**Execution Units**: Flow Control

Clock Cycles: 13

Exceptions: none

#### BAND -Branch if And

BAND CRs, label[BRs]

### **Description:**

Branch if the logical and of operation resulted in a true condition.

## Formats Supported: BR

|      | 31 | 3029            | 28               |    | 1      | 6 | 15  | 11                | 10 | 6  | 5   | 0 |   |
|------|----|-----------------|------------------|----|--------|---|-----|-------------------|----|----|-----|---|---|
| BAND | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 24 | CRs,1₃ |   | Dis | sp <sub>103</sub> |    | 03 | 125 |   | D |

### **Clock Cycles: 13**

### BANDL -Branch if And and Link

BANDL CRs, BRd, label[BRs]

### **Description:**

Branch if the logical and operation resulted in a true condition. Store the address of the next instruction in BRd.

### Formats Supported: BR

|       | 31 | 3029            | 28               |    | 1      | 6 | 15  | 11                | 10 | 6    | 5   | 0 |   |
|-------|----|-----------------|------------------|----|--------|---|-----|-------------------|----|------|-----|---|---|
| BANDL | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 24 | CRs,1₃ |   | Dis | sp <sub>103</sub> |    | BRd₃ | 125 |   | D |

## BCC -Branch if Carry Clear

BCC CRs, label[BRs]

### **Description:**

Branch if the operation resulted in no carry condition.

## Formats Supported: BR

|     | 31 | 3029            | 28               |    |        | 6 | 15  | 11                | 10 | 6  | 5   | 0 |   |
|-----|----|-----------------|------------------|----|--------|---|-----|-------------------|----|----|-----|---|---|
| BCC | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 24 | CRs,5₃ |   | Dis | sp <sub>103</sub> |    | 03 | 125 |   | D |

### **Clock Cycles: 13**

## BCCL -Branch if Carry Clear and Link

BCCL CRs, BRd, label[BRs]

### **Description:**

Branch if the operation did not result in a carry condition. Store the address of the next instruction in BRd.

### Formats Supported: BR

|      | 31 | 3029            | 28               |    |        | 6 | 15  | 11                | 10 | 6    | 5   | 0 |   |
|------|----|-----------------|------------------|----|--------|---|-----|-------------------|----|------|-----|---|---|
| BCCL | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 24 | CRs,5₃ |   | Dis | sp <sub>103</sub> |    | BRd₃ | 125 |   | D |

# BCS - Branch if Carry Set

BCS CRs, label[BRs]

## **Description:**

Branch if the operation resulted in a carry condition.

## Formats Supported: BR

|     | 31 | 3029            | 28               |    |        | 16 | 15  | 11                | 10 | 6  | 5   | 0 |   |
|-----|----|-----------------|------------------|----|--------|----|-----|-------------------|----|----|-----|---|---|
| BCS | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 54 | CRs,5₃ |    | Dis | sp <sub>103</sub> |    | 03 | 125 |   | D |

### **Clock Cycles: 13**

## BCSL -Branch if Carry Set and Link

BCSL CRs, BRd, label[BRs]

### **Description:**

Branch if the operation resulted in a carry condition. Store the address of the next instruction in BRd.

### Formats Supported: BR

|      | 31 | 3029            | 28               |    | 1                  | 16 | 15  | 11                | 10 | 6    | 5   | 0 |   |
|------|----|-----------------|------------------|----|--------------------|----|-----|-------------------|----|------|-----|---|---|
| BCSL | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 54 | CRs,5 <sub>3</sub> |    | Dis | sp <sub>103</sub> |    | BRd₃ | 125 |   | D |

## BEQ -Branch if Equal

BEQ CRs, label[BRs]

### **Description:**

Branch if source operands were equal as a result of a compare operation.

### Formats Supported: BR

|     | 31 | 3029            | 28               |    |        | 16 | 15  | 11                | 10 | 6  | 5   | 0 |   |
|-----|----|-----------------|------------------|----|--------|----|-----|-------------------|----|----|-----|---|---|
| BEQ | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 54 | CRs,0₃ |    | Dis | sp <sub>103</sub> |    | 03 | 125 |   | D |

#### Clock Cycles: 13

## BEQL -Branch if Equal and Link

BEQL CRs, BRd, label[BRs]

### **Description:**

Branch if source operands were equal as a result of a compare operation.

### Formats Supported: BR

|      | 31 | 3029            | 28               |    | 1                  | 16 | 15  | 11                | 10 | 6    | 5   | 0 |   |
|------|----|-----------------|------------------|----|--------------------|----|-----|-------------------|----|------|-----|---|---|
| BEQL | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 54 | CRs,0 <sub>3</sub> |    | Dis | sp <sub>103</sub> |    | BRd₃ | 125 |   | D |

### **Clock Cycles: 13**

## BGE -Branch if Greater Than or Equal

BGE CRs, label[BRs]

### **Description:**

Branch if source operands were greater than or equal as a result of a compare operation.

### Formats Supported: BR

|     | 31 | 3029            | 28               |    | 1                  | 16 | 15  | 11 | 10 | 6  | 5   | 0 |   |
|-----|----|-----------------|------------------|----|--------------------|----|-----|----|----|----|-----|---|---|
| BGE | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 24 | CRs,3 <sub>3</sub> |    | Dis |    |    | 03 | 125 |   | D |

## BGEL -Branch if Less Than or Equal and Link

BGEL CRs, BRd, label[BRs]

#### **Description:**

Branch if source operands were greater than or equal as a result of a compare operation. The destination address range is  $\pm 2^{31}$  bits. Branch register BR7 may not be used to store the return address as it is a reference to the program counter.

#### Formats Supported: BR

|      | 31 | 3029            | 28               |    | 1      | 6 | 15  | 11 | 10 | 6    | 5   | 0 |   |
|------|----|-----------------|------------------|----|--------|---|-----|----|----|------|-----|---|---|
| BGEL | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 24 | CRs,3₃ |   | Dis |    |    | BRd₃ | 125 | _ | D |

#### **Clock Cycles: 13**

### **BGT**-Branch if Greater Than

BGT CRs, label[BRs]

#### **Description:**

Branch if source operands were greater than as a result of a compare operation.

#### Formats Supported: BR

|     | 31 | 3029            | 28               |    |        | 6 | 15  | 11                | 10 | 6  | 5   | 0 |   |
|-----|----|-----------------|------------------|----|--------|---|-----|-------------------|----|----|-----|---|---|
| BGE | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 24 | CRs,43 |   | Dis | SP <sub>103</sub> |    | 03 | 125 |   | D |

#### Clock Cycles: 13

### BGTL -Branch if Less Than or Equal and Link

BGTL CRs, BRd, label[BRs]

#### **Description:**

Branch if source operands were greater than as a result of a compare operation. The destination address range is  $\pm 2^{31}$  bits. Branch register BR7 may not be used to store the return address as it is a reference to the program counter.

### Formats Supported: BR

|      | 31 | 3029            | 28   |    |                    | 6 | 15  | 11                | 10 | 6    | 5   | 0 |   |
|------|----|-----------------|------|----|--------------------|---|-----|-------------------|----|------|-----|---|---|
| BGEL | 0  | D <sub>12</sub> | BRs₃ | 24 | CRs,4 <sub>3</sub> |   | Dis | sp <sub>103</sub> |    | BRd₃ | 125 |   | D |

### BL – Branch and Link

BL BR, label[BR]

## **Description:**

This instruction always jumps to the destination address. The destination address range is  $\pm 2^{31}$  bits. Branch register BR7 may not be used to store the return address as it is a reference to the program counter.

## Formats Supported: BL

|    | 31 | 3029   | 28   | 16                     | 15 | 11 | 10 | 6                | 5               | 0 |   |
|----|----|--------|------|------------------------|----|----|----|------------------|-----------------|---|---|
| BL | L  | $LX_2$ | Disp | lacement <sub>22</sub> | .3 |    |    | BRd <sub>3</sub> | 13 <sub>5</sub> |   | D |

### Operation:

BRd = next PC

PC = PC + Constant

**Execution Units**: Flow Control

**Clock Cycles: 13** 

Exceptions: none

## BLR – Branch to Link Register

### BLR label[BR]

### **Description:**

This instruction always jumps to the destination address. The destination address is formed as the contents of BRs added to a 32-bit displacement. This instruction may be used to return from a subroutine. The 'L' bit of the instruction specifies if a limit should be applied for the value in BRs.

### Formats Supported: BL

| BLR | 1 | 02 | BRs <sub>3</sub> | Limit <sub>132</sub> | ~ | Rs1₅ |                 | ~2 | 03 | 135             | L |
|-----|---|----|------------------|----------------------|---|------|-----------------|----|----|-----------------|---|
| BLR | 1 | 12 | BRs <sub>3</sub> | Limit <sub>132</sub> |   | ~4   | CL <sub>3</sub> | ٧  | 03 | 13 <sub>5</sub> | L |

## Operation:

**Execution Units:** Flow Control

**Clock Cycles: 13** 

Exceptions: none

## BLRL - Branch to Link Register and Link

BLRL BRd, label [BRs]

### **Description:**

This instruction always jumps to the destination address. The destination address is formed as the contents of BRs added to a 32-bit displacement or the contents of Rs1. The address of the next instruction is stored in a branch link register BRd. Branch register BR7 may not be used to store the return address as it is a reference to the program counter. The 'L' bit of the instruction specifies if a limit should be applied for the value in BRs.

#### Formats Supported: BL

| BLRL | 1 | 02 | BRs <sub>3</sub> | Limit <sub>132</sub> | ~ | Rs1₅ |                 | ~2 | BRd₃ | 135             | L |
|------|---|----|------------------|----------------------|---|------|-----------------|----|------|-----------------|---|
| BLRL | 1 | 12 | BRs <sub>3</sub> | Limit <sub>132</sub> |   | ~4   | CL <sub>3</sub> | ٧  | BRd₃ | 13 <sub>5</sub> | L |

## Operation:

BRd = next PC

PC = BRs + Constant<sub>32</sub>

OR

PC = BRs + Rs1

**Execution Units:** Flow Control

**Clock Cycles: 13** 

Exceptions: none

## BLE -Branch if Less Than or Equal

BLE CRs, label[BRs]

### **Description:**

Branch if source operands were less than or equal as a result of a compare operation.

### Formats Supported: BR

|     | 31 | 3029            | 28   |    | 1      | 6 | 15  | 11               | 10 | 6  | 5   | 0 |   |
|-----|----|-----------------|------|----|--------|---|-----|------------------|----|----|-----|---|---|
| BLE | 0  | D <sub>12</sub> | BRs₃ | 54 | CRs,43 |   | Dis | p <sub>103</sub> |    | 03 | 125 |   | О |

### **Clock Cycles: 13**

## BLEL -Branch if Less Than or Equal and Link

BLEL CRs, BRd, label[BRs]

#### **Description:**

Branch if source operands were less than or equal as a result of a compare operation. The destination address range is  $\pm 2^{31}$  bits. Branch register BR7 may not be used to store the return address as it is a reference to the program counter.

### Formats Supported: BR

|      | 31 | 3029            | 28               |    | 1                  | 6 | 15  | 11                | 10 | 6    | 5   | 0 |   |
|------|----|-----------------|------------------|----|--------------------|---|-----|-------------------|----|------|-----|---|---|
| BLEL | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 54 | CRs,4 <sub>3</sub> |   | Dis | sp <sub>103</sub> |    | BRd₃ | 125 |   | D |

### BLT -Branch if Less Than

BLT CRs, label[BRs]

### **Description:**

Branch if source operands were less than as a result of a compare operation.

### Formats Supported: BR

| ſ |     | 31 | 3029            | 28   |    |        | 6 | 15  | 11                | 10 | 6  | 5   | 0 |   |
|---|-----|----|-----------------|------|----|--------|---|-----|-------------------|----|----|-----|---|---|
|   | BLT | 0  | D <sub>12</sub> | BRs₃ | 54 | CRs,3₃ |   | Dis | sp <sub>103</sub> |    | 03 | 125 |   | D |

### **Clock Cycles: 13**

## BLTL -Branch if Less Than and Link

BLTL CRs, BRd, label[BRs]

### **Description:**

Branch if source operands were less than as a result of a compare operation. The destination address range is  $\pm 2^{31}$  bits. Branch register BR7 may not be used to store the return address as it is a reference to the program counter.

### Formats Supported: BR

|      | 31 | 3029            | 28   |    | 1      | 6 | 15  | 11                | 10 | 6    | 5   | 0 |   |
|------|----|-----------------|------|----|--------|---|-----|-------------------|----|------|-----|---|---|
| BLTL | 0  | D <sub>12</sub> | BRs₃ | 54 | CRs,3₃ |   | Dis | Sp <sub>103</sub> |    | BRd₃ | 125 |   | D |

#### BNAND -Branch if Nand

BNAND CRs, label[BRs]

### **Description:**

Branch if the logical nand of operation resulted in a true condition.

## Formats Supported: BR

|       | 31 | 3029            | 28               |    | 1      | 16 | 15  | 11                | 10 | 6  | 5   | 0 |   |
|-------|----|-----------------|------------------|----|--------|----|-----|-------------------|----|----|-----|---|---|
| BNAND | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 54 | CRs,1₃ |    | Dis | sp <sub>103</sub> |    | 03 | 125 |   | D |

### **Clock Cycles: 13**

### BNANDL -Branch if Nand and Link

BNANDL CRs, BRd, label[BRs]

### **Description:**

Branch if the logical nand operation resulted in a true condition. Store the address of the next instruction in BRd.

### Formats Supported: BR

|        | 31 | 3029            | 28               |    | 1                  | 6 | 15  | 11                | 10 | 6    | 5   | 0 |   |
|--------|----|-----------------|------------------|----|--------------------|---|-----|-------------------|----|------|-----|---|---|
| BNANDL | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 54 | CRs,1 <sub>3</sub> |   | Dis | sp <sub>103</sub> |    | BRd₃ | 125 |   | D |

### BNOR -Branch if Nor

BNOR CRs, label[BRs]

### **Description:**

Branch if the logical nor of operation resulted in a true condition.

## Formats Supported: BR

|      | 31 | 3029            | 28               |    |                    | 16 | 15  | 11                | 10 | 6  | 5   | 0 |   |
|------|----|-----------------|------------------|----|--------------------|----|-----|-------------------|----|----|-----|---|---|
| BNOR | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 54 | CRs,2 <sub>3</sub> |    | Dis | sp <sub>103</sub> |    | 03 | 125 |   | D |

## Clock Cycles: 13

### BNORL -Branch if Nor and Link

BNORL CRs, BRd, label[BRs]

### **Description:**

Branch if the logical nor operation resulted in a true condition. Store the address of the next instruction in BRd.

### Formats Supported: BR

|       | 31 | 3029            | 28               |    | 1                  | 6 | 15  | 11                | 10 | 6    | 5   | 0 |   |
|-------|----|-----------------|------------------|----|--------------------|---|-----|-------------------|----|------|-----|---|---|
| BNORL | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 54 | CRs,2 <sub>3</sub> |   | Dis | sp <sub>103</sub> |    | BRd₃ | 125 | , | D |

## BNE -Branch if Not Equal

BNE CRs, label[BRs]

### **Description:**

Branch if source operands were not equal as a result of a compare operation.

### Formats Supported: BR

|     | 31 | 3029            | 28               |    | 1      | 6 | 15  | 11               | 10 | 6  | 5   | 0 |   |
|-----|----|-----------------|------------------|----|--------|---|-----|------------------|----|----|-----|---|---|
| BNE | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 24 | CRs,0₃ |   | Dis | p <sub>103</sub> |    | 03 | 125 |   | D |

#### Clock Cycles: 13

## BNEL -Branch if Not Equal and Link

BNEL CRs, BRd, label[BRs]

### **Description:**

Branch if source operands were not equal as a result of a compare operation. The destination address range is  $\pm 2^{31}$  bits. Branch register BR7 may not be used to store the return address as it is a reference to the program counter.

#### Formats Supported: BR

|      | 31 | 3029            | 28   |    | 1      | 6 | 15  | 11                | 10 | 6    | 5   | 0 |   |
|------|----|-----------------|------|----|--------|---|-----|-------------------|----|------|-----|---|---|
| BNEL | 0  | D <sub>12</sub> | BRs₃ | 24 | CRs,0₃ |   | Dis | Sp <sub>103</sub> |    | BRd₃ | 125 |   | D |

### BOR -Branch if Or

BOR CRs, label[BRs]

### **Description:**

Branch if the logical nor of operation resulted in a true condition.

## Formats Supported: BR

|     | 31 | 3029            | 28               |    |                    | 16 | 15  | 11                | 10 | 6  | 5   | 0 |   |
|-----|----|-----------------|------------------|----|--------------------|----|-----|-------------------|----|----|-----|---|---|
| BOR | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 24 | CRs,2 <sub>3</sub> |    | Dis | sp <sub>103</sub> |    | 03 | 125 |   | D |

### **Clock Cycles: 13**

### BORL -Branch if Or and Link

BORL CRs, BRd, label[BRs]

### **Description:**

Branch if the logical or operation resulted in a true condition. Store the address of the next instruction in BRd.

### Formats Supported: BR

|      | 31 | 3029            | 28               |    | 1                  | 6 | 15  | 11                | 10 | 6    | 5   | 0 |   |
|------|----|-----------------|------------------|----|--------------------|---|-----|-------------------|----|------|-----|---|---|
| BORL | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 24 | CRs,2 <sub>3</sub> |   | Dis | sp <sub>103</sub> |    | BRd₃ | 125 |   | D |

#### BT -Branch Table

BT label[BRs],Limit

#### **Description:**

Branch to a destination calculated as the sum of a branch register and a displacement. The branch register contains an offset from the start of the table. The offset must be greater than zero and less than the limit. If outside of these bounds, then the entry at the table limit is branched to.

#### Formats Supported: BR

| BT | 0 | D <sub>12</sub> | BRs <sub>3</sub> | Limit <sub>132</sub> | D <sub>2</sub> | Disp <sub>103</sub> |    | 0  | 306 |
|----|---|-----------------|------------------|----------------------|----------------|---------------------|----|----|-----|
| BT | 1 | 02              | BRs <sub>3</sub> | Limit <sub>132</sub> | ~2             | Rs1                 | ~2 | 03 | 306 |

**Clock Cycles: 13** 

#### BTL -Branch Table and Link

BTL label[BRs],Limit

### **Description:**

Branch to a destination calculated as the sum of a branch register and a displacement. The branch register contains an offset from the start of the table. The offset must be greater than zero and less than the limit. If outside of these bounds, then the entry at the table limit is branched to. The return address is stored in a link register BRd.

#### Formats Supported: BR

| BTL | 0 | D <sub>12</sub> | BRs <sub>3</sub> | Limit <sub>132</sub> | D <sub>2</sub> | Disp <sub>103</sub> |    | BRd <sub>3</sub> | 306             |
|-----|---|-----------------|------------------|----------------------|----------------|---------------------|----|------------------|-----------------|
| BTL | 1 | 02              | BRs <sub>3</sub> | Limit <sub>132</sub> | ~2             | Rs1                 | ~2 | BRd₃             | 30 <sub>6</sub> |

### BVS -Branch if Overflow Set

BVS CRs, label[BRs]

### **Description:**

Branch if the operation resulted in an overflow condition.

## Formats Supported: BR

|     | 31 | 3029            | 28               |    |        | 6 | 15  | 11                | 10 | 6  | 5   | 0 |   |
|-----|----|-----------------|------------------|----|--------|---|-----|-------------------|----|----|-----|---|---|
| BVS | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 54 | CRs,6₃ |   | Dis | sp <sub>103</sub> |    | 03 | 125 |   | D |

### **Clock Cycles: 13**

### BVSL -Branch if Overflow Set and Link

BVSL CRs, BRd, label[BRs]

### **Description:**

Branch if the operation resulted in an overflow condition. Store the address of the next instruction in BRd.

### Formats Supported: BR

|   |      | 31 | 3029            | 28               |    |                    | 16 | 15  | 11               | 10 | 6    | 5   | 0 |   |
|---|------|----|-----------------|------------------|----|--------------------|----|-----|------------------|----|------|-----|---|---|
| E | BVSL | 0  | D <sub>12</sub> | BRs <sub>3</sub> | 54 | CRs,6 <sub>3</sub> |    | Dis | p <sub>103</sub> |    | BRd₃ | 125 |   | D |

# NOP – No Operation

NOP

# **Description:**

This instruction does not perform any operation. Any value for bits 17 to 28 may be used.

## **Instruction Format:**

|     | 31 | 3029   | 28                      | 17 | 16 | 15                    | 11 | 10 | 6 | 5  | 0 |
|-----|----|--------|-------------------------|----|----|-----------------------|----|----|---|----|---|
| NOP | L  | $LX_2$ | Immediate <sub>12</sub> |    | 0  | <b>~</b> <sub>5</sub> |    | 05 |   | 63 |   |

## **System Instructions**

#### BRK - Break

### **Description:**

This instruction may initiate the processor debug routine if the BRK value matches the value set in the debug control register OR if the value zero is used. BRK instructions are treated as NOPs unless the value matches, excepting for the value zero. The processor enters debug mode. The cause code register is set to indicate execution of a BRK instruction. Interrupts are disabled. The program counter is reset to the vector located from the contents of tvec[3] and instructions begin executing. There should be a jump instruction placed at the break vector location. The address of the BRK instruction is stored in the EPC.

The debug BRK register is set to the value specified in the instruction.

Values with the MSB set will also trigger trace.

#### **Instruction Format: SYS**

|     | 31 | 3029 | 28                  | 17 | 16 | 15 | 11 | 10 | 6 | 5 | 0 |
|-----|----|------|---------------------|----|----|----|----|----|---|---|---|
| BRK | 0  | 02   | Value <sub>12</sub> |    | 0  | 05 |    | 05 | 5 | 0 | 6 |

#### Operation:

**PUSH SR** 

**PUSH PC** 

EPC = PC

PC = vector at (tvec[3])

**Execution Units:** Branch

**Clock Cycles:** 

Exceptions: none

### **REX – Redirect Exception**

#### **Description:**

This instruction redirects an exception from an operating mode to a lower operating mode. This instruction if successful jumps to the target exception handler and does not return. If this instruction fails execution will continue with the next instruction.

This instruction may fail if exceptions are not enabled at the target level.

The location of the target exception handler is found in the trap vector register for that operating mode (tvec[xx]).

The cause (cause) and bad address (badaddr) registers of the originating mode are copied to the corresponding registers in the target mode.

If the 'S' bit of the instruction is set, then the privilege level will be set to either a constant in the PL<sub>8</sub> field or the value in register Rs2. Otherwise the privilege level will remain unchanged.

#### **Instruction Format: EX**

|     | 31 | 3029 | 28 |                 |    | 17               | 16 | 15 | 11         | 10 | 6 | 5  | 0  |
|-----|----|------|----|-----------------|----|------------------|----|----|------------|----|---|----|----|
| REX | 1  | 02   | 12 | Tm <sub>2</sub> |    | PL <sub>70</sub> | S  | ,  | <b>5</b>   | 31 | 5 | 28 | 86 |
| REX | 0  | 02   | 12 | Tm <sub>2</sub> | ~3 | Rs2₅             | S  | ,  | <b>-</b> 5 | 31 | 5 | 28 | 86 |

| Tm <sub>2</sub> |                                       |
|-----------------|---------------------------------------|
| 0               | redirect to user mode                 |
| 1               | redirect to supervisor mode           |
| 2               | redirect to hypervisor mode           |
| 3               | Redirect to machine mode (from debug) |

Clock Cycles: 4

**Execution Units: Branch** 

Example:

REX 1 ; redirect to supervisor handler

; If the redirection failed, exceptions were likely disabled at the target level.

; Continue processing so the target level may complete its operation.

RTE ; redirection failed (exceptions disabled ?)

## Notes:

Since all exceptions are initially handled in machine mode the machine handler must check for disabled lower mode exceptions.

## SYS - System Call

### **Description:**

Perform a system call. Interrupts are disabled. The program counter is reset to the contents of the vector loaded from tvec[3] and instructions begin executing. There should be a jump instruction placed at the vector location. The address of the instruction following the SYS instruction is pushed onto an internal stack.

### **Instruction Format: SYS**

|     | 31 | 3029 | 28 |                     | 17                   | 16 | 15   | 11                    | 10  | 6 | 5  | 0  |
|-----|----|------|----|---------------------|----------------------|----|------|-----------------------|-----|---|----|----|
| SYS | 1  | 02   | 02 | Imme                | ediate <sub>10</sub> | 2  | Rs   | <b>1</b> <sub>5</sub> | 31  | 5 | 28 | 36 |
| SYS | 0  | 02   | 02 | ~5 Rs2 <sub>5</sub> |                      | ٧  | Rs1₅ |                       | 31₅ |   | 28 | 36 |

## Operation:

PUSH SR onto internal stack
PUSH PC + 4 onto internal stack
PC = tvec[3]

**Execution Units:** Branch

**Clock Cycles:** 

Exceptions: none

# TRAPcc – Trap if Condition Met

## **Description:**

A register, Rs1, is compared to register Rs2 or an immediate value. If the relationship between the registers matches the trap condition, then a trap exception occurs.

**Instruction Format:** R2

|      | 31 | 3029 | 28                  | 17                | 16 | 15   | 11                    | 10  | 6   | 5   | 0  |
|------|----|------|---------------------|-------------------|----|------|-----------------------|-----|-----|-----|----|
| TRAP | 1  | 02   | Immedia             | ite <sub>12</sub> | ~  | Rs   | <b>1</b> <sub>5</sub> | Cor | nd₅ | 28  | 86 |
| TRAP | 0  | 02   | ~ <sub>7</sub> Rs2₅ |                   | ~  | Rs1₅ |                       | Cor | าd₅ | 286 |    |

Cond<sub>5</sub> exception when

| Conus | exception when            |
|-------|---------------------------|
| 0     | Rs1 == Rs2                |
| 1     | Rs1 <> Rs2                |
| 2     | Rs1 < Rs2                 |
| 3     | Rs1 <= Rs2                |
| 4     | Rs1 >= Rs2                |
| 5     | Rs1 > Rs2                 |
| 6     | Rs1 < Rs2 (unsigned)      |
| 7     | Rs1 <= Rs2 (unsigned)     |
| 8     | Rs1 >= Rs2 (unsigned)     |
| 9     | Rs1 > Rs2 (unsigned)      |
| 10    |                           |
| 31    | Always trap (system call) |

## Operation:

IF check failed

PUSH SR onto internal stack
PUSH PC plus 4 onto internal stack
PC = vector at (tvec[3] + cause\*8)

Clock Cycles: 1

**Execution Units:** Integer ALU

**Exceptions**: bounds check

### Notes:

The system exception handler will typically transfer processing back to a local exception handler.

### Pre/Postfixes and Modifiers

#### **ATOM Modifier**

#### **Description:**

Treat the following sequence of instructions as an "atom". The instruction sequence is executed with interrupts set to the specified interrupt privilege level. Interrupts may be disabled or enabled for up to twelve instructions. The non-maskable interrupt may not be masked. Each bit in the mask represents a subsequent instruction.

Note that since the processor fetches instructions in groups the mask effectively applies to the group. The mask guarantees that at least as many instructions as specified will be masked, but more may be masked depending on group boundaries.

#### **Instruction Format: ATOM**

|      | 31 | 3029 | 28 17              | 16 | 15 | 12 | 11               | 6 | 5               | 0 |
|------|----|------|--------------------|----|----|----|------------------|---|-----------------|---|
| ATOM | 1  | 02   | Mask <sub>12</sub> | ~  | ~4 |    | IPL <sub>6</sub> |   | 60 <sub>6</sub> |   |
|      |    |      |                    |    |    |    |                  |   |                 |   |

#### **Assembler Syntax:**

#### Example:

ATOM "7MMMMM"
LOAD a0,[a3]
SLT t0,a0,a1
PRED t0,~t0,r0,"AAB"
STORE a2,[a3]
LDI a0,1
LDI a0,0

ATOM "6MMM" LOAD a1,[a3] ADD t0,a0,a1 MOV a0,a1 STORE t0,[a3]

## **QEXT Prefix**

## **Description:**

This prefix extends the register selection for quad precision. Quad precision operations need to use register pairs to contain a quad precision value. The QEXT prefix specifies the registers used to contain bits 64 to 127 of the quad precision values.

Quad precision values are calculated using the QEXT prefix before the quad precision instruction.

Note that any of 64 registers may be selected.

**Instruction Format: QEXT** 

**Instruction Format: ATOM** 

|      | 31 | 3029 | 28             | 17   | 16 | 15               | 11 | 10 | 6 | 5  | 0                     |
|------|----|------|----------------|------|----|------------------|----|----|---|----|-----------------------|
| QEXT | 0  | 02   | ~ <sub>7</sub> | Rs2₅ | ۲  | Rs1 <sub>5</sub> |    | Rd | 5 | 60 | <b>D</b> <sub>6</sub> |

# PFX[ABCD] – A/B/C/D Immediate Postfix

#### PFXA \$1234

### **Description:**

This instruction supplies immediate constant bits five to N for the preceding instruction, allowing a N-bit constant to be used in place of a register. The first five bits of the constant are specified by the register number field of the instruction. The Wh field of the instruction specifies which register is to be used as a constant.

| Wh | Substitute Immediate for: |
|----|---------------------------|
| 0  | Rs1                       |
| 1  | Rs2                       |
| 2  | Rs3                       |
| 3  | Rd                        |

<sup>\*</sup>Only one postfix is supported per instruction.

#### **Instruction Format:**

|     | 31 | 3029            | 28 17                    | 16 | 15 | 8 | 7     | 6 | 5  | 0 |
|-----|----|-----------------|--------------------------|----|----|---|-------|---|----|---|
| PFX | 0  | LX <sub>2</sub> | Immediate <sub>255</sub> |    |    | W | $h_2$ | 6 | 16 |   |

### **PRED Modifier**

### **Description:**

Apply the predicate to following instructions according to a bit mask. The predicate may be applied to a maximum of eight instructions. If the 'Z' bit is set, target register elements are set to zero if not masked. Each byte of the predicate register contains the mask bits for the corresponding instruction.

#### **Instruction Format: PRED**

| 31 | 3029 | 28                  | 17 | 16 | 15 | 11             |   | 10  | 6                | 5  | 0  |
|----|------|---------------------|----|----|----|----------------|---|-----|------------------|----|----|
| 0  | 12   | Mask <sub>154</sub> |    | Z  | Rs | 1 <sub>5</sub> | ۲ | Mas | sk <sub>30</sub> | 60 | 06 |

|                  | Mask Bit |                   | Rn <sub>8</sub> Bits<br>Tested |
|------------------|----------|-------------------|--------------------------------|
| _                | 0,1      | Instruction zero  | 0 to 7                         |
| Pred<br>S        | 2,3      | Instruction one   | 8 to 15                        |
| d Modi<br>Scope  | 4,5      | Instruction two   | 16 to 23                       |
| Modifier<br>cope | 6,7      | Instruction three | 24 to 31                       |
| ifie             | 8,9      | Instruction four  | 32 to 39                       |
| 7                | 10,11    | Instruction five  | 40 to 47                       |
|                  | 12,13    | Instruction six   | 48 to 55                       |
|                  | 14,15    | Instruction seven | 56 to 63                       |

| Mask Bit | Meaning                                       |
|----------|-----------------------------------------------|
| 00       | Ignore predicate bit (always execute)         |
| 01       | reserved                                      |
| 10       | Execute only if predicate bit in Rs1 is false |
| 11       | Execute only if predicate bit in Rs1 is true  |

### **Assembler Syntax:**

After the instruction mnemonic the register containing the predicate flags is specified. Next a character string containing 'A' for Ra, 'B' for Rb, or 'l' for ignore for the next eight instructions is present.

### Example:

PRED r2,"TIFIIIII"

; execute one if true, ignore one, next execute if false, one after always execute

MUL r3,r4,r5; executes if R2 True ADD r6,r3,r7; always executes

ADD r6,r6,#1234 ; executes if R2 FALSE

DIV r3,r4,r5 ; always executes

### **MPU Hardware**

# **Hardware Description**

### Caches

#### Overview

The core has both instruction and data caches to improve performance. Both caches are single level. The cache is four-way associative. The cache sizes of the instruction and data cache are available for reference from one of the info lines return by the CPUID instruction.

#### Instructions

Since the instruction format affects the cache design it is mentioned here. For this design instructions are of a fixed length being 32 bits in size. Specific formats are listed under the instruction set description section of this book.

#### L1 Instruction Cache

L1 is 32kB in size and made from block RAM with a single cycle of latency. L1 is organized as an odd, even pair of 256 lines of 64 bytes. The following illustration shows the L1 cache organization for Qupls3.



Note that the upper half of the cache line pair is available for each instruction so that constants may be decoded. This propagation of the cache line is not shown on the above diagram to keep it simple.

The cache is organized into odd and even lines to allow instructions to span a cache line. Two cache lines are fetched for every access; the one the instruction is located on, and the next one in case the instruction spans a line.

A 256-line cache was chosen as that matches the inherent size of block RAM component in the FPGA. It is the author's opinion that it would be better if the L1 cache were larger because it often misses due to its small size. In short, the current design is an attempt to make it easy for the tools to create a fast implementation.

Note that supporting interrupts and cache misses, a requirement for a realistic processor design, adds complexity to the instruction stream. Reading the cache ram, selecting the correct instruction word and accounting for interrupts and cache misses must all be done in a single clock cycle.

While the L1 cache has single cycle reads it requires two clock cycles to update (write) the cache. The cache line to update needs to be provided by the tag memory which is unknown until after the tag updates.

#### **Fetch Rate**

The fetch rate is four instructions per clock cycle.

### **Data Cache**

The data cache organization is somewhat simpler than that of the instruction cache. Data is cached with a single level cache because it's not critical that the data be available within a single clock cycle at least not for the hobby design. Some of the latency of the data cache can be hidden by the presence of non-memory operating instructions in the instruction queue.



The data cache is organized as 512 lines of 64 bytes (32kB) and implemented with block ram. Access to the data cache is multicycle. The data cache may be replicated to allow more memory instructions to be processed at the same time; however, just a single cache is in use for the demo system. The policy for stores is write-through. Stores always write through to memory. Since stores follow a write-through policy the latency of the store operation depends on the external memory system. It isn't critical that the cache be able to update in single cycle as external memory access is bound to take many more cycles than a cache update. There is only a single write port on the data cache.

#### Cache Enables

The instruction cache is always enabled to keep hardware simpler and faster. Otherwise, an additional multiplexor and control logic would be required in the instruction stream to read from external memory.

For some operations, it may be desirable to disable the data cache so there is a data cache enable bit in control register #0. This bit may be set or cleared with one of the CSR instructions.

### Cache Validation

A cache line is automatically marked as valid when loaded. The entire cache may be invalidated using the CACHE instruction. Invalidating a single line of the cache is not currently supported, but it is supported by the ISA. The cache may also be invalidated due to a write by another core via a snoop bus.

#### Un-cached Data Area

The address range \$F...FDxxxxx is an un-cached 1MB data area. This area is reserved for I/O devices. The data cache may also be disabled in control register zero.

### Return Address Stack Predictor (RSB)

There is an address predictor for return addresses which can in some cases can eliminate the flushing of the instruction queue when a return instruction is executed. The BLR instruction is detected in the fetch stage of the core and a predicted return address used to fetch instructions following the return. The return address stack predictor has a stack depth of 64 entries. On stack overflow or underflow, the prediction will be wrong, however performance will be no worse than not having a predictor. The return address stack predictor checks the address of the instruction queued following the BLR against the address fetched for the BLR instruction to make sure that the address corresponds.

### **Branch Predictor**

The branch predictor is a (2, 2) correlating predictor. The branch history is maintained in a 512- entry history table. It has four read ports for predicting branch outcomes, one port for each instruction fetched. The branch predictor may be disabled by a bit in control register zero. When disabled all branches are predicted as not taken, unless specified otherwise in the branch instruction.

To conserve hardware the branch predictor uses a fifo that can queue up to four branch outcomes at the same time. Outcomes are removed from the fifo one at a time and used to update the branch history table which has only a single write port. In an earlier implementation of the branch predictor, two write ports were provided on the history table. This turned out to be relatively large compared to its usefulness.

Correctly predicting a branch turns the branch into a single cycle operation. During execution of the branch instruction the address of the following instruction queued is checked against the address depending on the branch outcome. If the address does not match what is expected, then the queue will be flushed, and new instructions loaded from the correct program path.

# Branch Target Buffer (BTB)

The core has a 1k entry branch target buffer for predicting the target address of flow control instructions where the address is calculated and potentially unknown at time of fetch. Instructions covered by the BTB include jump-and-link, interrupt return and breakpoint instructions and branches to targets contained in a register.

# **Decode Logic**

Instruction decode is distributed about the core. Although some decodes take place between fetch and instruction queue. Broad classes of instructions are decoded for the benefit of issue logic along with register specifications prior to instruction enqueue. Most of the decodes are done with modules under the decoder folder. Decoding typically involves reducing a wide input into a smaller number of output signals. Other decodes are done at instruction execution time with case statements.

### Placement of Instruction Decode



Limited decode takes place between fetch and queue. Between fetch and queue register specifications are decoded along with general instruction classes for the benefit of issue. A handful of additional signals (like sync) that control the overall operation of the core are also decoded. Much of the instruction decode is actually done in the functional unit. The instruction register is passed right through to the functional units in the core.

# Instruction Queue (ROB)

The instruction queue is a 32-entry re-ordering buffer (ROB). The instruction queue tracks an instructions progress. Each instruction in queue may be in one of several different states. The instruction queue is a circular buffer with head and tail pointers. Instructions are queued onto the tail and committed to the machine state at the head. Queue and commit takes place in groups of up to four instructions.

# Instruction Queue - Re-order Buffer



The instruction queue is circular with eight slots. Each slot feeds a multiplexor which in turn feeds a functional unit. Providing arguments to the functional unit is done under the vise of issue logic. Output from the functional unit is fed back to the same queue slot that issued to the functional unit.

The queue slots are fed from the fetch buffers.

### Queue Rate

Up to four instructions may queue during the same clock cycle depending on the availability of queue slots.

### Sequence Numbers

The queue maintains a 7-bit instruction sequence number which gives other operations in the core a clue as to the order of instructions. The sequence number is assigned when an instruction queues. Branch instructions need to know when the next instruction has queued to detect branch misses. The program counter cannot be used to determine the instruction sequence because there may be a software loop at work which causes the program counter to cycle backwards even though it's really the next instruction executing.

# Input / Output Management

Before getting into memory management a word or two about I/O management is in order. Memory management depends on several I/O devices. I/O in the Qupls3 is memory mapped or MMIO. Ordinary load and store instructions are used to access I/O registers. I/O is mapped as a non-cacheable memory area.

# **Device Configuration Blocks**

I/O devices have a configuration block associated with them that allows the device to be discovered by the OS during bootup. All the device configuration blocks are located in the same 1GB region of memory in the address range \$C0000000 to \$FFFFFFF. Each device configuration block is aligned on a 16kB boundary. There is thus a maximum of 16k device configuration blocks.

#### Reset

At reset the device configuration blocks are not accessible. They must be mapped into memory for access. However, the devices have default addresses assigned to them, so it may not be necessary to map the device control block into memory before accessing the device. The device itself also needs to be mapped into the memory space for access though.

### Devices Built into the CPU / MPU

Devices present in the CPU itself include:

| Device               | Bus | Device | Func | IRQ      | Config Block | Default    |
|----------------------|-----|--------|------|----------|--------------|------------|
|                      |     |        |      | Priority | Address      | Address    |
| Interrupt Controller | 0   | 6      | 0    | ~        | \$D0030000   | \$FEE2xxxx |
| Interval Timers      | 0   | 4      | 0    | 61       | \$D0020000   | \$FEE4xxxx |
| Memory Region        | 0   | 12     | 0    | ~        | \$D0060000   | \$FEEFxxxx |
| Table                |     |        |      |          |              |            |

# **System Devices**

| Device              | Bus | Device | Func | IRQ | Config Block | Default |
|---------------------|-----|--------|------|-----|--------------|---------|
|                     |     |        |      |     | Address      | Address |
| Interrupt Reflector | 0   |        | 0    | ~   | TBD          | TBD     |
| Interrupt Logger    | 0   |        | 0    | 2   | TBD          | TBD     |

Function is mapped to address bits 14 to 16

Device is mapped to address bits 17 to 21

Bus is mapped to address bits 22 to 29

# **External Interrupts**

### Overview

External interrupts are interrupts external to the CPU and are usually generated by peripheral devices. External interrupts are usually events occurring asynchronously with respect to software running on a CPU. Qupls3 external interrupts make use of message signaling. Qupls3 does not follow the MSI / MSI-X standard exactly, although it is similar. The goal of Qupls3's MSI is to be frugal with logic resources. Qupls3 MSI Interrupts are signaled by peripheral devices placing an interrupt message on the peripheral slave response bus. This reuses the response bus pathway to the processing core. Slave peripherals do not need to include bus mastering logic that is normally present with MSI-X.

# Interrupt Messages

Interrupt messages are placed on the response bus with an error status indicating an IRQ occurred. The interrupt message identifies the vector number, servicing operating mode, and servicing interrupt controller. This information is stored in a register in the peripheral. An additional 32-bit data word is present in the device to hold extended message information. Qupls3 MSI differs from MSI-X in the storage location of the extended interrupt message information. MSI-X stores this information in the interrupt table whereas Qupls3 stores it in the device. MSI-X requires the device to perform a write operation to the interrupt table, whereas Qupls3 MSI does not. MSI-X interrupts normally specify an I/O address to post to and a 32-bit data word. Unfortunately, in the Qupls3 system there are not enough bits in a 32-bit response bus to mimic MSI-X. The vector number combined with the interrupt controller number take the place of the I/O address. Additional information passed by the interrupt message (in the response address field) identifies the source of the interrupt, the desired priority level, and the software stack required for processing.

# **Interrupt Controller**

The Qupls3 interrupt controller (QIC) is a slave peripheral device that detects interrupt messages occurring on the CPU response bus. It stores the interrupt message in a priority queue. The interrupt vector for the highest priority interrupt is looked up from an internal vector table. Information in the vector determines a list of possible target CPU cores and the software stack that must be available. Either the address of the interrupt subroutine (ISR) or, an instruction for the CPU to execute is

provided. There may be multiple interrupt controllers in the system. Currently a six-bit controller number is present in the interrupt message limiting the number of controllers to 62. With 62 interrupt controllers and each one servicing 62 CPU cores, a maximum of approximately 3800 CPU cores may be connected to interrupts.

The interrupt controller has some capacity to detect interrupt overruns. There is a "stuck interrupt" detector which flags an interrupt signal as being stuck if the same interrupt message is posted in a short time-frame. The queue full status flag is also available in the controller allowing software to detect if a queue is full. A full queue may also indicate a stuck interrupt.

There is more detail pertaining to QIC in the QIC device description later in this document.

### Interrupt Vector Table

The interrupt vector table is internal to the interrupt controller. The table is laid out in four sections, one for each available operating mode. There are 2048 (512 in the demo system) vectors available for each operating mode. Note there may be multiple interrupt controllers in the system, and hence multiple vector tables. Which vector table to use is identified in a device control register in the form of specifying an interrupt controller number.

# **Interrupt Group Filter**

There may be more than one CPU core connected to a QIC; up to 62 CPU cores may be connected to a QIC. Note that groups of CPU cores may be specified to handle an interrupt. There is a filter in the MPU that detects the lowest priority CPU core that is ready to handle an interrupt. The information from the QIC about the interrupt is passed to connected CPU cores.

To be ready to handle an interrupt, the current interrupt level of the CPU core must be less than that of the interrupting device, and the CPU core must be operating using the software stack appropriate for the interrupt.

# Interrupt Reflector

The interrupt reflector is a peripheral device that allows a bus master to trigger an interrupt. Because interrupts are posted on the response bus for Qupls3 a bus master would not be able to trigger an interrupt directly. The reflector moves a request from the bus master request bus over the response bus. It can then be

detected by the interrupt controller. This allows IPI (inter-processor interrupts) generated by software to be used.

# Interrupt Logger

Logging of interrupts can be useful for the system. It is handy for debugging. The interrupt logger is a peripheral device that monitors the CPU response bus for interrupts (like the QIC) and logs all interrupts to a file in memory. The file can be subsequently processed for system management purposes.

# **Qupls3 Memory Management**

#### Overview

The Qupls3 CPU uses both bounds and paging to manage memory. There is only a single dedicated page mapping table shared between all programs. To prevent two programs from accessing the same map entries bounds registers are used. Only the map entries within the bounds are accessible to the program.

### Page Table

Qupls3 uses a non-hierarchical memory mapping table which is just a single level deep. Pages mapped are 16kB in size. The table is implemented in a dedicated BRAM memory with 49152 entries. 48k entries is enough to map 768MB of memory; the upper 256MB of memory are not mapped and are dedicated to kernel use.

Each PTE is 32-bits in size. The layout of an entry in the table is as follows:

### **PTE Format**

| 3 | 1 | 30 28 | 27 | 26 | 25 24            | 23  | 22              | 21             | 20 | 18                     | 17 |                    | 0 |
|---|---|-------|----|----|------------------|-----|-----------------|----------------|----|------------------------|----|--------------------|---|
| ٧ | / | Rgn₃  | М  | Α  | AVL <sub>2</sub> | CAC | HE <sub>2</sub> | U <sub>1</sub> | RV | <b>/X</b> <sub>3</sub> |    | PPN <sub>170</sub> |   |

#### **Bounds**

Base and bound registers are used to define ranges of addresses within the mapping table for a given program. Attempt to access memory outside of the allowed bounds will cause a memory protection fault. There are sixteen bounds registers available which have typical usage for separate code, data, and stack areas. Which bounds register is used for defining an address range is selected by the upper four bits of the virtual address. Note that base and bounds apply only in user / app mode. Other modes of operation see a flat address space.

### QIC – Qupls3 Interrupt Controller

#### Overview

The Qupls3 system uses message-signaled interrupts (QMSI). QIC snoops the response bus going to the CPU core(s) for interrupt responses. Interrupt responses are stored in priority queues in the controller.

The Qupls3 interrupt controller presents an interrupt signal bus to the CPU core(s). The QIC may be used in a multi-CPU system as a shared interrupt controller. The QIC can guide the interrupt to the specified core(s). The QIC is a 64-bit slave I/O device.

### System Usage

For the demo system there is just a single interrupt controller in the system. However, there may be up to 62 interrupt controllers in a system, numbered 1 to 62. Each interrupt controller may support up to 62 CPU cores, making the total number of CPU cores processing interrupts approximately 3800. QIC supports 63 different priority levels.

The QIC registers are located at an address determined by BAR0 in the configuration space. The interrupt table is located at a address determined by BAR1.

### **Priority Resolution**

Interrupts have a fixed priority relationship with priority 63 having the highest priority and priority 1 the lowest. As interrupt messages are detected, they are placed in a queue according to their priority. (There are 63 small queues). The QIC sends the highest priority interrupt in the queues to the CPU. Periodically, once every 64 clock cycles, interrupt priorities are inverted.

# **Config Space**

A 256-byte config space is supported. Most of the config space is unused. The only configuration is for the I/O address of the register set.

| Regno | Width | R/W | Moniker  | Description           |  |
|-------|-------|-----|----------|-----------------------|--|
| 000   | 32    | RO  | REG_ID   | Vendor and device ID  |  |
| 004   | 32    | R/W |          |                       |  |
| 800   | 32    | RO  |          |                       |  |
| 00C   | 32    | R/W |          |                       |  |
| 010   | 32    | R/W | REG_BAR0 | Base Address Register |  |

| 014    | 32 | R/W | REG_BAR1 | Base Address Register |
|--------|----|-----|----------|-----------------------|
| 018    | 32 | R/W | REG_BAR2 | Base Address Register |
| 01C    | 32 | R/W | REG_BAR3 | Base Address Register |
| 020    | 32 | R/W | REG_BAR4 | Base Address Register |
| 024    | 32 | R/W | REG_BAR5 | Base Address Register |
| 028    | 32 | R/W |          |                       |
| 02C    | 32 | RO  |          | Subsystem ID          |
| 030    | 32 | R/W |          | Expansion ROM address |
| 034    | 32 | RO  |          |                       |
| 038    | 32 | R/W |          | Reserved              |
| 03C    | 32 | R/W |          | Interrupt             |
| 040 to | 32 | R/W |          | Capabilities area     |
| 0FF    |    |     |          |                       |

REG\_BAR0 defaults to \$FEE20001 which is used to specify the address of the controller's registers in the I/O address space.

The controller will respond with a memory size request of 0MB (0xFFFFFFF) when BAR0 is written with all ones. The controller contains its own dedicated memory and does not require memory allocated from the system.

#### Parameters

CFG\_BUS defaults to zero
CFG\_DEVICE defaults to six
CFG\_FUNC defaults to zero

Config parameters must be set correctly. CFG device and vendors default to zero.

### Registers

The QIC contains an interrupt vector table with a maximum of 2048 128-bit vectors available for each of four operating modes. (The number of vectors supported is parameterized). This vector table occupies 128kB of I/O space. An additional 522 registers are spread out through another 8k byte I/O region. All registers are 64-bit and only 64-bit accessible. The interrupt vector table is byte accessible.

| Regno | Access | Moniker | Purpose                                            |
|-------|--------|---------|----------------------------------------------------|
| 00    | RW     | UVTB    | Base address for user interrupt vector table       |
| 08    | RW     | SVTB    | Base address for supervisor interrupt vector table |
| 10    | RW     | HVTB    | Base address for hypervisor interrupt vector table |

| 18       | RW                                                                                                                                                         | MVTB | Base address for hypervisor machine vector table            |  |  |  |
|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------|------|-------------------------------------------------------------|--|--|--|
| 20       | RW                                                                                                                                                         | VTL  | Vector table limit                                          |  |  |  |
| 28       | RW                                                                                                                                                         | STAT | Bit                                                         |  |  |  |
| 20       | 1100                                                                                                                                                       | Oim  | 0 Que full, set if any que is full, cleared by software if  |  |  |  |
|          |                                                                                                                                                            |      | written with a zero                                         |  |  |  |
|          |                                                                                                                                                            |      | 1 Set if stuck interrupt detected                           |  |  |  |
|          |                                                                                                                                                            |      | 2 to 62 reserved                                            |  |  |  |
|          |                                                                                                                                                            |      | 63 Set if an interrupt is being requested                   |  |  |  |
| 30       | R                                                                                                                                                          | QUEL | Top output of the priority queues, bits 0 to 63             |  |  |  |
| 38       | R                                                                                                                                                          | QUEH | Top output of the priority queues, bits 64 to 127           |  |  |  |
| 40       | R                                                                                                                                                          | EMP  | Queue empty status, one bit for each queue, 1=empty         |  |  |  |
| 48       | R                                                                                                                                                          | OVR  | Queue overflow status, one bit for each queue, 1=overflowed |  |  |  |
| 380      | RW                                                                                                                                                         | GE   | Bit 0 = global interrupt enable                             |  |  |  |
| 390      | 390 RW THRES Interrupt threshold (0 to 63), IRQ priority must exceed this to be recognized.                                                                |      |                                                             |  |  |  |
| CPU affi | CPU affinity group table follows                                                                                                                           |      |                                                             |  |  |  |
|          | There are 256 groups that may be set. The interrupt vector references one of these groups to determine which CPU cores should be notified of an interrupt. |      |                                                             |  |  |  |

| 800 | RW | AFNx | CPU group, one bit for each CPU that should be notified |
|-----|----|------|---------------------------------------------------------|
| ••• | RW |      | More CPU groups                                         |
| FF8 | RW |      | Last CPU group                                          |

Interrupt pending and enable tables follow. There are 128 64-bit entries for each table. This is enough to cover up to 2047 interrupts for each of four operating modes. User mode is entries 0 to 31, supervisor mode is entries 32 to 63, hypervisor 64 to 95 and machine 96 to 127.

| 1000 | RW | IP | Interrupt enable bits       |
|------|----|----|-----------------------------|
| •••  |    |    | More IE bit registers       |
| 13F8 | RW | IP |                             |
| 1400 | RW | ΙE | Interrupt pending bits      |
| •••  |    |    | More interrupt pending bits |
| 17F8 | RW | IE |                             |

### **Base Address Fields**

The base address fields default to zero. The address fields are present should the controller be adapted to use main memory instead of dedicated BRAM. The address fields act as an index into the dedicated vector table for the location of the vectors for each operating mode.

### **CPU Affinity Group Table**

This table is an array of groups of CPU cores that should be notified of an interrupt. The interrupt vector selects one of these groups for the group of CPUs to notify. Note that normally only a single CPU core will ultimately be selected to process the interrupt. If bit zero of the CPU group is set, then the interrupt will be broadcast to all CPU cores in the group.

### **Interrupt Enable Bits**

The interrupt enable bit array offers a fast way to enable or disable interrupts without having to update the interrupt vector table. Both the enable bit in the enable bit array and the enable bit in the vector table must be set for an interrupt to be enabled.

### **Interrupt Pending Bits**

Writing a pending bit register clears the bit specified by the write data. If the MSB of the value written is a 1 then the corresponding interrupt is immediately triggered.

### Interrupt Vector Table

The interrupt vector table has a default address of \$FF...FECC0000 to \$FF...FECDFFFF. This address may be changed by altering the BAR1 register in the config space. The interrupt vector table has four consecutive sections to it, one for each CPU operating mode. There are a maximum of 2048 vectors available for each mode. The vector format is as follows:

| 127 | 112             | 111 | 104                | 103 101 | 100 98             | 97 | 96 | 95 0                                               |  |
|-----|-----------------|-----|--------------------|---------|--------------------|----|----|----------------------------------------------------|--|
| Dat | a <sub>16</sub> | CPU | group <sub>8</sub> | ~3      | Swstk <sub>3</sub> | ΙE | ΑI | Address <sub>64</sub> or Instruction <sub>96</sub> |  |

#### Field Description

Al: This field indicates that the vector contains an address (0) or an instruction (1)

IE: This field indicates if the interrupt is disabled (0) or enabled (1)

Swstk: This field contains the index of the software stack required to process the interrupt

CPU group: This field is an index into the CPU affinity group table which identifies which processor cores are candidates to receive the interrupt.

Data: This field is populated with data from the interrupt message.

### QIT - Qupls3 Interval Timer

#### Overview

Many systems have at least one timer. The timing device may be built into the CPU, but it is frequently a separate component on its own. The programmable interval timer has many potential uses in the system. It can perform several different timing operations including pulse and waveform generation, along with measurements. While it is possible to manage timing events strictly through software it is quite challenging to perform in that manner. A hardware timer comes into play for the difficult to manage timing events. A hardware timer can supply precise timing. In the test system there are two groups of four timers. Timers are often grouped together in a single component. The QIT is a 64-bit peripheral. The QIT while powerful turns out to be one of the simpler peripherals in the system.

### System Usage

One programmable timer component, which may include up 32 timers, is used to generate the system time slice interrupt and timing controls for system garbage collection. The second timer component is used to aid the paged memory management unit. There are free timing channels on the second timer component.

Each QIT is given an 8kB-byte memory range to respond to for I/O access. As is typical for I/O devices part of the address range is not decoded to conserve hardware.

PIT#1 is located at \$FFFFFFFFFEE40000 to \$FFFFFFFFEE41FFF

PIT#2 is located at \$FFFFFFFFFEE50000 to \$FFFFFFFFEE51FFF

### **Config Space**

A 256-byte config space is supported. Most of the config space is unused. The only configuration is for the I/O address of the register set and the interrupt line used.

| Regno | Width | R/W | Moniker  | Description           |
|-------|-------|-----|----------|-----------------------|
| 000   | 32    | RO  | REG_ID   | Vendor and device ID  |
| 004   | 32    | R/W |          |                       |
| 008   | 32    | RO  |          |                       |
| 00C   | 32    | R/W |          |                       |
| 010   | 32    | R/W | REG_BAR0 | Base Address Register |
| 014   | 32    | R/W | REG_BAR1 | Base Address Register |
| 018   | 32    | R/W | REG_BAR2 | Base Address Register |

| 01C    | 32 | R/W | REG_BAR3 | Base Address Register |  |
|--------|----|-----|----------|-----------------------|--|
| 020    | 32 | R/W | REG_BAR4 | Base Address Register |  |
| 024    | 32 | R/W | REG_BAR5 | Base Address Register |  |
| 028    | 32 | R/W |          |                       |  |
| 02C    | 32 | RO  |          | Subsystem ID          |  |
| 030    | 32 | R/W |          | Expansion ROM address |  |
| 034    | 32 | RO  |          |                       |  |
| 038    | 32 | R/W |          | Reserved              |  |
| 03C    | 32 | R/W |          | Interrupt             |  |
| 040 to | 32 | R/W |          | Capabilities area     |  |
| 0FF    |    |     |          |                       |  |

REG\_BAR0 defaults to \$FEE40001 which is used to specify the address of the controller's registers in the I/O address space. Note for additional groups of timers the REG\_BAR0 must be changed to point to a different I/O address range. Note the core uses only bits determined by the address mask in the address range comparison. It is assumed that a 8kB page is required for the device, matching the MMU page size.

The controller will respond with a mask of 0xFFFFFFF when BAR0 is written with all ones.

#### Parameters

CFG BUS defaults to zero

CFG\_DEVICE defaults to four

CFG FUNC defaults to zero

CFG\_ADDR\_MASK defaults to 0x00FF0000

CFG\_IRQ\_LINE defaults to 29

Config parameters must be set correctly. CFG device and vendors default to zero.

### **Parameters**

NTIMER: This parameter controls the number of timers present. The default is eight. The maximum is 32.

BITS: This parameter controls the number of bits in the counters. The default is 48 bits. The maximum is 64.

PIT\_ADDR: This parameter sets the I/O address that the QIT responds to. The default is \$FEE40001.

PIT\_ADDR\_ALLOC: This parameter determines which bits of the address are significant during decoding. The default is \$00FF0000 for an allocation of 64kB. To compute the address range allocation required, 'or' the value from the register with \$FF000000, complement it then add 1.

### Registers

The QIT has 134 registers addressed as 64-bit I/O cells. It occupies 2048 consecutive I/O locations. All registers are read-write except for the current counts which are read-only. All registers all 64-bit accessible; all 64 bits must be read or written. Values written to registers do not take effect until the synchronization register is written.

Note the core may be configured to implement fewer timers in which case timers that are not implemented will read as zero and ignore writes. The core may also be configured to support fewer bits per count register in which case the unimplemented bits will read as zero and ignore writes.

| Regno     | Access | Moniker | Purpose                                      |
|-----------|--------|---------|----------------------------------------------|
| 00        | R      | CC0     | Current Count                                |
| 08        | RW     | MC0     | Max count                                    |
| 10        | RW     | ОТ0     | On Time                                      |
| 18        | RW     | CTRL0   | Control                                      |
| 20 to 7F8 |        |         | Groups of four registers for timer #1 to #63 |
| 800       | RW     | USTAT   | Underflow status                             |
| 808       | RZW    | SYNC    | Synchronization register                     |
| 810       | RW     | IE      | Interrupt enable                             |
| 818       | RW     | TMP     | Temporary register                           |
| 820       | RO     | OSTAT   | Output status                                |
| 828       | RW     | GATE    | Gate register                                |
| 830       | RZW    | GATEON  | Gate on register                             |
| 838       | RZW    | GATEOFF | Gate off register                            |

### Control Register

This register contains bits controlling the overall operation of the timer.

| Bit     |    | Purpose                                                                                                                                                                                                                                                                                                                                                  |
|---------|----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0       | LD | setting this bit will load max count into current count, this bit automatically resets to zero.                                                                                                                                                                                                                                                          |
| 1       | CE | count enable, if 1 counting will be enabled, if 0 counting is disabled and the current count register holds its value. On counter underflow this bit will be reset to zero causing the count to halt unless auto-reload is set.                                                                                                                          |
| 2       | AR | auto-reload, if 1 the max count will automatically be reloaded into the current count register when it underflows.                                                                                                                                                                                                                                       |
| 3       | XC | external clock, if 1 the counter is clocked by an external clock source. The external clock source must be of lower frequency than the clock supplied to the PIT. The PIT contains edge detectors on the external clock source and counting occurs on the detection of a positive edge on the clock source.  This bit is forced to 0 for timers 4 to 31. |
| 4       | GE | gating enable, if 1 an external gate signal will also be required to be active high for the counter to count, otherwise if 0 the external gate is ignored. Gating the counter using the external gate may allow pulse-width measurement. This bit is forced to 0 for timers 4 to 31.                                                                     |
| 5 to 63 | 2  | not used, reserved                                                                                                                                                                                                                                                                                                                                       |
|         |    |                                                                                                                                                                                                                                                                                                                                                          |

#### **Current Count**

This register reflects the current count value for the timer. The value in this register will change by counting downwards whenever a count signal is active. The current count may be automatically reloaded at underflow if the auto reload bit (bit #2) of the control byte is set. The current count may also be force loaded to the max count by setting the load bit (bit #0) of the counter control byte.

#### Max Count

This register holds onto the maximum count for the timer. It is loaded by software and otherwise does not change. When the counter underflows the current count may be automatically reloaded from the max count register.

#### On Time

The on-time register determines the output pulse width of the timer. The timer output is low until the on-time value is reached, at which point the timer output switches high. The timer output remains high until the counter reaches zero at which point the timer output is reset back to zero. So, the on time reflects the length

of time the timer output is high. The timer output is low for max count minus the ontime clock cycles.

#### **Underflow Status**

The underflow status register contains a record of which timers underflowed.

Writing the underflow register clears the underflows and disable further interrupts where bits are set in the incoming data. Interrupt processing should read the underflow register to determine which timers underflowed, then write back the value to the underflow register.

### Synchronization Register

The synchronization register allows all the timers to be updated simultaneously. Values written to timer registers do not take effect until the synchronization register is written. The synchronization register must be written with a '1' bit in the bit position corresponding to the timer to update. For instance, writing all one's to the sync register will cause all timers to be updated. The synchronization register is write-only and reads as zero.

#### Interrupt Enable Register

Each bit of the interrupt enable register enables the interrupt for the corresponding timer. Interrupts must also be globally enabled by the interrupt enable bit in the config space for interrupts to occur. A '1' bit enables the interrupt, a '0' bit value disables it.

#### Temporary Register

This is merely a register that may be used to hold values temporarily.

#### **Output Status**

The output status register reflects the current status of the timers output (high or low). This register is read-only.

#### Gate Register

The internal gate register is used to temporarily halt or resume counting for the timer corresponding to the bit position of this register. Writing a value to this register will turn on all timers where there is a '1' bit in the value and turn off all timers where there is a '0' bit in the value.

### Gate On Register

The internal gate 'on' register is used to resume counting for the timer corresponding to the bit position of this register. Writing a value to this register will turn on all timers

where there is a '1' bit in the value. Where there is a '0' in the value the timer will not be affected. This register reads as zero.

### Gate Off Register

The internal gate 'off' register is used to halt counting for the timer corresponding to the bit position of this register. Writing a value to this register will turn off all timers where there is a '1' bit in the value. Where there is a '0' in the value the timer will not be affected. This register reads as zero.

### **Programming**

The PIT is a memory mapped i/o device. The PIT is programmed using 64-bit load and store instructions (LDO and STO). Byte loads and stores (LDB, STB) may be used for control register access. It must reside in the non-cached address space of the system.

### Interrupts

The core is configured use interrupt signal #29 by default. This may be changed with the CFG\_IRQ\_LINE parameter. Interrupts may be globally disabled by writing the interrupt disable bit in the config space with a '1'. Individual interrupts may be enabled or disabled by the setting of the interrupt enable register in the I/O space.

### **FTA Bus**

### Overview

The FTA bus is an asynchronous bus meaning it does not wait for responses before beginning the next bus cycle. It is a request and response bus. Requests are outgoing from a bus master and incoming to a bus slave. Responses are output by a bus slave and input by a bus master. FTA bus includes standard signals for address, data, and control. These signals should be like those found on many other busses.

# **Bus Tags**

The bus has tagged transactions; there is an id tag associated with each bus transaction. The id tag contains identifiers for the core, channel, and transaction. The core is a core number for a multi-core CPU. Channel selects a particular channel in the core which may for instance be a data channel or an instruction channel. Finally, the transaction id identifies the specific transaction. Incoming responses are matched against transactions that were outgoing. For instance, a bus master may issue a burst request for four bus transactions to fill a cache line. Each transaction will have an id associated with it. When the slave receives the transactions it sends back responses for each of the four requests with ids that match those in the request. The slave does not necessarily send back responses in the same order. Transaction requests from the master may not arrive in order.

\*An id tag of all zeros is illegal – it represents the bus available state.

# Single Cycle

The bus operates on a single cycle basis. Transaction requests and responses are routed through the soc interconnect network as the bus is available and are present for only a single clock cycle. Bus bridges may buffer the transactions for a short period of time. Generally, requests going out from masters do not need buffering as access to the bus will have been arbitrated before the bus cycle begins. Responses coming back from slaves may need to be buffered as two slaves may respond at the same time. Slaves are not required to arbitrate for the bus.

# Retry

If the bus is unavailable the retry response signal is asserted to the master. The master must retry the transaction.

# Signal Description

Following is a signal description for requests and responses for a 128-bit data version of the bus. Signal values have been chosen so that a value of zero represents a bus idle state. If nothing is on the bus it will be all zeros.

### Requests

| Signal  | Width | Description                             |
|---------|-------|-----------------------------------------|
| Om      | 2     | Operating mode                          |
| Cmd     | 5     | Command for bus controller or memory    |
|         |       | controller                              |
| Bte     | 3     | Burst type                              |
| Cti     | 3     | Cycle type                              |
| Blen    | 6     | Burst length -1 (0=1 to 63=64)          |
| SZ      | 4     | Transfer size                           |
| Segment | 3     | Code, data, or stack                    |
| Сус     | 1     | Bus cycle is valid                      |
| We      | 1     | Write enable                            |
| Asid    | 16    | Address space id                        |
| Vadr    | 32/64 | Virtual address                         |
| Padr    | 32/64 | Physical address                        |
| Sel     | 16    | Byte lane selects                       |
| Data1   | 128   | First data item                         |
| Data2   | 128   | Second data item (for AMO operations)   |
| Tid     | 13    | Transaction id                          |
| Csr     | 1     | Clear or set address reservation        |
| Pl      | 8     | Privilege level                         |
| Pri     | 4     | Transaction priority (higher is better) |
| Cache   | 4     | Transaction cacheability                |
|         |       |                                         |

### Responses

| Signal | Width | Description                 |  |
|--------|-------|-----------------------------|--|
| Tid    | 13    | Transaction id              |  |
| Stall  | 1     | Stall pipeline              |  |
| Next   | 1     | Advance to next transaction |  |

| Ack | 1             | Request acknowledgement (data is |  |
|-----|---------------|----------------------------------|--|
|     |               | available)                       |  |
| Rty | 1             | Retry transaction                |  |
| Err | 3             | Error code                       |  |
| Pri | 4             | Transaction priority             |  |
| Adr | 32/64         | Physical address                 |  |
| Dat | 32/64/128/256 | Response data                    |  |

### Om

Operating mode, this corresponds to the operating mode of the CPU. Some devices are limited to specific modes.

### Cmd

Command for memory controller. This is how the memory controller knows what to do with the data.

| Oridinal |                 |                                             |
|----------|-----------------|---------------------------------------------|
| 0        | CMD_NONE        | No command                                  |
| 1        | CMD_LOAD        | Perform a sign extended data load operation |
| 2        | CMD_LOADZ       | Perform a zero extended data load operation |
| 3        | CMD_STORE       | Perform a data store operation              |
| 4        | CMD_STOREPTR    | Perform a pointer store operation           |
| 7        | CMD_LEA         | Load the effective address                  |
| 10       | CMD_DCACHE_LOAD | Perform load operation intended for data    |
|          |                 | cache                                       |
| 11       | CMD_ICACHE_LOAD | Perform load operation intended for         |
|          |                 | instruction cache                           |
| 13       | CMD_CACHE       | Issue a cache control command               |
| 16       | CMD_SWAP        | AMO swap operation                          |
| 18       | CMD_MIN         | AMO min operation                           |
| 19       | CMD_MAX         | AMO max operation                           |
| 20       | CMD_ADD         | AMO add operation                           |
| 22       | CMD_ASL         | AMO left shift operation                    |
| 23       | CMD_LSR         | AMO right shift operation                   |
| 24       | CMD_AND         | AMO and operation                           |
| 25       | CMD_OR          | AMO or operation                            |
| 26       | CMD_EOR         | AMO exclusive or operation                  |
| 28       | CMD_MINU        | AMO unsigned minimum operation              |
| 29       | CMD_MAXU        | AMO unsigned maximum operation              |
| 31       | CMD_CAS         | AMO compare and swap                        |
| Others   |                 | reserved                                    |

### BTE

Burst type extension.

| Ordinal |          |
|---------|----------|
| 0       | Linear   |
| 1       | Wrap 4   |
| 2       | Wrap 8   |
| 3       | Wrap 16  |
| 4       | Wrap 32  |
| 5       | Wrap 64  |
| 6       | Wrap 128 |
| 7       | reserved |

### CTI

# Cycle Type Indicator

| Ordinal |         | Comment           |
|---------|---------|-------------------|
| 0       | Classic |                   |
| 1       | fixed   | Constant data     |
|         |         | address           |
| 2       | Incr    | Incrementing data |
|         |         | address           |
| 3       | erc     | Record errors on  |
|         |         | write             |
| 4       | Irqa    | Interrupt         |
|         |         | acknowledge       |
| 7       | Eob     | End of burst      |
| others  |         | reserved          |

Normally write cycles do not send a response back to the master. The ERC cycle type indicates that the master wants a response back from a write operation.

### Blen

Burst length, this is the number of transactions in the burst minus one. There is a maximum of 64 transactions. With a 128-bit bus this is 1024 bytes of data.

### Sz

### Transfer size.

| Ordinal |      | Transfer size          |
|---------|------|------------------------|
| 0       | Nul  | Nothing is transferred |
| 1       | Byt  | A single byte          |
| 2       | Wyde | Two bytes              |

| 3      | Tetra | Four bytes                 |
|--------|-------|----------------------------|
| 4      | Penta | Five bytes                 |
| 5      | Octa  | Eight bytes                |
| 6      | Hexi  | Sixteen bytes              |
| 10     | vect  | A vector 64 bytes (512 bit |
|        |       | bus)                       |
| Others |       | Reserved                   |

# Segment

The memory segment associated with the transfer.

| Ordinal |          |
|---------|----------|
| 0       | data     |
| 6       | stack    |
| 7       | code     |
| others  | reserved |

### TID

Transaction ID. This is made up of three fields.

| Size | Use         |
|------|-------------|
| 6    | Core number |
| 3    | Channel     |
| 4    | Tran id     |

### Cache

Cache-ability of transaction. A transaction may be non-cacheable meaning as it progresses through the cache hierarchy it does not store data in the cache. It only stores data when it reaches the final memory destination.

| Ordinal |                  |                                  |
|---------|------------------|----------------------------------|
| 0       | NC_NB            | Non cacheable, non bufferable    |
| 1       | NON_CACHEABLE    |                                  |
| 2       | CACHEABLE_NB     | Cacheable, non bufferable        |
| 3       | CACHEABLE        |                                  |
| 8       | WT_NO_ALLOCATE   | Write-through without allocating |
| 9       | WT_READ_ALLOCATE |                                  |

| 10 | WT_WRITE_ALLOCATE     |                               |
|----|-----------------------|-------------------------------|
| 11 | WT_READWRITE_ALLOCATE |                               |
| 12 | WB_NO_ALLOCATE        | Write-back without allocating |
| 13 | WB_READ_ALLOCATE      |                               |
| 14 | WB_WRITE_ALLOCATE     |                               |
| 15 | WB_READWRITE_ALLOCATE |                               |

# Message Signaled Interrupts

FTA bus provides for message signaled interrupts. A MSI interrupt transfers the required information to an interrupt controller without needing a request for it. This trims cycle time off an interrupt request. The interrupt controller constantly snoops the CPU response bus for IRQ requests.

Up to 62 interrupt controllers may be targeted to process interrupts messages. The interrupt table located in the controller specifies which of 62 target CPU cores to notify of the interrupt. Therefore about 3800 CPU cores may be easily used for interrupt processing.

There is a response code ('IRQ') on the response bus to support message signaled interrupts. A slave may place an IRQ message on a response bus (the 'err' field) to interrupt the master.

| Signal | Description                                                          |
|--------|----------------------------------------------------------------------|
| ack    | This signal indicates a valid response; should be high for MSI       |
| err    | Value = IRQ                                                          |
| dat    | Interrupt message data. Typically 32-bits                            |
| tid    | The coreno (upper 6 bits) should reflect the target core servicing   |
|        | the interrupt. This is an interrupt controller number. The interrupt |
|        | priority is in the lower 6 -bits.                                    |
| adr    | The 'adr' field of the response indicates the bus/device/function    |
|        | generating the interrupt.                                            |

# Glossary

### **ABI**

An acronym for application binary interface. An ABI is a description of the interface between software and hardware, or between software modules. It includes things like the expected register usage by the compiler. Some registers hardware has specific requirements for are noted in the ABI, for instance r0 may always be zero or it may be a usable register. The stack pointer may need to be a specific register. A good ABI is an aid to guaranteeing that software works when coming from multiple sources.

### AMO

AMO stands for atomic memory operation. An atomic memory operation typically reads then writes to memory in a fashion that may not be interrupted by another processor. Some examples of AMO operations are swap, add, and, and or. AMO operations are typically passed from the CPU to the memory controller and the memory controller performs the operation.

### Assembler

A program that translates mnemonics and operands into machine code OR a low-level language used by programmers to conveniently translate programs into machine code. Compilers are often capable of generating assembler code as an output.

### **ATC**

ATC stands for address translation cache. This buffer is used to cache address translations for fast memory access in a system with an mmu capable of performing address translations. The address translation cache is more commonly known as the TLB.

### **Base Pointer**

An alternate term for frame pointer. The frame or base pointer is used by high-level languages to access variables on the stack.

### **Burst Access**

A burst access is several bus accesses that occur rapidly in a row in a known sequence. If hardware supports burst access the cycle time for access to the device is drastically reduced. For instance, dynamic RAM memory access is fast for sequential burst access, and somewhat slower for random access.

### **BTB**

An acronym for Branch Target Buffer. The branch target buffer is used to improve the performance of a processing core. The BTB is a table that stores the branch target from previously executed branch instructions. A typical table may contain 1024 entries. The table is typically indexed by part of the branch address. Since the target address of a branch type instruction may not be known at fetch time, the address is speculated to be the address in the branch target buffer. This allows the machine to fetch instructions in a continuous fashion without pipeline bubbles. In many cases the calculated branch address from a previously executed instruction remains the same the next time the same instruction is executed. If the address from the BTB turns out to be incorrect, then the machine will have to flush the instruction queue or pipeline and begin fetching instructions from the correct address.

# **Card Memory**

A card memory is a memory reserved to record the location of pointer stores in a garbage collection system. The card memory is much smaller than main memory; there may be card memory entry for a block of main memory addresses. Card memory covers memory in 128 to 512-byte sized blocks. Usually, a byte is dedicated to record the pointer store status even though a bit would be adequate, for performance reasons. The location of card memory to update is found by shifting the pointer value to the right some number of bits (7 to 9 bits) and then adding the base address of the table. The update to the card memory needs to be done with interrupts disabled.

### Commit

As in commit stage of processor. This is the stage where the processor is dedicated or committed to performing the operation. There are no prior outstanding exceptions or flow control changes to prevent the instruction from executing. The instruction may execute in the commit stage, but registers and memory are not updated until the retire stage of the processor.

### **Decimal Floating Point**

Floating point numbers encoded specially to allow processing as decimal numbers. Decimal floating point allows processing every-day decimal numbers rounding in the same manner as would be done by hand.

### Decode

The stage in a processor where instructions are decoded or broken up into simpler control signals. For instance, there is often a register file write signal that must be decoded from instructions that update the register file.

### Diadic

As in diadic instruction. An instruction with two operands.

### DUT

An acronym for Design Under Test.

### **Endian**

Computing machines are often referred to as big endian or little endian. The endian of the machine has to do with the order bits and bytes are labeled. Little endian machines label bits from right to left with the lowest bit at the right. Big endian machines label bits from left to right with the lowest numbered bit at the left.

### **FIFO**

An acronym standing for 'first-in first-out'. Fifo memories are used to aid data transfer when the rate of data exchange may have momentary differences. Usually when fifos transfer data the average data rate for input and output is the same. Data is stored in a buffer in order then retrieved from the buffer in order. Uarts often contain fifos.

### **FPGA**

An acronym for Field Programmable Gate Array. FPGA's consist of a large number of small RAM tables, flip-flops, and other logic. These are all connected with a programmable connection network. FPGA's are 'in the field' programmable, and usually re-programmable. An FPGA's re-programmability is typically RAM based. They are often used with configuration PROM's so they may be loaded to perform specific functions.

# **Floating Point**

A means of encoding numbers into binary code to allow processing. Floating point numbers have a range within which numbers may be processed, outside of this range the number will be marked as infinity or zero. The range is usually large enough that it is not a concern for most programs.

### Frame Pointer

A pointer to the current working area on the stack for a function. Local variables and parameters may be accessed relative to the frame pointer. As a program progresses a series of "frames" may build up on the stack. In many cases the frame pointer may be omitted, and the stack pointer used for references instead. Often a register from the general register file is used as a frame pointer.

### HDL

An acronym that stands for 'Hardware Description Language'. A hardware description language is used to describe hardware constructs at a high level.

### HLL

An acronym that stands for "High Level Language"

### Instruction Bundle

A group of instructions. It is sometimes required to group instructions together into bundle. For instance, all instructions in a bundle may be executed simultaneously on a processor as a unit. Instructions may also need to be grouped if they are oddball in size for example 41 bits, so that they can be fit evenly into memory. Typically, a bundle has some bits that are global to the bundle, such as template bits, in addition to the encoded instructions.

### **Instruction Pointers**

A processor register dedicated to addressing instructions in memory. It is also often called a program counter. The program counter got its name because it usually increments (or counts) automatically after an instruction is fetched. In early machines in some rare cases the program counter did not count in a sequential binary fashion, but instead used other forms of a counter such as a grey counter or linear feedback shift register. In some

machines the program counter addresses bundles of instructions rather than individual instructions. This is common with some stack machines where multiple instructions are packed into a memory word.

### Instruction Prefix

An instruction prefix applies to the following instruction to modify its operation. An instruction prefix may be used to add more bits to a following immediate constant, or to add additional register fields for the instruction. The prefix essentially extends the number of bits available to encode instructions. An instruction prefix usually locks out interrupts between the prefix and following instruction.

### Instruction Modifier

An instruction modifier is similar to an instruction prefix except that the modifier may apply to multiple following instructions.

### ISA

An acronym for Instruction Set Architecture. The group of instructions that an architecture supports. ISA's are sometimes categorized at extreme edges as RISC or CISC. RTF64 falls somewhere in between with features of both RISC and CISC architectures.

### **IPI**

An acronym for Inter-Processor-Interrupt. An inter-processor interrupt is an interrupt sent from one processor to another.

#### JIT

An acronym standing for Just-In-Time. JIT compilers typically compile segments of a program just before usage, and hence are called JIT compilers.

# **Keyed Memory**

A memory system that has a key associated with each page to protect access to the page. A process must have a matching key in its key list in order to access the memory page. The key is often 20 bits or larger. Keys for pages are usually cached in the processor for performance reasons. The key may be part of the paging tables.

### **Linear Address**

A linear address is the resulting address from a virtual address after segmentation has been applied.

### Machine Code

A code that the processing machine is able execute. Machine code is lowest form of code used for processing and is not usually delt with by programmers except in debugging cases. While it is possible to assemble machine code by hand usually a tool called an assembler is used for this purpose.

### Milli-code

A short sequence of code that may be used to emulate a higher-level instruction. For instance, a garbage collection write barrier might be written as milli-code. Milli-code may use an alternate link register to return to obtain better performance.

### Monadic

An instruction with just a single operand.

### MSI

An acronym for Message Signaled Interrupt. A message signaled interrupt is an interrupt processed using a message sent to a CPU using in-band resources.

# Opcode

A short form for operation code, a code that determines what operation the processor is going to perform. Instructions are typically made up of opcodes and operands.

# Operand

The data that an opcode operates on, or the result produced by the operation. Operands are often located in registers. Inputs to an operation are referred to as source operands, the result of an operation is a destination operand.

### **Physical Address**

A physical address is the final address seen by the memory system after both segmentation and paging have been applied to a virtual address. One can think of a physical address as one that is "physically" wired to the memory.

# Physical Memory Attributes (PMA)

Memory usually has several characteristics associated with it. In the memory system there may be several different types of memory, rom, static ram, dynamic ram, eeprom, memory mapped I/O devices, and others. Each type of memory device is likely to have different characteristics. These characteristics are called the physical memory attributes. Physical memory attributes are associated with address ranges that the memory is located in. There may be a hardware unit dedicated to verifying software is adhering to the attributes associated with the memory range. The hardware unit is called a physical memory attributes checker (PMA checker).

### PIC

An acronym for Position Independent Code. Position independent code is code that will execute properly no matter where it is located. The code may be moved in memory without needing to be modified.

### **Posits**

An alternate representation of numbers.

# **Program Counter**

A processor register dedicated to addressing instructions in memory. It is also often and perhaps more aptly called an instruction pointer. The program counter got its name because it usually increments (or counts) automatically after an instruction is fetched. In early machines in some rare cases the program counter did not count in a sequential binary fashion, but instead used other forms of a counter such as a grey counter or linear feedback shift register. In some machines the program counter addresses bundles of instructions rather than individual instructions. This is common with some stack machines where multiple instructions are packed into a memory word.

### **RAT**

Anacronym for Register Alias Table. The RAT stores mappings of architectural registers to physical registers.

### Retire

As in retire an instruction. This is the stage in processor in which the machine state is updated. Updates include the register file and memory. Buffers used for instruction storage are freed.

### **ROB**

An acronym for ReOrder Buffer. The re-order buffer allows instructions to execute out of order yet update the machine's state in order by tracking instruction state and variables. In FT64 the re-order buffer is a circular queue with a head and tail pointers. Instructions at the head are committed if done to the machine's state then the head advanced. New instructions are queued at the buffer's tail as long as there is room in the queue. Instructions in the queue may be processed out of the order that they entered the queue in depending on the availability of resources (register values and functional units).

### **RSB**

An acronym that stands for return stack buffer. A buffer of addresses used to predict the return address which increases processor performance. The RSB is usually small, typically 16 entries. When a return instruction is detected at time of fetch the RSB is accessed to determine the address of the next instruction to fetch. Predicting the return address allows the processing core to continuously fetch instructions in a speculative fashion without bubbles in the pipeline. The return address in the RSB may turn out to be detected as incorrect during execution of the return instruction, in which case the pipeline or instruction queue will need to be flushed and instructions fetched from the proper address.

### **SIMD**

An acronym that stands for 'Single Instruction Multiple Data'. SIMD instructions are usually implemented with extra wide registers. The registers contain multiple data items, such as a 128-bit register containing four 32-bit numbers. The same instruction is applied to all the data items in the register

at the same time. For some applications SIMD instructions can enhance performance considerably.

### Stack Pointer

A processor register dedicated to addressing stack memory. Sometimes this register is assigned by convention from the general register pool. This register may also sometimes index into a small dedicated stack memory that is not part of the main memory system. Sometimes machines have multiple stack pointers for different purposes, but they all work on the idea of a stack. For instance, in Forth machines there are typically two stacks, one for data and one for return addresses.

# **Telescopic Memory**

A memory system composed of layers where each layer contains simplified data from the topmost layer downwards. At the topmost layer data is represented verbatim. At the bottom layer there may be only a single bit to represent the presence of data. Each layer of the telescopic memory uses far less memory than the layer above. A telescopic memory could be used in garbage collection systems. Normally however the extra overhead of updating multiple layers of memory is not warranted.

### **TLB**

TLB stands for translation look-aside buffer. This buffer is used to store address translations for fast memory access in a system with an mmu capable of performing address translations.

# Trace Memory

A memory that traces instructions or data. As instructions are executed the address of the executing instruction is stored in a trace memory. The trace memory may then be dumped to allow debugging of software. The trace memory may compress the storage of addresses by storing branch status (taken or not taken) for consecutive branches rather than storing all addresses. It typically requires only a single bit to store the branch status. However, even when branches are traced, periodically the entire address of the program executing is stored. Often trace buffers support tracing thousands of instructions.

### Triadic

An instruction with three operands.

# **Vector Chaining**

Vector chaining is a form of pipelining used with vector processors. A CPU that supports vector chaining can begin processing additional vector instructions before previous ones are complete. The processing of vector instructions is overlapped.

# Vector Length (VL register)

The vector length register controls the maximum number of elements of a vector that are processed. The vector length register may not be set to a value greater than the number of elements supported by hardware. Vector registers often contain more elements than are required by program code. It would be wasteful to process all elements when only a few are needed. To improve the processing performance only the elements up to the vector length are examined.

# Vector Mask (VM)

A vector mask is used to restrict which elements of a vector are processed during a vector operation. A one bit in a mask register enables the processing for that element, a zero bit disables it. The mask register is commonly set using a vector set operation.

### Virtual Address

The address before segmentation and paging has been applied. This is the primary type of address a program will work with. Different programs may use the same virtual address range without being concerned about data being overwritten by another program. Although the virtual address may be the same the final physical addresses used will be different.

### Writeback

A stage in a pipelined processing core where the machine state is updated. Values are 'written back' to the register file.

### Miscellaneous

### Reference Material

Below is a short list of some of the reading material the author has studied. The author has downloaded a fair number of documents on computer architecture from the web. Too many to list.

Modern Processor Design Fundamentals of Superscalar Processors by John Paul Shen, Mikko H. Lipasti. Waveland Press, Inc.

Computer Architecture A Quantitative Approach, Second Edition, by John L Hennessy & David Patterson, published by Morgan Kaufman Publishers, Inc. San Franciso, California is a good book on computer architecture. There is a newer edition of the book available.

Memory Systems Cache, DRAM, Disk by Bruce Jacob, Spencer W. Ng., David T. Wang, Samuel Rodriguez, Morgan Kaufman Publishers

PowerPC Microprocessor Developer's Guide, SAMS publishing. 201 West 103<sup>rd</sup> Street, Indianapolis, Indiana, 46290

80386/80486 Programming Guide by Ross P. Nelson, Microsoft Press

Programming the 286, C. Vieillefond, SYBEX, 2021 Challenger Drive #100, Alameda, CA 94501

<u>Tech. Report UMD-SCA-2000-02 ENEE 446: Digital Computer Design — An Out-of-Order</u> RiSC-16

Programming the 65C816, David Eyes and Ron Lichty, Western Design Centre Inc.

Microprocessor Manuals from Motorola, and Intel,

<u>The SPARC Architecture Manual Version 8, SPARC International Inc, 535 Middlefield Road.</u> <u>Suite210 Menlo Park California, CA 94025</u>

The SPARC Architecture Manual Version 9, SPARC International Inc, Sab Jose California, PTR Prentice Hall, Englewood Cliffs, New Jersey, 07632

The MMIX processor: 5

RISCV 2.0 Spec, Andrew Waterman, Yunsup Lee, David Patterson, Krste Asanovi´c CS

Division, EECS Department, University of California, Berkeley

{waterman|vunsup|pattrsn|krste}@eecs.berkeley.edu

<u>The Garbage Collection Handbook, Richard Jones, Antony Hosking, Eliot Moss published</u> <u>by CRC Press 2012</u>

RISC-V Cryptography Extensions Volume I Scalar & Entropy Source Instructions See github.com/riscv/riscv-crypto for more information.

### **Trademarks**

IBM® is a registered trademark of International Business Machines Corporation. Intel® is a registered trademark of Intel Corporation. HP® is a registered trademark of Hewlett-Packard Development Company.\_"SPARC® is a registered trademark of SPARC International, Inc.

# WISHBONE Compatibility Datasheet

The Qupls3 core now uses the FTA bus which is not compatible with WISHBONE. Many signals serve a similar function to those on the WISHBONE bus so they are listed here. A bus bridge is required to interface FTA bus to WISHBONE as WISHBONE is a synchronous bus and FTA is asynchronous.

| WISHBONE Datasheet                                                                |                                                                                                     |                                                                      |
|-----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|
| WISHBONE SoC Architecture Specification, Revision B.3                             |                                                                                                     |                                                                      |
|                                                                                   |                                                                                                     |                                                                      |
| Description:                                                                      | Specifications:                                                                                     |                                                                      |
| General Description:                                                              | Central processing unit (CPU core)                                                                  |                                                                      |
|                                                                                   | MASTER, READ / WRITE                                                                                |                                                                      |
| Supported Cycles:                                                                 | MASTER, READ-MODIFY-WRITE                                                                           |                                                                      |
| Supported System.                                                                 | MASTER, BLOCK READ / WRITE, BURST READ (FIXED ADDRESS)                                              |                                                                      |
| Data port, size:                                                                  | 128 bit                                                                                             |                                                                      |
| Data port, granularity:                                                           | 8 bit                                                                                               |                                                                      |
| Data port, maximum operand size:                                                  | 128 bit                                                                                             |                                                                      |
| Data transfer ordering:                                                           | Little Endian                                                                                       |                                                                      |
| Data transfer sequencing                                                          | any (undefined)                                                                                     |                                                                      |
| Clock frequency<br>constraints:                                                   | tm_clk_i must be >= 10MHz                                                                           |                                                                      |
| Supported signal list and<br>cross reference to<br>equivalent WISHBONE<br>signals | Signal Name: Resp.ack_i Req.adr_o(31:0) clk_i resp.dat(127:0) req.dat(127:0) req.cyc req.stb req.wr | WISHBONE Equiv. ACK_I ADR_O() CLK_I DAT_I() DAT_O() CYC_O STB_O WE_O |

|                       | req.sel(7:0) | SEL_O |
|-----------------------|--------------|-------|
|                       | req.cti(2:0) | CTI_O |
|                       | req.bte(1:0) | BTE_O |
| Special Requirements: |              |       |