[Draw your reader in with an engaging abstract. It is typically a short summary of the document. When you're ready to add your content, just click here and start typing.]

# Thor2023

[Document subtitle]

**Robert Finch** 

# Table of Contents

| RF65000                                        | Error! Bookmark not defined. |
|------------------------------------------------|------------------------------|
| CC - Condition Codes Register Group            | 8                            |
| SR - Status Register                           | 9                            |
| AC – Application Control Register              | 10                           |
| VB – Vector Base Register                      | 12                           |
| Exceptions                                     | 14                           |
| Instruction Descriptions                       | 36                           |
| Major Opcode                                   | 36                           |
| Operand Sizes                                  | 39                           |
| Arithmetic Operations                          | 40                           |
| ABS – Absolute Value                           | 40                           |
| ADC – Add with Carry                           | Error! Bookmark not defined. |
| ADD - Addition                                 | 42                           |
| AND – Bitwise And                              | 43                           |
| BIT – Bitwise And                              | 44                           |
| CMP - Comparison                               | 46                           |
| CSR - Control and Special Registers Operations | 49                           |
| DIVS – Signed Division                         | 50                           |
| DIVU – Unsigned Division                       | 51                           |
| EOR – Bitwise Exclusive Or                     | 52                           |
| LOADQ - Load Quick Immediate                   | 53                           |
| MULS - Multiply Signed                         | 55                           |
| MULU – Unsigned Multiplication                 | 56                           |
| OR – Bitwise Or                                | 59                           |
| SUB - Subtraction                              | 60                           |
| Floating-Point Operations                      | 61                           |
| FCMP - Comparison                              | 61                           |
| Bit Manipulation Operations                    | 72                           |
| BCLR - Clear Bit                               | 76                           |
| BCHG – Change Bit                              | 77                           |
| BPCHG – Change Bit Pair                        | Error! Bookmark not defined. |
| BPCLR – Clear Bit Pair                         | Error! Bookmark not defined. |
| RPTST – Test Rit Pair                          | Error! Bookmark not defined. |

| BSET – Set Bit                           | 78                           |
|------------------------------------------|------------------------------|
| BTST – Test Bit                          | Error! Bookmark not defined. |
| Shift and Rotate Operations              | 82                           |
| ASL – Arithmetic Shift Left              | 82                           |
| ASR – Arithmetic Shift Right             | 83                           |
| LSL – Logical Shift Left                 | 84                           |
| LSR – Logical Shift Right                | 88                           |
| ROL – Rotate Left                        | 89                           |
| ROLC - Rotate Left through Carry         | Error! Bookmark not defined. |
| ROR – Rotate Right                       | 90                           |
| RORC - Rotate Right through Carry        | Error! Bookmark not defined. |
| SWAP – Swap register halves              | Error! Bookmark not defined. |
| Flow Control Instructions                | 93                           |
| ATRAP – Application Trap                 | Error! Bookmark not defined. |
| Bcc - Conditional Branch                 | 94                           |
| BRA – Unconditional Branch               | 96                           |
| BRK – Breakpoint                         | 96                           |
| BSR – Branch to Subroutine               | 96                           |
| DBcc - Decrement and Branch              | 97                           |
| JMP – Jump to Address                    | 98                           |
| JSR – Jump to Subroutine                 | 99                           |
| RTI – Return From Interrupt              | 101                          |
| ARTI - Application Return from Interrupt | Error! Bookmark not defined. |
| RTS – Return from Subroutine             | Error! Bookmark not defined. |
| TRAP – Trap                              | 102                          |
| TRAPV – Trap on Overflow                 | Error! Bookmark not defined. |
| Memory Operations                        | 103                          |
| Flag Updates                             | Error! Bookmark not defined. |
| LOAD Rn, <sea></sea>                     | 109                          |
| LOAD Reglist, <sea></sea>                | Error! Bookmark not defined. |
| PEA <ea></ea>                            | Error! Bookmark not defined. |
| STORE Rn, <dea></dea>                    | 115                          |
| STORE #imm, <dea></dea>                  | Error! Bookmark not defined. |
| STORE Reglist, <dea></dea>               | Error! Bookmark not defined. |

### **Thor2023**

### Nomenclature

The ISA refers to primitive object sizes following the convention suggested by Knuth of using Greek.

| Number of Bits |       | Instructions | Comment          |
|----------------|-------|--------------|------------------|
| 8              | byte  | LDB, STB     | UTF8 usage       |
| 16             | wyde  | LDW, STW     |                  |
| 24             | char  | LDC, STC     | UTF24 usage      |
| 32             | tetra | LDT, STT     |                  |
| 40             | penta | LDP, STP     | Instruction size |
| 64             | octa  | LDO, STO     |                  |
| 96             |       | LDN, STN     |                  |

The register used to address instructions is referred to as the instruction pointer or IP register. The instruction pointer is a synonym for instruction pointer or PC register.

# Endian

Thor2023 is a little-endian machine. The difference between big endian and little endian is in the ordering of bytes in memory. Bits are also numbered from lowest to highest for little endian and from highest to lowest for big endian.

Shown is an example of a 32-bit word in memory.

#### Little Endian:

| Address | 3 | 2 | 1 | 0 |
|---------|---|---|---|---|
| Byte    | 3 | 2 | 1 | 0 |

#### Big Endian:

| Address | 3 | 2 | 1 | 0 |
|---------|---|---|---|---|
| Byte    | 0 | 1 | 2 | 3 |

For Thor2023 the root opcode is in byte zero of the instruction and bytes are shown from right to left in increasing order. As the following table shows.

| ĺ | Address 3 | Address 2 | Address 1 | Address 0 |
|---|-----------|-----------|-----------|-----------|
| I | Byte 3    | Byte 2    | Byte 1    | Byte 0    |

| 31   | 24                 | 23   | 16               | 15  | 8                 | 7 | 5          | 4   | 0                 |
|------|--------------------|------|------------------|-----|-------------------|---|------------|-----|-------------------|
| Cons | stant <sub>8</sub> | Rasj | pec <sub>8</sub> | Rts | spec <sub>8</sub> | S | <b>Z</b> 3 | Opc | code <sub>5</sub> |

# **Block Diagram**



# Programming Model Register File

# Rn – General Purpose Registers

The register file contains 64 96-bit general purpose registers. The register file is *unified*; register may hold integer or floating-point values. The stack pointer is banked with a separate stack pointer for each operation mode.

| Regno | ABI        | ABI Usage                               |
|-------|------------|-----------------------------------------|
| 0     | 0          | Always zero                             |
| 1     | A0         | First argument / return value register  |
| 2     | A1         | Second argument / return value register |
| 3     | A2         | Third argument register                 |
| 4     | A3         | Fourth argument register                |
| 5     | T0         | Temporary register, caller save         |
| 6     | T1         | Temporary register                      |
| 7     | T2         | Temporary register                      |
| 8     | T3         | Temporary register                      |
| 9     | T4         | Temporary register                      |
| 10    | T5         | Temporary register                      |
| 11    | T6         | Temporary register                      |
| 12    | T7         | Temporary register                      |
| 13    | T8         | Temporary register                      |
| 14    | T9         | Temporary register                      |
| 15    | S0         | Saved register, register variables      |
| 16    | S1         | Saved register                          |
| 17    | S2         | Saved register                          |
| 18    | <b>S</b> 3 | Saved register                          |
| 19    | S4         | Saved register                          |
| 20    | S5         | Saved register                          |
| 21    | S6         | Saved register                          |
| 22    | S7         | Saved register                          |
| 23    | S8         | Saved register                          |
| 24    | S9         | Saved register                          |
| 25    | A4         | Fifth Argument register                 |
| 26    | A5         | Argument register                       |
| 27    | A6         | Argument register                       |
| 28    | A7         | Argument register                       |
| 29    | A8         | Argument register                       |
| 30    | A9         | Argument register                       |
| 31    |            |                                         |

| 32 | M0  | Vector mask                                             |  |
|----|-----|---------------------------------------------------------|--|
| 33 | M1  | Vector mask                                             |  |
| 34 | M2  | Vector mask                                             |  |
| 35 | M3  | Vector mask                                             |  |
| 36 | M4  | Vector mask                                             |  |
| 37 | M5  | Vector mask                                             |  |
| 38 | M6  | Vector mask                                             |  |
| 39 | M7  | Vector mask                                             |  |
| 40 |     |                                                         |  |
| 41 |     |                                                         |  |
| 42 |     |                                                         |  |
| 43 |     |                                                         |  |
| 44 |     |                                                         |  |
| 45 |     |                                                         |  |
| 46 |     |                                                         |  |
| 47 |     |                                                         |  |
| 48 |     |                                                         |  |
| 49 |     |                                                         |  |
| 50 |     |                                                         |  |
| 51 |     |                                                         |  |
| 52 | CTP | Card table pointer                                      |  |
| 53 | PC  | Program counter / Status Register                       |  |
| 54 | SC  | Stack canary; a LOAD does CCHK                          |  |
| 55 | LC  | Loop counter                                            |  |
| 56 | LR0 | Subroutine link register #0; branch subroutine specific |  |
| 57 | LR1 | Subroutine link register #1                             |  |
| 58 | LR2 | Subroutine link register #2                             |  |
| 59 | LR3 | Subroutine link register #3                             |  |
| 60 | GP1 | Global Pointer #1                                       |  |
| 61 | GP0 | Global Pointer #0                                       |  |
| 62 | FP  | Frame Pointer                                           |  |
| 63 | SP  | Stack Pointer                                           |  |
| 63 | ASP | Application Stack pointer                               |  |
| 63 | SSP | System Stack pointer                                    |  |

|  | AC | Application Control Register |  |
|--|----|------------------------------|--|
|--|----|------------------------------|--|

# Predicate Registers

The original Thor machine had 16 four-bit dedicated predicate registers. Thor 2023 by contrast stores predicate conditions in general purpose registers. Any GPR may be used to hold values used in predication. Original Thor predicates were a prefix byte containing the predicate register and condition present for every instruction. This has been superseded using the predicate instruction modifier, PRED, which allows up to eight following instructions to be predicated in the same manner. The PRED modifier is more storage efficient than predicating every instruction with predicate bits as most instructions do not require predication.

# Mask Registers (m0 to m7)

Mask registers are used to mask off vector operations so that a vector instruction doesn't perform the operation on all elements of the vector. Vector instructions (loads and stores) that don't explicitly specify a mask register assume the use of mask register zero (m0). Mask registers are a subset of the general-purpose register array, allowing instructions that operate on GPRs to operate on the mask registers.

Thor 2022 had dedicated mask registers leading to additional instructions required to manipulate them.

| Register | Tag | Usage                           |
|----------|-----|---------------------------------|
| m0       | 32  | contains all ones by convention |
| m1       | 33  |                                 |
| m2       | 34  |                                 |
| m3       | 35  |                                 |
| m4       | 36  |                                 |
| m5       | 37  |                                 |
| m6       | 38  |                                 |
| m7       | 39  |                                 |

# Vector Length (VL register)

The vector length register controls how many elements of a vector are processed. The vector length register may not be set to a value greater than the number of elements supported by hardware. After the vector length is set a SYNC instruction should be used to ensure that following instructions will see the updated version of the length register.

Vector length has register tag #87.

| 15 |   | 8 | 7 |                        | 0 |
|----|---|---|---|------------------------|---|
|    | 0 |   |   | Elements <sub>70</sub> |   |

### **Code Address Registers**

Many architectures have registers dedicated to addressing code. Almost every modern architecture has a program counter or instruction pointer register to identify the location of

instructions. Many architectures also have at least one link register or return address register holding the address of the next instruction after a subroutine call. There are also dedicated branch address registers in some architectures. These are all code addressing registers.

The original Thor lumped these registers together in a code address register array.

#### LRn – Link Registers

There are four registers in the Thor2023 architecture reserved for subroutine linkage. These registers are used to store the address of the calling instruction. They may be used to implement fast returns for several levels of subroutines or to used to call milli-code routines. The jump to subroutine, <u>JSR</u>, and branch to subroutine, <u>BSR</u>, instructions update a link register. The return from subroutine,. <u>RTS</u>, instruction may reload the program counter with an offset from the value contained in a link register. Typically, this is used to return to the next instruction.

### PC – Program Counter

This register points to the currently executing instruction. The program counter increments as instructions are fetched, unless overridden by another flow control instruction.

# **LC** - Loop Counter

The loop counter register is used in counted loops along the decrement and branch, <u>DBcc</u>, instruction.

# **SR - Status Register**

The processor status register holds bits controlling the overall operation of the processor. The status register is not accessible in user mode.

| 31  | 24 | 23 | 21 | 20 | 16 | 15 | 13 12 | 11 | 10 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1  | 0 |
|-----|----|----|----|----|----|----|-------|----|------|---|---|---|---|---|---|----|---|
| CPL |    | `  | ,  |    | ~  | T  | OM    |    | IPL  |   |   | D | D | A | R | R' |   |

CPL is the current privilege level the processor is operating at.

T indicates that trace mode is active.

OM processor operating mode.

AR: Address Range indicates the number of address bits in use. 0 = near or short (32-bit) addressing is in use. When short addressing is in use only the low order 32-bit are significant and stored or loaded to or from the stack.

IPL is the interrupt mask level

RT specifies the return type for an RTI instruction.

#### Decimal Mode

Setting the 'D' flag bit 5 in the SR register sets the processor in decimal operating mode. Arithmetic operations will use BCD numbers for both source and destination operands.

Decimal mode, 'D' flag bit 4, may also be applied to floating-point which will use decimal floating-point operations instead of binary.

# **Special Purpose Registers**

### $[U/S/H/M]_IE (0x?004)$

This register contains interrupt enable bits. The register is present at all operating levels. Only enable bits at the current operating level or lower are visible and may be set or cleared. Other bits will read as zero and ignore writes. Only the lower four bits of this register are implemented. The bits have individual bit set / clear capability using the CSRRS, CSRRC instructions.

| 63 |   | 4 | 3   | 2   | 1   | 0   |  |
|----|---|---|-----|-----|-----|-----|--|
|    | ~ |   | mie | hie | sie | uie |  |

#### [U/S/H/M]\_CAUSE (CSR- 0x?006)

This register contains a code indicating the cause of an exception or interrupt. The break handler will examine this code to determine what to do. Only the low order 16 bits are implemented. The high order bits read as zero and are not updateable.

### $[U/S/H/M]_SCRATCH - CSR 0x?041$

This is a scratchpad register. Useful when processing exceptions. There is a separate scratch register for each operating mode.

#### S\_PTBR (CSR 0x1003)

This register contains the base address of the page table, which must be a multiple of 16384. Also included in this register is table parameters depth and type. Register tag #152.

| 95 14                              | 13 12 | 11 8   | 7 6    | 5 4 | 3 | 2 1 | 0    |
|------------------------------------|-------|--------|--------|-----|---|-----|------|
| Page Table Address <sub>6714</sub> | ~2    | Levels | $AL_2$ | ~2  | S | ~   | Type |

Type: 0 = inverted page table, 1 = page table

S: 1=software managed TLB miss, 0 = hardware table walking

Levels are ignored for the inverted page table. For a normal page table gives the top entry level.

AL<sub>2</sub>: TLB entry replacement algorithm, 0=fixed,1=LRU,2=random,3=reserved

### S\_ASID (CSR 0x101F)

This register contains the address space identifier (ASID) or memory map index (MMI). The ASID is used in this design to select (index into) a memory map in the paging tables. Only the low order eight bits of the register are implemented.

### S KEYS (CSR 0x1020 to 0x1027)

These eight registers contain the collection of keys associated with the process for the memory lot system. Each key is twenty-four bits in size. All eight registers are searched in parallel for keys matching the one associated with the memory page. Keyed memory enhances the security and reliability of the system.

|      |  | 23   | 0 |
|------|--|------|---|
| 1020 |  | key0 |   |
| 1021 |  | key1 |   |
|      |  | •••  |   |
| 1027 |  | key7 |   |

#### M\_CORENO (CSR 0x3001)

This register contains a number that is externally supplied on the coreno\_i input bus to represent the hardware thread id or the core number.

### M\_TICK (CSR 0x3002)

This register contains a tick count of the number of clock cycles that have passed since the last reset. Note that this register should not be used for precise timing as the processor's clock frequency may vary for performance and power reasons. The TIME CSR may be used for wall-clock timing as it has its own timing source.

#### M\_SEED (CSR 0x3003)

This register contains a random seed value based on an external entropy collector. The most significant bit of the state is a busy bit.

| 63 60              | 59 16 | 15 0               |
|--------------------|-------|--------------------|
| State <sub>4</sub> | ~44   | seed <sub>16</sub> |

| State <sub>4</sub> Bit |                                                         |
|------------------------|---------------------------------------------------------|
| 0                      | dead                                                    |
| 1                      | test                                                    |
| 2                      | valid, the seed value is valid                          |
| 3                      | Busy, the collector is busy collecting a new seed value |

#### M\_BADADDR (CSR 0x3007)

This register contains the effective address for a load / store operation that caused a memory management exception or a bus error. Note that the address of the instruction causing the exception is available in the EIP register.

### M BAD INSTR (CSR 0x300B)

This register contains a copy of the exceptioned instruction.

#### M SEMA (CSR 0x300C)

This register contains semaphores. The semaphores are shared between all cores in the MPU.

### M\_IOS – IO Select Register (CSR 0x3100)

The location of IO is determined by the contents of the IOS control register. The select is for a 1MB region. This address is a virtual address. The low order 16 bits of this register should be zero and are ignored.

| 63 |                                 | 16 | 15 |          | 0 |
|----|---------------------------------|----|----|----------|---|
|    | Virtual Address <sub>6720</sub> |    |    | $0_{16}$ |   |

### SC - Stack Canary

This special purpose register is available in the general register file as register 54. The stack canary register is used to alleviate issues resulting from buffer overflows on the stack. The canary register contains a random value which remains consistent throughout the run-time of a program.

In the right conditions, the canary register is written to the stack during the function's prolog code. In the function's epilog code, the value of the canary on stack is checked to ensure it is correct, if not a check exception occurs.

### AV – Application Vector Table Address

This register holds the address of the applications vector table. The vector table must be 16-byte aligned.



#### VB – Vector Base Register

The vector base register provides the location of the vector table. The vector table must be octa aligned. On reset the VBR is loaded with zero. There is a separate vector base register for each operating mode.

| 63                                  | 3 | 2 | 10 |
|-------------------------------------|---|---|----|
| Vector Table Address <sub>633</sub> |   | ٧ | ~  |

# **Operating Modes**

The core operates in one of four basic modes: application/user mode, supervisor mode, hypervisor mode or machine mode. Machine mode is switched to when an interrupt or exception occurs, or when debugging is triggered. On power-up the core is running in machine mode. An RTI instruction must be executed to leave machine mode after power-up.

A subset of instructions is limited to machine mode.

| Mode Bits | Mode       |
|-----------|------------|
| 0         | User / App |
| 1         | Supervisor |
| 2         | Hypervisor |
| 3         | Machine    |

Tags

| Tag |            |                                                |  |  |  |  |  |
|-----|------------|------------------------------------------------|--|--|--|--|--|
| 0   | Untagged   | Untagged                                       |  |  |  |  |  |
| 1   | Address Po | Address Pointer – 20 bit size + 64 bit pointer |  |  |  |  |  |
|     | Subtype    |                                                |  |  |  |  |  |
|     | 0          | Unused                                         |  |  |  |  |  |
|     | 1          | Return address                                 |  |  |  |  |  |
|     | 2          | Frame Pointer                                  |  |  |  |  |  |
|     | 3          | Pointer                                        |  |  |  |  |  |
|     | 4 to 7     | Unassigned                                     |  |  |  |  |  |
|     |            |                                                |  |  |  |  |  |

| 2  | Integer 96 bits                                                   |
|----|-------------------------------------------------------------------|
| 3  | Integer 64 - bits                                                 |
| 4  | Integer 32 - bits                                                 |
| 5  | Integer 16 - bits                                                 |
| 6  | Integer 8 - bits                                                  |
| 8  | Float 96 bits                                                     |
| 9  | Float 64 bits                                                     |
| 10 | Float 32 bits                                                     |
| 11 | Float 16-bits                                                     |
| 12 | Float 8-bits                                                      |
| 16 | String Descriptor – 24 bit length, 64 bit virtual address pointer |
| 17 | Character data, three 32-bit characters                           |
| 18 | Character data, four 24-bit characters                            |
| 19 | Character data, 12 8-bit characters                               |
|    |                                                                   |
| 63 | Instructions 40-bit parcels                                       |

# **Exceptions**

# **External Interrupts**

There is little difference between an externally generated exception and an internally generated one. An externally caused exception will set the exception cause code for the currently fetched instruction.

There are eight priority interrupt levels for external interrupts. When an external interrupt occurs the mask level is set to the level of the current interrupt. A subsequent interrupt must exceed the mask level to be recognized.

### Effect on Machine Status

The operating mode is always switched to machine mode on exception. It is up to the machine mode code to redirect the exception to a lower operating mode when desired. Further exceptions at the same or lower interrupt level are disabled automatically. Machine mode code must enable interrupts at some point.

# **Exception Stack**

The status register, program counter, and predicate group register are pushed onto an internal stack when an exception occurs. This stack is at least 16 entries deep to allow for nested interrupts and multiply nested traps and exceptions.

#### **Exception Table**

| Vector   | Usage                                |
|----------|--------------------------------------|
| 0        | Reset value for system stack pointer |
| 1        | Reset value for program counter      |
| 2        | Bus Error                            |
| 3        | Address Error                        |
| 4        | Unimplemented Instruction            |
| 5        |                                      |
| 6        |                                      |
| 7        |                                      |
| 8        | Privilege Violation                  |
| 9        | Instruction trace                    |
| 10       |                                      |
| 11       | Stack Canary                         |
| 12 to 23 | reserved                             |
| 24       | Spurious interrupt                   |
| 25       | Auto vector #1                       |
| 26       | Auto vector #2                       |
| 27       | Auto vector #3                       |
| 28       | Auto vector #4                       |
| 29       | Auto vector #5                       |
| 30       | Auto vector #6                       |
| 31       | Auto vector #7                       |

| 32        | Breakpoint (BRK)   |
|-----------|--------------------|
| 33 to 63  | Trap #1 to 31      |
|           | Applications Usage |
| 64        | Divide by zero     |
| 65        | Overflow           |
| 65 to 511 | Unassigned usage   |
|           |                    |

# Reset

Reset is treated as an exception. The reset routine should exit using an RTI instruction. The status register should be setup appropriately for the return.

The core begins executing instructions at address \$00...00. All registers are in an undefined state.

# Precision

Exceptions in Thor2023 are precise. They are processed according to program order of the instructions. If an exception occurs during the execution of an instruction, then an exception field is set in the pipeline buffer. The exception is processed when the instruction commits which happens in program order. If the instruction was executed in a speculative fashion, then no exception processing will be invoked unless the instruction makes it to the commit stage.

# Memory Management Bank Swapping

About the simplest form of memory management is a single bank register that selects the active memory bank. This is the mechanism used on many early microcomputers. The bank register may be an eight bit I/O port supplying control over some number of upper address bits used to access memory.

# The Page Map

The next simplest form of memory management is a single table map of virtual to physical addresses. The page map is often located in a high speed dedicated memory. An example of a mapping table is the 74LS612 chip. It may map four address bits on the input side to twelve address bits on the output side. This allows a physical address range eight bits greater than the virtual address range. A more complicated page map is something like the MC6829 MMU. It may map 2kB pages in a 2MB physical address space for up to four different tasks.

# Regions

In any processing system there are typically several different types of storage assigned to different physical address ranges. These include memory mapped I/O, MMIO, DRAM, ROM, configuration space, and possibly others. Thor2023 has a region table that supports up to eight separate regions.

The region table is a list of region entries. Each entry has a start address, an end address, an access type field, and a pointer to the PMT, page management table. To determine legal access types, the physical address is searched for in the region table, and the corresponding access type returned. The search takes place in parallel for all eight regions.

Once the region is identified the access rights for a particular page within the region can be found from the PMT corresponding to the region.

# PMA - Physical Memory Attributes Checker

### **Overview**

The physical memory attributes checker is a hardware module that ensures that memory is being accessed correctly according to its physical attributes.

Physical memory attributes are stored in an eight-entry region table. Three bits in the PTE select an entry from this table. The operating mode of the CPU also determines which 32-bit set of attributes to apply for the memory region.

Most of the entries in the table are hard-coded and configured when the system is built. However, they may be modified at the address range \$F...F9F0xxx.

Physical memory attributes checking is applied in all operating modes.

The region table is accessible as a memory mapped IO, MMIO, device.

# **Region Table Description**

| Reg      | Bits |     |                                                                                                        |
|----------|------|-----|--------------------------------------------------------------------------------------------------------|
| 00       | 128  | Pmt | associated PMT address                                                                                 |
| 01       | 128  | cta | Card table address                                                                                     |
| 02       | 128  | at  | Four groups of 32-bit memory attributes, 1 group for each of user, supervisor, hypervisor and machine. |
| 03       | 128  |     | Not used                                                                                               |
| 04 to 1F |      |     | 7 more register sets                                                                                   |

### **PMT Address**

The PMT address specifies the location of the associated PMT.

### CTA – Card Table Address

The card table address is used during the execution of the store pointer, STPTR instruction to locate the card table.

### Attributes

| Bitno |   |                             |  |  |  |  |  |  |  |  |
|-------|---|-----------------------------|--|--|--|--|--|--|--|--|
| 0     | X | may contain executable code |  |  |  |  |  |  |  |  |
| 1     | W | may be written to           |  |  |  |  |  |  |  |  |
| 2     | R | may be read                 |  |  |  |  |  |  |  |  |
| 3     | ~ | reserved                    |  |  |  |  |  |  |  |  |
| 4-7   | С | Cache-ability bits          |  |  |  |  |  |  |  |  |
| 8-10  | G | granularity                 |  |  |  |  |  |  |  |  |
|       |   | G                           |  |  |  |  |  |  |  |  |
|       |   | 0 byte accessible           |  |  |  |  |  |  |  |  |
|       |   | 1 wyde accessible           |  |  |  |  |  |  |  |  |
|       |   | 2 tetra accessible          |  |  |  |  |  |  |  |  |
|       |   | 3 octa accessible           |  |  |  |  |  |  |  |  |
|       |   | 4 hexi accessible           |  |  |  |  |  |  |  |  |
|       |   | 5 to 7 reserved             |  |  |  |  |  |  |  |  |
| 11    | ~ | reserved                    |  |  |  |  |  |  |  |  |

| 12-14 | S | number of times to shift address to right and store for telescopic STPTR stores. |
|-------|---|----------------------------------------------------------------------------------|
| 16-23 | T | device type (rom, dram, eeprom, I/O, etc)                                        |
| 24-31 | ~ | reserved                                                                         |

# Page Management Table - PMT

### **Overview**

For the first translation of a virtual to physical address, after the physical page number is retrieved from the TLB, the region is determined, and the page management table is referenced to obtain the access rights to the page. PMT information is loaded into the TLB entry for the page translation. The PMT contains an assortment of information most of which is managed by software. Pieces of information include the key needed to access the page, the privilege level, and read-write-execute permissions for the page. The table is organized as rows of access rights table entries (PMTEs). There are as many PMTEs as there are pages of memory in the region.

For subsequent virtual to physical address translations PMT information is retrieved from the TLB.

As the page is accessed in the TLB, the TLB may update the PMT.

### Location

The page management table is in main memory and may be accessed with ordinary load and store instructions. The PMT address is specified by the region table.

The PMT is implemented as a dual-read-write port RAM that allows hardware to update it at high speed during a memory access. The current PMT can provide information for 16384 pages. These pages may be DRAM, ROM, MMIO or other types.

# **PMTE Description**

There is a wide assortment of information that goes in the page management table. To accommodate all the information an entry size of 128-bits was chosen.

Page Management Table Entry

| V | N                          | M      | ~9                | C | Е | $AL_2$ | ~16                       |  |  |  |  |  |
|---|----------------------------|--------|-------------------|---|---|--------|---------------------------|--|--|--|--|--|
|   |                            |        | ACL <sub>16</sub> |   |   |        | Share Count <sub>16</sub> |  |  |  |  |  |
|   | Access Count <sub>32</sub> |        |                   |   |   |        |                           |  |  |  |  |  |
|   |                            | $PL_8$ |                   |   |   |        | Key <sub>24</sub>         |  |  |  |  |  |

### **Access Control List**

The ACL field is a reference to an associated access control list.

### **Share Count**

The share count is the number of times the page has been shared to processes. A share count of zero means the page is free.

### **Access Count**

This part uses the term 'access count' to refer to the number of times a page is accessed. This is usually called the reference count, but that phrase is confusing because reference counting may also refer to share counts. So, the phrase 'reference count' is avoided. Some texts use the term reference count to refer to the share count. Reference counting is used in many places in software and refers to the number of times something is referenced.

Every time the page of memory is accessed, the access count of the page is incremented. Periodically the access count is aged by shifting it to the right one bit.

The access count may be used by software to help manage the presence of pages of memory.

# Key

The access key is a 24-bit value associated with the page and present in the key ring of processes. The keyset is maintained in the keys CSRs. The key size of 20 bits is a minimum size recommended for security purposes. To obtain access to the page it is necessary for the process to have a matching key OR if the key to match is set to zero in the PMTE then a key is not needed to access the page.

# **Privilege Level**

The current privilege level is compared with the privilege level of the page, and if access is not appropriate then a privilege violation occurs. For data access, the current privilege level must be at least equal to the privilege level of the page. If the page privilege level is zero anybody can access the page.

### N

indicates a conforming page of executable code. Conforming pages may execute at the current privilege level. In which case the PL field is ignored.

### M

indicates if the page was modified, written to, since the last time the M bit was cleared. Hardware sets this bit during a write cycle.

### $\mathbf{E}$

indicates if the page is encrypted.

### AL

indicates the compression algorithm used.

### C

The C indicator bit indicates if the page is compressed.

# Page Tables

### **Intro**

Page tables are part of the memory management system used map virtual addresses to real physical addresses. There are several types of page tables. Hierarchical page tables are probably the most common. Almost all page tables map only the upper bits of a virtual address, called a page. The lower bits of the virtual address are passed through without being altered. The page size often 4kB which means the low order 12-bits of a virtual address will be mapped to the same 12-bits for the physical address.

# **Hierarchical Page Tables**

Hierarchical page tables organize page tables in a multi-level hierarchy. They can map the entire virtual address range. At the topmost level a register points to a page directory, that page directory points to a page directory at a lower level until finally a page directory points to a page containing page table entries. To map an entire 64-bit virtual address range approximately five levels of tables are required.

# Paged MMU Mapping



### **Inverted Page Tables**

An inverted page table is a table used to store address translations for memory management. The idea behind an inverted page table is that there is a fixed number of pages of memory no matter how it is mapped. It should not be necessary to provide for a map of every possible address, only addresses that correspond to real pages of memory. Each page of memory can be allocated only once. It is either allocated or it is not. Compared to a non-inverted paged memory management system where tables are used to map potentially the entire address space an inverted page table uses less memory. There is typically only a single inverted page table supporting all applications

in the system. This is a different approach than a non-inverted page table which may provide separate page tables for each process.

# The Simple Inverted Page Table

The simplest inverted page table contains only a record of the virtual address mapped to the page, and the index into the table is used as the physical page number. There are only as many entries in the inverted page table as there are physical pages of memory. A translation can be made by scanning the table for a matching virtual address, then reading off the value of the table index. The attraction of an inverted page table is its small size compared to the typical hierarchical page table. Unfortunately, the simplest inverted page table is not practical when there are thousands or millions of pages of memory. It simply takes too long to scan the table. The alternative solution to scanning the table is to hash the virtual address to get a table index directly.

# Inverted Page Table

Entry number identifies physical page number



# **Hashed Page Tables**

#### Hashed Table Access

Hashes are great for providing an index value immediately. The issue with hash functions is that they are just a hash. It is possible that two different virtual address will hash to the same value. What is then needed is a way to deal with these hash collisions. There are a couple of different methods of dealing with collisions. One is to use a chain of links. The chain has each link in the chain pointing the to next page table entry to use in the event of a collision. The hash page table is slightly more complicated then as it needs to store links for hash chains. The second method is to use open addressing. Open addressing calculates the next page table entry to use in the event of a collision. The calculation may be linear, quadratic or some other function dreamed up. A linear probe simply chooses the next page table entry in succession from the previous one if no match occurred. Quadratic probing calculates the next page table entry to use based on squaring the count of misses.

# **Shared Memory**

Another memory management issue to deal with is shared memory. Sometimes applications share memory with other apps for communication purposes, and to conserve memory space where there

are common elements. With a hierarchical paged memory management system, it is easy to share memory, just modify the page table entry to point to the same physical memory as is used by another process. With an inverted page table having only a single entry for each physical page is not sufficient to support shared memory. There needs to be multiple page table entries available for some physical pages but not others because multiple virtual addresses might map to the same physical address. One solution would be to have multiple buckets to store virtual addresses in for each physical address. However, this would waste a lot of memory because much of the time only a single mapped address is needed. There must be a better solution. Rather than reading off the table index as the physical page number, the association of the virtual and physical address can be stored. Since we now need to record the physical address multiple times the simple mechanism of using the table index as the physical page number cannot be used. Instead, the physical page number needs to be stored in the table in addition to the virtual page number.

That means a table larger than the minimum is required. A minimally sized table would contain only one entry for each physical page of memory. So, to allow for shared memory the size of the table is doubled. This smells like a system configuration parameter.

# **Specifics: Thor2023 Page Tables**

# **Thor2023 Hash Page Table Setup**

#### Hash Page Table Entries - HPTE

We have determined that a page table entry needs to store both the physical page number and the virtual page number for the translations. To keep things simple, the page table stores only the information needed to perform an address translation. Other bits of information are stored in a secondary table called the page management table, PMT. The author did a significant amount of juggling around the sizes of various fields, mainly the size of the physical and virtual page numbers. Finally, the author decided on a 192-bit HPTE format.

| V | LVL/BC5                                       | RGN <sub>3</sub> | M | Α | T | S | G | $SW_2$ | CACHE <sub>4</sub> | MRWX <sub>3</sub> | HRWX <sub>3</sub> | SRWX <sub>3</sub> | URWX <sub>3</sub> |
|---|-----------------------------------------------|------------------|---|---|---|---|---|--------|--------------------|-------------------|-------------------|-------------------|-------------------|
|   | PPN <sub>310</sub>                            |                  |   |   |   |   |   |        |                    |                   |                   |                   |                   |
|   | PPN <sub>6332</sub>                           |                  |   |   |   |   |   |        |                    |                   |                   |                   |                   |
|   | VPN <sub>376</sub>                            |                  |   |   |   |   |   |        |                    |                   |                   |                   |                   |
|   | VPN <sub>6938</sub>                           |                  |   |   |   |   |   |        |                    |                   |                   |                   |                   |
|   | ~4 ASID <sub>110</sub> ~2 VPN <sub>8370</sub> |                  |   |   |   |   |   |        |                    |                   |                   |                   |                   |

#### Fields Description

| V      | 1  | translation Valid               |
|--------|----|---------------------------------|
| G      | 1  | global translation              |
| RGN    | 3  | region                          |
| PPN    | 64 | Physical page number            |
| VPN    | 84 | Virtual page number             |
| RWX    | 3  | readable, writeable, executable |
| ASID   | 12 | address space identifier        |
| LVL/BC | 5  | bounce count                    |
| M      | 1  | modified                        |
| A      | 1  | accessed                        |

| T  | 1 | PTE type (not used)   |
|----|---|-----------------------|
| S  | 1 | Shared page indicator |
| SW | 3 | OS usage              |

The page table does not include everything needed to manage pages of memory. There is additional information such as share counts and privilege levels to take care of, but this information is better managed in a separate table.

### Small Hash Page Table Entries - SHPTE

The small HPTE is used for the test system which contains only 512MB of physical RAM to conserve hardware resources. The SHPTE is 96-bits in size.

| V LVL/BC5         | RGN <sub>3</sub>    | M | Α | T | S | G | $SW_2$ | CACHE <sub>4</sub> | MRWX3               | HRWX3 | SRWX <sub>3</sub> | URWX3 |
|-------------------|---------------------|---|---|---|---|---|--------|--------------------|---------------------|-------|-------------------|-------|
| VPN <sub>50</sub> |                     |   |   |   |   |   |        | PPN250             |                     |       |                   |       |
|                   | ASID <sub>110</sub> |   |   |   |   |   |        |                    | VPN <sub>25 6</sub> |       |                   |       |

### Page Table Groups – PTG

We want the search for translations to be fast. That means being able to search in parallel. So, PTEs are stored in groups that are searched in parallel for translations. This is sometimes referred to as a clustered table approach. Access to the group should be as fast as possible. There are also hardware limits to how many entries can be searched at once while retaining a high clock rate. So, the convenient size of 1024 bits was chosen as the amount of memory to fetch.

A page table group then contains eight page-table entries. All entries in the group are searched in parallel for a match. Note that the entries are searched as the PTG is loaded, so that the PTG group load may be aborted early if a matching PTE is found before the load is finished.

| 127 |      | 0 |
|-----|------|---|
|     | PTE0 |   |
|     | PTE1 |   |
|     | PTE2 |   |
|     | PTE3 |   |
|     | PTE4 |   |
|     | PTE5 |   |
|     | PTE6 |   |
|     | PTE7 |   |

### Size of Page Table

There are several conflicting elements to deal with, with regards to the size of the page table. Ideally, the page table is small enough to fit into the block RAM resources available in the FPGA. So, about 1/3 of the block RAMs available are dedicated to MMU use. At the same time a multiple of the number of physical pages of memory should be supported to support page sharing and swapping pages to secondary storage. To support swapping pages, double the number of physical entries were chosen. To support page sharing, double that number again. Therefore, a minimum size of a page table would contain at least four times the number of physical pages for entries. By setting the size of the page table instead of the size of pages, it can be worked backwards how many pages of memory can be supported.

For a system using 512k block RAM to store PTEs. 512k / 32 = 16384 entries. 16384 / 4 = 4096 physical pages. Since the RAM size is 512MB, each page would be 512MB/4096 = 128kB. Since half the pages may be in secondary storage, 1GB of address range is available.

Since there are 16,384 entries in the table and they are grouped into groups of eight, there are 2048 PTGs. To get to a page table group fast a hash function is needed then that returns a 11-bit number.

#### **Hash Function**

The hash function needs to reduce the size of a virtual address down to a 11-bit number. The asid should be considered part of the virtual address. Including the asid an address is 76 bits. The first thing to do is to throw away the lowest fourteen bits as they pass through the MMU unaltered. We now have 62-bits to deal with. We can probably throw away some high order bits too, as a process is not likely to use the full 64-bit address range.

The hash function chosen uses the asid combined with virtual address bits 18 to 28 and bits 29 to 39. This should space out the PTEs according to the asid. Address bits 16 and 17 select one of four address ranges. the PTG supports eight PTEs. The translations where address bits 16 and 17 are involved are likely consecutive pages that would show up in the same PTG. The hash is the asid exclusively or'd with address bits 18 to 28 exclusively or'd with address bits 29 to 39.

#### **Collision Handling**

Quadratic probing of the page table is used when a collision occurs. The next PTG to search is calculated as the hash plus the square of the miss count. On the first miss the PTG at the hash plus one is searched. Next the PTG at the hash plus four is searched. After that the PTG at the hash plus nine is searched, and so on.

### Finding a Match

Once the PTG to be searched is located using the hash function, which PTE to use needs to be sorted out. The match operation must include both the virtual address bits and the asid, address space identifier, as part of the test for a match. It is possible that the same virtual address is used by two or more different address spaces, which is why it needs to be in the match.

### Locality of Reference

The page table group may be cached in the system read cache for performance. It is likely that the same PTG group will be used multiple times due to the locality of reference exhibited by running software.

### **Access Rights**

To avoid duplication of data the access rights are stored in another table called the PMT for access rights table. The first time a translation is loaded the access rights are looked-up from the PMT. A bit is set in the TLB entry indicating that the access rights are valid. On subsequent translations the access rights are not looked up, but instead they are read from values cached in the TLB.

### Location of Page Table

Thor2023's hash page table is in the physical address space at \$FFAxxxxx. It is a specially dedicated block RAM memory which has two sides. One side is updateable and readable via the load hexi-byte pair

and store hexi-byte pair LDHP, STHP instructions. The other side is updateable and readable in terms of page groups by the hash page table control logic.

# Thor 2023 Hierarchical Page Table Setup

#### Page Table Entries - PTE

For hierarchical tables the structure is like that of hashed page tables except that there is no need to store the virtual address. We know the virtual address because it is what is being translated and there is no chance of collisions unlike the hash table. The structure is 96 bits in size. This allows 1024 PTEs to fit into an 16kB page. ¼ of the 16kB page is not used. Note the size of pages in the table is a configuration parameter used to build the system.

There are two types of page table entries. The first type, T=0, is a pointer to a page of memory, the second type, T=1, is an entry that points to lower-level page tables. PTE's that point to lower-level page tables are sometimes called page table pointers, PTPs.

### Page Table Entry Format – PTE

| V | LVL/BC5             | RGN <sub>3</sub> | M | A | T | S | G | $SW_2$ | CACHE <sub>4</sub> | MRWX <sub>3</sub> | HRWX3 | SRWX <sub>3</sub> | URWX3 |
|---|---------------------|------------------|---|---|---|---|---|--------|--------------------|-------------------|-------|-------------------|-------|
|   | PPN <sub>310</sub>  |                  |   |   |   |   |   |        |                    |                   |       |                   |       |
|   | PPN <sub>6332</sub> |                  |   |   |   |   |   |        |                    |                   |       |                   |       |

### Small Page Table Entry Format – SPTE

The small PTE format is used when the physical address space is less than 46-bits in size. The small PTE occupies only 64-bits. 1024 SPTEs will fit into an 8kB page.

| V | LVL/BC5 | RGN <sub>3</sub> | M | A | T | S | G | $SW_2$ | CACHE <sub>4</sub> | MRWX <sub>3</sub> | HRWX <sub>3</sub> | SRWX <sub>3</sub> | URWX <sub>3</sub> |
|---|---------|------------------|---|---|---|---|---|--------|--------------------|-------------------|-------------------|-------------------|-------------------|
|   |         |                  |   |   |   |   |   | PPN31  | .0                 |                   |                   |                   |                   |

| Field  | Size | Purpose                                      |  |  |  |  |
|--------|------|----------------------------------------------|--|--|--|--|
| PPN    | 64   | Physical page number                         |  |  |  |  |
| URWX   | 3    | User read-write-execute override             |  |  |  |  |
| SRWX   | 3    | Supervisor read-write-execute override       |  |  |  |  |
| HRWX   | 3    | Hypervisor read-write-execute override       |  |  |  |  |
| MRWX   | 3    | Machine read-write-execute override          |  |  |  |  |
| CACHE  | 4    | Cache-ability bits                           |  |  |  |  |
| A      | 1    | 1=accessed/used                              |  |  |  |  |
| M      | 1    | 1=modified                                   |  |  |  |  |
| V      | 1    | 1 if entry is valid, otherwise 0             |  |  |  |  |
| S      | 1    | 1=shared page                                |  |  |  |  |
| G      | 1    | 1=global, ignore ASID                        |  |  |  |  |
| T      | 1    | 0=page pointer, 1= table pointer             |  |  |  |  |
| RGN    | 3    | Region table index                           |  |  |  |  |
| LVL/BC | 5    | the page table level of the entry pointed to |  |  |  |  |

### **Super Pages**

The hierarchical page table allows "super pages" to be defined. These pages bypass lower levels of page tables by using an entry at a high level to represent a block containing many pages.

Normally a PTE with LVL=0 is a pointer to an 16kB memory page. However, super-pages may be defined by specifying a page pointer with a LVL greater than zero. For instance, if T=0 and LVL=1 then the page pointed to is a super-page within an 16MB block of contiguous memory.

| T=0, LVL= | Page Size  |
|-----------|------------|
| 0         | 16 kB page |
| 1         | 16 MB page |
| 2         | 16 GB page |
| 3         | 16 TB page |
| 4         | 16 EB page |
| 5         |            |
| 6         |            |
| 7         | reserved   |

A super page pointer contains both a pointer to the block of pages and a super page length field. The length field is provided to restrict memory access to an address range between the super page pointer and the super page pointer plus the number of pages specified in the length. A typical use would be to point to the system ROM which may be several megabytes and yet shorter than the maximum size of the super page.

For example, a system ROM is located 512 MB before the end of physical memory. The ROM is only 1MB in size. So, it is desired to setup a super page pointer to the ROM and restrict access to a single megabyte. The PTE for this would look like:

| V | 15                            | RGN <sub>3</sub> | M   | Α              | 0 | S | G | $SW_2$ | ~4 | MRWX | 3 HRWX3 | SRWX <sub>3</sub> | URWX <sub>3</sub> |
|---|-------------------------------|------------------|-----|----------------|---|---|---|--------|----|------|---------|-------------------|-------------------|
|   |                               |                  | NPC | $G=0x03F_{10}$ |   |   |   |        |    |      |         |                   |                   |
|   | PPN=0xFFFFFFF <sub>6332</sub> |                  |     |                |   |   |   |        |    |      |         |                   |                   |

The PTE would be pointed to by a LVL=1 pointer resulting in a 16MB super-page size. 512MB is 32 pages before the end of memory, reflected in the value  $0x3FFFE0_{22}$  for the PPN above. There are 64 x 16kB pages in 1MB so the length field, NPG, is set to  $0x03f_{10}$ .

#### PTE Format for 16MB page

| V | 15                  | RGN <sub>3</sub> | M | Α  | 0 | S                 | G | $SW_2$ | ~4 | MRWX <sub>3</sub> | HRWX <sub>3</sub> | SRWX <sub>3</sub> | URWX <sub>3</sub> |
|---|---------------------|------------------|---|----|---|-------------------|---|--------|----|-------------------|-------------------|-------------------|-------------------|
|   |                     |                  |   | PP |   | NPG <sub>10</sub> |   |        |    |                   |                   |                   |                   |
|   | PPN <sub>6332</sub> |                  |   |    |   |                   |   |        |    |                   |                   |                   |                   |

#### PTE Format for 16GB page

| V                                     | 25                  | RGN <sub>3</sub> | M | Α | 0 | S | G | $SW_2$ | ~4 | MRWX <sub>3</sub> | HRWX <sub>3</sub> | SRWX <sub>3</sub> | URWX <sub>3</sub> |
|---------------------------------------|---------------------|------------------|---|---|---|---|---|--------|----|-------------------|-------------------|-------------------|-------------------|
| PPN <sub>3120</sub> NPG <sub>20</sub> |                     |                  |   |   |   |   |   |        |    |                   |                   |                   |                   |
|                                       | PPN <sub>6332</sub> |                  |   |   |   |   |   |        |    |                   |                   |                   |                   |

# TLB – Translation Lookaside Buffer

### **Overview**

A simple page map is limited in the translations it can perform because of its size. The solution to allowing more memory to be mapped is to use main memory to store the translations tables.

However, if every memory access required two or three additional accesses to map the address to a final target access, memory access would be quite slow, slowed down by a factor or two or three, possibly more. To improve performance, the memory mapping translations are stored in another unit called the TLB standing for Translation Lookaside Buffer. This is sometimes also called an address translation cache ATC. The TLB offers a means of address virtualization and memory protection. A TLB works by caching address mappings between a real physical address and a virtual address used by software. The TLB deals with memory organized as pages. Typically, software manages a paging table whose entries are loaded into the TLB as translations are required.

The TLB is a cache specialized for address translations. Thor2023's TLB is quite large being six-way associative with 1024 entries per way. This choice of size was based on the minimum number of block RAMs that could be used to implement the TLB. On a TLB miss the page table is searched for a translation and if found the translation is stored in one of the ways of the TLB. The way selected is determined either randomly or in a least-recently-used fashion as one of the first four ways. The last way may not be updated automatically by a page table search, it must be updated by software.

# Size / Organization

The TLB has 1024 entries per set. The size was chosen as it is the size of one block ram for 32-bit data in the FPGA. This is quite a large TLB. Many systems use smaller TLBs. Typically, systems vary between 64 and 1024 entries. There is not really a need for such a large one, however it is available.

The TLB is organized as a six-way set associative cache. The last way may only be updated by software. The last way allows translations to be stored that will not be overwritten. The first four ways may use hardware LRU replacement in addition to fixed or random replacement.

| Way | Page size  |
|-----|------------|
| 0   | 16kB pages |
| 1   | 16kB pages |
| 2   | 16kB pages |
| 3   | 16kB pages |
| 4   | 16MB pages |
| 5   | 16kB pages |

Note that 16MB pages do not need multiple ways as there are sufficient TLB entries to allow distinct entries for each 16MB page if the virtual address space is 34-bits or less.

### **TLB Entries - TLBE**

Closely related to page table entries are translation look-aside buffer, TLB, entries. TLB entries have additional fields to match against the virtual address. The count field is used to invalidate the entire TLB. Note that the least significant 10-bits of the virtual address are not stored as these bits are used as an index for the TLB entry.

| Count <sub>6</sub> | LRU <sub>3</sub> |
|--------------------|------------------|
| Count              | <b>-110</b> 3    |

| V | LVL/BC <sub>5</sub> | RGN <sub>3</sub> | M | Α | T | S | G | $SW_2$ | CACHE <sub>4</sub> | $MRWX_3$ | HRWX <sub>3</sub> | SRWX <sub>3</sub> | URWX <sub>3</sub> |
|---|---------------------|------------------|---|---|---|---|---|--------|--------------------|----------|-------------------|-------------------|-------------------|
|   | PPN <sub>310</sub>  |                  |   |   |   |   |   |        |                    |          |                   |                   |                   |
|   | PPN <sub>6332</sub> |                  |   |   |   |   |   |        |                    |          |                   |                   |                   |

|    | VPN <sub>4110</sub> |    |                      |  |  |  |  |  |  |
|----|---------------------|----|----------------------|--|--|--|--|--|--|
|    | VPN <sub>7342</sub> |    |                      |  |  |  |  |  |  |
| ~4 | ASID <sub>110</sub> | ~5 | VPN <sub>83 73</sub> |  |  |  |  |  |  |

### **Small TLB Entries - TLBE**

The small TLB is used for the test system which contains only 512MB of physical RAM to conserve hardware resources. The address ranges are more limited, 40-bits for the physical address and 70-bits for the virtual address.

| Count <sub>6</sub> | LRU <sub>3</sub> |
|--------------------|------------------|
|--------------------|------------------|

| V | LVL/BC5    | RGN <sub>3</sub> | M | A | T | S | G | $SW_2$ | CACHE <sub>4</sub> | MRWX <sub>3</sub> | HRWX <sub>3</sub> | SRWX <sub>3</sub> | URWX3 |
|---|------------|------------------|---|---|---|---|---|--------|--------------------|-------------------|-------------------|-------------------|-------|
|   | <b>~</b> 6 |                  |   |   |   |   |   |        | PPN <sub>250</sub> |                   |                   |                   |       |

| VPN <sub>4110</sub> |                     |    |   |          |  |  |  |  |  |
|---------------------|---------------------|----|---|----------|--|--|--|--|--|
| ~4                  | ASID <sub>110</sub> | PS | ~ | VPN55 42 |  |  |  |  |  |

### What is Translated?

The TLB processes addresses including both instruction and data addresses for all modes of operation. It is known as a *unified* TLB.

### **Page Size**

Because the TLB caches address translations it can get away with a much smaller page size than the page map can for a larger memory system. 4kB is a common size for many systems. There are some indications in contemporary documentation that a larger page size would be better. In this case the TLB uses 16kB. For a 512MB system (the size of the memory in the test system) there are 32768 16kB pages.

# Ways

The first four ways in the TLB are reserved for 16kB page translations. The next way, 4 is reserved for 16MB page translations. The last way is reserved for fixed translations of 16kB pages.

### **Management**

The TLB unit may be updated by either software or hardware. This is selected in the page table base register. If software miss handling is selected when a translation miss occurs, an exception is generated to allow software to update the TLB. It is left up to software to decide how to update the TLB. There may be a set of hierarchical page tables in memory, or there could be a hash table used to store translations.

# **Accessing the TLB**

A TLB entry contains too much information to be updated with a single register write. Since the information must also be updated atomically to ensure correct operation, the TLB update occurs in an indirect fashion. First holding registers are loaded with the desired values, then all the holding registers are written to the TLB in a single atomic cycle. The TLB is addressed in the physical memory space in the address range \$F...FE000xx. There are eight buckets which must be filled with TLB info using store instructions. Then address \$F...FE0007E is written to causing the TLB to be updated.

The low order bits of the bucket six determine which way to update in the TLB if the algorithm is a fixed way algorithm. Otherwise, if LRU is selected the LRU entry will be updated, otherwise a way to update will be selected randomly. The data is octa-byte aligned.

| 00       | TLBE (PTE <sub>630</sub> )      |  |  |     |  |  |                                  |   |                         |   |                  |  |
|----------|---------------------------------|--|--|-----|--|--|----------------------------------|---|-------------------------|---|------------------|--|
| 08       | TLBE (PTE <sub>9564</sub> )     |  |  |     |  |  |                                  |   |                         |   |                  |  |
| 10       | TLBE (VPN <sub>630</sub> )      |  |  |     |  |  |                                  |   |                         |   |                  |  |
| 18       | TLBE (VPN <sub>9564</sub> )     |  |  |     |  |  |                                  |   |                         |   |                  |  |
| 20       | TLB Miss Address <sub>630</sub> |  |  |     |  |  |                                  |   |                         |   |                  |  |
| 28       | ~4 Miss ASID <sub>12</sub>      |  |  | ~16 |  |  | TLB Miss Address <sub>9564</sub> |   |                         |   |                  |  |
| 30 to 68 |                                 |  |  |     |  |  |                                  |   |                         |   |                  |  |
| 70       |                                 |  |  |     |  |  | $AL_2$                           | 0 | Entry Num <sub>10</sub> | ~ | Way <sub>4</sub> |  |
| 78       | RWTRIG WTRIG RTRIG ~8 ~32       |  |  |     |  |  |                                  |   |                         |   |                  |  |

| ADR        |                     |
|------------|---------------------|
| 7C         | No operation        |
| 7D         | Read TLBE           |
| <b>7</b> E | Write TLBE          |
| <b>7</b> F | Read and Write TLBE |

### ?RWX<sub>3</sub>

If RWX3 attributes are specified non-zero, then they will override the attributes coming from the region table. Otherwise RWX attributes are determined by the region table.

### CACHE<sub>4</sub>

The cache<sub>4</sub> field is combined with the cache attributes specified in the region table. The region table takes precedence; however, if the cache<sub>4</sub> field indicates non-cache-ability then the data will not be cached.

### Example TLB Update Routine

```
_TLBMap:
       1d64
                      a0,0[sp]
       1d64
                      a1,8[sp]
       1d64
                      a2,16[sp]
       ld64
                      a3,24[sp]
       ; <lock TLB update semaphore>
       st64
                      a0,0xFFE00000
                                                           #TLBE value
       st64
                      a1,0xFFE00008
                                                           #TLBE value
                                                           #TLBE value
                      a2,0xFFE00010
       st64
       st64
                      a3,0xFFE00070
                                                           # control
       st8
                      a0.0xFFE0007E
                                                           # triggers a TLB update
       ; <unlock TLB update semaphore>
       add
                      sp,sp,32
       rts
```

# **TLB Entry Replacement Policies**

The TLB supports three algorithms for replacement of entries with new entries on a TLB miss. These are fixed replacement (0), least recently used replacement (1) and random replacement (2). The replacement method is stored in the  $AL_2$  bits of the page table base register.

For fixed replacement, the way to update must be specified by a software instruction. Least recently used replacement, LRU, selects the least recently used address translation to be overwritten. Random replacement chooses a way to replace at random.

### Flushing the TLB

The TLB maintains the address space (ASID) associated with a virtual address. This allows the TLB translations to be used without having to flush old translations from the TLB during a task switch.

### Reset

On a reset the TLB is preloaded with translations that allow access to the system ROM.

#### Global Bit

In addition to the ASID the TLB entries contain a bit that indicates that the translation is a global translation and should be present in every address space.

### Card Table

### **Overview**

Also present in the memory system is the Card table. The card table is a telescopic memory which reflects with increasing detail where in the memory system a pointer write has occurred. This is for the benefit of garbage collection systems. Card table is updated using a write barrier when a pointer value is stored to memory, or it may be updated automatically using the STPTR instruction.

# **Organization**

At the lowest level memory is divided into 256-byte card memory pages. Each card has a single byte recording whether a pointer store has taken place in the corresponding memory area. To cover a 512MB memory system 2MB card memory is required at the outermost layer. A byte is used rather than a bit to allow byte store operations to update the table directly without having to resort to multiple instructions to perform a bit-field update.

To improve the performance of scanning a hardware card table, HCT, is present which divides memory at an upper level into 8192-byte pages. The hardware card table indicates if a pointer store operation has taken place in one of the 8192-byte pages. It is then necessary to scan only cards representing the 8192-byte page rather than having to scan the entire 2MB card table. Note that this memory is organized as 2048 32-bit words. Allowing 32-bits at a time to be tested.

To further improve performance a master card table, MCT, is present which divides memory at the uppermost layer into 16-MB pages.

| Layer | Resolving Power |             |  |  |  |
|-------|-----------------|-------------|--|--|--|
| 0     | 2 MB            | 256B pages  |  |  |  |
| 1     | 64k bits        | 8kB pages   |  |  |  |
| 2     | 32 bits         | 16 MB pages |  |  |  |

There is only a single card memory in the system, used by all tasks.

### Location

Card memory must be based at physical address zero, extending up to the amount of card memory required. This is so that the address calculation of the memory update may be done with a simple right-shift operation.

# **Operation**

As a program progresses it writes pointer values to memory using the write barrier. Storing a pointer triggers an update to all the layers of card memory corresponding to the main memory location written. A bit or byte is set in each layer of the card memory system corresponding to the memory location of the pointer store.

The garbage collection system can very quickly determine where pointer stores have occurred and skip over memory that has not been modified.

# **Sample Write Barrier**

- ; Milli-code routine for garbage collect write barrier.
- ; This sequence is short enough to be used in-line.
- ; Three level card memory.
- ; a2 is a register pointing to the card table.
- ; STPTR will cause an update of the master card table, and hardware card table.

:

#### GCWriteBarrier:

STPTR a0,[a1] ; store the pointer value to memory at a1

LSR t0,a1,#8 ; compute card address ST8 r0,[a2+t0] ; clear byte in card memory

# System Memory Map

There are several components to the system which use tables in memory. These tables are statically allocated at the time the system is built. The table sizes depend on the size of main memory. The card memory table must be located at address zero. So, it is probably best to group the tables together at the low end of memory.

| Address                  | Usage                |  |
|--------------------------|----------------------|--|
| \$00000000 to \$001FFFFF | Card Table (2 MB)    |  |
| \$00210000 to \$0022FFFF | PAM (128kB 2 copies) |  |
| \$00280000 to \$0029FFFF | Key memory (128 kB)  |  |
|                          |                      |  |

# Instruction Set Overview

Thor was a variable length instruction set with instructions varying in length from one to eight bytes. Thor2023 is primarily a fixed length instruction with provision for additional instruction words used for constants. Reducing the variety of instruction sizes makes implementation of decoders more economical.

# **Instruction Descriptions**

# Opcode Maps

# **Major Opcode**

|    | 0        | 1       | 2     | 3     | 4      | 5     | 6       | 7         |
|----|----------|---------|-------|-------|--------|-------|---------|-----------|
| 0x | 0        | 1       | 2     | 3     | 4      | 5     | 6       | 7         |
|    | TRAP     |         | {R2}  | {CSR} | ADDI   | CMPI  | MULI    | DIVI      |
|    | 8        | 9       | 10    | 11    | 12     | 13    | 14      | 15        |
|    | ANDI     | ORI     | EORI  | CHK   | {FLT2} | {BIT} | {SHIFT} | FMA       |
| 1x | 16       | 17      | 18    | 19    | 20     | 21    | 22      | 23        |
|    | LOAD     | LOADZ   | STORE | BMAP  | FADDI  | FCMPI | FMULI   | FDIVI     |
|    | 24       | 25      | 26    | 27    | 28     | 29    | 30      | 31        |
|    | JSR, JMP | CMPXCHG | {AMO} | BccI  | Bcc    | DBcc  |         | PFX / NOP |

# **{R2} Operations**

|    | 0         | 1      | 2       | 3        | 4       | 5        | 6         | 7        |  |
|----|-----------|--------|---------|----------|---------|----------|-----------|----------|--|
| 0x | 0         | 1      | 2       | 3        | 4       | 5        | 6         | 7        |  |
|    | CNTLZ     |        | CNTPOP  | ABS      | ADD     | CMP      | MUL       | DIV      |  |
|    | 8         | 9      | 10      | 11       | 12      | 13       | 14        | 15       |  |
|    | AND       | OR     | EOR     |          |         | CHRNDX   | CLMUL     | SQRT     |  |
| 1x | 16        | 17     | 18      | 19       | 20      | 21       | 22        | 23       |  |
|    | DIF       | PTRDIF | REVBIT  | BMAP     |         |          | SM4ED     | SM4KS    |  |
|    | 24        | 25     | 26      | 27       | 28      | 29       | 30        | 31       |  |
|    | JMP / JSR |        | AES64DS | AES64DSM | AES64ES | AES64ESM | AES64KS1I | AES64KS2 |  |
| 2x | 32        | 33     | 34      | 35       | 36      | 37       | 38        | 39       |  |
|    | PRED      | CARRY  | VMASK   | ATOM     | ROUND   |          |           |          |  |
|    | 40        | 41     | 42      | 43       | 44      | 45       | 46        | 47       |  |
|    | V2BITS    | BITS2V | VEX     | VEINS    | VGNDX   |          |           |          |  |
| 3x | 48        | 49     | 50      | 51       | 52      | 53       | 54        | 55       |  |
|    | MIN       | MAX    | BMM     | MUX      |         | AES64IM  | SM3P0     | SM3P1    |  |
|    | 56        | 57     | 58      | 59       | 60      | 61       | 62        | 63       |  |
|    | SHA256    | SHA256 | SHA256  | SHA256   | SHA512  | SHA512   | SHA512    | SHA512   |  |
|    | SIG0      | SIG1   | SUM0    | SUM1     | SIG0    | SIG1     | SUM0      | SUM1     |  |

# ${BIT}$

|    | 0    | 1    | 2    | 3    | 4    | 5    | 6  | 7  |
|----|------|------|------|------|------|------|----|----|
| 0x | 0    | 1    | 2    | 3    | 4    | 5    | 6  | 7  |
|    | ASL  | ASR  | LSL  | LSR  | ROL  | ROR  |    |    |
|    | 8    | 9    | 10   | 11   | 12   | 13   | 14 | 15 |
|    | ASLI | ASRI | LSLI | LSRI | ROLI | RORI |    |    |

# {SHIFT}

|    | 0      | 1      | 2    | 3    | 4    | 5    | 6  | 7  |
|----|--------|--------|------|------|------|------|----|----|
| 0x | 0      | 1      | 2    | 3    | 4    | 5    | 6  | 7  |
|    | ASL    | ASR    | LSL  | LSR  | ROL  | ROR  |    |    |
|    | 8      | 9      | 10   | 11   | 12   | 13   | 14 | 15 |
|    | ASLI   | ASRI   | LSLI | LSRI | ROLI | RORI |    |    |
| 1x | 16     | 17     | 18   | 19   | 20   | 21   | 22 | 23 |
|    | VSHLV  | VSHRV  |      |      |      |      |    |    |
|    | 24     | 25     | 26   | 27   | 28   | 29   | 30 | 31 |
|    | VSHLVI | VSHRVI |      |      |      |      |    |    |

# **(FLT2) Operations**

|    | 0            | 1           | 2         | 3         | 4         | 5         | 6         | 7         |
|----|--------------|-------------|-----------|-----------|-----------|-----------|-----------|-----------|
| 0x | 0<br>FSCALEB | 1<br>(ELT1) | 2<br>EMIN | 3<br>EMAY | 4<br>EADD | 5<br>ECMD | 6<br>EMIH | 7<br>EDIV |
|    | FSCALED      | {FLT1}      | FMIN      | FMAX      | FADD      | FCMP      | FMUL      | FDIV      |
|    | 8            | 9           | 10        | 11        | 12        | 13        | 14        | 15        |
|    | FSEQ         | FSLT        | FSLE      | FSNE      |           |           | FNXT      | FREM      |
| 1x | 16           |             |           |           |           |           |           |           |
|    |              |             |           |           |           |           |           |           |
|    | 24           |             |           |           |           |           |           |           |
|    |              |             |           |           |           |           |           |           |

# **(FLT1) Operations**

|    | 0          | 1          | 2          | 3          | 4  | 5            | 6            | 7          |
|----|------------|------------|------------|------------|----|--------------|--------------|------------|
| 0x | 0          | 1          | 2<br>FOTI  | 3<br>ITOF  | 4  | 5            | 6<br>ESIGN   | 7<br>ESIC  |
|    | 8          | 9          | 10         | 110F       | 12 | 13           | FSIGN<br>14  | FSIG<br>15 |
|    | 6<br>FSQRT | FS2D       | FS2T       | FD2T       | 12 | 15           | ISNAN        | FINITE     |
| 1x | 16         | 17         | 18         | 19         | 20 | 21<br>FTRUNC | 22           | 23<br>FRES |
|    | 24         | 25<br>FD2S | 26<br>FT2S | 27<br>FT2D | 28 | 29           | 30<br>FCLASS | 31         |
| 2x | 32<br>FABS | 33         | 34<br>FNEG | 35         | 36 | 37           | 38           | 39         |
|    | 40         |            |            |            |    |              |              |            |
| 3x | 48         |            |            |            |    |              |              |            |
|    | 56         |            |            |            |    |              |              |            |

# **(AMO) Operations**

|    | 0     | 1   | 2    | 3   | 4    | 5    | 6    | 7    |
|----|-------|-----|------|-----|------|------|------|------|
| 0x | 0     | 1   | 2    | 3   | 4    | 5    | 6    | 7    |
|    | SWAP  |     | MIN  | MAX | ADD  |      | ASL  | LSR  |
|    | 8     | 9   | 10   | 11  | 12   | 13   | 14   | 15   |
|    | AND   | OR  | EOR  |     | MINU | MAXU |      | CAS  |
| 1x | 16    | 17  | 18   | 19  | 20   | 21   | 22   | 23   |
|    | SWAPI |     | MIN  | MAX | ADDI |      | ASLI | LSRI |
|    | 24    | 25  | 26   | 27  | 28   | 29   | 30   | 31   |
|    | ANDI  | ORI | EORI |     | MINU | MAXU |      | CAS  |

## **Operand Swapping**

Many instructions allow first and second source operands to be swapped. This is indicated by the swap 'S' bit in the instruction. This is particularly useful for instructions that are non-commutative like SUB and DIV.

#### Operand Swap

| Operand Order       | S |
|---------------------|---|
| Normal              | 0 |
| 1st and 2nd Swapped | 1 |

## **Operand Sizes**

Many instructions support four different operand sizes: byte, wyde, tetra and octa. The operand size is selected by suffixing the mnemonic with 'b' for byte, 'w' for wyde, 't' for tetra and 'o' for octa.

| $Sz_3$ | Ext. | Operand      |
|--------|------|--------------|
| 0      | .b   | 8-bit Byte   |
| 1      | .W   | 16-bit Wyde  |
| 2      | .t   | 32-bit Tetra |
| 3      | .0   | 64-bit Octa  |
| 4      | .c   | 24-bit       |
| 5      | .p   | 40-bit Penta |
| 6      | .n   | 96-bit       |
| 7      |      | reserved     |

# **Arithmetic Operations**

# Representations





## Int:64



## Int:40



## Int:32



## Int:16



## Int:8

## **ABS – Absolute Value**

## **Description:**

This instruction computes the absolute value of the contents of the source operand and places the result in Rt.

Supported Operand Sizes: .b, .w, .t, .o

**Integer Instruction Format: R2** 

## ABS Rt, Ra – Register direct

|   | 39  | 34   | 3332   | 31   | 30 | 29 | 28 | 23 | 22 | 21 | 16    | 15 | 14 | 9 | 8                                     | 7 5    | 4 | 0 |
|---|-----|------|--------|------|----|----|----|----|----|----|-------|----|----|---|---------------------------------------|--------|---|---|
|   | 3   | 6    | ~2     | 0    | 0  | 0  | 0  | 6  | Sa | R  | $a_6$ | St | Rt | 6 | V                                     | $Sz_3$ | 2 | 5 |
| , | Clo | ck ( | Cycles | s: 1 |    |    |    |    |    |    |       |    |    |   | · · · · · · · · · · · · · · · · · · · |        |   |   |

### **Operation:**

$$If Ra < 0$$

$$Rt = -Ra$$

$$else$$

$$Rt = Ra$$

**Execution Units:** Integer ALU #0

**Clock Cycles: 1** 

Exceptions: none

## **ADD - Addition**

## **Description:**

Add two source operands and place the sum in the target register. All registers are treated as integer registers. Arithmetic is signed twos-complement values unless the decimal mode flag is set in which case values are treated as BCD numbers. This instruction may be used with the CARRY modifier to perform extended precision addition.

**Supported Operand Sizes:** .b, .w, .t, .o

**Operation:** 

Rt = Ra + Rb or Rt = Ra + Imm

**Clock Cycles:** 

**Execution Units:** All Integer ALU's

Exceptions: none

**Notes:** 

**Instruction Formats:** 

ADD Rt, Ra, Rb - Register direct



ADD Rt,Ra,Imm<sub>16</sub>



## AND - Bitwise And

### **Description:**

Bitwise 'and' two source operands and place the result in the target register. The one's complement of operands may be used by setting the appropriate 'S' bit in the instruction.

Supported Operand Sizes: .b, .w, .t, .o, .c, .p, .n

**Clock Cycles: 1** 

**Operation:** 

Rt = Ra & Rb or Rt = Ra & Imm

**Instruction Formats:** 

AND Rt, Ra, Rb - Register direct



·

AND Rt,Ra,Imm<sub>16</sub>



**Clock Cycles: 1** 

**Execution Units:** All Integer ALU's

Exceptions: none

## BMAP – Byte Map

### **Description:**

First the target register is cleared, then bytes are mapped from the 12-byte source Ra into bytes in the target register. This instruction may be used to permute the bytes in register Ra and store the result in Rt. This instruction may also pack bytes, wydes or tetras. The map is determined by the low order 48-bits of register Rb or a 48-bit immediate constant. Bytes which are not mapped will end up as zero in the target register.

#### **Instruction Formats:**

#### BMAP Rt, Ra, Rb - Register direct

|   | 39  | 34   | 3332   | 31 | 30 | 29 | 28 | 23    | 22 | 21 | 16    | 15 | 14 | 9 | 8 | 7 5    | 4 | 0 |
|---|-----|------|--------|----|----|----|----|-------|----|----|-------|----|----|---|---|--------|---|---|
|   | 19  | 96   | ~2     | 0  | Vb | Sb | R  | $b_6$ | Sa | R  | $a_6$ | St | Rt | 6 | V | $Sz_3$ | 2 | 5 |
| , | Clo | ck ( | Cycles |    |    |    |    |       |    |    |       |    |    |   |   |        |   |   |

#### BMAP Rt,Ra,Imm<sub>48</sub>

| 39 |                          | 33 | 32  | 31 | 23 | 22 | 21 | 16              | 15  | 14   | 9 | 8 | 7 5             | 4  | 0  |
|----|--------------------------|----|-----|----|----|----|----|-----------------|-----|------|---|---|-----------------|----|----|
|    | ~7                       |    | S   |    | ~9 | Sa | R  | .a <sub>6</sub> | St  | Rt   | 6 | V | Sz <sub>3</sub> | 19 | 95 |
|    | Immediate <sub>310</sub> |    |     |    |    |    |    |                 |     |      |   |   |                 |    | 15 |
|    |                          |    | ~16 |    |    |    |    |                 | Imm | 4732 |   |   | 13              | 3  | 15 |

**Clock Cycles: 1** 

**Operation:** 

**Vector Operation** 

**Execution Units:** First Integer ALU

**Clock Cycles: 1** 

Exceptions: none

## **CHK – Check Register Against Bounds**

#### **Description**:

A register is compared to two values. If the register is outside of the bounds defined by Rb and an immediate value then an exception will occur. Ra must be greater than or equal to Rb and Ra must be less than the immediate.

#### **Instruction Formats:**

#### CHK Ra, Rb, Cn - Register direct



| cn <sub>3</sub> | exception when not                       |
|-----------------|------------------------------------------|
| 0               | Ra >= Rb and $Ra < Rc$                   |
| 1               | Ra >= Rb and $Ra <= Rc$                  |
| 2               | Ra > Rb and Ra < Rc                      |
| 3               | Ra > Rb and Ra <= Rc                     |
| 4               | not $(Ra \ge Rb \text{ and } Ra < Rc)$   |
| 5               | not $(Ra \ge Rb \text{ and } Ra \le Rc)$ |
| 6               | not (Ra > Rb and Ra < Rc)                |

not  $(Ra > Rb \text{ and } Ra \le Rc)$ 

#### CHKI Ra, Imm, Cn



**Clock Cycles: 1** 

| cn <sub>3</sub> | exception when not                        |
|-----------------|-------------------------------------------|
| 0               | Ra >= Rb and $Ra < Imm$                   |
| 1               | Ra >= Rb and $Ra <= Imm$                  |
| 2               | Ra > Rb and Ra < Imm                      |
| 3               | Ra > Rb and Ra <= Imm                     |
| 4               | not ( $Ra >= Rb$ and $Ra < Imm$ )         |
| 5               | not $(Ra \ge Rb \text{ and } Ra \le Imm)$ |
| 6               | not (Ra > Rb and Ra < Imm)                |
| 7               | not (Ra > Rb and Ra <= Imm)               |

Clock Cycles: 1

**Execution Units:** Integer ALU

Exceptions: bounds check

**Notes:** 

The system exception handler will typically transfer processing back to a local exception handler.

## **CLMUL – Carry-less Multiply**

## **Description**:

Compute the low order product bits of a carry-less multiply.

#### **Instruction Formats:**

#### CLMUL Rt, Ra, Rb



## **Clock Cycles: 4**

### CLMUL Rt,Ra,Imm<sub>8</sub>



Exceptions: none

**Execution Units: First Integer ALU** 

Operations

$$Rt = Ra * Rb$$

#### **Vector Operation**

for 
$$x = 0$$
 to  $VL - 1$  
$$if (Vm[x]) \ Vt[x] = Va[x] * Vb[x]$$
 
$$else \ if (z) \ Vt[x] = 0$$
 
$$else \ Vt[x] = Vt[x]$$

Exceptions: none

## **CMP - Comparison**

## **Description:**

Compare two source operands and place the result in the target register. The result is a vector identifying the relationship between the two source operands as signed and unsigned integers.

Supported Operand Sizes: .b, .w, .t, .o, .c, .p, .n

**Operation:** 

Rt = Ra? Rb or Rt = Ra? Imm or Rt = Imm? Ra

**Clock Cycles:** 1

**Execution Units:** All Integer ALU's

Exceptions: none

**Notes:** 

**Instruction Formats:** 

ADD Rt, Ra, Rb - Register direct



**Clock Cycles: 1** 

ADD Rt,Ra,Imm<sub>15</sub>



| Rt bit | Mnem.   | Meaning                        | Test |
|--------|---------|--------------------------------|------|
|        |         | Integer Compare Results        |      |
| 0      | EQ      | = equal                        |      |
| 1      | LT      | < less than                    |      |
| 2      | LE      | <= less than or equal          |      |
| 3      | LO/CS   | < unsigned less than           |      |
| 4      | LS      | <= unsigned less than or equal |      |
| 5      | AND     | And                            |      |
| 6      |         |                                |      |
| 7      |         |                                |      |
| 8      | NE      | <> not equal                   |      |
| 9      | GE      | >= greater than or equal       |      |
| 10     | GT      | > greater than                 |      |
| 11     | HS / CC | unsigned greater than or equal |      |
| 12     | HI      | unsigned greater than          |      |
| 13     | NAND    | nand                           |      |
| 14     | NOR     | Nor                            |      |
| 15     | SR      | Branch subroutine              |      |

## **CNTLZ – Count Leading Zeros**

### **Description:**

This instruction counts the number of consecutive zero bits beginning at the most significant bit towards the least significant bit.

Supported Operand Sizes: .b, .w, .t, .o

**Integer Instruction Format: R1** 

CNTLZ Rt, Ra, Rb – Register direct



**Clock Cycles: 1** 

**Operation:** 

**Execution Units:** Integer ALU #0

**Clock Cycles: 1** 

Exceptions: none

**Notes:** 

## **CNTPOP – Count Population**

### **Description:**

This instruction counts the number of bits set in a register.

Supported Operand Sizes: .b, .w, .t, .o

**Integer Instruction Format: R1** 

CNTPOP Rt, Ra, Rb - Register direct



**Operation:** 

**Execution Units:** Integer ALU #0

**Clock Cycles: 1** 

Exceptions: none

# **CSR – Control and Special Registers Operations**

## **Description:**

Perform an operation on a CSR.

| Operation                          | Op <sub>3</sub> |  |
|------------------------------------|-----------------|--|
| Read CSR                           | 0               |  |
| Write CSR                          | 1               |  |
| Or to CSR (set bits)               | 2               |  |
| And complement to CSR (clear bits) | 3               |  |
| Exclusive Or to CSR (flip bits)    | 4               |  |

**Supported Operand Sizes:** N/A

| Regno        |          |                                       |
|--------------|----------|---------------------------------------|
| \$000        | reserved | Not used                              |
| \$002        | sr       | Status register (privileged)          |
| \$120        | Tick     | Tick count (read only)                |
| \$121 Coreno |          | Core number ( read only) (privileged) |
| \$127        |          |                                       |

#### **Instruction Formats:**

OR Rt, Ra, CSR

ANDC Rt, Ra, CSR

EOR Rt, Ra, CSR

CSR Rt,Ra,#Regno<sub>12</sub>



### CSR Rt, #imm, #Regno<sub>12</sub>

| 3938 | 37   | 32               | 31 | 30 23               | 22               | 16 | 15 | 14  | 9 | 8 | 7 5             | 4 | 0 |
|------|------|------------------|----|---------------------|------------------|----|----|-----|---|---|-----------------|---|---|
| 1    | Regn | O <sub>138</sub> | S  | Regno <sub>70</sub> | Imm <sub>7</sub> |    | St | Rte | 5 | V | Op <sub>3</sub> | 3 | 5 |

## **DIVS – Signed Division**

### **Description:**

Divide source dividend operand by divisor operand and place the quotient in the target register. All registers are integer registers. Arithmetic is signed twos-complement values.

**Supported Operand Sizes:** .b, .w, .t, .o

## **Operation:**

Rt = Ra / Rb or Rt = Ra / Imm or Rt = Imm / Ra

#### **Instruction Formats:**

#### DIVS Rt, Ra, Rb - Register direct



### DIVS Rt,Ra,Imm<sub>16</sub>



Clock Cycles: 100

**Execution Units:** All Integer ALU's

Exceptions: none

## **DIVU – Unsigned Division**

### **Description:**

Divide source dividend operand by divisor operand and place the sum in the target register. All registers are integer registers. Arithmetic is unsigned twos-complement values.

Immediate mode is not available for this instruction.

Supported Operand Sizes: .b, .w, .t, .o

### **Operation:**

Rt = Ra / Rb or Rt = Ra / Imm or Rt = Imm / Ra

#### **Instruction Formats:**

#### DIVU Rt, Ra, Rb - Register direct

| 7 <sub>6</sub> 1 <sub>2</sub> 0 Vb Sb Rb <sub>6</sub> Sa Ra <sub>6</sub> St Rt <sub>6</sub> V Sz <sub>3</sub> 2 <sub>5</sub> | 39 | 34 | 3332 | 31 | 30 | 29 | 28 | 23    | 22 | 21 | 16    | 15 | 14 | 9     | 8 | 7 5    | 4 | 0  |
|------------------------------------------------------------------------------------------------------------------------------|----|----|------|----|----|----|----|-------|----|----|-------|----|----|-------|---|--------|---|----|
| 70 12 0 70 50 100 50 100 7 525 25                                                                                            | 7  | 6  | 12   | 0  | Vb | Sb | R  | $b_6$ | Sa | R  | $a_6$ | St | R  | $t_6$ | V | $Sz_3$ | 2 | ·5 |

Clock Cycles: 100

**Execution Units:** All Integer ALU's

Exceptions: none

## **EOR – Bitwise Exclusive Or**

## **Description:**

Bitwise exclusive 'or' two source operands and place the sum in the target register. All registers are integer registers. Arithmetic is signed twos-complement values.

Supported Operand Sizes: .b, .w, .t, .o, .c, .p, .n

### **Operation:**

$$Rt = Ra \wedge Rb \text{ or } Rt = Ra \wedge Imm$$

#### **Instruction Formats:**

#### EOR Rt, Ra, Rb - Register direct



### EOR Rt,Ra,Imm<sub>16</sub>



**Clock Cycles: 1** 

**Execution Units:** All Integer ALU's

Exceptions: none

## **ENOR – Bitwise Exclusive Nor**

## **Description:**

Bitwise exclusive 'nor' two source operands and place the result in the target register.

Supported Operand Sizes: .b, .w, .t, .o, .c, .p, .n

**Operation:** 

$$Rt = (Ra \land Rb) \text{ or } Rt = (Ra \land Imm)$$

**Instruction Formats:** 

ENOR Rt, Ra, Rb - Register direct



ENOR Rt,Ra,Imm<sub>16</sub>



**Clock Cycles: 1** 

**Clock Cycles: 1** 

**Execution Units:** All Integer ALU's

Exceptions: none

## **PFX – Constant Postfix**

### **Description:**

The PFX instruction postfix is used to build large constants for use in the preceding instruction as the immediate constant for the instruction. There are three postfix instructions which extend the constant from different bit locations. They should be used in the order PFX0, PFX1. A postfix may be omitted if the omitted bits match what would be included.

Postfixes are normally caught at the decode stage and do not progress further in the pipeline. They are treated as a NOP instruction.

## **Supported Operand Sizes:** N/A

#### **Instruction Format:**

This format extends the constant from bit 0 with the 32 bits specified in the instruction and sign extends the value to the width of the constant prefix buffer.

| 39                      | 8 | 7 5     | 4   | 0 |
|-------------------------|---|---------|-----|---|
| Immediate <sub>32</sub> |   | $0_{3}$ | 315 |   |

#### **Instruction Format:**

This format extends the previous constant value by 32 bits beginning at bit 32 and sign extends the value to the width of the machine.

|   | 39                      | 3 | 7 5 | 4 | 0  |
|---|-------------------------|---|-----|---|----|
| Ī | Immediate <sub>32</sub> |   | 13  | 3 | 15 |

#### **Instruction Format:**

This format extends the previous constant value by 32 bits beginning at bit 64 and sign extends the value to the width of the machine.

| 39                      | 7 5 | 4 0 |
|-------------------------|-----|-----|
| Immediate <sub>32</sub> | 23  | 315 |

## **MULS – Multiply Signed**

## **Description:**

Multiply two source operands and place the sum in the target register. All registers are treated as integer registers. Arithmetic is signed twos-complement values. The 'S' flag indicates to perform an unsigned multiply.

Supported Operand Sizes: .b, .w, .t, .o

**Operation:** 

Rt = Ra \* Rb or Rt = Ra \* Imm

**Instruction Formats:** 

MULS Rt, Ra, Rb – Register direct



Clock Cycles: 12

MULS Rt,Ra,Imm<sub>16</sub>



**Clock Cycles: 12** 

Clock Cycles: 12

**Execution Units:** All Integer ALU's

**Exceptions:** none

## **MULU – Unsigned Multiplication**

## **Description:**

Multiply two source operands and place the product in the target register. All registers are treated as integer registers. Arithmetic is signed twos-complement values. The 'S' flag indicates to perform an unsigned multiply. Unsigned multiply can be used during index calculations.

Supported Operand Sizes: .b, .w, .t, .o

**Operation:** 

Rt = Ra \* Rb or Rt = Ra \* Imm

**Instruction Formats:** 

MULU Rt, Ra, Rb - Register direct



**Clock Cycles: 12** 

MULU Rt,Ra,Imm<sub>16</sub>



**Clock Cycles: 12** 

**Execution Units:** All Integer ALU's

Exceptions: none

## NAND - Bitwise And and Invert

## **Description:**

Bitwise 'nand' two source operands and place the result in the target register.

Supported Operand Sizes: .b, .w, .t, .o, .c, .p, .n

**Clock Cycles: 1** 

**Operation:** 

 $Rt = \sim (Ra \& Rb)$ 

**Instruction Formats:** 

NAND Rt, Ra, Rb - Register direct



NAND Rt,Ra,Imm<sub>16</sub>



**Clock Cycles: 1** 

**Execution Units:** All Integer ALU's

Exceptions: none

## **NOR – Bitwise Or and Invert**

## **Description:**

Bitwise 'or' two source operands invert the result and place the result in the target register. All registers are integer registers.

Supported Operand Sizes: .b, .w, .t, .o, .c, .p, .n

**Operation:** 

$$Rt = \sim (Ra \mid Rb)$$

**Instruction Formats:** 

NOR Rt, Ra, Rb – Register direct



NOR Rt,Ra,Imm<sub>16</sub>



**Clock Cycles: 1** 

**Execution Units:** All Integer ALU's

Exceptions: none

## OR - Bitwise Or

## **Description:**

Bitwise 'or' two source operands and place the sum in the target register. All registers are integer registers. Arithmetic is signed twos-complement values.

Supported Operand Sizes: .b, .w, .t, .o, .c, .p, .n

**Operation:** 

$$Rt = Ra \mid Rb \text{ or } Rt = Ra \mid Imm$$

**Instruction Formats:** 

OR Rt, Ra, Rb – Register direct



OR Rt,Ra,Imm<sub>16</sub>



Clock Cycles: 1

**Clock Cycles: 2** 

**Execution Units:** All Integer ALU's

Exceptions: none

## **REVBIT – Reverse Bit Order**

#### **Description:**

This instruction reverses the order of bits in Ra and stores the result in Rt.

**Integer Instruction Format: R2** 

REVBIT Rt, Ra - Register direct



**Operation:** 

**Execution Units: I** 

**Clock Cycles: 1** 

Exceptions: none

**Notes:** 

## **SQRT – Square Root**

#### **Description:**

This instruction computes the square root value of the contents of the source operand and places the result in Rt.

Supported Operand Sizes: .b, .w, .t, .o

**Integer Instruction Format: R2** 

SQRT Rt, Ra – Register direct



**Clock Cycles: 1** 

**Operation:** 

Rt = SQRT(Ra)

**Execution Units:** Integer ALU #0

**Clock Cycles: 1** 

**Exceptions:** none

# Floating-Point Operations

## **Precision**

Floating point operations are always performed at the greatest precision available. Lower precision formats are available for storage.

For decimal floating-point three storage formats are supported. 96-bit triple precision, 64-bit double precision, and 32-bit single precision values.

## Representations

## **Binary Floats**

Triple Precision, Float:96

The core uses a 96-bit triple precision binary floating-point representation.

96-bit values are more compact than 128-bit ones which reduces the amount of hardware required and data being transferred. They have enough significant digits for a wide variety of applications. 64-bit values are not sufficient for some applications. The question then is how much larger of a representation to use. 80-bits is popular, offering about 19 significant digits which is good for a wide variety of applications. 96-bit floats offer about 24 significant digits.

| 95 | 94 80                  | 79                        | 0 |
|----|------------------------|---------------------------|---|
| S  | Exponent <sub>15</sub> | Significand <sub>80</sub> |   |

Double Precision, Float:64



Single Precision, Float:32

| 31 | 30 |                       | 23 | 22 |                           | 0 |
|----|----|-----------------------|----|----|---------------------------|---|
| S  |    | Exponent <sub>8</sub> |    |    | Significand <sub>23</sub> |   |

Half Precision, Float:16



### **Decimal Floats**

The core uses a 96-bit densely packed decimal triple precision floating-point representation.

| 95 | 94 90              | 89 80                  | 79                        | 0 |
|----|--------------------|------------------------|---------------------------|---|
| S  | Combo <sub>5</sub> | Exponent <sub>10</sub> | Significand <sub>80</sub> |   |

The significand stores 25 densely packed decimal digits. One whole digit before the decimal point.

The exponent is a power of ten as a binary number with an offset of 1535. Range is  $10^{-1535}$  to  $10^{1536}$ 

64-bit double precision decimal floating point:

| 63 | 62 58              | 57 50                 | 49                        | 0 |
|----|--------------------|-----------------------|---------------------------|---|
| S  | Combo <sub>5</sub> | Exponent <sub>8</sub> | Significand <sub>50</sub> |   |

The significand stores 16 DPD digits. One whole digit before the decimal point.

32-bit single precision decimal floating point:

| 31 | 30  | 26               | 25   | 20                 | 19 |                           | 0 |
|----|-----|------------------|------|--------------------|----|---------------------------|---|
| S  | Con | nbo <sub>5</sub> | Expo | onent <sub>6</sub> |    | Significand <sub>20</sub> |   |

The significand store 7 DPD digits. One whole digit before the decimal point.

## **Rounding Modes**

## Binary Float Rounding Modes

| Rm3 | Rounding Mode                               |
|-----|---------------------------------------------|
| 000 | Round to nearest ties to even               |
| 001 | Round to zero (truncate)                    |
| 010 | Round towards plus infinity                 |
| 011 | Round towards minus infinity                |
| 100 | Round to nearest ties away from zero        |
| 101 | Reserved                                    |
| 110 | Reserved                                    |
| 111 | Use rounding mode in float control register |

## **Decimal Float Rounding Modes**

|     | C                                           |
|-----|---------------------------------------------|
| Rm3 | Rounding Mode                               |
| 000 | Round ceiling                               |
| 001 | Round floor                                 |
| 010 | Round half up                               |
| 011 | Round half even                             |
| 100 | Round down                                  |
| 101 | Reserved                                    |
| 110 | Reserved                                    |
| 111 | Use rounding mode in float control register |

# **Operand Sizes**

| Sz <sub>3</sub> | Ext. | Operand       |
|-----------------|------|---------------|
| 0               |      | Reserved      |
| 1               | .h   | 16-bit half   |
| 2               | .s   | 32-bit single |
| 3               | .d   | 64-bit double |
| 4               |      | Reserved      |
| 5               |      | Reserved      |
| 6               | .t   | 96-bit triple |
| 7               |      | reserved      |

## FABS – Absolute Value

## **Description:**

This instruction computes the absolute value of the contents of the source operand and places the result in Rt. The sign bit of the value is cleared. No rounding occurs.

### **Integer Instruction Format: R1**

### FABS Rt, Ra, Rb - Register direct



### **Operation:**

FPt = Abs(FPa)

**Execution Units:** FPU #0

**Clock Cycles: 1** 

Exceptions: none

## **FADD** – Float Addition

### **Description:**

Add two source operands and place the sum in the target register. All registers values are treated as 96-bit floating-point values. An immediate value is converted to 96-bit triple precision from half, single, or double precision.

### **Supported Operand Sizes:**

#### **Operation:**

Rt = Ra + Rb or Rt = Ra + Imm

**Clock Cycles:** 8

**Execution Units:** All Integer ALU's

Exceptions: none

**Notes:** 

#### **Instruction Formats:**

#### FADD Rt, Ra, Rb – Register direct





#### FADD Rt,Ra,Imm<sub>32</sub>

| _ | 39                      | 32 | 31 | 30 | 23 | 22 | 21 | 16    | 15 | 14              | 9 | 8 | 7 5    | 4  | 0              |
|---|-------------------------|----|----|----|----|----|----|-------|----|-----------------|---|---|--------|----|----------------|
|   | ~8                      |    | S  | ~8 |    | Sa | R  | $a_6$ | St | Rt <sub>6</sub> |   | V | $Sz_3$ | 20 | ) <sub>5</sub> |
| Ī | Immediate <sub>32</sub> |    |    |    |    |    |    |       |    |                 |   |   |        |    | L <sub>5</sub> |

### FADD Rt,Ra,Imm<sub>64</sub>

| 39 | 32                       | 31 | 30 | 23  | 22    | 21                | 16 | 15 | 14                | 9 | 8      | 7 5 | 4 | 0  |  |
|----|--------------------------|----|----|-----|-------|-------------------|----|----|-------------------|---|--------|-----|---|----|--|
| ~8 |                          | S  | ~8 |     | Sa    | Ra <sub>6</sub>   |    | St | Rt <sub>6</sub> V |   | $Sz_3$ | 205 |   |    |  |
|    | Immediate <sub>310</sub> |    |    |     |       |                   |    |    |                   |   |        |     |   |    |  |
|    |                          |    |    | Imm | ediat | e <sub>6332</sub> |    |    |                   |   |        | 13  | 3 | 15 |  |

### FADD Rt,Ra,Imm<sub>64</sub>

| 39 |                           | 32 | 31 | 30 | 23         | 22     | 21                | 16    | 15 | 14                | 9 | 8 | 7 5             | 4  | 0  |
|----|---------------------------|----|----|----|------------|--------|-------------------|-------|----|-------------------|---|---|-----------------|----|----|
|    | ~8                        |    | S  | ^  | <b>~</b> 8 | Sa     | R                 | $a_6$ | St | Rt <sub>6</sub> V |   |   | Sz <sub>3</sub> | 20 | )5 |
|    | Immediate <sub>310</sub>  |    |    |    |            |        |                   |       |    |                   |   |   |                 |    |    |
|    | Immediate <sub>6332</sub> |    |    |    |            |        |                   |       |    |                   |   |   |                 |    |    |
|    |                           |    |    |    | Imm        | ediate | e <sub>9564</sub> |       |    |                   |   |   | 23              | 31 | 15 |

## **FCMP - Comparison**

### **Description:**

Compare two source operands and place the result in the target register. The result is a vector identifying the relationship between the two source operands as floating-point values. This instruction may compare against lower precision immediate values to conserve code space.

### **Supported Operand Sizes:**

#### **Operation:**

Rt = Ra? Rb or Rt = Ra? Imm or Rt = Imm? Ra

Clock Cycles: 1

**Execution Units:** All Integer ALU's

Exceptions: none

**Notes:** 

**Instruction Formats:** 

FCMP Rt, Ra, Rb - Register direct



**Clock Cycles: 1** 

#### FCMP Rt,Ra,Imm<sub>13</sub>



| Rt bit | Mnem. | Meaning                                   | Test                      |
|--------|-------|-------------------------------------------|---------------------------|
|        |       | Float Compare Results                     |                           |
| 0      | EQ    | equal                                     | !nan & eq                 |
| 1      | NE    | not equal                                 | !eq                       |
| 2      | GT    | greater than                              | !nan & !eq & !lt & !inf   |
| 3      | UGT   | Unordered or greater than                 | Nan    (!eq & !lt & !inf) |
| 4      | GE    | greater than or equal                     | Eq    (!nan & !lt & !inf) |
| 5      | UGE   | Unordered or greater than or equal        | Nan    (!lt    eq)        |
| 6      | LT    | Less than                                 | Lt & (!nan & !inf & !eq)  |
| 7      | ULT   | Unordered or less than                    | Nan   (!eq & lt)          |
| 8      | LE    | Less than or equal                        | Eq   (lt & !nan)          |
| 9      | ULE   | unordered less than or equal              | Nan   (eq   1t)           |
| 10     | GL    | Greater than or less than                 | !nan & (!eq & !inf)       |
| 11     | UGL   | Unordered or greater than or less than    | Nan   !eq                 |
| 12     | ORD   | Greater than less than or equal / ordered | !nan                      |
| 13     | UN    | Unordered                                 | Nan                       |
| 14     |       |                                           |                           |
| 15     |       |                                           |                           |

## **FDIV** – Float Division

## **Description:**

Divide two source operands and place the quotient in the target register. All registers values are treated as 96-bit floating-point values.

### **Supported Operand Sizes:**

### **Operation:**

Rt = Ra / Rb or Rt = Ra / Imm or Rt = Imm / Ra

**Clock Cycles:** 

**Execution Units:** All Integer ALU's

Exceptions: none

**Notes:** 

**Instruction Formats:** 

FDIV Rt, Ra, Rb – Register direct

| 39 35 | 34 32           | 31 | 30 | 29 | 28 | 23    | 22 | 21 | 16    | 15 | 14 | 9              | 8 | 7 5    | 4 ( | 0 |
|-------|-----------------|----|----|----|----|-------|----|----|-------|----|----|----------------|---|--------|-----|---|
| 75    | Rm <sub>3</sub> | 0  | Vb | Sb | R  | $b_6$ | Sa | R  | $a_6$ | St | R  | t <sub>6</sub> | V | $Sz_3$ | 125 | 5 |

Clock Cycles: 150

FDIV Rt,Ra,Imm<sub>16</sub>



## **FMUL** – **Float Multiplication**

### **Description:**

Multiply two source operands and place the product in the target register. All registers values are treated as 96-bit floating-point values.

## **Supported Operand Sizes:**

**Operation:** 

Rt = Ra \* Rb or Rt = Ra \* Imm

**Clock Cycles:** 

**Execution Units:** All Integer ALU's

Exceptions: none

**Notes:** 

**Instruction Formats:** 

FDIV Rt, Ra, Rb – Register direct

| 39 35 | 34 32           | 31 | 30 | 29 | 28 | 23    | 22 | 21 | 16    | 15 | 14 | 9              | 8 | 7 5    | 4 ( | 0 |
|-------|-----------------|----|----|----|----|-------|----|----|-------|----|----|----------------|---|--------|-----|---|
| 65    | Rm <sub>3</sub> | 0  | Vb | Sb | R  | $b_6$ | Sa | R  | $a_6$ | St | R  | t <sub>6</sub> | V | $Sz_3$ | 125 | 5 |

**Clock Cycles: 8** 

FDIV Rt,Ra,Imm<sub>13</sub>



## **FNEG – Negate Value**

## **Description:**

This instruction computes the negative value of the contents of the source operand and places the result in Rt. The sign bit of the value is inverted. No rounding occurs.

### **Integer Instruction Format: R1**

### FNEG Rt, Ra, Rb – Register direct



### **Operation:**

Rt = -Ra

**Execution Units:** FPU #0

**Clock Cycles: 1** 

Exceptions: none

## **FSCALEB – Scale Exponent**

## **Description:**

Add the source operand to the exponent.

**Supported Operand Sizes:** 

**Operation:** 

**Clock Cycles:** 

Execution Units: All Integer ALU's

Exceptions: none

**Notes:** 

**Instruction Formats:** 

FSCALEB Rt, Ra, Rb – Register direct



FSCALEB Rt, Ra, #Imm - Immediate



| ~16 | Immediate <sub>150</sub> | 03 | 315 |
|-----|--------------------------|----|-----|
|-----|--------------------------|----|-----|

## **FSUB** – Float Subtraction

### **Description:**

Subtract two source operands and place the difference in the target register. All registers values are treated as 88-bit floating-point values. This is an alternate mnemonic for the <u>FADD</u> instruction where the second source operand, Rb is assumed negated.

### **Supported Operand Sizes:**

**Operation:** 

$$Rt = Ra + -Rb$$
 or  $Rt = Ra + -Imm$ 

**Clock Cycles:** 8

**Execution Units:** All Integer ALU's

Exceptions: none

**Notes:** 

**Instruction Formats:** 

FSUB Rt, Ra, Rb – Register direct

| 39 35           | 34 32           | 31 | 30 |   | 28 23           |    |                 | 6 15 | 14 9            | 8 | 7 5    | 4 0 |
|-----------------|-----------------|----|----|---|-----------------|----|-----------------|------|-----------------|---|--------|-----|
| 45              | Rm <sub>3</sub> | 0  | Vb | 1 | Rb <sub>6</sub> | Sa | Ra <sub>6</sub> | St   | Rt <sub>6</sub> | V | $Sz_3$ | 125 |
| Clock Cycles: 8 |                 |    |    |   |                 |    |                 |      |                 |   |        |     |

#### FSUB Rt,Ra,Imm<sub>13</sub>



S

### **FTRUNC – Truncate Fraction**

### **Description:**

This instruction truncates off the fractional portion of the number leaving only the integer portion. No rounding occurs.

**Integer Instruction Format: R1** 

#### FTRUNC Rt, Ra, Rb - Register direct

|          |    |   |   |   |     |    |                 |    | 14 9            |   |        |     |  |
|----------|----|---|---|---|-----|----|-----------------|----|-----------------|---|--------|-----|--|
| 15       | ~3 | 0 | 0 | 0 | 216 | Sa | Ra <sub>6</sub> | St | Rt <sub>6</sub> | V | $Sz_3$ | 125 |  |
| $\alpha$ |    |   |   |   |     |    |                 |    |                 |   |        |     |  |

**Clock Cycles: 1** 

#### **Operation:**

Rt = Trunc(Ra)

**Execution Units:** FPU #0

**Clock Cycles: 1** 

Exceptions: none

# **String Operations**

# Representations

### Strings



### **UTF8 Chars**



### UTF24 Chars



### **CHRNDX – Character Index**

#### **Description:**

This instruction searches Ra, which is treated as an array of characters, for a character value specified by Rb and places the index of the character into the target register Rt. If the character is not found -1 is placed in the target register. A common use would be to search for a null byte. The index result may vary from -1 to +11 for UTF8 characters or -1 to +3 for UTF24 characters. The index of the first found byte is returned (closest to zero).

Supported Operand Sizes: .b, .c

**Instruction Formats:** 

CHRNDX Rt, Ra, Rb - Register direct

|                                                                                                                       | 39 | 34 | 33 | 32 | 31 | 30 | 29 | 24    | 23 | 22 | 21 | 16    | 15 | 14 | 13 | 8     | 7 5    | 4 | 0  |
|-----------------------------------------------------------------------------------------------------------------------|----|----|----|----|----|----|----|-------|----|----|----|-------|----|----|----|-------|--------|---|----|
|                                                                                                                       | 13 | 36 | ~  | 0  | Vb | Sb | R  | $b_6$ | Va | Sa | R  | $a_6$ | Vt | St | R  | $t_6$ | $Sz_3$ | 2 | 25 |
| 13 <sub>6</sub> ~ 0 Vb Sb Rb <sub>6</sub> Va Sa Ra <sub>6</sub> Vt St Rt <sub>6</sub> Sz <sub>3</sub> Clock Cycles: 1 |    |    |    |    |    |    |    |       |    |    |    |       |    |    |    |       |        |   |    |

CHRNDX Rt,Ra,Imm<sub>15</sub>



**Clock Cycles: 1** 

**Operation:** 

Rt = Index of (Rb in Ra)

**Execution Units:** All Integer ALU's

Exceptions: none

# Bit Manipulation Operations

### CLR - Clear Bit Field

#### **Description:**

A bit field in the source operand is cleared and the result placed in the target register. The specified bit to clear is modulo the operand size.

Supported Operand Sizes: .b, .w, .t, .o

Flag Updates: none

**Operation:** 

Rt = Ra &~bit Rb or Ra = Ra &~bit imm

**Instruction Formats:** 

CLR Rt, Ra, Rb



CLR Rt, Ra,Offs7,Wid7



**Clock Cycles: 1** 

**Clock Cycles:** 

**Execution Units:** All Integer ALU's

Exceptions: none

# **COM – Complement Bit Field**

#### **Description:**

A bit in the source operand is changed and placed in the target register. The specified bit to change is modulo the operand size.

**Supported Operand Sizes:** .b, .w, .t, .o

Flag Updates: none

**Operation:** 

$$Rt[Rb] = \sim Ra[Rb] \text{ or } Rt[Imm] = \sim Ra[Imm]$$

**Instruction Formats:** 

COM Rt, Ra, Rb



COM Rt, Ra,Offs<sub>7</sub>,Wid<sub>7</sub>



**Clock Cycles: 1** 

**Clock Cycles:** 1

**Execution Units:** All Integer ALU's

Exceptions: none

### SET – Set Bit Field

#### **Description:**

A bit in the source operand is set and placed in the target register.

Supported Operand Sizes: .b, .w, .t, .o

**Operation:** 

 $Rt = Ra \mid bit Rb \text{ or } Rt = Ra \text{ or } Bit[Imm]$ 

**Instruction Formats:** 

SET Rt, Ra, Rb



**Clock Cycles: 1** 

SET Rt, Ra,Offs7,Wid7



Clock Cycles: 1

Clock Cycles: 1

**Execution Units:** All Integer ALU's

Exceptions: none

# SBX – Sign Bit Extend

#### **Description:**

Sign extend a value beginning at a specified bit to the width specified and place the result in the target register. All registers are integer registers.

Supported Operand Sizes: .b, .w, .t, .o

**Operation:** 

**Instruction Formats:** 

SBX Rt, Ra, Rb



Clock Cycles. 1

SBX Rt, Ra,Offs7,Wid7

| 39 37          | 36 30            | 29 23             | 22 | 21 16           | 15 | 14 9            | 8 | 7 5             | 4 0 |
|----------------|------------------|-------------------|----|-----------------|----|-----------------|---|-----------------|-----|
| 3 <sub>3</sub> | Wid <sub>7</sub> | Offs <sub>7</sub> | Sa | Ra <sub>6</sub> | St | Rt <sub>6</sub> | V | Sz <sub>3</sub> | 135 |

**Clock Cycles: 1** 

**Clock Cycles:** 

**Execution Units:** All Integer ALU's

Exceptions: none

# **EXTS – Extract Signed Bit Field**

#### **Description:**

Extract a bit field from the source operand and place the bit field in the target register. The field is sign extended.

Supported Operand Sizes: .b, .w, .t, .o

**Operation:** 

Rt = Ra[Rb] or Rt = Ra[Imm]

**Instruction Formats:** 

EXTS Rt, Ra, Rb



**Clock Cycles: 1** 

EXTS Rt, Ra,Offs7,Wid7



**Clock Cycles: 1** 

Clock Cycles: 1

**Execution Units:** All Integer ALU's

**Exceptions:** none

### **EXTU – Extract Bit Field**

#### **Description:**

Extract a bit field from the source operand and place the bit field in the target register. The field is zero extended.

Supported Operand Sizes: .b, .w, .t, .o

**Operation:** 

Rt = Ra[Rb] or Rt = Ra[Imm]

**Instruction Formats:** 

EXTU Rt, Ra, Rb



Clock Cycles: 1

EXTU Rt, Ra,Offs7,Wid7

| 39 37 | 50               | 30 |    |                 |    |   |       | 16 15 |    |   |   |                 |    |    |
|-------|------------------|----|----|-----------------|----|---|-------|-------|----|---|---|-----------------|----|----|
| 43    | Wid <sub>7</sub> |    | Of | fs <sub>7</sub> | Sa | R | $a_6$ | St    | Rt | 6 | V | Sz <sub>3</sub> | 1. | 35 |

Clock Cycles: 1

Clock Cycles: 1

**Execution Units:** All Integer ALU's

Exceptions: none

# Shift and Rotate Operations

### **ASL – Arithmetic Shift Left**

#### **Description:**

Shift the first source operand to the left by the number of bits specified by the second source operand and place the result in the target register. All registers are integer registers. Arithmetic is signed twos-complement values. The least significant bit is filled with the value of 'N' specified in the instruction.

**Supported Operand Sizes:** .b, .w, .t, .o

**Operation:** 

 $Rt = Ra \ll Rb \text{ or } Rt = Ra \ll Imm$ 

**Instruction Formats:** 

ASL Rt, Ra, Rb



ASL Rt, Ra, Imm7



**Clock Cycles:** 

**Execution Units:** All Integer ALU's

Exceptions: none

# **ASR – Arithmetic Shift Right**

#### **Description:**

Shift the first source operand to the right, preserving the sign bit, by the number of bits specified by the second source operand and place the result in the target register. All registers are integer registers. Arithmetic is signed twos-complement values.

Supported Operand Sizes: .b, .w, .1

**Operation:** 

 $Rt = Ra \gg Rb$  or  $Rt = Ra \gg Imm$ 

**Instruction Formats:** 

ASR Rt, Ra, Rb



ASR Rt, Ra, Imm7



Clock Cycles: 1

**Clock Cycles:** 

**Execution Units:** All Integer ALU's

Exceptions: none

## LSL – Logical Shift Left

#### **Description:**

Shift the first source operand to the left by the number of bits specified by the second source operand and place the result in the target register. All registers are integer registers. Arithmetic is signed twos-complement values. Fill the least significant bit with the value specified by 'N' in the instruction.

This instruction may be used to generate a bitmask by setting N to one, and shifting a zero.

**Supported Operand Sizes:** .b, .w, .1

**Operation:** 

 $Rt = Ra \ll Rb \text{ or } Rt = Ra \ll Imm$ 

**Instruction Formats:** 

LSL Rt, Ra, Rb



LSL Rt, Ra, Imm7



**Clock Cycles:** 

**Execution Units:** All Integer ALU's

Exceptions: none

## LSLAND – Logical Shift Left and And

#### **Description:**

Shift the first source operand to the left by the number of bits specified by the second source operand and bitwise 'and' the result to the target register. All registers are integer registers. Arithmetic is signed twos-complement values. Fill the least significant bit with the value specified by 'N' in the instruction.

This instruction may be used to isolate a bitfield in a target register.

Supported Operand Sizes: .b, .w, .1

**Operation:** 

Rt = Rt & (Ra << Rb) or Rt = Rt & (Ra << Imm)

**Instruction Formats:** 

LSLAND Rt, Ra, Rb



LSLAND Rt, Ra, Imm7



**Clock Cycles:** 

**Execution Units:** All Integer ALU's

Exceptions: none

## LSLOR – Logical Shift Left and Or

#### **Description:**

Shift the first source operand to the left by the number of bits specified by the second source operand and bitwise 'or' the result to the target register. All registers are integer registers. Arithmetic is signed twos-complement values. Fill the least significant bit with the value specified by 'N' in the instruction.

This instruction may be used to insert a bitfield into a target register.

**Supported Operand Sizes:** .b, .w, .1

**Operation:** 

 $Rt = Rt \mid (Ra \ll Rb) \text{ or } Rt = Rt \mid (Ra \ll Imm)$ 

**Instruction Formats:** 

LSLOR Rt, Ra, Rb



LSLOR Rt, Ra, Imm<sub>7</sub>



**Clock Cycles: 1** 

**Clock Cycles:** 

**Execution Units:** All Integer ALU's

Exceptions: none

### LSLXOR - Logical Shift Left and Exclusive Or

#### **Description:**

Shift the first source operand to the left by the number of bits specified by the second source operand and bitwise exclusive 'or' the result to the target register. All registers are integer registers. Arithmetic is signed twos-complement values. Fill the least significant bit with the value specified by 'N' in the instruction.

This instruction may be used to insert or invert a bitfield in a target register.

**Supported Operand Sizes:** .b, .w, .1

**Operation:** 

 $Rt = Rt \land (Ra << Rb) \text{ or } Rt = Rt \land (Ra << Imm)$ 

**Instruction Formats:** 

LSLOR Rt, Ra, Rb



LSLOR Rt, Ra, Imm7



**Clock Cycles:** 

**Execution Units:** All Integer ALU's

Exceptions: none

# LSR – Logical Shift Right

#### **Description:**

Shift the first source operand to the right by the number of bits specified by the second source operand and place the result in the target register. All registers are integer registers. Arithmetic is signed twos-complement values. Fill the least significant bit with the value specified by 'N' in the instruction.

Supported Operand Sizes: .b, .w, .t, .o

**Operation:** 

 $Rt = Ra \gg Rb$  or  $Rt = Ra \gg Imm$ 

**Instruction Formats:** 

LSR Rt, Ra, Rb



Clock Cycles: 1

LSR Rt, Ra, Imm7



**Clock Cycles:** 

**Execution Units:** All Integer ALU's

**Exceptions:** none

### **ROL – Rotate Left**

#### **Description:**

Rotate the first source operand to the left by the number of bits specified by the second source operand and place the result in the target register. All registers are integer registers. Arithmetic is signed twos-complement values. The least significant bit is set to the value of the most significant bit exclusively or'd with the value 'N' from the instruction.

Supported Operand Sizes: .b, .w, .t, .o

**Operation:** 

 $Rt = Ra \ll Rb$  or  $Rt = Ra \ll Imm$ 

**Instruction Formats:** 

ROL Rt, Ra, Rb



ROL Rt, Ra, Imm7



**Clock Cycles:** 

**Execution Units:** All Integer ALU's

**Exceptions:** none

## **ROR – Rotate Right**

#### **Description:**

Rotate the first source operand through the carry to the right by the number of bits specified by the second source operand and place the result in the target register. All registers are integer registers. Arithmetic is signed twos-complement values. The most significant bit is set to the value of the least significant bit exclusively or'd with the value 'N' from the instruction.

Supported Operand Sizes: .b, .w, .1

**Operation:** 

 $Rt = Ra \gg Rb$  or  $Rt = Ra \gg Imm$ 

**Instruction Formats:** 

ROR Rt, Ra, Rb



ROR Rt, Ra, Imm7



Clock Cycles: 1

**Clock Cycles:** 

**Execution Units:** All Integer ALU's

**Exceptions:** none

### **VSHLV** – Shift Vector Left

#### **Description**

Elements of the vector are transferred upwards to the next element position. The first is loaded with the value zero. This is also called a slide operation.

#### **Instruction Formats:**

#### VSHLV Rt, Ra, Rb



#### VSHLV Rt, Ra, Imm7



#### **Clock Cycles: 1**

#### Operation

$$Amt = Rb$$

For 
$$x = VL-1$$
 to Amt

$$Vt[x] = Va[x-amt]$$

For 
$$x = Amt-1$$
 to 0

$$Vt[x] = 0$$

#### Exceptions: none

# **VSHRV** – Shift Vector Right

#### **Description**

Elements of the vector are transferred downwards to the next element position. The last is loaded with the value zero. This is also called a slide operation.

#### VSHLR Rt, Ra, Rb



#### VSHLR Rt, Ra, Imm7



#### Operation

$$Amt = Rb$$
 
$$For \ x = 0 \ to \ VL\text{-}Amt$$
 
$$Vt[x] = Va[x+amt]$$

For 
$$x = VL-Amt + 1$$
 to  $VL-1$ 

$$Vt[x] = 0$$

#### Exceptions: none

# Flow Control Instructions

### **Bcc – Conditional Branch**

Bcc Ra, Rb, label

#### **Description:**

Branch if the condition is met. The condition is a relationship between either two registers or a register and an immediate value. The displacement is relative to the address of the branch instruction. The branch range is  $\pm$ -32kB.

#### **Instruction Format: RR**



| Cond <sub>5</sub> | Mnem.   | Meaning                                   | Test                      |
|-------------------|---------|-------------------------------------------|---------------------------|
|                   |         | Integer Compare Results                   |                           |
| 0                 | EQ      | = equal                                   | a == b                    |
| 1                 | LT      | < less than                               | a < b                     |
| 2                 | LE      | <= less than or equal                     | a <= b                    |
| 3                 | LO/CS   | < unsigned less than                      | a < b                     |
| 4                 | LS      | <= unsigned less than or equal            | a <= b                    |
| 5                 | BC      | Bit clear                                 | !a[b]                     |
| 6                 | BS      | Bit set                                   | a[b]                      |
| 7                 |         |                                           |                           |
| 8                 | NE      | <> not equal                              | a <> b                    |
| 9                 | GE      | >= greater than or equal                  | a >= b                    |
| 10                | GT      | > greater than                            | a > b                     |
| 11                | HS / CC | unsigned greater than or equal            | a >= b                    |
| 12                | HI      | unsigned greater than                     | a > b                     |
| 13                |         | <u> </u>                                  |                           |
| 14                | RA      | Branch always                             | 1                         |
| 15                | SR      | Branch subroutine                         | 1                         |
| Cond <sub>5</sub> | Mnem.   | Meaning                                   | Test                      |
|                   |         | Float Compare Results                     |                           |
| 16                | EQ      | equal                                     | !nan & eq                 |
| 17                | NE      | not equal                                 | !eq                       |
| 18                | GT      | greater than                              | !nan & !eq & !lt & !inf   |
| 19                | UGT     | Unordered or greater than                 | Nan    (!eq & !lt & !inf) |
| 20                | GE      | greater than or equal                     | Eq    (!nan & !lt & !inf) |
| 21                | UGE     | Unordered or greater than or equal        | Nan    (!lt    eq)        |
| 22                | LT      | Less than                                 | Lt & (!nan & !inf & !eq)  |
| 23                | ULT     | Unordered or less than                    | Nan   (!eq & lt)          |
| 24                | LE      | Less than or equal                        | Eq   (lt & !nan)          |
| 25                | ULE     | unordered less than or equal              | Nan   (eq   lt)           |
| 26                | GL      | Greater than or less than                 | !nan & (!eq & !inf)       |
| 27                | UGL     | Unordered or greater than or less than    | Nan   !eq                 |
| 28                | ORD     | Greater than less than or equal / ordered | !nan                      |

| 29                | UN    | Unordered | Nan  |
|-------------------|-------|-----------|------|
| Cond <sub>5</sub> | Mnem. | Meaning   | Test |
| 30                |       |           |      |
| 31                |       |           |      |

### **BRA** – Unconditional Branch

#### **Description:**

Unconditionally branch to a new program address. The displacement is relative to the address of the branch instruction. The branch range is +/- 128MB.

#### **Instruction Format:**

| 39                  | 16 | 15   | 12   | 1110    | 9  | 5  | 4  | 0  |
|---------------------|----|------|------|---------|----|----|----|----|
| Disp <sub>230</sub> |    | Disp | 2724 | $0_{2}$ | 14 | 45 | 28 | 85 |

**Clock Cycles: 3** 

### **BRK** – Breakpoint

#### **Description:**

Execute the breakpoint exception. This is a form of the TRAP instruction.

#### **Instruction Format:**



### **BSR** – Branch to Subroutine

#### **Description:**

Branch to a subroutine placing the address of the next instruction in a register. The displacement is relative to the address of the branch instruction. The branch range is +/- 128MB.

#### **Instruction Format:**



## **DBcc – Decrement and Branch**

DBcc Rn, label

#### **Description:**

Decrement the loop counter and branch if the condition is false and the loop counter is not equal to minus one. The displacement is relative to the address of the branch instruction. The branch range is  $\pm$ 7- 32kB.

#### **Instruction Format:**

| 39                   | 24 | 2322 | 21                | 16 | 15 | 10    | 9  | 5               | 4  | 0  |
|----------------------|----|------|-------------------|----|----|-------|----|-----------------|----|----|
| Disp <sub>15</sub> . | 0  | ~2   | 2 Rb <sub>6</sub> |    | R  | $a_6$ | Co | nd <sub>5</sub> | 29 | 95 |

## JMP – Jump to Address

#### **Description:**

Compute the effective address and jump to it. If Ra=53 then the program counter is used. If the indirection bit 'I' of the instruction is set then load the address from memory specified by the effective address and jump to it.

#### **Operation:**

PC = Ra + Rb or PC = Ra + Imm

**Clock Cycles:** 

**Execution Units:** All Integer ALU's

Exceptions: none

**Notes:** 

**Instruction Formats:** 

JMP d(Ra, Rb)



**Clock Cycles: 1** 

 $JMP Imm_{16} (Ra)$ 



JMP [d(Ra, Rb)]



**Clock Cycles: 1** 

 $JMP [Imm_{16} (Ra)]$ 



| $Op_2$ |              |
|--------|--------------|
| 0      | Load PC LSBs |
| 1      | Add to PC    |
|        |              |
|        |              |

# JSR – Jump to Subroutine

#### **Description:**

Compute the effective address and jump to it. The address of the instruction is stored in a register. If Ra=53 then the program counter is used.

Flag Updates:

None.

**Operation:** 

Rt = PC

PC = Ra + Rb or PC = Ra + Imm

**Clock Cycles:** 

**Execution Units:** All Integer ALU's

Exceptions: none

**Notes:** 

**Instruction Formats:** 

JSR (Ra, Rb)



JSR Imm<sub>16</sub> (Ra)



# **NOP – No Operation**

NOP

#### **Description:**

This instruction does not perform any operation.  $Ty_3$  0 to 3 indicates a postfix instruction, and these codes should not be used for other NOPs.

#### **Instruction Format:**

| 39                    | 8 | 7 | 5          | 4  | 0  |
|-----------------------|---|---|------------|----|----|
| Payload <sub>32</sub> |   | T | <b>y</b> 3 | 31 | 15 |

### **RTE – Return From Exception**

#### **Instruction Formats:**

#### RTE #Rpt



#### **Field Description:**

Rpt<sub>7</sub> is the number of bytes to skip past the return address. This is to allow inline subroutine arguments. Up to 128 bytes may be skipped over. For externally triggered interrupts this field should be zero.

D<sub>2</sub> specifies the number of internal stack entries to unstack. It may be used to perform a multilevel return. Legal values for D are 1,2 or 3. In most cases a single entry is unstacked. If two entries are unstack a two-up level return will occur.

#### **Operation:**

Optionally pop the status register, condition code group register, and program counter from the internal stack. Add Rpt tetras to the program counter, and Arg tetras to the stack pointer. If returning from an application trap the status register is not popped from the stack.

# TRAP - Trap

#### **Description:**

Execute trap. The data field is loaded into the specified target register, Rt. The trap number to execute comes from the contents of register Ra or an immediate value encoded in the instruction. The trap number must be between 1 and 511. Trap numbers below 64 are reserved for the system. Trap numbers 64 and above may be used by applications.

Traps below 64 will use the vector base register to lookup the location of the service routine. Traps above 64 will use the application control register to lookup the location of the service routine.

#### **Instruction Format:**

#### TRAP Rt, Ra, #Data



#### TRAP Rt, #Vec, #Data



#### **Operation:**

The program counter and the status register are pushed on an internal stack. Next the vector is fetched from the exception vector table and jumped to.

# **Memory Operations**

### **AMADD - Addition**

#### **Description:**

Atomically add source operand register Rb to value from memory and store the result back to memory. The original value of the memory cell is stored in register Rt. The memory address is contained in register Ra.

Supported Operand Sizes: .t, .o, .n

**Instruction Formats: AMO** 

AMADD Rt, Rb, [Ra]

| <br> |  |    |       |    |    |                 |    |                 |    | 14 9            | - |        |     |
|------|--|----|-------|----|----|-----------------|----|-----------------|----|-----------------|---|--------|-----|
| 45   |  | rl | $0_2$ | Vb | Sb | Rb <sub>6</sub> | Sa | Ra <sub>6</sub> | St | Rt <sub>6</sub> | V | $Sz_3$ | 265 |

**Clock Cycles:** 

AMADD Rt, imm, [Ra]

| 20 <sub>5</sub>   aq   rl   0   Imm <sub>8</sub>   Sa   Ra <sub>6</sub>   St   Rt <sub>6</sub>   V   Sz <sub>3</sub>   26 <sub>5</sub> |  |  |    |    |   |    | 30    |    |   |       |    |   |                |   | -      |    |    |  |
|----------------------------------------------------------------------------------------------------------------------------------------|--|--|----|----|---|----|-------|----|---|-------|----|---|----------------|---|--------|----|----|--|
| 20) 44 11 0 1111118 84 1440 81 1440 7 825 205                                                                                          |  |  | aq | rl | 0 | Im | $n_8$ | Sa | R | $a_6$ | St | R | t <sub>6</sub> | V | $Sz_3$ | 20 | 65 |  |

### AMAND - Bitwise And

#### **Description:**

Bitwise 'And' source operand register Rb to value from memory and store the result back to memory. The original value of the memory cell is stored in register Rt. The memory address is contained in register Ra.

Supported Operand Sizes: .t, .o, .n

**Instruction Formats: AMO** 

AMAND Rt, Rb, [Ra]

|    |   |    |    |       |    |    |                 |    | 21 16           |    |                 |   |        |     |
|----|---|----|----|-------|----|----|-----------------|----|-----------------|----|-----------------|---|--------|-----|
| 8: | 5 | aq | rl | $0_2$ | Vb | Sb | Rb <sub>6</sub> | Sa | Ra <sub>6</sub> | St | Rt <sub>6</sub> | V | $Sz_3$ | 265 |

**Clock Cycles:** 

AMAND Rt, imm, [Ra]

| 39 | 35                    | 34 | 33 | 3231 | 30  | 23 | 22 | 21 | 16    | 15 | 14 | 9              | 8 | 7 5    | 4  | 0  |
|----|-----------------------|----|----|------|-----|----|----|----|-------|----|----|----------------|---|--------|----|----|
| 24 | <b>4</b> <sub>5</sub> | aq | rl | 0    | Imn | 18 | Sa | R  | $a_6$ | St | R  | t <sub>6</sub> | V | $Sz_3$ | 20 | 65 |

**Clock Cycles:** 

### **AMASL – Arithmetic Shift Left**

### **Description:**

Atomically shift left source operand from memory by Rb and store the result back to memory. The original value of the memory cell is stored in register Rt. The memory address is contained in register Ra.

Supported Operand Sizes: .t, .o, .n

**Instruction Formats: AMO** 

AMASL Rt, Rb, [Ra]



AMASL Rt, imm, [Ra]



### **AMEOR – Bitwise Exclusive Or**

#### **Description:**

Bitwise exclusive 'Or' source operand register Rb to value from memory and store the result back to memory. The original value of the memory cell is stored in register Rt. The memory address is contained in register Ra.

Supported Operand Sizes: .t, .o, .n

**Instruction Formats: AMO** 

AMEOR Rt, Rb, [Ra]

| 39 | 35 | 34 | 33 | 3231  | 30 | 29 | 28 23           | 22 | 21 16           | 15 | 14 9            | 8 | 7 5    | 4 0 |
|----|----|----|----|-------|----|----|-----------------|----|-----------------|----|-----------------|---|--------|-----|
| 10 | )5 | aq | rl | $0_2$ | Vb | Sb | Rb <sub>6</sub> | Sa | Ra <sub>6</sub> | St | Rt <sub>6</sub> | V | $Sz_3$ | 265 |

**Clock Cycles:** 

AMEOR Rt, imm, [Ra]

| <br>39 | 35         | 34 | 33 | 3231 | 30 2:            | 3 22 | 21 | 16    | 15 | 14 | 9              | 8 | 7 5             | 4  | 0  |
|--------|------------|----|----|------|------------------|------|----|-------|----|----|----------------|---|-----------------|----|----|
| 26     | <b>5</b> 5 | aq | rl | 0    | Imm <sub>8</sub> | Sa   | R  | $a_6$ | St | R  | t <sub>6</sub> | V | Sz <sub>3</sub> | 20 | 65 |

**Clock Cycles:** 

### **AMLSR – Logical Shift Right**

#### **Description:**

Atomically shift right source operand from memory by Rb and store the result back to memory. The original value of the memory cell is stored in register Rt. The memory address is contained in register Ra.

Supported Operand Sizes: .t, .o, .n

**Instruction Formats: AMO** 

AMLSR Rt, Rb, [Ra]



AMLSR Rt, imm, [Ra]



### **AMMIN - Minimum**

#### **Description:**

If Rb is less than the value from memory, store Rb to memory. The original value of the memory cell is stored in register Rt. The memory address is contained in register Ra. Values are treated as signed two's complement integers. This operation is performed in an atomic fashion.

Supported Operand Sizes: .t, .o, .n

**Instruction Formats: AMO** 

AMMIN Rt, Rb, [Ra]

**Clock Cycles:** 

### **AMMINU - Minimum**

#### **Description:**

If Rb is less than the value from memory, store Rb to memory. The original value of the memory cell is stored in register Rt. The memory address is contained in register Ra. Values are treated as unsigned integers. This operation is performed in an atomic fashion.

**Supported Operand Sizes:** .t, .o, .n

**Instruction Formats: AMO** 

AMMINU Rt, Rb, [Ra]

3231 30 28 23 22 21 16 15 29 14 75 Vb  $Rb_6$ aq rl  $0_2$ Sb Sa Ra<sub>6</sub> St  $Rt_6$ 

# AMOR – Bitwise Or

#### **Description:**

Bitwise 'Or' source operand register Rb to value from memory and store the result back to memory. The original value of the memory cell is stored in register Rt. The memory address is contained in register Ra.

Supported Operand Sizes: .t, .o, .n

**Instruction Formats: AMO** 

AMOR Rt, Rb, [Ra]

| 39 | 35 | 34 | 33 | 3231  | 30 | 29 | 28 23           | 22 | 21 16           | 15 | 14 9            | 8 | 7 5    | 4 0 |
|----|----|----|----|-------|----|----|-----------------|----|-----------------|----|-----------------|---|--------|-----|
| 9  | 5  | aq | rl | $0_2$ | Vb | Sb | Rb <sub>6</sub> | Sa | Ra <sub>6</sub> | St | Rt <sub>6</sub> | V | $Sz_3$ | 265 |

**Clock Cycles:** 

AMOR Rt, imm, [Ra]

| 39 | 35                    | 34 | 33 | 3231 | 30               | 23 22 | 21 | 16    | 15 | 14 | 9              | 8 | 7 5    | 4  | 0  |
|----|-----------------------|----|----|------|------------------|-------|----|-------|----|----|----------------|---|--------|----|----|
| 2  | <b>5</b> <sub>5</sub> | aq | rl | 0    | Imm <sub>8</sub> | Sa    | R  | $a_6$ | St | Ri | t <sub>6</sub> | V | $Sz_3$ | 20 | 65 |

## **CMPXCHG – Compare and Exchange**

### **Description:**

If the contents of the addressed memory cell is equal to the contents of Rb then a value is stored to memory from the source register Rc. The original contents of the memory cell are loaded into register Rt. The memory address is contained in register Ra. The memory address must be properly aligned. If the operation was successful then Rt and Rb will be the same value. The compare and swap operation are an atomic operation; no other access is allowed between the load and potential store operation.

### Supported Operand Sizes: .t, .o, .n

| Sz <sub>3</sub> | Ext. | Operand      |
|-----------------|------|--------------|
| 0               | .b   | 8-bit Byte   |
| 1               | .W   | 16-bit Wyde  |
| 2               | .t   | 32-bit Tetra |
| 3               | .0   | 64-bit Octa  |
| 4               | .c   | 24-bit       |
| 5               | .p   | 40-bit       |
| 6               | .n   | 96-bit       |
| 7               |      | reserved     |

**Instruction Formats: CMPXCHG** 

### CMPXCHG Rt, Rb, Rc, [Ra]

| 39  | 38    | 37    | 32    | 31 | 30 | 29 | 28 | 24    | 22 | 21 | 16    | 15 | 14 | 9              | 8 | 7 5    | 4 | 0                     |
|-----|-------|-------|-------|----|----|----|----|-------|----|----|-------|----|----|----------------|---|--------|---|-----------------------|
| 0   | Vc    | R     | $c_6$ | Sc | Vb | Sb | R  | $b_6$ | Sa | R  | $a_6$ | St | Rt | t <sub>6</sub> | V | $Sz_3$ | 2 | <b>5</b> <sub>5</sub> |
| Cle | ock C | ycles | s:    |    |    |    |    |       |    |    |       |    |    |                |   |        |   |                       |

# FLOAD Rn,<ea>

### **Description:**

Load register Rt from floating-point source. The source value is converted to the machine width; 96-bit triple precision.

Supported Operand Sizes: .b, .w, .t, .o, .p, .n

| Sz <sub>3</sub> | Ext. | Operand       |
|-----------------|------|---------------|
| 0               |      | reserved      |
| 1               | .h   | 16-bit half   |
| 2               | .s   | 32-bit single |
| 3               | .d   | 64-bit double |
| 4               |      | Reserved      |
| 5               |      | Reserved      |
| 6               | .t   | 96-bit triple |
| 7               |      | reserved      |

**Instruction Formats: NDXL** 

FLOAD Rt, d(Rb,Rc\*Sc) – indexed

|   | 39  | 38    | 37    | 32             | 31 | 30 | 29 | 28 | 24    | 22 | 21 | 16              | 15    | 9 | 8 | 7 5    | 4 | 0  |
|---|-----|-------|-------|----------------|----|----|----|----|-------|----|----|-----------------|-------|---|---|--------|---|----|
| ſ | 1   | Vc    | R     | c <sub>6</sub> | Sc | Vb | Sb | R  | $b_6$ | St | R  | .t <sub>6</sub> | $D_6$ | 0 | V | $Sz_3$ | 1 | 65 |
|   | Clo | ock C | ycles | s:             |    |    |    |    |       |    |    |                 |       |   |   |        |   |    |

## LOAD Rn,<ea>

### **Description:**

Load register Rt from source. The source value is sign extended to the machine width. Loading register r54, the stack canary placeholder, will cause a check trap if the value loaded is not equal to the current value of the stack canary register.

Supported Operand Sizes: .b, .w, .t, .o, .p, .n

| Sz <sub>3</sub> | Ext. | Operand      |
|-----------------|------|--------------|
| 0               | .b   | 8-bit Byte   |
| 1               | .W   | 16-bit Wyde  |
| 2               | .t   | 32-bit Tetra |
| 3               | .0   | 64-bit Octa  |
| 4               | .c   | 24-bit       |
| 5               | .p   | 40-bit       |
| 6               | .n   | 96-bit       |
| 7               |      | group        |

**Instruction Formats: NDXL** 

LOAD Rt, d(Rb,Rc\*Sc) – indexed



# FSTORE Ra,<ea>

### **Description:**

Store register Ra to destination. The register is converted from triple precision to the storage precision.

Supported Operand Sizes: .h, .s, .d, .t

| Sz <sub>3</sub> | Ext. | Operand       |
|-----------------|------|---------------|
| 0               |      | Reserved      |
| 1               | .h   | 16-bit half   |
| 2               | .s   | 32-bit single |
| 3               | .d   | 64-bit double |
| 4               |      | Reserved      |
| 5               |      | reserved      |
| 6               | .t   | 96-bit triple |
| 7               |      | reserved      |

**Instruction Formats: NDXS** 

STORE Ra, d(Rb, Rc\*Sc) – Indexed



# LOADG Gn,<ea>

### **Description:**

Load group of five registers from source.

| Gn | Registers  |
|----|------------|
| 0  | R0 to R4   |
| 1  | R5 to R9   |
| 2  | R10 to R14 |
| 3  | R15 to R19 |
| 4  | R20 to R24 |
| 5  | R25 to R29 |
| 6  | R30 to R34 |
| 7  | R35 to R39 |

| Gn | Registers          |
|----|--------------------|
| 8  | R40 to R44         |
| 9  | R45 to R49         |
| 10 | R50 to R54         |
| 11 | R55 to R59         |
| 12 | R60 to R64         |
| 13 | ASP, SSP, HSP, MSP |
|    |                    |
|    |                    |

Supported Operand Sizes: .b, .w, .1

**Instruction Formats: NDXL** 

 $LOADG\ Gt,\ d(Rb,Rc*Sc)-indexed$ 



**Clock Cycles:** 

# LOADZ Rn,<ea>

### **Description:**

Load register Rt from source. The source value is zero extended to the machine width. Loading register r54, the stack canary placeholder, will cause a check trap if the value loaded is not equal to the current value of the stack canary register.

Supported Operand Sizes: .b, .w, .t, .o, .p, .n

| $Sz_3$ | Ext. | Operand      |
|--------|------|--------------|
| 0      | .b   | 8-bit Byte   |
| 1      | .W   | 16-bit Wyde  |
| 2      | .t   | 32-bit Tetra |
| 3      | .0   | 64-bit Octa  |
| 4      | .c   | 24-bit       |
| 5      | .p   | 40-bit       |
| 6      | .n   | 96-bit       |
| 7      |      | reserved     |

**Instruction Formats: NDXL** 

LOAD Rt, d(Rb,Rc\*Sc) – indexed



# STORE Ra,<ea>

### **Description:**

Store register Ra to destination.

Supported Operand Sizes: .b, .w, .t, .o, .p, .n

| Sz <sub>3</sub> | Ext. | Operand      |
|-----------------|------|--------------|
| 0               | .b   | 8-bit Byte   |
| 1               | .W   | 16-bit Wyde  |
| 2               | .t   | 32-bit Tetra |
| 3               | .0   | 64-bit Octa  |
| 4               | .c   | 24-bit       |
| 5               | .p   | 40-bit       |
| 6               | .n   | 96-bit       |
| 7               |      | group        |

**Instruction Formats: NDXS** 

STORE Ra, d(Rb, Rc\*Sc) – Indexed

|   |    |    |    |    |    |    |    |   |    |   |       |                |    |   | 7 5    |    |    |
|---|----|----|----|----|----|----|----|---|----|---|-------|----------------|----|---|--------|----|----|
| 0 | Vc | Ro | 26 | Sc | Vb | Sb | Rb | 6 | Sa | R | $a_6$ | $D_{\epsilon}$ | 50 | V | $Sz_3$ | 18 | 35 |

**Clock Cycles:** 

# STOREPTR Ra,<ea>

### **Description:**

Store a pointer contained in register Ra to destination.

**Instruction Formats: NDXS** 

STORE Ra, d(Rb, Rc\*Sc) – Indexed



# STOREG Gt,<ea>

### **Description:**

Store register group to destination. The destination is a 512 bit / 64 byte aligned region of memory.

| Gn | Registers  |
|----|------------|
| 0  | R0 to R4   |
| 1  | R5 to R9   |
| 2  | R10 to R14 |
| 3  | R15 to R19 |
| 4  | R20 to R24 |
| 5  | R25 to R29 |
| 6  | R30 to R34 |
| 7  | R35 to R39 |

| Gn | Registers          |
|----|--------------------|
| 8  | R40 to R44         |
| 9  | R45 to R49         |
| 10 | R50 to R54         |
| 11 | R55 to R59         |
| 12 | R60 to R64         |
| 13 | ASP, SSP, HSP, MSP |
|    |                    |
|    |                    |

Supported Operand Sizes: .b, .w, .1

**Instruction Formats: NDXS** 

STOREG Ga, d(Rb, Rc\*Sc) - Indexed



Notes:

Compare and Exchange

ATOM a0, "AAAAAA"

LOAD a0,[a3]

CMP t0,a0,a1

PEQ t0, "TTF"

STORE a2,[a3]

LDI a0,1

LDI a0,0

Load add and store:

ATOM "AAA" LOAD a0,[a2] ADD t0,a0,a1 STORE t0,[a2]

Load or and store

ATOM "AAA"

LOAD a0,[a2] OR t0,a0,a1 STORE t0,[a2]

Load and complement and store

ATOM "AAA"
LOAD a0,[a2]
AND t0,a0,~a1
STORE t0,[a2]

# STORE\_PAIR Rb, Rc, d[Ra]

### **Description:**

Store register pair to destination.

Supported Operand Sizes: .b, .w, .t, .o, .p, .n

| Sz <sub>3</sub> | Ext. | Operand      |
|-----------------|------|--------------|
| 0               | .b   | 8-bit Byte   |
| 1               | .W   | 16-bit Wyde  |
| 2               | .t   | 32-bit Tetra |
| 3               | .0   | 64-bit Octa  |
| 4               | .c   | 24-bit       |
| 5               | .p   | 40-bit       |
| 6               | .n   | 96-bit       |
| 7               |      | group        |

**Instruction Formats: dRa** 

STORE Rb, Rc, d(Ra)



**Clock Cycles:** 

# **Vector Specific Instructions**

### **V2BITS**

### **Description**

Convert Boolean vector to bits. A bit specified by Rb or an immediate of each vector element is copied to the bit corresponding to the vector element in the target register. The target register is a scalar register. Usually, Rb would be zero so that the least significant bit of the vector is copied.

A typical use is in moving the result of a vector compare operation into a mask register.

**Instruction Format: R2** 

V2BITS Rt, Ra, Rb - Register direct



V2BITS Rt, Ra, #bit – Register direct



#### **Operation**

For x = 0 to VL-1

Rt.bit[x] = Ra[x].bit[Rb]

Exceptions: none

#### **Example:**

cmp v1,v2,v3 ; compare vectors v2 and v3 v2bits m1,v1,#8 ; move NE status to bits in m1

vmask "11100000"

add v4,v5,v6; perform some masked vector operations

muls v7,v8,v9 add v7,v7,v4

# Cryptographic Accelerator Instructions

## **AES64DS – Final Round Decryption**

### **Description**:

Perform the final round of decryption for the AES standard. Registers Rb, Ra represent the entire AES state.

### **Integer Instruction Format: R3**

| 47  | 41 | 49 38 | 37 | 36 35 | 34 | 29         | 28 27  | 26 | 21    | 20 | 15    | 14 | 9              | 8 | 7 | 0      |
|-----|----|-------|----|-------|----|------------|--------|----|-------|----|-------|----|----------------|---|---|--------|
| 50h | 7  | $m_3$ | Z  | ~2    | ~  | <b>'</b> 6 | $Tb_2$ | R  | $b_6$ | R  | $a_6$ | R  | t <sub>6</sub> | v | 0 | $2h_8$ |

1 clock cycle / N clock cycles (N = vector length)

### **Operation:**

Rt = Ra & Rb

Exceptions: none

## **AES64DSM – Middle Round Decryption**

### **Description**:

Perform a middle round of decryption for the AES standard. Registers Rb, Ra represent the entire AES state.

#### **Integer Instruction Format: R3**

| 47 | 41               | 49 38 | 37 | 36 35 | 34 | 29         | 28 27  | 26 | 21    | 20 | 15    | 14 | 9     | 8 | 7  | 0     |
|----|------------------|-------|----|-------|----|------------|--------|----|-------|----|-------|----|-------|---|----|-------|
|    | 51h <sub>7</sub> | $m_3$ | Z  | ~2    | ~  | <b>'</b> 6 | $Tb_2$ | R  | $b_6$ | R  | $a_6$ | R  | $t_6$ | V | 02 | $h_8$ |

1 clock cycle / N clock cycles (N = vector length)

### **Operation:**

Rt = Ra & Rb

# **AES64ES – Final Round Encryption**

### **Description**:

Perform the final round of encryption for the AES standard. Registers Rb, Ra represent the entire AES state.

### **Integer Instruction Format: R3**

| 47  | 41         | 49 38 | 37 | 36 35 | 34 | 29 | 28 27  | 26 | 21    | 20 | 15    | 14 | 9     | 8 | 7  | 0               |
|-----|------------|-------|----|-------|----|----|--------|----|-------|----|-------|----|-------|---|----|-----------------|
| 521 | <b>1</b> 7 | $m_3$ | Z  | ~2    | ~  | 6  | $Tb_2$ | R  | $b_6$ | R  | $a_6$ | R  | $t_6$ | v | 02 | 2h <sub>8</sub> |

1 clock cycle / N clock cycles (N = vector length)

### **Operation:**

Rt = Ra & Rb

Exceptions: none

# **AES64ESM – Middle Round Encryption**

### **Description**:

Perform a middle round of encryption for the AES standard. Registers Rb, Ra represent the entire AES state.

### **Integer Instruction Format: R3**

| 47  | 41 | 49 38 | 37 | 36 35 | 34 | 29         | 28 27  | 26 | 21    | 20 | 15    | 14 | 9     | 8 | 7  | 0      |
|-----|----|-------|----|-------|----|------------|--------|----|-------|----|-------|----|-------|---|----|--------|
| 53h | 7  | $m_3$ | Z  | ~2    | '  | <b>'</b> 6 | $Tb_2$ | R  | $b_6$ | R  | $a_6$ | R  | $t_6$ | V | 02 | $2h_8$ |

1 clock cycle / N clock cycles (N = vector length)

### **Operation:**

Rt = Ra & Rb

### SHA256SIG0

### **Description:**

Implements the Sigma0 transformation function used in the SHA2-256 and SHA2-224 hash function. Only the low order 32 bits of Ra are operated on. The 32-bit result is sign extended to the machine width.

**Instruction Format:** R2

SHA256SIG0 Rt, Ra – Register direct



**Operation:** 

 $Rt = sign \ extend(ror32(Ra,7) \land ror32(Ra,18) \land (Ra_{32} >> 3))$ 

**Execution Units:** ALU #0

Exceptions: none

### SHA256SIG1

#### **Description:**

Implements the Sigma1 transformation function used in the SHA2-256 and SHA2-224 hash function. Only the low order 32 bits of Ra are operated on. The 32-bit result is sign extended to the machine width.

**Instruction Format:** R2

SHA256SIG1 Rt, Ra – Register direct



**Clock Cycles: 1** 

**Operation:** 

 $Rt = sign extend(ror32(Ra,17) \land ror32(Ra,19) \land (Ra_{32} >> 10))$ 

**Execution Units:** ALU #0

### SHA256SUM0

### **Description:**

Implements the Sum0 transformation function used in the SHA2-256 and SHA2-224 hash function. Only the low order 32 bits of Ra are operated on. The 32-bit result is sign extended to the machine width.

**Instruction Format:** R2

SHA256SUM0 Rt, Ra – Register direct

| 39 | 34 | 33 | 32 | 31 | 30 | 29      | 24 | 23 | 22 | 21 | 16    | 15 | 14 | 13              | 8 | 7 5 | 4 0 |  |
|----|----|----|----|----|----|---------|----|----|----|----|-------|----|----|-----------------|---|-----|-----|--|
| 58 | 36 | }  | 0  | 0  | 0  | $0_{6}$ |    | Va | Sa | R  | $a_6$ | Vt | St | $Rt_{\epsilon}$ | 5 | 63  | 25  |  |

**Clock Cycles: 1** 

**Operation:** 

 $Rt = sign \ extend(ror32(Ra,2) \land ror32(Ra,13) \land ror32(Ra,22))$ 

**Execution Units:** ALU #0

Exceptions: none

### SHA256SUM1

#### **Description:**

Implements the Sum1 transformation function used in the SHA2-256 and SHA2-224 hash function. Only the low order 32 bits of Ra are operated on. The 32-bit result is sign extended to the machine width.

**Instruction Format:** R2

SHA256SUM1 Rt, Ra – Register direct



### **Operation:**

 $Rt = sign extend(ror32(Ra,6) \land ror32(Ra,11) \land ror32(Ra,25))$ 

**Execution Units:** ALU #0

## SHA512SIG0

### **Description:**

Implements the Sigma0 transformation function used in the SHA2-512 hash function.

**Instruction Format:** R1

| 31 25            | 24 22 | 21 | 20 | 15    | 14 | 9     | 8 | 7 | 0      |
|------------------|-------|----|----|-------|----|-------|---|---|--------|
| 34h <sub>7</sub> | $m_3$ | Z  | R  | $a_6$ | R  | $t_6$ | v | 0 | $1h_8$ |

Clock Cycles: 1

### **Operation:**

$$Rt = ror64(Ra, 1) \land ror64(Ra, 8) \land (Ra >> 7)$$

**Execution Units:** ALU #0

Exceptions: none

### SHA512SIG1

### **Description:**

Implements the Sigma1 transformation function used in the SHA2-512 hash function.

**Instruction Format:** R1

| 31 | 25             | 24 22 | 21 | 20 | 15    | 14 | 9 | 8 | 7  | 0               |
|----|----------------|-------|----|----|-------|----|---|---|----|-----------------|
| 35 | h <sub>7</sub> | $m_3$ | Z  | R  | $a_6$ | Ri | 6 | v | 0. | lh <sub>8</sub> |

**Clock Cycles:** 1

### **Operation:**

$$Rt = ror64(Ra, 19) \land ror64(Ra, 61) \land (Ra >> 6)$$

**Execution Units:** ALU #0

# SHA512SUM0

Description:

Instruction Format: R1

| 31 25            | 24 22 | 21 | 20 | 15    | 14 | 9     | 8 | 7 | 0      |
|------------------|-------|----|----|-------|----|-------|---|---|--------|
| 36h <sub>7</sub> | $m_3$ | Z  | R  | $a_6$ | R  | $t_6$ | V | 0 | $1h_8$ |

## SHA512SUM1

Description:

Instruction Format: R1

| 31 | 25             | 24 22 | 21 | 20 | 15    | 14 | 9              | 8 | 7 | 0               |
|----|----------------|-------|----|----|-------|----|----------------|---|---|-----------------|
| 37 | h <sub>7</sub> | $m_3$ | Z  | R  | $a_6$ | R  | t <sub>6</sub> | v | 0 | 1h <sub>8</sub> |

## **SM3P0**

Description:

Instruction Format: R1

| 31 25            | 24 22          | 21 | 20 | 15 | 14 | 9              | 8 | 7  | 0              |
|------------------|----------------|----|----|----|----|----------------|---|----|----------------|
| 38h <sub>7</sub> | m <sub>3</sub> | Z  | Ra | 16 | Ri | t <sub>6</sub> | v | 01 | h <sub>8</sub> |

# SM3P1

Description:

Instruction Format: R1

| 31 25            | 24 22 | 21 | 20 | 15    | 14 | 9              | 8 | 7 | 0               |
|------------------|-------|----|----|-------|----|----------------|---|---|-----------------|
| 39h <sub>7</sub> | $m_3$ | Z  | R  | $a_6$ | R  | t <sub>6</sub> | v | 0 | 1h <sub>8</sub> |

# **SM4ED**

**Description:** 

**Instruction Format:** R3

|     |   |       |   |        |   |       | 28 27  |   |       |   |       |    |   |   |   |                 |
|-----|---|-------|---|--------|---|-------|--------|---|-------|---|-------|----|---|---|---|-----------------|
| 56h | 7 | $m_3$ | Z | $Tc_2$ | R | $c_6$ | $Tb_2$ | R | $b_6$ | R | $a_6$ | Rt | 6 | V | 0 | 2h <sub>8</sub> |

# SM4KS

**Description:** 

**Instruction Format:** R3

| 47  | 41 | 49 38 | 37 | 36 35  | 34 | 29 | 28 27  | 26 | 21                    | 20 | 15    | 14 | 9 | 8 | 7  | 0      |
|-----|----|-------|----|--------|----|----|--------|----|-----------------------|----|-------|----|---|---|----|--------|
| 57h | 17 | $m_3$ | Z  | $Tc_2$ | Ro | 26 | $Tb_2$ | Rł | <b>)</b> <sub>6</sub> | R  | $a_6$ | Rı | 6 | V | 02 | $2h_8$ |

# Modifiers

## **ATOM**

### **Description:**

Treat the following sequence of instructions as an "atom". Rt specifies the register results are to be written to.

Disable interrupts for the following instructions.

|                 | Mask Bit |                   |
|-----------------|----------|-------------------|
| -               | 0,1      | Instruction zero  |
| MA              | 2,3      | Instruction one   |
| _ 🗴             | 4,5      | Instruction two   |
| K Moo<br>Scope  | 6,7      | Instruction three |
| Modifier<br>ope | 8,9      | Instruction four  |
| ifie            | 10,11    | Instruction five  |
| r               | 12,13    | Instruction six   |
|                 | 14,15    | Instruction seven |

| Mask Bit | Meaning                         |
|----------|---------------------------------|
| 00       | No action                       |
| 01       | Disable interrupts              |
| 10       | Disable interrupts and lock bus |
| 11       | Reserved                        |

### **Instruction Format:**

| 39 | 34 | 3332 | 31  | 24               | 23  | 16              | 15 | 14 | 9 | 8 | 7 5             | 4 | 0 |
|----|----|------|-----|------------------|-----|-----------------|----|----|---|---|-----------------|---|---|
| 35 | 6  | ~2   | Imn | n <sub>158</sub> | Imi | m <sub>70</sub> | St | Rt | 6 | V | Sz <sub>3</sub> | 2 | 5 |

### **Assembler Syntax:**

### **Example:**

ATOM "LLLLAA"
LOAD a0,[a3]
CMP t0,a0,a1
PEQ t0,"TTF"
STORE a2,[a3]
LDI a0,1
LDI a0,0

ATOM "LLLL" LOAD a1,[a3] ADD t0,a0,a1 MOV a0,a1 STORE t0,[a3]

## **CARRY**

#### **Description:**

Apply the carry modifier to following instructions according to a bit mask. This modifier may be used to perform extended precision addition. It may also be used to retrieve the high order multiplier bits or the divide remainder. Note that carry input is not available for the first instruction under the modifier's shadow. Generating carry output for the eight instruction is discarded. Note that postfixes do not count as instructions.

|                      | Mask Bit |                   |
|----------------------|----------|-------------------|
|                      | 0,1      | Instruction zero  |
| Ca                   | 2,3      | Instruction one   |
| S                    | 4,5      | Instruction two   |
| ry Modifier<br>Scope | 6,7      | Instruction three |
| odi:                 | 8,9      | Instruction four  |
| fier                 | 10,11    | Instruction five  |
|                      | 12,13    | Instruction six   |
|                      | 14,15    | Instruction seven |

| Mask Bit | Letter | Meaning                             |
|----------|--------|-------------------------------------|
| 00       | N      | No carry in or out                  |
| 01       | I      | Use carry in                        |
| 10       | 0      | Generate carry out                  |
| 11       | C      | Use carry in and generate carry out |

#### **Instruction Format:**

| 39 | 34 | 33 | 32 | 31  | 16  | 15 | 14 | 13              | 8 | 7 5 | 4 | 0 |
|----|----|----|----|-----|-----|----|----|-----------------|---|-----|---|---|
| 33 | 6  | ~  | 0  | Imm | 150 | ~  | 0  | Rn <sub>6</sub> |   | ~3  | 2 | 5 |

### **Assembler Syntax:**

Specifying carry input / output capability for following instructions consists of a map using one of four characters: 'I' for input only, 'O' for output only, 'C' for both input and output and 'N' for neither input or output. A character is present in a string for each following instruction in sequence.

#### **Example:**

CARRY "OCCCCINN"; first generate carry out, second to fifth use carry in and out, sixth use carry in, seven and eight ignore carry. ADD r6,r3,r7 ; 'O' gen carry ; 'C' carry in and carry out ADD r6,r6,#1234 ; 'C' carry in and carry out ADD r6,r2,r1 ; 'C' carry in and carry out ADD r6,r6,#456 ; 'C' carry in and carry out ADD r7,r6,#456 ADD r8,r7,#987 ; 'I' carry in MUL r8,r9,r10 ; 'N' no carry in or out

## **VMASK**

### **Description:**

Apply the vector masking to following instructions according to a bit mask. Note that postfixes do not count as instructions.

|             | Mask Bit |                   |
|-------------|----------|-------------------|
| -           | 0 to 2   | Instruction zero  |
| MAS         | 3 to 5   | Instruction one   |
| SK          | 6 to 8   | Instruction two   |
| SK Modifier | 9 to 11  | Instruction three |
| odi         | 12 to 14 | Instruction four  |
| ifïe        | 15 to 17 | Instruction five  |
| r           | 18 to 20 | Instruction six   |
|             | 21 to 23 | Instruction seven |

### **Instruction Format:**

| 39 | 34                    | 33 | 32 | 31 | 8                  | 8 | 7 5 | 4 | 0  |
|----|-----------------------|----|----|----|--------------------|---|-----|---|----|
| 34 | <b>1</b> <sub>6</sub> | ~  | 1  |    | Imm <sub>230</sub> |   | ~3  | 2 | 25 |

### **Assembler Syntax:**

Specifying the mask register for following instructions consists of a map using single digit numeric characters between '0' and '7'. A character is present in a string for each following instruction in sequence.

### **Example:**

| VMASK "12345000 | "                    |
|-----------------|----------------------|
| ADD v6,v3,v7    | ; vector mask reg #1 |
| ADD v6,v6,#1234 | ; vector mask reg #2 |
| ADD v6,v2,v1    | ; vector mask reg #3 |
| ADD v6,v6,#456  | ; vector mask reg #4 |
| ADD v7,v6,#456  | ; vector mask reg #5 |
| ADD v8,v7,#987  | ; vector mask reg #0 |
| MUL v8,v9,v10   | ; vector mask reg #0 |

### **PRED**

### **Description:**

Apply the predicate to following instructions according to a bit mask. The predicate may be applied to a maximum of eight instructions. Note that postfixes do not count as instructions.

|                     | Mask Bit |                   |
|---------------------|----------|-------------------|
|                     | 0,1      | Instruction zero  |
| Pred                | 2,3      | Instruction one   |
| s ed ]              | 4,5      | Instruction two   |
| Mo<br>cop           | 6,7      | Instruction three |
| d Modifier<br>Scope | 8,9      | Instruction four  |
| ier                 | 10,11    | Instruction five  |
|                     | 12,13    | Instruction six   |
|                     | 14,15    | Instruction seven |

| Mask Bit | Meaning                            |
|----------|------------------------------------|
| 00       | Always execute (ignore predicate)  |
| 01       | Execute only if predicate is true  |
| 10       | Execute only if predicate is false |
| 11       | Always execute (ignore predicate)  |

### **Instruction Format:**

| 39 | 34 | 33 | 32 | 31  | 16               | 15 | 10    | 9  | 5               | 4 | 0 |
|----|----|----|----|-----|------------------|----|-------|----|-----------------|---|---|
| 32 | 26 | ٧  | 1  | Imn | n <sub>150</sub> | R  | $n_6$ | Co | nd <sub>5</sub> | 2 | 5 |

### **Assembler Syntax:**

The predicate condition is part of the mnemonic. 'PEQ' predicates logic if the equals flag in the register containing flags is set. Other conditions work in a similar fashion. After the instruction mnemonic the register containing the predicate flags is specified. Next a character string containing 'T' for True, 'F' for false, or 'I' for ignore for the next eight instructions is present.

### **Example:**

| PEQ r2,"TTTFFFII" | ; next three execute if true, three after execute if false, two after always execute |
|-------------------|--------------------------------------------------------------------------------------|
| MUL r3,r4,r5      | ; executes if True                                                                   |
| ADD r6,r3,r7      | ; executes if True                                                                   |
| ADD r6,r6,#1234   | ; executes if True                                                                   |
| DIV r3,r4,r5      | ; executes if FALSE                                                                  |
| ADD r6,r2,r1      | ; executes if FALSE                                                                  |
| ADD r6,r6,#456    | ; executes if FALSE                                                                  |
| MUL r8,r9,r10     | ; always executes                                                                    |

# **ROUND**

### **Description:**

Set the rounding mode for following instructions according to a bit mask. Note that postfixes do not count as instructions.

|                      | Mask Bit |                   |
|----------------------|----------|-------------------|
| ROU                  | 0 to 2   | Instruction zero  |
|                      | 3 to 5   | Instruction one   |
| <b>—</b>             | 6 to 8   | Instruction two   |
| ND Modifier<br>Scope | 9 to 11  | Instruction three |
|                      | 12 to 14 | Instruction four  |
| lifi                 | 15 to 17 | Instruction five  |
| er                   | 18 to 20 | Instruction six   |
|                      | 21 to 23 | Instruction seven |

### **Instruction Format:**

| 39 | 34 | 33 | 32 | 31 |                    | 8 | 7 5 | 4 | 0                     |
|----|----|----|----|----|--------------------|---|-----|---|-----------------------|
| 36 | 56 | ~  | 1  |    | Imm <sub>230</sub> |   | ~3  | 2 | <b>2</b> <sub>5</sub> |

**Assembler Syntax:** 

**Example:** 

## MPU Hardware

# PIC – Programmable Interrupt Controller

### **Overview**

The programmable interrupt controller manages interrupt sources in the system and presents an interrupt signal to the cpu. The PIC may be used in a multi-CPU system as a shared interrupt controller. The PIC can guide the interrupt to the specified core. If two interrupts occur at the same time the controller resolves which interrupt the cpu sees. While the CPU's interrupt input is only level sensitive the PIC may process interrupts that are either level or edge sensitive. the PIC is a 32-bit I/O device.

## **System Usage**

There is just a single interrupt controller in the system. It supports 31 different interrupt sources plus a non-maskable interrupt source.

The PIC is located at an address determined by BAR0 in the configuration space.

## **Priority Resolution**

Interrupts have a fixed priority relationship with interrupt #1 having the highest priority and interrupt #31 the lowest. Note that interrupt priorities are only effective when two interrupts occur at the same time.

# **Config Space**

A 256-byte config space is supported. Most of the config space is unused. The only configuration is for the I/O address of the register set.

| Regno  | Width | R/W | Moniker  | Description           |
|--------|-------|-----|----------|-----------------------|
| 000    | 32    | RO  | REG_ID   | Vendor and device ID  |
| 004    | 32    | R/W |          |                       |
| 008    | 32    | RO  |          |                       |
| 00C    | 32    | R/W |          |                       |
| 010    | 32    | R/W | REG_BAR0 | Base Address Register |
| 014    | 32    | R/W | REG_BAR1 | Base Address Register |
| 018    | 32    | R/W | REG_BAR2 | Base Address Register |
| 01C    | 32    | R/W | REG_BAR3 | Base Address Register |
| 020    | 32    | R/W | REG_BAR4 | Base Address Register |
| 024    | 32    | R/W | REG_BAR5 | Base Address Register |
| 028    | 32    | R/W |          |                       |
| 02C    | 32    | RO  |          | Subsystem ID          |
| 030    | 32    | R/W |          | Expansion ROM address |
| 034    | 32    | RO  |          |                       |
| 038    | 32    | R/W |          | Reserved              |
| 03C    | 32    | R/W |          | Interrupt             |
| 040 to | 32    | R/W |          | Capabilities area     |
| 0FF    |       |     |          |                       |

REG\_BAR0 defaults to \$FEE20001 which is used to specify the address of the controller's registers in the I/O address space.

The controller will respond with a memory size request of 0MB (0xFFFFFFF) when BAR0 is written with all ones. The controller contains its own dedicated memory and does not require memory allocated from the system.

#### **Parameters**

CFG\_BUS defaults to zero CFG\_DEVICE defaults to six CFG\_FUNC defaults to zero

Config parameters must be set correctly. CFG device and vendors default to zero.

## **Registers**

The PIC contains 40 registers spread out through a 256 byte I/O region. All registers are 32-bit and only 32-bit accessible. There are two different means to control interrupt sources. One is a set of registers that works with bit masks enabling control of multiple interrupt sources at the same time using single I/O accesses. The other is a set of control registers, one for each interrupt source, allowing control of interrupts on a source-by-source basis.

| Regno | Access | Moniker | Purpose                                                              |                                                                        |  |  |  |
|-------|--------|---------|----------------------------------------------------------------------|------------------------------------------------------------------------|--|--|--|
| 00    | R      | CAUSE   | interrupt cause                                                      | interrupt cause code for currently interrupting source                 |  |  |  |
| 04    | RW     | RE      | request enable,                                                      | request enable, a 1 bit indicates interrupt requesting is enabled for  |  |  |  |
|       |        |         | that interrupt, a 0 bit indicates the interrupt request is disabled. |                                                                        |  |  |  |
| 08    | W      | ID      | Disables interru                                                     | pt identified by low order five data bits.                             |  |  |  |
| OC    | W      | IE      | enables interrup                                                     | t identified by low order five data bits                               |  |  |  |
| 10    |        |         | reserved                                                             |                                                                        |  |  |  |
| 14    | W      | RSTE    |                                                                      | sense circuit for edge sensitive interrupts, 1 bit for                 |  |  |  |
|       |        |         |                                                                      | ource. This register has no effect on level sensitive                  |  |  |  |
|       |        |         |                                                                      | gister automatically resets to zero.                                   |  |  |  |
| 18    | W      | TRIG    |                                                                      | software trigger of the interrupt specified by the low order five data |  |  |  |
|       |        |         | bits.                                                                |                                                                        |  |  |  |
| 20    | W      | ESL     | The low bit for edge sensitivity selection. ESL and ESH combine to   |                                                                        |  |  |  |
|       |        |         |                                                                      | elect of the edge sensitivity.                                         |  |  |  |
|       |        |         | ESH,EHL                                                              | Sensitivity                                                            |  |  |  |
|       |        |         | 00                                                                   | level sensitive interrupt                                              |  |  |  |
|       |        |         | 01                                                                   | positive edge sensitive                                                |  |  |  |
|       |        |         | 10                                                                   | negative edge sensitive                                                |  |  |  |
|       |        |         | 11                                                                   | either edge sensitive                                                  |  |  |  |
| 24    | W      | ESH     | The high bit for                                                     | edge sensitivity selection                                             |  |  |  |
| 80    | RW     | CTRL0   | control register                                                     | for interrupt #0                                                       |  |  |  |
| 84    | RW     | CTRL1   | control register                                                     | for interrupt #1                                                       |  |  |  |
| •••   |        | •••     |                                                                      |                                                                        |  |  |  |
| FC    | RW     | CTRL31  | control register                                                     | for interrupt #31                                                      |  |  |  |

# **Control Register**

All the control registers are identical for all interrupt sources, so only the first control register is described here.

| Bits     |       |                                                                                                   |
|----------|-------|---------------------------------------------------------------------------------------------------|
| 0 to 7   | CAUSE | The cause code associated with the interrupt; this register is copied to the cause                |
|          |       | register when the interrupt is selected.                                                          |
| 8 to 10  | IRQ   | This register determines which signal lines of the cpu are activated for the                      |
|          |       | interrupt. Signal lines are typically used to resolve priority.                                   |
| 16       | IE    | This is the interrupt enable bit, 1 enables the interrupt, 0 disables it. This is the             |
|          |       | same bit reflected in the RE register.                                                            |
| 17       | ES    | This bit controls edge sensitivity for the interrupt $0 = \text{level}$ , $1 = \text{pos}$ . edge |
|          |       | sensitive. This same bit is present in the ESL register.                                          |
| 18       |       | reserved                                                                                          |
| 19       | IRQAR | Respond to an IRQ Ack cycle                                                                       |
| 20 to 23 |       | reserved                                                                                          |
| 24 to 29 | CORE  | Core number to select for interrupt processing                                                    |
| 30 to 31 |       | reserved                                                                                          |

# PIT – Programmable Interval Timer

## **Overview**

Many systems have at least one timer. The timing device may be built into the cpu, but it is frequently a separate component on its own. The programmable interval timer has many potential uses in the system. It can perform several different timing operations including pulse and waveform generation, along with measurements. While it is possible to manage timing events strictly through software it is quite challenging to perform in that manner. A hardware timer comes into play for the difficult to manage timing events. A hardware timer can supply precise timing. In the test system there are two groups of four timers. Timers are often grouped together in a single component. The PIT is a 64-bit peripheral. The PIT while powerful turns out to be one of the simpler peripherals in the system.

## **System Usage**

One programmable timer component, which may include up 32 timers, is used to generate the system time slice interrupt and timing controls for system garbage collection. The second timer component is used to aid the paged memory management unit. There are free timing channels on the second timer component.

Each PIT is given a 64kB-byte memory range to respond to for I/O access. As is typical for I/O devices part of the address range is not decoded to conserve hardware.

PIT#1 is located at \$FFFFFFFFEE4xxxx

PIT#2 is located at \$FFFFFFFFEE5xxxx

# **Config Space**

A 256-byte config space is supported. Most of the config space is unused. The only configuration is for the I/O address of the register set and the interrupt line used.

| Regno  | Width | R/W | Moniker  | Description           |
|--------|-------|-----|----------|-----------------------|
| 000    | 32    | RO  | REG_ID   | Vendor and device ID  |
| 004    | 32    | R/W |          |                       |
| 008    | 32    | RO  |          |                       |
| 00C    | 32    | R/W |          |                       |
| 010    | 32    | R/W | REG_BAR0 | Base Address Register |
| 014    | 32    | R/W | REG_BAR1 | Base Address Register |
| 018    | 32    | R/W | REG_BAR2 | Base Address Register |
| 01C    | 32    | R/W | REG_BAR3 | Base Address Register |
| 020    | 32    | R/W | REG_BAR4 | Base Address Register |
| 024    | 32    | R/W | REG_BAR5 | Base Address Register |
| 028    | 32    | R/W |          |                       |
| 02C    | 32    | RO  |          | Subsystem ID          |
| 030    | 32    | R/W |          | Expansion ROM address |
| 034    | 32    | RO  |          |                       |
| 038    | 32    | R/W |          | Reserved              |
| 03C    | 32    | R/W |          | Interrupt             |
| 040 to | 32    | R/W |          | Capabilities area     |

OFF |

REG\_BAR0 defaults to \$FEE40001 which is used to specify the address of the controller's registers in the I/O address space. Note for additional groups of timers the REG\_BAR0 must be changed to point to a different I/O address range. Note the core uses only bits determined by the address mask in the address range comparison. It is assumed that the I/O address select input, cs\_io, will have bits 24 and above in its decode and that a 64kB page is required for the device, matching the MMU page size.

The controller will respond with a mask of 0x00FF0000 when BAR0 is written with all ones.

#### **Parameters**

CFG\_BUS defaults to zero
CFG\_DEVICE defaults to four
CFG\_FUNC defaults to zero
CFG\_ADDR\_MASK defaults to 0x00FF0000
CFG\_IRQ\_LINE defaults to 29

Config parameters must be set correctly. CFG device and vendors default to zero.

### **Parameters**

NTIMER: This parameter controls the number of timers present. The default is eight. The maximum is 32.

BITS: This parameter controls the number of bits in the counters. The default is 48 bits. The maximum is 64.

PIT\_ADDR: This parameter sets the I/O address that the PIT responds to. The default is \$FEE40001.

PIT\_ADDR\_ALLOC: This parameter determines which bits of the address are significant during decoding. The default is \$00FF0000 for an allocation of 64kB. To compute the address range allocation required, 'or' the value from the register with \$FF000000, complement it then add 1.

## **Registers**

The PIT has 134 registers addressed as 64-bit I/O cells. It occupies 2048 consecutive I/O locations. All registers are read-write except for the current counts which are read-only. All registers all 64-bit accessible; all 64 bits must be read or written. Values written to registers do not take effect until the synchronization register is written.

Note the core may be configured to implement fewer timers in which case timers that are not implemented will read as zero and ignore writes. The core may also be configured to support fewer bits per count register in which case the unimplemented bits will read as zero and ignore writes.

| Regno     | Access | Moniker | Purpose                                      |
|-----------|--------|---------|----------------------------------------------|
| 00        | R      | CC0     | Current Count                                |
| 08        | RW     | MC0     | Max count                                    |
| 10        | RW     | OT0     | On Time                                      |
| 18        | RW     | CTRL0   | Control                                      |
| 20 to 7F8 |        |         | Groups of four registers for timer #1 to #63 |
| 800       | RW     | USTAT   | Underflow status                             |
| 808       | RZW    | SYNC    | Synchronization register                     |
| 810       | RW     | IE      | Interrupt enable                             |
| 818       | RW     | TMP     | Temporary register                           |
| 820       | RO     | OSTAT   | Output status                                |
| 828       | RW     | GATE    | Gate register                                |
| 830       | RZW    | GATEON  | Gate on register                             |
| 838       | RZW    | GATEOFF | Gate off register                            |

### **Control Register**

This register contains bits controlling the overall operation of the timer.

| Bit     |    | Purpose                                                                                   |
|---------|----|-------------------------------------------------------------------------------------------|
| 0       | LD | setting this bit will load max count into current count, this bit automatically resets to |
|         |    | zero.                                                                                     |
| 1       | CE | count enable, if 1 counting will be enabled, if 0 counting is disabled and the current    |
|         |    | count register holds its value. On counter underflow this bit will be reset to zero       |
|         |    | causing the count to halt unless auto-reload is set.                                      |
| 2       | AR | auto-reload, if 1 the max count will automatically be reloaded into the current count     |
|         |    | register when it underflows.                                                              |
| 3       | XC | external clock, if 1 the counter is clocked by an external clock source. The external     |
|         |    | clock source must be of lower frequency than the clock supplied to the PIT. The PIT       |
|         |    | contains edge detectors on the external clock source and counting occurs on the           |
|         |    | detection of a positive edge on the clock source.                                         |
|         |    | This bit is forced to 0 for timers 4 to 31.                                               |
| 4       | GE | gating enable, if 1 an external gate signal will also be required to be active high for   |
|         |    | the counter to count, otherwise if 0 the external gate is ignored. Gating the counter     |
|         |    | using the external gate may allow pulse-width measurement. This bit is forced to 0        |
|         |    | for timers 4 to 31.                                                                       |
| 5 to 63 | ~  | not used, reserved                                                                        |
|         |    |                                                                                           |

#### **Current Count**

This register reflects the current count value for the timer. The value in this register will change by counting downwards whenever a count signal is active. The current count may be automatically reloaded at underflow if the auto reload bit (bit #2) of the control byte is set. The current count may also be force loaded to the max count by setting the load bit (bit #0) of the counter control byte.

### Max Count

This register holds onto the maximum count for the timer. It is loaded by software and otherwise does not change. When the counter underflows the current count may be automatically reloaded from the max count register.

#### On Time

The on-time register determines the output pulse width of the timer. The timer output is low until the on-time value is reached, at which point the timer output switches high. The timer output remains high until the counter reaches zero at which point the timer output is reset back to zero. So, the on time reflects the length of time the timer output is high. The timer output is low for max count minus the on-time clock cycles.

#### **Underflow Status**

The underflow status register contains a record of which timers underflowed.

Writing the underflow register clears the underflows and disable further interrupts where bits are set in the incoming data. Interrupt processing should read the underflow register to determine which timers underflowed, then write back the value to the underflow register.

### Synchronization Register

The synchronization register allows all the timers to be updated simultaneously. Values written to timer registers do not take effect until the synchronization register is written. The synchronization register must be written with a '1' bit in the bit position corresponding to the timer to update. For instance, writing all one's to the sync register will cause all timers to be updated. The synchronization register is write-only and reads as zero.

### Interrupt Enable Register

Each bit of the interrupt enable register enables the interrupt for the corresponding timer. Interrupts must also be globally enabled by the interrupt enable bit in the config space for interrupts to occur. A '1' bit enables the interrupt, a '0' bit value disables it.

### **Temporary Register**

This is merely a register that may be used to hold values temporarily.

### **Output Status**

The output status register reflects the current status of the timers output (high or low). This register is read-only.

### Gate Register

The internal gate register is used to temporarily halt or resume counting for the timer corresponding to the bit position of this register. Writing a value to this register will turn on all timers where there is a '1' bit in the value and turn off all timers where there is a '0' bit in the value.

### Gate On Register

The internal gate 'on' register is used to resume counting for the timer corresponding to the bit position of this register. Writing a value to this register will turn on all timers where there is a '1' bit in the value. Where there is a '0' in the value the timer will not be affected. This register reads as zero.

### Gate Off Register

The internal gate 'off' register is used to halt counting for the timer corresponding to the bit position of this register. Writing a value to this register will turn off all timers where there is a '1' bit in the value. Where there is a '0' in the value the timer will not be affected. This register reads as zero.

## **Programming**

The PIT is a memory mapped i/o device. The PIT is programmed using 64-bit load and store instructions (LDO and STO). Byte loads and stores (LDB, STB) may be used for control register access. It must reside in the non-cached address space of the system.

## **Interrupts**

The core is configured use interrupt signal #29 by default. This may be changed with the CFG\_IRQ\_LINE parameter. Interrupts may be globally disabled by writing the interrupt disable

bit in the config space with a '1'. Individual interrupts may be enabled or disabled by the setting of the interrupt enable register in the I/O space.