## **CESC16** Documentation

#### WHAT IS CESC16?

CESC16 is the 16 bit version of the CESC architecture (Competent but Extremely Simple Computer), a homebrew breadboard computer I designed previously.

The greatest limitation of the old design was the 256 instruction limit caused by the 8 bit architecture. I could have kept the 8 bit ALU and expanded just the MAR, but I wanted to keep things simple and having to mix 8 bit and 16 bit values made instructions very slow.

This iteration was meant to be built on a PCB from the start, so I could afford doing major changes that wouldn't have been possible on a breadboard:

- The entire architecture is 16 bit wide: this makes jumps very fast (since the address can be copied all at once) and avoids having to use slow dword operations for big numbers.
- The register file has been expanded greatly, now having 14 GPRs instead of 4.

You can find the GitHub repository for the original CESC architecture at <a href="mailto:github.com/p-rivero/CESCA">github.com/p-rivero/CESCA</a>.

### **FEATURES:**

- Simple but powerful RISC architecture without pipelining, inspired by MIPS and RISC-V
  - Most instructions take <u>3 or 4 clock cycles</u> (see "Cycles" on instruction table below)
  - Native stack capabilities to speed up push/pop and call/return instructions
  - Supports signed and unsigned comparisons for conditional jumping
  - Supports hardware interrupts from up to 4 GPIO ports (each with a dedicated IRQ line)
  - Like other homebrew CPUs, its microcode is easy to reprogram to make new instructions
- 17? MHz clock rate + 7?-?? Hz variable speed 555 timer + manual clock pulse mode
- 16 bit architecture: ALU, Register File and Memory Addresses are all 16 bit wide.
- 256 kB ROM + 128 kB RAM\*: 64k instructions (32 bit wide) + 64k\* data words (16 bit wide)
  - ROM is split in two 16 bit banks, so each address contains 32 bits of data
  - \* 512 bytes (256 words) of RAM are reserved for controlling the MMIO.
- <u>14 general purpose registers</u> (GPRs): 5 temporary, 5 safe, 3 arguments, 1 return value <u>2 special purpose registers</u>: Zero constant, Stack Pointer
  - All of them can be used as operands and/or destination for ALU/Memory instructions
  - The destination and operands can all be different registers (like in MIPS and RISC-V)
- CPU created entirely with discrete logic chips (74 series); I/O implemented with 3 Arduino Nano.
  - **Input:** Arduino translates the <u>PS/2 keyboard</u> input to ASCII and causes an interrupt.
  - Output: Arduino outputs video signal for a <u>VGA terminal</u>. More info <u>in this video</u>.
     <u>Specs of the terminal video output</u>:
    - Displays 25 lines with 40 text characters each, on an 8x8 pixel font
    - Resolution of 320x480 pixels @ 60Hz
  - Input+Output: An additional Arduino allows UART communication with another computer.

Read DOCS/Input.pdf and DOCS/Output.pdf to learn how to get input and send output.

[work in progress]

### **INSTRUCTION FORMAT:**

XXOOO FFF DDDD AAAA Argument (16 bits)

X: Don't care

Possible arguments:

XXXXXXXXXXXBBBB: rB (2nd operand register)
IIIIIIIIIIIIIIII: Immediate 16 bit value
@@@@@@@@@@@@@@@@@: Immediate 16 bit address

## **INSTRUCTION TABLE:**

|         | Mnemonic                    | Machine code                        | Cycles |
|---------|-----------------------------|-------------------------------------|--------|
| ALU:    | ALU rD, rA, rB              | X0000FFFDDDDAAAA XXXXXXXXXXXXBBBB   | 3      |
|         | ALU rD, rA, Imm16           | X0001FFFDDDDAAAA IIIIIIIIIIIII      | 3      |
|         | ALU rD, rA, [Addr16]        | X0010FFFDDDDAAAA @@@@@@@@@@@@@@@@   | 4      |
|         | ALU rD, rA, [rB]            | X0011FFFDDDDAAAA XXXXXXXXXXXBBBB    | 4      |
|         | ALU [Addr16], rA            | X0100FFF0000AAAA @@@@@@@@@@@@@@@@   | 4      |
|         | ALU [rA], rB                | X0101FFF0000AAAA XXXXXXXXXXXXBBBB   | 4      |
|         | sll rD, rA, Imm4            | X011IIIDDDDAAAA XXXXXXXXXXXXXXXX    | Imm+1  |
|         | srl rD, rA, Imm4            | X100IIIIDDDDAAAA XXXXXXXXXXXXXXXX   | Imm+1  |
|         | sra rD, rA, Imm4            | X101IIIIDDDDAAAA XXXXXXXXXXXXXXXXX  | Imm+1  |
|         | <pre>lw rD, Imm16(rA)</pre> | X1100000DDDDAAAA IIIIIIIIIIIII      | 3      |
|         | lb rD, Imm16(rA)            | X1100001DDDDAAAA IIIIIIIIIIIII      | 3      |
|         | sw rB, Imm16(rA)            | X1100010BBBBAAAA IIIIIIIIIIIII      | 4      |
|         | swap rD, Imm16(rA)          | X1100011DDDDAAAA IIIIIIIIIIII       | 5      |
| Memory: | peek rD, Imm16(rA), W       | X110010WDDDDAAAA IIIIIIIIIIIII      | 3      |
|         | push rB                     | X1100110XXXX0001 XXXXXXXXXXXBBBB    | 3      |
|         | pop rD                      | X1100111DDDD0001 XXXXXXXXXXXXXXXXX  | 3      |
|         | pushf                       | X1111010XXXX0001 XXXXXXXXXXXXXXXX   | 3      |
|         | popf                        | X1111011XXXX0001 XXXXXXXXXXXXXXXXX  | 3      |
| Jumps:  | JMP Addr16                  | X1101FFFXXXXXXXX @@@@@@@@@@@@@@@@@  | 2      |
|         | JMP rA                      | X1110FFFXXXXAAAA XXXXXXXXXXXXXXXXXX | 2      |
|         | call Addr16                 | X1111000XXXX0001 @@@@@@@@@@@@@@@@   | 4      |
|         | ret                         | X1111001XXXX0001 XXXXXXXXXXXXXXXX   | 3      |

## **Additional comments:**

- ALU means any mnemonic from the "ALU Operations" table below
- JMP means any mnemonic from the "Jump Conditions" table below
- For more in-depth information about all the instructions, their macros and how to use them, please read <a href="DOCS/Instructions.pdf">DOCS/Instructions.pdf</a>

# **ALU Operations:**

| FFF | Mnemonic     | Pseudocode  | Description / observations                                            |
|-----|--------------|-------------|-----------------------------------------------------------------------|
| 000 | mov D, A     | D = A       | Moves the contents of A into D (without updating the flags!).         |
| 001 | and D, A, B  | D = A&B     | Performs a bitwise logic AND between A and B.                         |
| 010 | or D, A, B   | D = A B     | Performs a bitwise logic OR between A and B.                          |
| 011 | xor D, A, B  | D = A^B     | Performs a bitwise logic XOR between A and B.                         |
| 100 | add D, A, B  | D = A+B     | Adds the contents of A and B.                                         |
| 101 | sub D, A, B  | D = A-B     | Subtracts the contents of A and B.                                    |
| 110 | addc D, A, B | D = A+B+C   | Add with Carry: Add 1 to result if last operation caused carry (C).   |
| 111 | subb D, A, B | D = A-B-1+C | Subtract with Borrow: Subtract 1 to result if last op. caused borrow. |

# **Jump Conditions:**

| FFF | Mnemonic | Pseudocode | Description / observations                                                    |
|-----|----------|------------|-------------------------------------------------------------------------------|
| 000 | jmp      | 1          | Jump unconditionally: always jumps.                                           |
| 001 | jz       | Z          | Jump if Zero: jumps if the Z flag is set.                                     |
| 010 | jnz      | ! Z        | Jump if Not Zero: jumps if the Z flag is not set.                             |
| 011 | jc       | С          | Jump if Carry: jumps if the C flag is set.                                    |
| 100 | jnc      | ! C        | Jump if Not Carry: jumps if the C flagis not set.                             |
| 101 | jleu     | !C   Z     | Jump if Less or Equal (Unsigned): after a sub D, A, B op., jumps if A <= B.   |
| 110 | jlt      | V ^ S      | Jump if Less Than (signed): after a sub D, A, B operation, jumps if A < B.    |
| 111 | jle      | (V^S)   Z  | Jump if Less or Equal (signed): after sub D, A, B operation, jumps if A <= B. |

## Flags:

| Flag name |          | Meaning of active flag (Flag=1)                                                       |
|-----------|----------|---------------------------------------------------------------------------------------|
| Z         | Zero     | The result of the last ALU operation was exactly 0x0000.                              |
| С         | Carry    | The last add operation caused carry OR the last sub operation did not cause borrow.   |
| ٧         | 0verflow | The last add or sub operation caused a signed overflow.                               |
| S         | Sign bit | The result of the last ALU operation was negative when interpreted in 2's complement. |

#### **REGISTERS / ABI:**

| Machine code | Assembler name | Description                                                                                                                                                                                      |
|--------------|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0000         | zero           | Hardwired to 0x0000 (read only). Useful for comparing registers without saving the result and for loading/storing to absolute addresses.                                                         |
| 0001         | sp             | Stack pointer. Starts at 0xFEFF and a subroutine must decrement it in order to store local variables to the stack. The push and pop instructions handle the sp automatically. *                  |
| 0010 - 0110  | t0 - t4        | Temporary registers that can be wiped by a called subroutine.                                                                                                                                    |
| 0111         | v0             | Subroutine <u>return value</u> . Otherwise it works as a regular temporary register that can be wiped by subroutines.                                                                            |
| 1000 - 1010  | a0 - a2        | Used for passing <u>arguments</u> to a called subroutine (the subroutine can wipe their content). Otherwise they work as regular temporary registers.                                            |
| 1011 - 1111  | s0 - s4        | <u>Safe registers</u> whose contents get preserved in a subroutine call. The called subroutine is responsible for pushing and then restoring the values of the safe registers it's going to use. |

- \* The implemented stack is a *full stack*: sp points at the actual data, not the next free space. If a subroutine plans to use, say, 5 words (note that words are 16 bits wide) for local variables, it can perform sub sp, sp, 5 at the beginning and access them with sw/lw t0, offset(sp). At the end of the subroutine it must free this space with add sp, sp, 5.
  - If you don't need access to all variables at the same time (you just need to push and pop individual registers), it's faster to use the push and pop instructions, since those automatically increase / decrease the stack pointer.
  - Flags <u>are not</u> preserved in a subroutine call. The caller can save its flags to the stack by using the pushf instruction before performing the call, and restore them afterwards using popf.

#### Rules for declaring labels:

- All strings of text without spaces and ended with a colon (:) are labels.
- Labels point at the instruction that comes after and can be used the same way as constants.
- Labels that start with a dot (.) are <u>local</u> (sublabels of the last global label). More than 1 dot can be used to add nested levels of sublabels.
- Labels can't be called "zero", "sp" or any other name that conflicts with a register.
- In general, all other rules of customasm apply: <u>Customasm User Guide</u>

#### **MEMORY MAP:**

Instruction fetching and the peek instruction access the 32-bit wide **Program Memory**.

The rest of the memory instructions (including stack operations and call/ret) access the 16-bit wide **Data Memory**.

Both memory spaces have a 16 bit address (64k words).



| Data             | memory (RAM)  |
|------------------|---------------|
| 64k              | x 16 bit      |
| 0×0000           | Data          |
| SP-1             |               |
| SP               | Top of Stack  |
| 0xFEFF           | Stack origin  |
| 0xFF00<br>0xFF3F | PS/2 Keyboard |
| 0xFF40<br>0xFF7F | VGA Terminal  |
| 0xFF80<br>0xFFBF | I/O device 2  |
| 0xFFC0<br>0xFFFF | I/O device 3  |

#### **OPERATING SYSTEM:**

Since CESC16 is an extremely simple architecture, <u>all code runs at kernel level</u>. Therefore an OS isn't really needed in terms of security, but it's still useful to have a collection of commonly used subroutines and a safe hardware entry point.

The current OS draft includes:

- Startup code for resetting the CPU correctly and calling the user code (and then terminating user code execution safely), as well as handling interrupts.
- <u>Math library</u> with commonly used functions: 16 and 32 bit multiplication and division/modulus (more to be added later: exponentiation, square root, trigonometric functions).
- <u>Input and Output libraries</u> for writing hardware-agnostic code with I/O. It also implements a sort of pseudo-interrupts that allow the user to call the OS utilities directly.
- <u>Utilities and debug tools</u>: Startup menu that allows the user to launch a specific program (if the ROM contains more than one) and a RAM/ROM content inspector. **[work in progress]**

System subroutines can be called with the regular call instruction. However, in order to increase code clarity, syscall can be used as an alias instead (the programmer is expected to use syscall exclusively for calling OS subroutines).

A full list of the available system calls can be found in <a href="DOCS/syscalls.md">DOCS/syscalls.md</a> [work in progress]

#### **ROOM FOR IMPROVEMENT:**

The goal of CESC16 is to be a powerful architecture while still being as simple as possible. This causes it to be in a weird spot, since the lack of advanced features means it's far from a real consumer CPU, but its chip count and PCB cost is much higher than most homebrew CPUs.

#### To be a real consumer CPU it would need:

- Better I/O support (more than 4 devices), and being able to mask interrupt sources.
- Actual pipelining (as close to 1 cycle per instruction as possible).
- Probably hardware multiplication/division, since it's an extremely common operation.

#### To greatly reduce its chip count it would need:

- 8 bit architecture would be fast enough, as long as it has 16 bit addresses.
- No physical registers (use addressing modes to save all variables in SRAM). Having many physical registers (like in MIPS and RISC-V) only makes sense if it allows you to implement pipelining.

Embracing either a minimalistic 8 bit architecture (allowing a faster clock) or a pipelined complex architecture would result in much faster code execution than my current design.

#### So, why did I choose this approach?

The reason is simple: having fun. My objective is to make writing assembly code for my computer as fun as it can possibly be. On a homebrew CPU speed is not the most important factor.

- Compared with hardware-efficient ISAs like the 6502 or PIC, instructions for MIPS and RISC-V are much more intuitive, powerful and fun to work with. That's why I took inspiration from them, even if this meant I had to use more chips.
- If you need an integer greater than 255, writing several instructions in order to do a dword operation just isn't as fun to program as with native 16 bit instructions.
- Having pipelining and better I/O would be great, but it doesn't affect programming that much.
- Registers in memory may be very efficient, but RAM is a black box and you can't wire LEDs to specific addresses. When I build a homebrew CPU I want to be able to see all my registers at the same time (however, I did end up implementing addressing modes for completeness).

#### Why do you use Arduinos in a homebrew CPU?

I understand why some people are against using microcontrollers in homebrew computers. After all, a single Arduino Nano is more powerful than the entire CPU.

However, everyone can decide what they want to allow in their design. I decided that peripherals are not part of the CPU and therefore are allowed to use microcontrollers. After all, even if I managed to implement VGA and PS/2 controllers with 74 series chips, the monitor and keyboard themselves have microcontrollers inside, so the design still wouldn't be completely "microcontroller free".

#### **Closing words**

I hope that reading this documentation has given you ideas for your own builds, just as reading from others was what inspired me to create my own. Remember that the aim of a homebrew computer isn't to create the perfect and most efficient design, just one that is new and unique, and most importantly one that you enjoy and feel proud of.

If you have questions or any suggestion, you can send me a message on Reddit or an email.

Want more homebrew CPUs? Here are some links (download the PDF to be able to click them):

- MUPS/16 CPU
- James Sharman Making an 8 Bit pipelined CPU
- Slu4 Minimalistic Breadboard 8-Bit CPU
- nand2tetris: Part 1, Part 2