## **CESC16** Documentation

#### WHAT IS CESC16?

CESC16 is the 16 bit version of the CESC architecture (Competent but Extremely Simple Computer), a homebrew breadboard computer I designed previously.

The greatest limitation of the old design was the 256 instruction limit caused by the 8 bit architecture. I could have kept the 8 bit ALU and expanded just the MAR, but I wanted to keep things simple and having to mix 8 bit and 16 bit values made instructions very slow.

This iteration was meant to be built on a PCB from the start, so I could afford doing major changes that wouldn't have been possible on a breadboard:

- The entire architecture is 16 bit wide: this makes jumps very fast (since the address can be copied all at once) and avoids having to use slow dword operations for big numbers.
- The register file has been expanded greatly, now having 14 GPRs instead of 4.

You can find the GitHub repository for the original CESC architecture at github.com/p-rivero/CESCA.

#### **FEATURES:**

- Simple but powerful RISC architecture without pipelining, inspired by MIPS and RISC-V
  - Most instructions take 3 or 4 clock cycles (see "Cycles" on instruction table below)
  - Native stack capabilities to speed up push/pop and call/return instructions
  - Supports signed and unsigned comparisons for conditional jumping
  - Even though it's a RISC architecture, <u>direct</u>, <u>indirect</u> and <u>indexed</u> addressing modes are supported for *all ALU operations*, using a sort of x86 Intel syntax.
  - Supports hardware interrupts from up to 4 GPIO ports (each with a dedicated IRQ line)
  - Like other homebrew CPUs, its microcode is easy to reprogram to make new instructions
- 1?? MHz clock rate + 0.72 Hz 480 Hz variable speed 555 timer + manual clock pulse mode
- 16 bit architecture: ALU, Register File and Memory Addresses are all 16 bit wide.
- 256 kB ROM + 128 kB RAM\*: 64k instructions (32 bit wide) + 64k\* data words (16 bit wide)
  - ROM is split in two 16 bit banks, so each address contains 32 bits of data
  - \* 512 bytes (256 words) of RAM are reserved for controlling the MMIO.
- <u>14 general purpose registers</u> (GPRs): 5 temporary, 5 safe, 3 arguments, 1 return value <u>2 special purpose registers</u>: Zero constant, Stack Pointer
  - All of them can be used as operands and/or destination for ALU/Memory instructions
  - The destination and operands can all be different registers (like in MIPS and RISC-V)
- CPU created entirely with discrete logic chips (74 series); I/O implemented with 3 Arduino Nano.
  - **Input:** Arduino translates the <u>PS/2 keyboard</u> input to ASCII and causes an interrupt.
  - **Output:** Arduino outputs video signal for a <u>VGA terminal</u>. More info <u>in this video</u>.
    - Displays 25 lines with 40 text characters each, on an 8x8 pixel font
    - Resolution of 320x480 pixels @ 60Hz, 4 bit color (16 colors)
  - Input+Output: An additional Arduino allows UART communication with another computer.

Read <u>DOCS/Input.pdf</u> and <u>DOCS/Output.pdf</u> to learn how to get input and send output. [in progress]

### **INSTRUCTION FORMAT:**

0000m FFF DDDD AAAA Argument (16 bits)

m: Opcode modifier (Addressing mode): 0=Register, 1=Immediate

**D**: rD (Destination register) **A**: rA (1st operand register)

X: Don't care

Possible arguments:

XXXXXXXXXXBBBB: rB (2nd operand register)

IIIIIIIIIIIIiii: Either of the previous 2 options (depending on m bit)

### **INSTRUCTION TABLE:**

|         | Mnemonic                                                                                                  | Machine code                                                                                                                                                                                                    | Cycles                                     |
|---------|-----------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------|
| ALU:    | ALU rD, rA, rB/Imm16 sll rD, rA, Imm4 srl rD, rA, Imm4 sra rD, rA, Imm4 ALU rD, rA, [mode] ALU [mode], rA | 0000mFFFDDDDAAAA IIIIIIIIIIiiii<br>0001IIIDDDDAAAA XXXXXXXXXXXXXXX<br>0010IIIIDDDDAAAA XXXXXXXXXXXXXX<br>0011IIIDDDDAAAA XXXXXXXXXXXXXXX<br>010mmFFFDDDDAAAA IIIIIIIIIIiiii<br>011mmFFFBBBBAAAA IIIIIIIIIIiiiii | 3<br>Imm+1<br>Imm+1<br>Imm+1<br>4/5<br>4/5 |
| Memory: | <pre>swap rD, [rA+Imm16] peek rD, [rA+Imm16], W push rB/Imm16 pushf pop rD popf</pre>                     | 1000001DDDDAAAA IIIIIIIIIIIIIIIII 100001WDDDDAAAA IIIIIIIIIIIIII 1000010mXXXX0001 IIIIIIIIIIIIiii 10000110XXXX0001 XXXXXXXXXXXXXXX 10000111DDDD0001 XXXXXXXXXXXXXXXX 10001000XXXX0001 XXXXXXXXXX                | 5<br>3<br>3<br>3<br>3                      |
| Jumps:  | JMP rA/Imm16  call rB/Imm16  syscall rB/Imm16  enter rB/Imm16  ret  sysret  exit                          | 110mFFFFXXXXAAAA IIIIIIIIIIIIIIIIIIIIIIIIIII                                                                                                                                                                    | 2<br>4<br>4<br>4<br>3<br>3<br>3            |

## **Additional comments:**

- ALU means any mnemonic from the "ALU Operations" table below
- JMP means any mnemonic from the "Jump Conditions" table below
- Read <u>DOCS/Instructions.pdf</u> to learn how to use each instruction, the available addressing modes and macros, and more in-depth information in general

# Flags:

| Flag name |          | Meaning of active flag (Flag=1)                                                       |
|-----------|----------|---------------------------------------------------------------------------------------|
| Z         | Zero     | The result of the last ALU operation was exactly 0x0000.                              |
| С         | Carry    | The last add or sub operation caused an unsigned overflow (carry or borrow).          |
| ٧         | Overflow | The last add or sub operation caused a signed overflow.                               |
| S         | Sign bit | The result of the last ALU operation was negative when interpreted in 2's complement. |

# **Jump Conditions:**

| FFFF | Mnemonic(s)       | Pseudocode   | Description                                                              |
|------|-------------------|--------------|--------------------------------------------------------------------------|
| 0000 | jmp               | 1            | Jump unconditionally (always jumps)                                      |
| 0001 | jz<br>je          | Z            | Jump if Zero<br>Jump if Equal                                            |
| 0010 | jnz<br>jne        | ! Z          | Jump if Not Zero<br>Jump if Not Equal                                    |
| 0011 | jc<br>jb<br>jnae  | С            | Jump if Carry Jump if Below (unsigned <) Jump if Not Above or Equal      |
| 0100 | jnc<br>jae<br>jnb | !C           | Jump if Not Carry Jump if Above or Equal (unsigned >=) Jump if Not Below |
| 0101 | jo                | V            | Jump if Overflow                                                         |
| 0110 | jno               | !V           | Jump if Not Overflow                                                     |
| 0111 | js                | S            | Jump if Sign (MSB = 1)                                                   |
| 1000 | jns               | !\$          | Jump if Not Sign (MSB = 0)                                               |
| 1001 | jbe<br>jna        | C    Z       | Jump if Below or Equal (unsigned <=) Jump if Not Above                   |
| 1010 | ja<br>jnbe        | !C && !Z     | Jump if Above (unsigned >) Jump if Not Below or Equal                    |
| 1011 | jl<br>jnge        | V != S       | Jump if Less (signed <) Jump if Not Greater or Equal                     |
| 1100 | jle<br>jng        | (V!=S)    Z  | Jump if Less or Equal (signed <=) Jump if Not Greater                    |
| 1101 | jg<br>jnle        | (V==S) && !Z | Jump if Greater (signed >) Jump if Not Less or Equal                     |
| 1110 | jge<br>jnl        | V == S       | Jump if Greater or Equal (signed >=) Jump if Not Less                    |
| 1111 | -                 | -            | Unused                                                                   |

## **ALU Operations:**

| FFF | Mnemonic     | Pseudocode | Description / observations                                            |
|-----|--------------|------------|-----------------------------------------------------------------------|
| 000 | mov D, A     | D = A      | Moves the contents of A into D (without updating the flags!).         |
| 001 | and D, A, B  | D = A&B    | Performs a bitwise logic AND between A and B.                         |
| 010 | or D, A, B   | D = A   B  | Performs a bitwise logic OR between A and B.                          |
| 011 | xor D, A, B  | D = A^B    | Performs a bitwise logic XOR between A and B.                         |
| 100 | add D, A, B  | D = A+B    | Adds the contents of A and B.                                         |
| 101 | sub D, A, B  | D = A-B    | Subtracts the contents of A and B.                                    |
| 110 | addc D, A, B | D = A+B+C  | Add with Carry: Add 1 to result if last operation caused carry.       |
| 111 | subb D, A, B | D = A-B-C  | Subtract with Borrow: Subtract 1 to result if last op. caused borrow. |

## **REGISTERS / ABI:**

| Machine code | Assembler name | Description                                                                                                                                                                                      |
|--------------|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0000         | zero           | Hardwired to 0x0000 (read only). Useful for comparing registers without saving the result and for loading/storing to absolute addresses.                                                         |
| 0001         | sp             | Stack pointer. Starts at 0xFEFF and a subroutine must decrement it in order to store local variables to the stack. The push and pop instructions handle the sp automatically. *                  |
| 0010 - 0110  | t0 - t4        | Temporary registers that can be wiped by a called subroutine.                                                                                                                                    |
| 0111         | v0             | Subroutine <u>return value</u> . Otherwise it works as a regular temporary register that can be wiped by subroutines.                                                                            |
| 1000 - 1010  | a0 - a2        | Used for passing <u>arguments</u> to a called subroutine (the subroutine can wipe their content). Otherwise they work as regular temporary registers.                                            |
| 1011 - 1111  | s0 - s4        | <u>Safe registers</u> whose contents get preserved in a subroutine call. The called subroutine is responsible for pushing and then restoring the values of the safe registers it's going to use. |

<sup>\*</sup> The implemented stack is a *full stack*: sp points at the actual data, not the next free space.

If a subroutine plans to use, say, 5 words (note that words are 16 bits wide) for local variables, it can perform sub sp, sp, 5 at the beginning and access them with mov t0, [sp+offset]. At the end of the subroutine it must free this space with add sp, sp, 5.

- If you don't need access to all variables at the same time (you just need to push and pop individual registers), it's faster to use the push and pop instructions, since those automatically increase / decrease the stack pointer.
- Flags <u>are not</u> preserved in a subroutine call. The caller can save its flags to the stack by using the pushf instruction before performing the call, and restore them afterwards using popf.

## Rules for declaring labels:

- Labels can't be called "zero", "sp" or any other name that conflicts with a register.
- In general, all other rules of customasm apply: <u>Customasm User Guide</u>

#### **MEMORY MAP:**

There are 2 separate memory spaces, both indexed with a 16-bit address (64k words). <u>Code can be fetched and executed from either of them</u>\*.

- **Program Memory (ROM, 32 bit wide):** Cannot be written, and its contents are always available. By default, instructions are fetched from ROM. The peek instruction always accesses ROM.
- **Data Memory (RAM, 16 bit wide):** Its contents are undefined after boot (must be constructed at runtime). The rest of the memory instructions (including stack operations) always access RAM.

| 64k    | Program memory (ROM)<br>x 32 bit  |
|--------|-----------------------------------|
| 0x0000 |                                   |
|        | 0S                                |
|        |                                   |
|        | User Programs* &<br>Lookup Tables |
|        |                                   |
| 0xFFFF |                                   |

| Data             | memory (RAM)            |
|------------------|-------------------------|
| 64k              | x 16 bit                |
| 0×0000           | User Progams*<br>& Data |
| SP-1             |                         |
| SP               | Top of Stack            |
| 0xFEFF           | Stack origin            |
| 0xFF00<br>0xFF3F | PS/2 Keyboard           |
| 0xFF40<br>0xFF7F | VGA Terminal            |
| 0xFF80<br>0xFFBF | 16 bit Timer            |
| 0xFFC0<br>0xFFFF | Serial port             |

<sup>\*</sup> Instructions can be stored in <u>1 32-bit word</u> of ROM, or in <u>2 16-bit words</u> of RAM (using big-endian format: the opcode goes first, followed by the argument).

#### **Program flow:**

Execution starts at address 0x0000 of ROM. The OS handles startup and then calls user code (also stored in ROM). The user code can do one of 2 things:

- Stay in ROM, using RAM just as storage (Harvard Architecture).
- Construct a program in RAM, either taking user input (assembler) or not (self-modifying code), and then <u>call it</u> using the enter instruction.
  - The program in RAM can call subroutines in ROM using the syscall instruction. Those ROM subroutines <u>must</u> finish with a sysret instruction. If they don't (<u>like the OS routines!</u>), then they must be called using the CALL\_GATE routine (read the "Calling system routines" section below).
  - Once the program in RAM finishes, it can return control to the ROM program that called it by using the exit instruction.

Once the main user program (in ROM) finishes, it can return control to the OS using a regular retinstruction. The OS can then halt or restart the computer, open a ROM monitor, etc.

## **OPERATING SYSTEM:**

Since CESC16 is an extremely simple architecture, <u>all code runs at kernel level</u>. Therefore, an OS isn't really needed in terms of security, but it's still useful to have a collection of commonly used subroutines and a safe hardware entry point.

The current OS draft includes:

- <u>Startup code</u> for resetting the CPU correctly and calling the user code (and then terminating user code execution safely), as well as handling interrupts.
- <u>Math library</u> with commonly used functions: 16 and 32 bit multiplication and division/modulus (more to be added later: exponentiation, square root, trigonometric functions).
- <u>Input and Output libraries</u> for writing hardware-agnostic code with I/O. They also allow the user to call the OS utilities directly using interrupts.
- <u>Utilities and debug tools</u>: Startup menu that allows the user to launch a specific program (if the ROM contains more than one) and a RAM/ROM monitor. **[work in progress]**

## **Calling system routines:**

Programs that are stored in RAM must use syscall to call an OS routine (this way the program jumps to ROM instead of a random position in RAM). However, all OS routines assume they are called from ROM and will end with a regular ret instruction (therefore, program execution will stay in ROM)!

In order to ensure that the call returns safely to RAM, instead of calling the routine directly, the program must store the address it wishes to call in v0, and then perform: syscall CALL\_GATE.

The code of CALL\_GATE is shown below:

CALL\_GATE:
call v0
sysret

Programs that are stored in ROM can call OS routines directly using the regular call instruction. However, in order to increase code clarity, syscall should be used instead (when used from ROM, it behaves like a regular call). Do not use CALL\_GATE on programs that are being fetched from ROM.

A full list of the available system calls can be found in <a href="DOCS/syscalls.md">DOCS/syscalls.md</a> [work in progress]

#### **ROOM FOR IMPROVEMENT:**

The goal of CESC16 is to be a powerful architecture while still being as simple as possible. This causes it to be in a weird spot, since the lack of advanced features means it's far from a real consumer CPU, but its chip count and PCB cost is much higher than most homebrew CPUs.

#### To be a real consumer CPU it would need:

- Better I/O support (more than 4 devices), and being able to mask interrupt sources.
- Actual pipelining (as close to 1 cycle per instruction as possible).
- Hardware multiplication/division, since it's an extremely common operation.

#### To greatly reduce its chip count it would need:

- 8 bit architecture would be fast enough, as long as it has 16 bit addresses.
- No physical registers (use addressing modes to save all variables in SRAM). Having many physical registers (like in MIPS and RISC-V) only makes sense if it allows you to implement pipelining.

Embracing either a minimalistic 8 bit architecture (allowing a faster clock) or a pipelined complex architecture would result in much faster code execution than my current design.

### So, why did I choose this approach?

The reason is simple: having fun. My objective is to make writing assembly code for my computer as fun as it can possibly be. On a homebrew CPU speed is not the most important factor.

- Compared with hardware-efficient ISAs like the 6502 or PIC, instructions for MIPS and RISC-V are much more intuitive, powerful and fun to work with. That's why I took inspiration from them, even if this meant I had to use more chips.
- If you need an integer greater than 255, writing several instructions in order to do a dword operation just isn't as fun to program as with native 16 bit instructions.
- Having pipelining and better I/O would be great, but it doesn't affect programming that much.
- Registers in memory may be very efficient, but RAM is a black box and you can't wire LEDs to it. When I build a homebrew CPU I want to be able to see *all* my registers at the same time (however, I did end up implementing some Intel-style x86 addressing modes for completeness).

## Why do you use Arduinos in a homebrew CPU?

I understand why some people are against using microcontrollers in homebrew computers. After all, a single Arduino Nano is more powerful than the entire CPU.

However, everyone can decide what they want to allow in their design. I decided that peripherals are not part of the CPU and therefore are allowed to use microcontrollers. After all, even if I managed to implement VGA and PS/2 controllers with 74 series chips, the screen and keyboard themselves have microcontrollers inside, so the design still wouldn't be completely "microcontroller free".

#### **Closing words**

I hope that reading this documentation has given you ideas for your own builds, just as reading from others was what inspired me to create my own. Remember that the aim of a homebrew computer isn't to create the perfect and most efficient design, just one that is new and unique, and most importantly one that you enjoy and feel proud of.

If you have questions or any suggestion, you can send me a message on Reddit or an email.

Want more homebrew CPUs? Here are some links (download the PDF to be able to click them):

- MUPS/16 CPU
- James Sharman Making an 8 Bit pipelined CPU
- Slu4 Minimalistic Breadboard 8-Bit CPU
- nand2tetris: Part 1, Part 2