A 16-bit discrete logic CPU with 24-bit addressing, built from approximately 81 TTL chips, 6 ALU/control ROMs, and 13 chips of external glue/memory.
The K16 is a homebrew CPU designed around ROM-based lookup tables for both ALU operations and instruction decoding. Rather than using traditional hardwired logic, the K16 leverages high-density flash ROMs to implement complex functionality while keeping the chip count reasonable.
Key Specifications:
- 16-bit data bus
- 24-bit address bus (16MB flat memory space)
- Little-endian byte ordering
- Hybrid ROM/Adder ALU architecture
- 8-level priority interrupt system (74LS148 encoder)
- Target clock speed: 5–10 MHz
The ALU uses 4× SST39SF040 flash ROMs (512KB each) for the data path, plus 2× SST39SF040 ROMs for control signal generation — 6 ROMs total in the ALU/control module. Each data ROM is a nibble-wide slice receiving:
- 4-bit ALU-A input (D0–D3, T16, T8-5, $0000, $0002, $FFFE, $FFFF)
- 4-bit ALU-B input (XY[Mem], X0–X3, Y0–Y3, PCHi, PCLo, T8-5)
- 7-bit instruction opcode (Opcode + Mode)
- 4-bit microcode step counter
ROM outputs feed into 4× 74x283 TTL adders for carry propagation. This hybrid approach gives the speed benefits of hardware addition while allowing arbitrary ALU functions through ROM programming — a 4-bit ALU can perform any 4-input/4-output function by reprogramming the lookup table.
| Registers | Width | Description |
|---|---|---|
| D0–D3 | 16-bit | Data registers |
| X0–X3 | 16-bit | Index registers (low address) |
| Y0–Y3 | 8-bit | Page registers (high address) |
| XY0–XY3 | 24-bit | Combined index pairs |
| PC | 24-bit | Program counter |
| SP | 24-bit | Stack pointer (XY3) |
| Range | Size | Description |
|---|---|---|
| $00_0000 – $00_FFFF | 64KB | Page 00: Zero Page & Stack |
| $01_0000 – $1F_FFFF | ~2MB | RAM (currently installed) |
| $20_0000 – $BF_FFFF | 10MB | RAM (expansion space) |
| $C0_0000 – $DF_FFFF | 2MB | I/O Space |
| $E0_0000 – $EF_FFFF | 1MB | ROM: Lookup Tables (Bank 1) |
| $F0_0000 – $FB_FFFF | 768KB | ROM: Lookup Tables (Bank 2) |
| $FC_0000 – $FE_FFFF | 192KB | ROM: Program Code |
| $FF_0000 – $FF_FFFF | 64KB | ROM: Boot Code & Reset Vector |
Reset Vector: CPU starts execution at $FF_0000
I/O Addresses:
- $DE_0000: Keyboard input
- $DF_0000: Terminal output
The K16 extends the ROM-based philosophy to complex operations via dedicated lookup table memory. Operations like shifts, rotates, byte swaps, and multiplication use 64K-word lookup tables accessed in 3 cycles:
SHL D0 ; Shift left via lookup (3 cycles)
MULB D1 ; 8×8 multiply via lookup
RECIP D2 ; Reciprocal approximationThe ALU calculates the table address (D+D for word alignment), with the carry bit selecting odd/even pages. This achieves fast complex operations without dedicated shifter or multiplier hardware.
Eight priority-encoded interrupt levels (IRQ0–IRQ7) using a 74LS148 priority encoder:
- Automatic PC and Status Register save to stack
- 15-cycle interrupt entry, 7-cycle return
- Interrupt level captured in saved SR bits 6:4
- Nested interrupt support via separate flag register banks
EINT ; Enable interrupts
DINT ; Disable interrupts
RTI ; Return from interrupt (restores PC + flags)Four independent 24-bit stack pointers (XY0–XY3) with flexible push/pop:
PUSH D0, XY3 ; Push single register
PUSH D, XY3 ; Push all data registers (D0–D3)
PUSH XY2, XY3 ; Push 24-bit address pair
POP D, XY3 ; Pop all data registers| Category | Instructions |
|---|---|
| Load | LOADI, LOADD, LOADX, LOADY, LOADB, LOADXY, LOADP, LOADPB, LOADZ, LOADZB |
| Store | STORED, STOREX, STOREY, STOREB, STOREI, STOREXY, STOREP, STOREPB, STOREZ, STOREZB |
| Move | MOVE, SWAP |
| Arithmetic | ADD, ADC, SUB, SBC, NEG, INC, DEC |
| Logical | AND, OR, XOR, NOT |
| Shift/Rotate | SHL, SHR, ASR, ROL, ROR, SWAPB, HIGH, LOW, SHL4, SHR4, ASR4, ASR8, MULB, RECIP, LOOKUP |
| Address | LEA (24-bit effective address calculation) |
| Compare | CMP |
| Conditional Set | SEQ, SNE, SCS/SHS, SCC/SLO, SLT, SGT, SGE, SLE (branchless conditionals) |
| Branch | BEQ, BNE, BCS/BHS, BCC/BLO, BLT, BGT, BGE, BLE, BRA |
| Jump | JMP, JMP24, JMP16, JMPT, JMPXY |
| Subroutine | CALL, CALL24, CALL16, CALLR, CALLXY, RET |
| Syscall | TRAP (software interrupt via vector table) |
| Stack | PUSH, POP (supports D, X, Y, XY, D group, immediate) |
| Control | NOP, HALT, DINT, EINT, RTI |
| Instruction | Cycles | Notes |
|---|---|---|
| NOP/HALT | 2 | Control |
| LOADI | 2 | Immediate |
| LOADD/X/Y | 2–4 | Depends on addressing mode |
| LOADP/PB, LOADZ/ZB | 4–5 | 16-bit absolute with bank |
| STORED/X/Y | 3–4 | Depends on addressing mode |
| STOREP/PB, STOREZ/ZB | 4–5 | 16-bit absolute with bank |
| ADD/SUB/AND/OR/XOR | 3–4 | ALU operations |
| NOT | 4 | All modes |
| NEG | 3 | Two's complement negate |
| CMP | 3 | All modes |
| SHL/SHR/ROL/ROR | 3 | Lookup table operations |
| MULB/RECIP | 3 | Lookup-based multiply/reciprocal |
| LEA | 5–6 | 24-bit address calculation |
| Scc | 4 | Conditional set |
| Bcc | 3–4 | Short/long branch |
| JMP | 2–4 | Various modes |
| CALL | 11–12 | Subroutine call |
| CALLXY | 10 | Indirect call via XY register |
| TRAP | 12 | Software syscall |
| RET | 5 | Return |
| PUSH/POP | 4–14 | Single to group operations |
| INT | 15 | Interrupt entry |
| RTI | 7 | Interrupt return |
The K16 prioritises:
- Minimal chip count (~94 chips total including ROMs) without sacrificing capability
- Flexibility via ROM-based microcode — personality changes without rewiring
- Modern amenities like 24-bit addressing and priority interrupts
- Practical performance targeting the 68000 class
The ISA is essentially complete: 8 spare opcode/mode slots remain and are deliberately left unused. TRAP and CALLXY were the final two opcodes added; NEG sits at opcode $00 mode $11.
A Pascal compiler targeting the K16, ported from PASTA/80. Implements the V2 ABI calling convention (arguments in D0/D1/D2, result in D0, XY2 as frame pointer, callee-saved D2/D3/XY2). In active development; tested against a 291-test suite.
A complete Forth implementation running natively on the K16:
- Indirect Threaded Code (ITC) interpreter
- 17-cycle inner interpreter with sentinel-based execution
- 102+ built-in words
- Uses XY1 as IP, XY2 as data stack, XY3 as return stack
- MULB-based fast multiplication
: SQUARE DUP * ;
: CUBE DUP SQUARE * ;
10 CUBE . \ prints 1000K16 BASIC v2.2 — a BASIC interpreter running natively on the K16.
A preemptive multitasking operating system in design. Planned features include FAT16 filesystem, syscall via TRAP, 8-level priority interrupts, and 2KB stack per task.
- ISA complete and verified — all opcodes implemented including NOT, NEG, TRAP, CALLXY, RET, LOADZ/LOADZB, STOREZ/STOREZB. 8 spare slots intentionally unused.
- Hardware design validated in Digital simulator (hneemann)
- Microcode generator and assembler implemented in Free Pascal/Lazarus
- K16Pascal compiler — in active development; 291-test suite, V2 ABI + all optimisation phases
- K16 Forth v2.24 complete with 102+ words
- K16 BASIC v2.2 complete
- K16 Emulator complete — Lazarus GUI with register display, disassembly panel, memory inspector, video output (TPaintBox), live MHz speed display; CPU runs on a dedicated thread
- FPGA implementation: Sipeed Tang Console 138K (Gowin GW5AST-138B) acquired and ready; Verilog implementation pending. Bring-up order planned: CPU core → PSRAM → USB-UART → SD card → HDMI. Implementation will be bit-exact to the simulator.
- k/OS in design
This project is open source. See LICENSE for details.