Skip to content

paulkberger/K16-CPU

Repository files navigation

K16 CPU

A 16-bit discrete logic CPU with 24-bit addressing, built from approximately 81 TTL chips, 6 ALU/control ROMs, and 13 chips of external glue/memory.

Overview

The K16 is a homebrew CPU designed around ROM-based lookup tables for both ALU operations and instruction decoding. Rather than using traditional hardwired logic, the K16 leverages high-density flash ROMs to implement complex functionality while keeping the chip count reasonable.

Key Specifications:

  • 16-bit data bus
  • 24-bit address bus (16MB flat memory space)
  • Little-endian byte ordering
  • Hybrid ROM/Adder ALU architecture
  • 8-level priority interrupt system (74LS148 encoder)
  • Target clock speed: 5–10 MHz

Architecture Highlights

Hybrid ROM-Based ALU

The ALU uses 4× SST39SF040 flash ROMs (512KB each) for the data path, plus 2× SST39SF040 ROMs for control signal generation — 6 ROMs total in the ALU/control module. Each data ROM is a nibble-wide slice receiving:

  • 4-bit ALU-A input (D0–D3, T16, T8-5, $0000, $0002, $FFFE, $FFFF)
  • 4-bit ALU-B input (XY[Mem], X0–X3, Y0–Y3, PCHi, PCLo, T8-5)
  • 7-bit instruction opcode (Opcode + Mode)
  • 4-bit microcode step counter

ROM outputs feed into 4× 74x283 TTL adders for carry propagation. This hybrid approach gives the speed benefits of hardware addition while allowing arbitrary ALU functions through ROM programming — a 4-bit ALU can perform any 4-input/4-output function by reprogramming the lookup table.

Register Set

Registers Width Description
D0–D3 16-bit Data registers
X0–X3 16-bit Index registers (low address)
Y0–Y3 8-bit Page registers (high address)
XY0–XY3 24-bit Combined index pairs
PC 24-bit Program counter
SP 24-bit Stack pointer (XY3)

Memory Map

Range Size Description
$00_0000 – $00_FFFF 64KB Page 00: Zero Page & Stack
$01_0000 – $1F_FFFF ~2MB RAM (currently installed)
$20_0000 – $BF_FFFF 10MB RAM (expansion space)
$C0_0000 – $DF_FFFF 2MB I/O Space
$E0_0000 – $EF_FFFF 1MB ROM: Lookup Tables (Bank 1)
$F0_0000 – $FB_FFFF 768KB ROM: Lookup Tables (Bank 2)
$FC_0000 – $FE_FFFF 192KB ROM: Program Code
$FF_0000 – $FF_FFFF 64KB ROM: Boot Code & Reset Vector

Reset Vector: CPU starts execution at $FF_0000

I/O Addresses:

  • $DE_0000: Keyboard input
  • $DF_0000: Terminal output

Lookup Tables

The K16 extends the ROM-based philosophy to complex operations via dedicated lookup table memory. Operations like shifts, rotates, byte swaps, and multiplication use 64K-word lookup tables accessed in 3 cycles:

SHL D0          ; Shift left via lookup (3 cycles)
MULB D1         ; 8×8 multiply via lookup
RECIP D2        ; Reciprocal approximation

The ALU calculates the table address (D+D for word alignment), with the carry bit selecting odd/even pages. This achieves fast complex operations without dedicated shifter or multiplier hardware.

Interrupts

Eight priority-encoded interrupt levels (IRQ0–IRQ7) using a 74LS148 priority encoder:

  • Automatic PC and Status Register save to stack
  • 15-cycle interrupt entry, 7-cycle return
  • Interrupt level captured in saved SR bits 6:4
  • Nested interrupt support via separate flag register banks
EINT            ; Enable interrupts
DINT            ; Disable interrupts
RTI             ; Return from interrupt (restores PC + flags)

Stack Operations

Four independent 24-bit stack pointers (XY0–XY3) with flexible push/pop:

PUSH D0, XY3        ; Push single register
PUSH D, XY3         ; Push all data registers (D0–D3)
PUSH XY2, XY3       ; Push 24-bit address pair
POP D, XY3          ; Pop all data registers

Instruction Set

Category Instructions
Load LOADI, LOADD, LOADX, LOADY, LOADB, LOADXY, LOADP, LOADPB, LOADZ, LOADZB
Store STORED, STOREX, STOREY, STOREB, STOREI, STOREXY, STOREP, STOREPB, STOREZ, STOREZB
Move MOVE, SWAP
Arithmetic ADD, ADC, SUB, SBC, NEG, INC, DEC
Logical AND, OR, XOR, NOT
Shift/Rotate SHL, SHR, ASR, ROL, ROR, SWAPB, HIGH, LOW, SHL4, SHR4, ASR4, ASR8, MULB, RECIP, LOOKUP
Address LEA (24-bit effective address calculation)
Compare CMP
Conditional Set SEQ, SNE, SCS/SHS, SCC/SLO, SLT, SGT, SGE, SLE (branchless conditionals)
Branch BEQ, BNE, BCS/BHS, BCC/BLO, BLT, BGT, BGE, BLE, BRA
Jump JMP, JMP24, JMP16, JMPT, JMPXY
Subroutine CALL, CALL24, CALL16, CALLR, CALLXY, RET
Syscall TRAP (software interrupt via vector table)
Stack PUSH, POP (supports D, X, Y, XY, D group, immediate)
Control NOP, HALT, DINT, EINT, RTI

Cycle Counts

Instruction Cycles Notes
NOP/HALT 2 Control
LOADI 2 Immediate
LOADD/X/Y 2–4 Depends on addressing mode
LOADP/PB, LOADZ/ZB 4–5 16-bit absolute with bank
STORED/X/Y 3–4 Depends on addressing mode
STOREP/PB, STOREZ/ZB 4–5 16-bit absolute with bank
ADD/SUB/AND/OR/XOR 3–4 ALU operations
NOT 4 All modes
NEG 3 Two's complement negate
CMP 3 All modes
SHL/SHR/ROL/ROR 3 Lookup table operations
MULB/RECIP 3 Lookup-based multiply/reciprocal
LEA 5–6 24-bit address calculation
Scc 4 Conditional set
Bcc 3–4 Short/long branch
JMP 2–4 Various modes
CALL 11–12 Subroutine call
CALLXY 10 Indirect call via XY register
TRAP 12 Software syscall
RET 5 Return
PUSH/POP 4–14 Single to group operations
INT 15 Interrupt entry
RTI 7 Interrupt return

Design Philosophy

The K16 prioritises:

  1. Minimal chip count (~94 chips total including ROMs) without sacrificing capability
  2. Flexibility via ROM-based microcode — personality changes without rewiring
  3. Modern amenities like 24-bit addressing and priority interrupts
  4. Practical performance targeting the 68000 class

The ISA is essentially complete: 8 spare opcode/mode slots remain and are deliberately left unused. TRAP and CALLXY were the final two opcodes added; NEG sits at opcode $00 mode $11.

Software Stack

K16 Pascal Compiler

A Pascal compiler targeting the K16, ported from PASTA/80. Implements the V2 ABI calling convention (arguments in D0/D1/D2, result in D0, XY2 as frame pointer, callee-saved D2/D3/XY2). In active development; tested against a 291-test suite.

K16 Forth

A complete Forth implementation running natively on the K16:

  • Indirect Threaded Code (ITC) interpreter
  • 17-cycle inner interpreter with sentinel-based execution
  • 102+ built-in words
  • Uses XY1 as IP, XY2 as data stack, XY3 as return stack
  • MULB-based fast multiplication
: SQUARE DUP * ;
: CUBE DUP SQUARE * ;
10 CUBE .    \ prints 1000

K16 BASIC

K16 BASIC v2.2 — a BASIC interpreter running natively on the K16.

k/OS

A preemptive multitasking operating system in design. Planned features include FAT16 filesystem, syscall via TRAP, 8-level priority interrupts, and 2KB stack per task.

Current Status

  • ISA complete and verified — all opcodes implemented including NOT, NEG, TRAP, CALLXY, RET, LOADZ/LOADZB, STOREZ/STOREZB. 8 spare slots intentionally unused.
  • Hardware design validated in Digital simulator (hneemann)
  • Microcode generator and assembler implemented in Free Pascal/Lazarus
  • K16Pascal compiler — in active development; 291-test suite, V2 ABI + all optimisation phases
  • K16 Forth v2.24 complete with 102+ words
  • K16 BASIC v2.2 complete
  • K16 Emulator complete — Lazarus GUI with register display, disassembly panel, memory inspector, video output (TPaintBox), live MHz speed display; CPU runs on a dedicated thread
  • FPGA implementation: Sipeed Tang Console 138K (Gowin GW5AST-138B) acquired and ready; Verilog implementation pending. Bring-up order planned: CPU core → PSRAM → USB-UART → SD card → HDMI. Implementation will be bit-exact to the simulator.
  • k/OS in design

License

This project is open source. See LICENSE for details.

About

The Kiama K16 is a 16-bit discrete logic CPU with 24-bit addressing (16MB), featuring ROM-based ALU lookup tables, four data registers (D0-D3), four 24-bit index pairs (XY0-XY3), and 8-level priority interrupts. Built from 74-series logic chips with custom microcode ROMs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages