### Import and set-up

In [None]:
'''
This module defines a CPU dataclass that emulates a simple virtual machine (VM) for experimenting with and implementing a pseudo-assembly language.

PC is advanced before executing each instruction (pre-increment). Relative jumps are computed from this post-fetch PC.

CPU PROPERTIES:
    - mem: emulates RAM. default size is 32 int cells; override via CPU(mem=[0]*N) on construction.
    - acc: emulates accumulator. holds intermediate arithmetic results.
    - idx: emulates index register. points to a memory cell for indexed operations.
    - pc: emulates program counter. tracks the next instruction to execute.
    - halted: defaults to False. "HALT" sets halted to true; subsequent step() is a no-op and run() stops. 

PROGRAM INPUT:
    - A program: a list of instructions executed sequentially (with jumps for control flow). 
    - Instruction format: either (op: str) or (op: str, int).
    - Instructions that take an operand require an int (enforced at runtime).
    - load_program() resets registers (acc, idx, pc, halted) but preserves RAM contents.

CPU CLASS METHODS
    - cpu.load_program(program): resets registers and loads a new program (memory is preserved unless overwritten)
    - cpu.dump(): returns a dictionary snapshot of the CPU state (PC, ACC, IDX, HALTED, and first 8 memory cells)
    - cpu.step(): executes exactly one instruction (used internally by run, but callable for tracing)
    - cpu.run(max_steps=100): executes until HALT or a step limit; returns the number of steps executed.
    
STEP() ORGANIZATION
    - step() defines helper functions to validate operands and enforce addressing rules for specific operations:    
        - No helper: LOADI, ADDI
        - _get_address(): used by LOAD_ABS, STORE_ABS, ADD_ABS (validates absolute memory address)
        - _check_idx(): used by LOAD_IDX, STORE_IDX, ADD_IDX (ensures IDX register points to valid memory)
        - _get_target_index(): used by INC_IDX, DEC_IDX, LOAD_IDX_OFF, STORE_IDX_OFF, ADD_IDX_OFF, LOADI_IDX (computes IDX+offset or new IDX)
        - _get_target_step(): used by JUMP_ABS and JUMP_REL (computes and validates new program counter)
    - Error handling
        - Invalid opcodes raise NotImplementedError
        - Bad/missing operands raise TypeError/ValueError
        - Bad memory/PC/IDX targets raise IndexError
    
TO-DO FOR NEW OPERATIONS
    [] If an op requires args[0], add it to the "needs_args" set
    [] Update or create helper functions to validate arguments, memory bounds, or control-flow targets as needed
    [] Keep this specification updated whenever new instructions are added    

PSEUDO-ASSEMBLY LANGUAGE    
-Operation naming conventions:
    - Suffix indicates addressing mode (I, _ABS, _IDX, _IDX_OFF)
        - I: immediate value (uses the literal argument)
        - *_ABS: operates on a fixed (absolute) memory cell F[addr]
        - *_IDX operates on a memory cell pointed to by the index register
        - *_IDX_OFF operates on the memory cell at index register + signed offset
    - Verb indicates the operation (e.g., LOAD, STORE, ADD, JUMP, HALT)
    - Supported ops: LOADI, LOADI_IDX, LOAD_ABS, LOAD_IDX, LOAD_IDX_OFF, STORE_ABS, STORE_IDX, STORE_IDX_OFF, ADDI, ADD_ABS, ADD_IDX, ADD_IDX_OFF, INC_IDX, DEC_IDX, JUMP_ABS, JUMP_REL, HALT.
    - JUMP_REL k sets PC := PC_after_fetch + k; JUMP_ABS sets PC := k

FUTURE EXTENSIONS
- Flags & Status Register
  - Add NZCV flags (Zero, Negative, Carry, Overflow); update on arithmetic/loads.
  - Expose read-only status via cpu.dump().

- Comparisons & Conditional Branches
  - CMP_ABS / CMP_IDX / CMPI: set flags based on ACC − operand.
  - JZ/JNZ/JN/JP/JC/JNC/JO/JNO: branch based on flags.

- Stack & Subroutines
  - Add SP (stack pointer) and RAM region for stack.
  - PUSH/POP (ACC and/or arbitrary values).
  - CALL addr (push return PC), RET (pop to PC).

- Memory & Addressing Modes
  - Configurable RAM size; optional segmented memory or regions (code/data/stack).
  - Indirect addressing: LOAD_IND addr (ACC ← mem[mem[addr]]).
  - Base+offset form: LOAD_BASE base_abs, off (ACC ← mem[base+off]).
  - Auto-inc/dec variants: LOAD_IDX_POSTINC, STORE_IDX_PREDEC, etc.

- Arithmetic & Bitwise
  - SUB*, MUL*, DIV*, MOD* variants (immediate/ABS/IDX/IDX_OFF).
  - SHL/SHR (logical), SAR (arith), ROL/ROR; AND/OR/XOR/NOT.

- I/O & “Syscalls”
  - IN port → ACC; OUT port ← ACC (simulate ports or a device map).
  - Simple syscall mechanism: SYS imm (dispatch to Python host callbacks).

- Tooling & Debugging
  - Single-step tracer with before/after snapshots per instruction.
  - Breakpoints: BRK and a user-set breakpoint set().
  - Disassembler: turn program tuples into readable lines.
  - Assembler: parse text mnemonics into tuples (op, arg).

- Performance & Config
  - Instruction step budget configurable in run(max_steps=…).
  - Optional “strict mode” that forbids out-of-bounds even on INC/DEC before use.
  - Deterministic seed for any randomized I/O/device behavior.

- Error Semantics
  - Distinguish exceptions: DecodeError (bad opcode), OperandError (type/arity),
    MemError (RAM bounds), PCError (bad jump), StateError (e.g., RET with empty stack).

NEXT WEEK:
Hour 1–2: Sanity & cleanup
    - Finalize your current instruction set docstring/spec.
    - Make sure all ops (LOADI, LOADI_IDX, *_ABS, *_IDX, *_IDX_OFF, ADD*, INC/DEC_IDX, JUMP*, HALT) run with tests.
    - Write 2–3 very small test programs (add two numbers, store & reload, simple jump loop).

Hour 3–5: Write and run a real program
    - Program 1: Generate the first 10 Fibonacci numbers and store them in RAM.
    - Program 2: Sum numbers from F[0]..F[9] into ACC (a “for loop” with jumps).
    - Dump memory and verify results.
        - Blog idea: “From Toy Instructions to Real Computation: Writing Loops in My Virtual CPU”

Hour 6–7: Extend with subtraction
    - Add SUBI, SUB_ABS, SUB_IDX, SUB_IDX_OFF.
    - Verify with simple test: 5 – 3 = 2.
    - Rewrite your sum loop using subtraction to count down (more realistic assembly pattern).

Hour 8: Reflection + write-up
    - Document what surprised you about writing loops in assembly.
    - Reflect on how much control flow depends on jumps and counters.
    - Update your spec with new ops.
    - (Optional) Write a trace function that prints ACC, IDX, PC after each instruction for debugging.

'''

from dataclasses import dataclass, field
from typing import List, Tuple, Any

Instruction = Tuple[str, Any]

# create boilerplate for dataclass CPU
# dataclass decorator creates class boilerplates that already have __init__, __repr__, and __eq__ defined
@dataclass
class CPU:
    '''
    Returns the instance of a VM
    Defintes the VM's methods and the psuedo-assembly language it accepts
    '''
    mem: List[int] = field(default_factory=lambda: [0]*32)      # 32 integer cells of RAM
    acc: int = 0                                                # Accumulator
    idx: int = 0                                                # Index register
    pc: int = 0                                                 # Program counter
    halted: bool = False
    program: List[Instruction] = field(default_factory=list)    # program is a List of Instructions (list of tuples)

    # load program, with starting values for each field except mem
    def load_program(self, program: List[Instruction]):
        '''Load a list of instructions and reset CPU state.'''
        self.program = program
        self.pc = 0
        self.halted = False
        self.acc = 0
        self.idx = 0
        # memory persists until you overwrite cpu.mem explicitly

    def dump(self):
        '''Snapshot of key state for quick inspection'''
        return{
            'PC': self.pc,
            'ACC': self.acc,
            'IDX': self.idx,
            'HALTED': self.halted,
            'MEMO..7': self.mem[:8]
        }
    
    def step(self):
        '''Execute exactly the next instruction'''
        # if halted or pc is not between 0 and len(program), set halted = True
        if self.halted or not (0 <= self.pc < len(self.program)):
            self.halted = True 
            return
                
        # set-up
        instr = self.program[self.pc]
        # unpack instruction into op and *args or 
        op, *args = instr if isinstance(instr, tuple) else (instr,)

        # default: advance PC before execute (pre-increment)
        self.pc += 1

        # reset variables
        target_index = None
        addr = None
        target_step = None
        
        # enforce that *args[0] is int
        if len(args) >= 1 and not isinstance(args[0], int):
            raise TypeError(f'({op}, {args}): second tuple item must be an int')
        
        # enforce that certain ops have args[0]
        # note: 
        needs_args = {
            'LOADI', 'LOAD_ABS', 'LOAD_IDX_OFF',
            'STORE_ABS','STORE_IDX_OFF',
            'ADDI', 'ADD_ABS', 'ADD_IDX_OFF'
            'JUMP_ABS', 'JUMP_REL', 'LOADI_IDX',
            }
        if op in needs_args:
            if not args:
                raise ValueError(f'Instruction {self.pc} must include an int.')

        
        # HELPER FUNCTIONS
        # use with any command that operates on an ABS memory address
        def _get_address(args: Any) -> int:
            'Validate and return a memory address'
            addr = args[0]
            if not (0 <= addr < len(self.mem)):
                raise IndexError(f'Instruction {self.pc} results in a bad memory address in step: {addr}.')
            return addr

        # use with Any command that operates on the IDX
        def _get_target_idx(op: str, args: Any) -> int:
            'Validate and return a target index'
            if 'OFF' in op: 
                off = args[0]
                target_idx = self.idx + off
            elif op == 'DEC_IDX':
                target_idx = self.idx - 1
            elif 'INC' in op:
                target_idx = self.idx + 1
            elif op == 'LOADI_IDX':
                target_idx = args[0]

            if not 0 <= target_idx < len(self.mem):
                raise IndexError(f'Instruction {self.pc} results in IDX out of range: {target_idx}.')
            
            return target_idx 

        # use with any command that operates on the memory slot @IDX
        def _check_idx():
            'Validate the current index'
            if not (0 <= self.idx < len(self.mem)):
                raise IndexError(f'Instruction {self.pc} contains an IDX out of range.')
            
        # use with any command that operates on self.pc
        def _get_target_step(args: Any) -> int:
            'Validate and return a target processor counter location'
            if op == 'JUMP_REL':
                target_step = self.pc + args[0]
            else:                       # includes JUMP_ABS
                target_step = args[0]
            if not (0 <= target_step < len(self.program)):
                raise IndexError(f'Instruction {self.pc} results in a bad pc: {target_step}')
            return target_step

        # OPERATION LOGIC
        # no helper function
        if op == 'LOADI':           # ACC <- immediate
            self.acc = args[0]
        elif op == 'ADDI':          # acc = acc + immediate
            self.acc = self.acc + args[0]

        # _get_address() helper function
        elif op == 'LOAD_ABS':      # ACC <- mem[addr]
            addr = _get_address(args)
            self.acc = self.mem[addr]
        elif op == 'STORE_ABS':     # mem[addr] <- ACC
            addr = _get_address(args)
            self.mem[addr] = self.acc
        elif op == 'ADD_ABS':       # acc = acc + mem[addr]
            addr = _get_address(args)
            self.acc = self.acc + self.mem[addr]

        # _check_id() helper function
        elif op == 'LOAD_IDX':      # ACC <- mem[idx]
            _check_idx()
            self.acc = self.mem[self.idx]
        elif op == 'STORE_IDX':     # mem[idx] <- ACC
            _check_idx()
            self.mem[self.idx] = self.acc
        elif op == 'ADD_IDX':       # acc = acc + mem[idx]
            _check_idx()
            self.acc += self.mem[self.idx]

        # _get_target_idx() helper function
        elif op == 'LOAD_IDX_OFF':  # ACC <- mem[idx+off]
            target_index = _get_target_idx(self, op, args)
            self.acc = self.mem[target_index]
        elif op == 'STORE_IDX_OFF':  # mem[idx+off] <- ACC
            target_index = _get_target_idx(op, args)
            self.mem[target_index] = self.acc
        elif op == 'ADD_IDX_OFF':   # acc = acc + mem[idx + off]
            target_index = _get_target_idx(op, args)
            self.acc += self.mem[target_index]
        elif op == 'INC_IDX':
            target_index = _get_target_idx(op, args)     
            self.idx = target_index
        elif op == 'DEC_IDX':
            target_index = _get_target_idx(op, args)     
            self.idx = target_index
        elif op == 'LOADI_IDX':       # idx = args[0]
            target_index = _get_target_idx(op, args)
            self.idx = target_index
        
        # get_target_set() helper function
        elif op == 'JUMP_ABS':      # Absolute jump
            target_step = _get_target_step(args)
            self.pc = target_step
        elif op == 'JUMP_REL':      # Relative jump
            target_step = _get_target_step(args)
            self.pc = target_step
        elif op == 'HALT':          # stop execution
            self.halted = True
        else:
            raise NotImplementedError(f'Unknown op: {op}')
        
    def run(self, max_steps=100):
        '''Run up to max_steps or until halted.'''
        steps = 0
        while not self.halted and steps < max_steps:
            self.step()
            steps += 1
        return steps


In [None]:
prog = [
    ('LOADI', -7),
    ('STORE_ABS', 0),
    ('LOADI', 12),
    ('STORE_ABS', 1),
    ('LOAD_ABS', 1),
    ('LOADI_IDX', 5),
    ('HALT',),
]

In [141]:
cpu = CPU()
cpu.load_program(prog)
cpu.run()
print(cpu.dump())


{'PC': 7, 'ACC': 12, 'IDX': 5, 'HALTED': True, 'MEMO..7': [-7, 12, 0, 0, 0, 0, 0, 0]}
