# 15 - Assembler

## Introduction

An **Assembler** converts human-readable assembly language into machine code. Instead of manually encoding instructions like `0x4012`, we can write `ADD R0, R1, R2`.

Our assembler uses a **two-pass** approach:
1. **First pass**: Build symbol table (label addresses)
2. **Second pass**: Generate machine code

## Learning Objectives

1. Understand assembly language syntax
2. Learn about labels and symbol tables
3. Build a two-pass assembler

In [None]:
import sys
from pathlib import Path
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root / 'src'))
sys.path.insert(0, str(project_root))

from computer.isa import OPCODES, encode_instruction
from computer import bits_to_int
from typing import List, Dict, Optional
import re
print("Setup complete!")

## Assembly Syntax

```asm
; This is a comment
label:              ; Labels end with colon
    ADD R0, R1, R2  ; Instruction with operands
    LOAD R3, data   ; Reference label
    JMP label       ; Jump to label

.org 0x10           ; Set origin address
data:
    .byte 42        ; Data byte
```

## Exercise 1: Line Parser

In [None]:
def parse_line(line: str) -> Optional[Dict]:
    """
    Parse a single line of assembly.
    
    Returns:
        Dictionary with:
        - 'label': Label name (if present)
        - 'opcode': Instruction mnemonic
        - 'operands': List of operand strings
        - 'directive': Assembler directive (.org, .byte)
        - 'value': Directive value
        Or None for empty/comment lines
    """
    # Remove comments
    line = line.split(';')[0].strip()
    if not line:
        return None
    
    result = {}
    
    # YOUR CODE HERE
    # 1. Check for label (ends with :)
    # 2. Check for directive (starts with .)
    # 3. Parse instruction and operands
    pass

# Test
test_lines = [
    "ADD R0, R1, R2",
    "loop:  ; a label",
    "    LOAD R1, 0x10",
    ".byte 42",
    ".org 0x20",
    "end: HALT",
]

for line in test_lines:
    result = parse_line(line)
    print(f"{line:25} -> {result}")

## Exercise 2: Two-Pass Assembler

In [None]:
class Assembler:
    """Two-pass assembler for our 8-bit computer."""
    
    def __init__(self):
        self.symbol_table: Dict[str, int] = {}
        self.errors: List[str] = []
    
    def assemble(self, source: str) -> List[List[int]]:
        """
        Assemble source code into machine code.
        
        Args:
            source: Assembly source code
        
        Returns:
            List of 16-bit instructions
        """
        self.symbol_table = {}
        self.errors = []
        parsed_lines = self.first_pass(source)
        return self.second_pass(parsed_lines)
    
    def first_pass(self, source: str) -> List[Dict]:
        """
        First pass: Build symbol table.
        
        Returns:
            List of parsed lines with addresses
        """
        # YOUR CODE HERE
        # 1. Parse each line
        # 2. Track current address
        # 3. Record label addresses in symbol_table
        # 4. Instructions are 2 bytes, .byte is 1 byte
        pass
    
    def second_pass(self, parsed_lines: List[Dict]) -> List[List[int]]:
        """
        Second pass: Generate machine code.
        
        Returns:
            List of 16-bit instructions
        """
        # YOUR CODE HERE
        # 1. For each instruction, encode it
        # 2. Resolve label references using symbol_table
        pass
    
    def parse_line(self, line: str) -> Optional[Dict]:
        """Parse a single line."""
        line = line.split(';')[0].strip()
        if not line:
            return None
        
        result = {}
        
        # Check for label
        if ':' in line:
            parts = line.split(':', 1)
            result['label'] = parts[0].strip()
            line = parts[1].strip()
            if not line:
                return result
        
        # Check for directive
        if line.startswith('.'):
            parts = line.split(None, 1)
            result['directive'] = parts[0].lower()
            if len(parts) > 1:
                result['value'] = self._parse_value(parts[1])
            return result
        
        # Parse instruction
        parts = line.split(None, 1)
        result['opcode'] = parts[0].upper()
        if len(parts) > 1:
            result['operands'] = [op.strip() for op in parts[1].split(',')]
        
        return result
    
    def _parse_reg(self, operand: str) -> int:
        """Parse register operand (R0-R7)."""
        operand = operand.strip().upper()
        if operand.startswith('R'):
            return int(operand[1:])
        return 0
    
    def _parse_value(self, operand: str) -> int:
        """Parse numeric or label operand."""
        operand = operand.strip()
        if operand in self.symbol_table:
            return self.symbol_table[operand]
        if operand.startswith('0x') or operand.startswith('0X'):
            return int(operand, 16)
        return int(operand)

# Test
source = """
; Simple program
    LOAD R1, 0x10    ; Load first number
    LOAD R2, 0x11    ; Load second number
    ADD R0, R1, R2   ; Add them
    HALT
"""

asm = Assembler()
code = asm.assemble(source)

print("Assembled code:")
for i, instr in enumerate(code):
    val = bits_to_int(instr)
    print(f"  {i*2:02X}: {val:04X}")

## Testing with Labels

In [None]:
source_with_labels = """
; Program with labels
start:
    LOAD R0, counter    ; Load counter
    JZ done             ; If zero, done
    ADD R1, R1, R0      ; Add to accumulator
    JMP start           ; Repeat
done:
    HALT

counter:
    .byte 5
"""

asm = Assembler()
code = asm.assemble(source_with_labels)

print("Symbol Table:")
for name, addr in asm.symbol_table.items():
    print(f"  {name}: 0x{addr:02X}")

print("\nAssembled code:")
for i, instr in enumerate(code):
    val = bits_to_int(instr)
    print(f"  {i*2:02X}: {val:04X}")

## Copy Your Implementation

Once your code works, copy the `Assembler` class to `src/computer/assembler.py`

## Validation

In [None]:
from utils.checker import check
check('assembler')

## Summary

The Assembler:

| Feature | Description |
|---------|-------------|
| Two-pass | First builds symbol table, then generates code |
| Labels | Named locations for jumps |
| Directives | .org, .byte for data |
| Comments | Lines starting with ; |

### What's Next?

Time to put it all together! The **Full System** combines the CPU and Assembler to run real programs!