A complete interpreter and source-level debugger for nuPython, a Python-like programming language. This project implements the full execution pipeline from parsing to runtime, along with an interactive debugging environment similar to GDB.
This project consists of four major components that work together to parse, execute, and debug nuPython programs:
- Program Execution Engine - Interprets and executes nuPython code
- Memory Management System - Simulates RAM for variable storage
- Interactive Debugger - Provides step-by-step execution and inspection
- Program Graph Representation - Internal AST-like structure for code
The interpreter can execute nuPython programs that include:
- Variable assignments (integers, floats, strings, booleans)
- Arithmetic operations (
+,-,*,/,%,**) - Comparison operations (
==,!=,<,<=,>,>=) - String concatenation
- Control flow (if/elif/else statements, while loops)
- Built-in functions (
print(),input(),int(),float()) - Pointer operations (address-of
&, dereference*)
The debugger provides GDB-like functionality:
- Step-by-step execution
- Variable inspection and modification
- Memory state visualization
- Execution history tracking
- Breakpoint-style program navigation
The heart of the interpreter - executes the parsed program graph statement by statement.
Key Features:
-
Expression Evaluation: Handles unary and binary expressions with proper type coercion
- Automatic int/float conversion for mixed-type arithmetic
- String concatenation support
- Boolean comparison operations for all types
-
Statement Execution:
- Assignment statements (including pointer dereference assignments)
- Function calls (print, input, int, float)
- Control flow (if/elif/else, while loops)
-
Type System: Manages multiple data types with proper error handling
- Integers, floats, strings, booleans, pointers, None
- Runtime type checking and conversion
-
Error Handling: Comprehensive semantic error detection
- Undefined variables
- Invalid operand types
- Division/modulus by zero
- Type conversion failures
Notable Implementation Details:
execute_expr(): Unified expression evaluator that handles both simple values and complex binary operationsexecute_binary_expression(): Performs arithmetic/comparison operations with automatic type promotion- Helper functions for type-specific operations (int, real, string)
- Proper memory management for dynamically allocated strings in concatenation
A simulated RAM implementation that stores Python variables and their values.
Architecture:
-
Dual Data Structure Design:
cells[]: Dynamic array storing actual variable valuesmap[]: Alphabetically sorted array mapping variable names to cell addresses
-
Dynamic Growth: Automatically doubles capacity when full (starts at 4, grows to 8, 16, 32...)
Core Operations:
ram_write_cell_by_name(): Inserts variables in sorted order (O(log n) search + O(n) insertion)ram_read_cell_by_name(): Binary search lookup (O(log n))ram_read_cell_by_addr(): Direct address access (O(1))ram_get_addr(): Retrieves memory address for a variable
Features:
- Binary search for efficient variable lookup
- Deep copying for safe value retrieval
- String duplication for proper memory ownership
- Comprehensive cleanup in destructor
Design Decisions:
- Kept map sorted alphabetically for fast binary search
- Separated name-to-address mapping from value storage
- Addresses never change once assigned (variables have stable memory locations)
A GDB-style debugging interface for nuPython programs.
Debugging Commands:
s- Step: Execute current statement and advancew- Where: Show next statement to executep varname- Print variable value and typec varname newvalue- Change variable valuesm- Show memory: Display all variablese- Execution history: List executed line numberspg- Print program graph: Show entire program structureq- Quit debugger
Implementation Highlights:
Program Graph Manipulation:
- Maintains "current" and "next" statement pointers
- Unlinks program graph to execute one statement at a time
- Properly handles control flow (if statements, while loops with multiple next pointers)
- Relinks graph after execution to maintain program structure
State Management:
- Tracks execution history (line numbers executed)
- Detects execution completion vs. execution failure
- Prevents stepping beyond program end or after errors
Variable Inspection/Modification:
- Type-aware variable printing (shows both type and value)
- Dynamic variable modification during execution
- Proper type conversion for new values (handles strings, ints, floats, booleans)
Advanced Features:
- Handles while loops by evaluating conditions and choosing between loop body or next statement
- Supports if/elif/else by evaluating conditions and following true/false paths
- Preserves program graph integrity for re-execution or printing
Orchestrates the complete execution pipeline.
Pipeline:
- Input handling (file or keyboard)
- Parsing (validates syntax using provided parser)
- Program graph construction
- RAM initialization
- Program execution
- Memory state display
- Cleanup
├── main.c # Main execution pipeline
├── execute.c/h # Program execution engine
├── ram.c/h # Memory management system
├── debugger.cpp/h # Interactive debugger
├── debugger_run.cpp # Debugger main program
├── programgraph.h # Program representation
├── parser.h # Syntax parser
├── scanner.h # Lexical scanner
├── token.h # Token definitions
├── tokenqueue.h # Token queue structure
├── nupython.*.o # Pre-compiled parser/scanner for program execution
├── nupython_e.*.o # Pre-compiled parser/scanner for debugger
└── test-*.py # Example nuPython programs
The interpreter uses a recursive approach for expression evaluation:
- Determine if expression is unary or binary
- For binary: evaluate left-hand side, right-hand side, then operator
- Perform automatic type coercion (int→float when needed)
- Return result as dynamically allocated
RAM_VALUE
Variables are stored with stable addresses:
- First assigned variable gets address 0
- Second gets address 1, etc.
- Addresses never change (important for pointer operations)
- Map keeps sorted by name for fast lookup, but addresses remain sequential
The debugger implements control flow by:
- Unlinking the program graph at the current statement
- Evaluating conditions when needed (for if/while)
- Choosing appropriate next statement based on condition
- Relinking after execution
- Advancing to the chosen next statement
Runtime errors are detected and reported immediately:
- Undefined variable access
- Type mismatches in operations
- Division/modulus by zero
- Invalid pointer operations
- Execution stops at first error with descriptive message
./a.out test-while.pyOutput:
**parsing successful, valid syntax
**building program graph...
**executing...
Enter an integer> 5
project:a
project:aa
project:aaa
project:aaaa
project:aaaaa
**done
**MEMORY PRINT**
Size: 4
Capacity: 4
Contents:
N: int, 5
i: int, 6
result: str, 'project:aaaaa'
s: str, '5'
**END PRINT**
./a.out
nuPython input (enter $ when you're done)>
x = 10
print(x ** 2)
$Output:
**parsing successful, valid syntax
**building program graph...
**executing...
100
**done
**MEMORY PRINT**
Size: 1
Capacity: 4
Contents:
x: int, 10
**END PRINT**
./debugger test-while.pyInteractive Session:
** Simple gdb for nuPython **
Enter a command, h for help >
s
Enter an integer> 5
Enter a command, h for help >
w
1: result = "project:"
Enter a command, h for help >
p N
N (int): 5
Enter a command, h for help >
sm
**MEMORY PRINT**
Size: 2
Capacity: 4
Contents:
N: int, 5
s: str, '5'
**END PRINT**
Enter a command, h for help >
s
Enter a command, h for help >
p result
result (str): project:
Enter a command, h for help >
q
** Done **
IMPORTANT: Before building, you must copy the correct pre-compiled object files for your system architecture.
For Intel-based systems (x86/x64/AMD):
cp nupython.x86.o nupython.o
cp nupython_e.x86.o nupython_e.oFor Apple Silicon Macs (M1/M2/M3):
cp nupython.arm.o nupython.o
cp nupython_e.arm.o nupython_e.oThe interpreter (main.c) runs nuPython programs directly and shows the final memory state.
# Build the interpreter
make build
# Run with keyboard input (type your program, end with $)
make run
# Run with a file
./a.out test-expr.py
# Check for memory leaks with valgrind
make valgrind file=test-while.pyThe debugger (debugger_run.cpp) provides step-by-step execution with GDB-like commands.
# Build the debugger
make build-debugger
# Run the debugger with a file
./debugger test-funct.py
# Or use the make command
make run-debugger
# Check for memory leaks with valgrind
make valgrind-debugger file=test-if.pyKey Differences:
- Interpreter runs the entire program and shows final state
- Debugger lets you step through line-by-line, inspect/modify variables, and view execution history
The project includes several test programs demonstrating different features:
- test-expr.py: Expressions with integers, floats, strings, and comparisons
- test-funct.py: Built-in functions (input, int, float)
- test-if.py: If/elif/else statements
- test-while.py: While loops and string concatenation
This project demonstrated:
- Low-level systems programming in C/C++
- Memory management with malloc/free and dynamic arrays
- Data structure design (sorted arrays, binary search)
- Interpreter architecture (scanning → parsing → execution)
- Debugger implementation (program graph manipulation, state tracking)
- Type systems and runtime type checking
- Error handling and semantic analysis
The most challenging aspects were:
- Managing the program graph unlinking/relinking for single-step execution
- Implementing proper type coercion for mixed-type expressions
- Handling string memory allocation during concatenation
- Coordinating control flow between if/while conditions and statement advancement
With variables stored alphabetically, binary search provides O(log n) lookup time instead of O(n) linear search. While insertion is still O(n) due to shifting elements, lookups are far more frequent in typical programs.
The debugger needs to execute exactly one statement at a time. By unlinking the current statement from the next, we can call the existing execute() function (which normally runs entire programs) to execute just that single statement. After execution, we relink the graph and manually advance to the next statement.
When concatenating strings like "hello" + " world", we can't know the final length at compile time. The interpreter allocates exactly enough memory for the result, copies both strings into it, and stores the pointer in RAM. This memory is properly freed when the variable is reassigned or when RAM is destroyed.
The program graph serves as an intermediate representation between raw source code and execution. This separation allows:
- The debugger to traverse the program structure
- Error messages with accurate line numbers
- Potential for optimization passes (not implemented here)
- Clean separation of concerns (parsing vs. execution)
Built by Jay Rao for CS 211 at Northwestern University.
- My Work:
execute.c,ram.c,debugger.cpp,main.c - Provided: Parser, scanner, token system, program graph structure
- Course: CS 211 - Fundamentals of Computer Programming II
- Instructor: Prof. Joe Hummel