Skip to content

jayrao213/nuPython

Repository files navigation

nuPython Interpreter & Debugger

A complete interpreter and source-level debugger for nuPython, a Python-like programming language. This project implements the full execution pipeline from parsing to runtime, along with an interactive debugging environment similar to GDB.

Overview

This project consists of four major components that work together to parse, execute, and debug nuPython programs:

  1. Program Execution Engine - Interprets and executes nuPython code
  2. Memory Management System - Simulates RAM for variable storage
  3. Interactive Debugger - Provides step-by-step execution and inspection
  4. Program Graph Representation - Internal AST-like structure for code

What It Does

The interpreter can execute nuPython programs that include:

  • Variable assignments (integers, floats, strings, booleans)
  • Arithmetic operations (+, -, *, /, %, **)
  • Comparison operations (==, !=, <, <=, >, >=)
  • String concatenation
  • Control flow (if/elif/else statements, while loops)
  • Built-in functions (print(), input(), int(), float())
  • Pointer operations (address-of &, dereference *)

The debugger provides GDB-like functionality:

  • Step-by-step execution
  • Variable inspection and modification
  • Memory state visualization
  • Execution history tracking
  • Breakpoint-style program navigation

Core Components I Built

1. Execution Engine (execute.c / execute.h)

The heart of the interpreter - executes the parsed program graph statement by statement.

Key Features:

  • Expression Evaluation: Handles unary and binary expressions with proper type coercion

    • Automatic int/float conversion for mixed-type arithmetic
    • String concatenation support
    • Boolean comparison operations for all types
  • Statement Execution:

    • Assignment statements (including pointer dereference assignments)
    • Function calls (print, input, int, float)
    • Control flow (if/elif/else, while loops)
  • Type System: Manages multiple data types with proper error handling

    • Integers, floats, strings, booleans, pointers, None
    • Runtime type checking and conversion
  • Error Handling: Comprehensive semantic error detection

    • Undefined variables
    • Invalid operand types
    • Division/modulus by zero
    • Type conversion failures

Notable Implementation Details:

  • execute_expr(): Unified expression evaluator that handles both simple values and complex binary operations
  • execute_binary_expression(): Performs arithmetic/comparison operations with automatic type promotion
  • Helper functions for type-specific operations (int, real, string)
  • Proper memory management for dynamically allocated strings in concatenation

2. Memory Management System (ram.c / ram.h)

A simulated RAM implementation that stores Python variables and their values.

Architecture:

  • Dual Data Structure Design:

    • cells[]: Dynamic array storing actual variable values
    • map[]: Alphabetically sorted array mapping variable names to cell addresses
  • Dynamic Growth: Automatically doubles capacity when full (starts at 4, grows to 8, 16, 32...)

Core Operations:

  • ram_write_cell_by_name(): Inserts variables in sorted order (O(log n) search + O(n) insertion)
  • ram_read_cell_by_name(): Binary search lookup (O(log n))
  • ram_read_cell_by_addr(): Direct address access (O(1))
  • ram_get_addr(): Retrieves memory address for a variable

Features:

  • Binary search for efficient variable lookup
  • Deep copying for safe value retrieval
  • String duplication for proper memory ownership
  • Comprehensive cleanup in destructor

Design Decisions:

  • Kept map sorted alphabetically for fast binary search
  • Separated name-to-address mapping from value storage
  • Addresses never change once assigned (variables have stable memory locations)

3. Interactive Debugger (debugger.cpp / debugger.h)

A GDB-style debugging interface for nuPython programs.

Debugging Commands:

  • s - Step: Execute current statement and advance
  • w - Where: Show next statement to execute
  • p varname - Print variable value and type
  • c varname newvalue - Change variable value
  • sm - Show memory: Display all variables
  • e - Execution history: List executed line numbers
  • pg - Print program graph: Show entire program structure
  • q - Quit debugger

Implementation Highlights:

Program Graph Manipulation:

  • Maintains "current" and "next" statement pointers
  • Unlinks program graph to execute one statement at a time
  • Properly handles control flow (if statements, while loops with multiple next pointers)
  • Relinks graph after execution to maintain program structure

State Management:

  • Tracks execution history (line numbers executed)
  • Detects execution completion vs. execution failure
  • Prevents stepping beyond program end or after errors

Variable Inspection/Modification:

  • Type-aware variable printing (shows both type and value)
  • Dynamic variable modification during execution
  • Proper type conversion for new values (handles strings, ints, floats, booleans)

Advanced Features:

  • Handles while loops by evaluating conditions and choosing between loop body or next statement
  • Supports if/elif/else by evaluating conditions and following true/false paths
  • Preserves program graph integrity for re-execution or printing

4. Main Program (main.c)

Orchestrates the complete execution pipeline.

Pipeline:

  1. Input handling (file or keyboard)
  2. Parsing (validates syntax using provided parser)
  3. Program graph construction
  4. RAM initialization
  5. Program execution
  6. Memory state display
  7. Cleanup

Project Structure

├── main.c                 # Main execution pipeline
├── execute.c/h            # Program execution engine 
├── ram.c/h                # Memory management system 
├── debugger.cpp/h         # Interactive debugger 
├── debugger_run.cpp       # Debugger main program
├── programgraph.h         # Program representation 
├── parser.h               # Syntax parser 
├── scanner.h              # Lexical scanner 
├── token.h                # Token definitions 
├── tokenqueue.h           # Token queue structure 
├── nupython.*.o           # Pre-compiled parser/scanner for program execution
├── nupython_e.*.o         # Pre-compiled parser/scanner for debugger
└── test-*.py              # Example nuPython programs

Technical Highlights

Expression Evaluation Strategy

The interpreter uses a recursive approach for expression evaluation:

  1. Determine if expression is unary or binary
  2. For binary: evaluate left-hand side, right-hand side, then operator
  3. Perform automatic type coercion (int→float when needed)
  4. Return result as dynamically allocated RAM_VALUE

Memory Organization

Variables are stored with stable addresses:

  • First assigned variable gets address 0
  • Second gets address 1, etc.
  • Addresses never change (important for pointer operations)
  • Map keeps sorted by name for fast lookup, but addresses remain sequential

Control Flow Handling

The debugger implements control flow by:

  1. Unlinking the program graph at the current statement
  2. Evaluating conditions when needed (for if/while)
  3. Choosing appropriate next statement based on condition
  4. Relinking after execution
  5. Advancing to the chosen next statement

Error Handling Philosophy

Runtime errors are detected and reported immediately:

  • Undefined variable access
  • Type mismatches in operations
  • Division/modulus by zero
  • Invalid pointer operations
  • Execution stops at first error with descriptive message

Example Usage

Running a Program (Interpreter Mode)

./a.out test-while.py

Output:

**parsing successful, valid syntax
**building program graph...
**executing...
Enter an integer> 5
project:a
project:aa
project:aaa
project:aaaa
project:aaaaa
**done
**MEMORY PRINT**
Size: 4
Capacity: 4
Contents:
 N: int, 5
 i: int, 6
 result: str, 'project:aaaaa'
 s: str, '5'
**END PRINT**

Interactive Input (Interpreter Mode)

./a.out
nuPython input (enter $ when you're done)>
x = 10
print(x ** 2)
$

Output:

**parsing successful, valid syntax
**building program graph...
**executing...
100
**done
**MEMORY PRINT**
Size: 1
Capacity: 4
Contents:
 x: int, 10
**END PRINT**

Debugging a Program (Debugger Mode)

./debugger test-while.py

Interactive Session:

** Simple gdb for nuPython **

Enter a command, h for help > 
s
Enter an integer> 5

Enter a command, h for help > 
w
1: result = "project:"

Enter a command, h for help > 
p N
N (int): 5

Enter a command, h for help > 
sm
**MEMORY PRINT**
Size: 2
Capacity: 4
Contents:
 N: int, 5
 s: str, '5'
**END PRINT**

Enter a command, h for help > 
s

Enter a command, h for help > 
p result
result (str): project:

Enter a command, h for help > 
q

** Done **

Building the Project

Setup

IMPORTANT: Before building, you must copy the correct pre-compiled object files for your system architecture.

For Intel-based systems (x86/x64/AMD):

cp nupython.x86.o nupython.o
cp nupython_e.x86.o nupython_e.o

For Apple Silicon Macs (M1/M2/M3):

cp nupython.arm.o nupython.o
cp nupython_e.arm.o nupython_e.o

Compilation & Running

Interpreter Mode

The interpreter (main.c) runs nuPython programs directly and shows the final memory state.

# Build the interpreter
make build

# Run with keyboard input (type your program, end with $)
make run

# Run with a file
./a.out test-expr.py

# Check for memory leaks with valgrind
make valgrind file=test-while.py

Debugger Mode

The debugger (debugger_run.cpp) provides step-by-step execution with GDB-like commands.

# Build the debugger
make build-debugger

# Run the debugger with a file
./debugger test-funct.py

# Or use the make command
make run-debugger

# Check for memory leaks with valgrind
make valgrind-debugger file=test-if.py

Key Differences:

  • Interpreter runs the entire program and shows final state
  • Debugger lets you step through line-by-line, inspect/modify variables, and view execution history

Test Programs

The project includes several test programs demonstrating different features:

  • test-expr.py: Expressions with integers, floats, strings, and comparisons
  • test-funct.py: Built-in functions (input, int, float)
  • test-if.py: If/elif/else statements
  • test-while.py: While loops and string concatenation

Learning Outcomes

This project demonstrated:

  • Low-level systems programming in C/C++
  • Memory management with malloc/free and dynamic arrays
  • Data structure design (sorted arrays, binary search)
  • Interpreter architecture (scanning → parsing → execution)
  • Debugger implementation (program graph manipulation, state tracking)
  • Type systems and runtime type checking
  • Error handling and semantic analysis

The most challenging aspects were:

  1. Managing the program graph unlinking/relinking for single-step execution
  2. Implementing proper type coercion for mixed-type expressions
  3. Handling string memory allocation during concatenation
  4. Coordinating control flow between if/while conditions and statement advancement

Key Implementation Insights

Why Binary Search for Variables?

With variables stored alphabetically, binary search provides O(log n) lookup time instead of O(n) linear search. While insertion is still O(n) due to shifting elements, lookups are far more frequent in typical programs.

Why Unlink the Program Graph?

The debugger needs to execute exactly one statement at a time. By unlinking the current statement from the next, we can call the existing execute() function (which normally runs entire programs) to execute just that single statement. After execution, we relink the graph and manually advance to the next statement.

Why Dynamic String Allocation?

When concatenating strings like "hello" + " world", we can't know the final length at compile time. The interpreter allocates exactly enough memory for the result, copies both strings into it, and stores the pointer in RAM. This memory is properly freed when the variable is reassigned or when RAM is destroyed.

Why Separate Execution from Parsing?

The program graph serves as an intermediate representation between raw source code and execution. This separation allows:

  • The debugger to traverse the program structure
  • Error messages with accurate line numbers
  • Potential for optimization passes (not implemented here)
  • Clean separation of concerns (parsing vs. execution)

Credits

Built by Jay Rao for CS 211 at Northwestern University.

  • My Work: execute.c, ram.c, debugger.cpp, main.c
  • Provided: Parser, scanner, token system, program graph structure
  • Course: CS 211 - Fundamentals of Computer Programming II
  • Instructor: Prof. Joe Hummel

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published