# Session 1 - Topic 1: Python's Stack Machine Architecture (Enhanced)

In this enhanced notebook, we explore Python’s stack machine and *instrument* two examples to show not only disassembly but also **stack depth evolution** and **performance timing** for deeper understanding.

## 1. Quick Refresher: Stack Machine Basics
- Stack Machine = operates using a LIFO stack.
- Python (CPython) compiles code into **bytecode** operating on a **stack**.
- Bytecode Instructions manipulate this stack directly.

## 2. Code Example 1: Local vs Global Variable Access

In [None]:
import dis
import time

GLOBAL_VAR = 10

def use_local():
    local_var = GLOBAL_VAR
    for _ in range(1000000):
        result = local_var * 2
    return result

def use_global():
    for _ in range(1000000):
        result = GLOBAL_VAR * 2
    return result

# Timing
start = time.time()
use_local()
local_time = time.time() - start

start = time.time()
use_global()
global_time = time.time() - start

print(f"use_local: {local_time:.4f} sec")
print(f"use_global: {global_time:.4f} sec")

# Disassembly and Stack Depth Analysis
def analyze_function(func):
    print(f"\nDisassembly and Stack Depth for {func.__name__}:")
    depth = 0
    max_depth = 0
    for instr in dis.Bytecode(func):
        effect = dis.stack_effect(instr.opcode, instr.arg)
        depth += effect
        max_depth = max(max_depth, depth)
        print(f"{instr.opname:<20} stack_effect={effect:+2} depth_now={depth:2}")
    print(f"Maximum stack depth during {func.__name__}: {max_depth}")

analyze_function(use_local)
analyze_function(use_global)

### Observations:
- **Local Variables** (`LOAD_FAST`) are accessed quickly — simple stack operation.
- **Global Variables** (`LOAD_GLOBAL`) involve a **dictionary lookup** at runtime, making them slower.
- Stack depth also tends to be slightly higher for global accesses due to extra work.

## 3. Code Example 2: List Comprehension vs For Loop

In [None]:
def list_comp():
    return [x * 2 for x in range(10)]

def for_loop():
    result = []
    for x in range(10):
        result.append(x * 2)
    return result

# Timing
start = time.time()
list_comp()
list_time = time.time() - start

start = time.time()
for_loop()
for_time = time.time() - start

print(f"list_comp: {list_time:.6f} sec")
print(f"for_loop: {for_time:.6f} sec")

# Disassembly and Stack Depth Analysis
analyze_function(list_comp)
analyze_function(for_loop)

### Observations:
- **List Comprehensions** are internally optimized — fewer instructions and shallower stack changes.
- **For Loops** generate more bytecode, involving more stack operations per iteration.
- Hence, list comprehensions are generally faster for simple transformations.

## 4. Conclusion
- Understanding **stack depth** alongside **disassembly** gives insights into why some constructs are faster.
- Always balance between **clarity** and **performance**.
- Deep dive like this is especially useful for tight loops or performance-critical paths.