# Python AST Fundamentals - Part 1 of 6

**Note**: This is part 1 of 6 in the AST guide series for Python static analysis. This part covers fundamental AST concepts and function-related nodes.

## Overview
This guide covers the essential AST concepts needed for building static analysis tools, particularly for implementing scope-aware variable tracking and method call attribution. The AST module allows us to analyze Python code structure without executing it, which is perfect for static analysis tools that need to understand code relationships and patterns.

### 1. What is an AST?
An Abstract Syntax Tree is a tree representation of Python source code structure. Unlike the raw source text, an AST represents the logical structure of the code, with each node representing a syntactic construct (function, class, expression, statement, etc.). The "abstract" part means it omits syntactic details like parentheses, commas, and whitespace that don't affect the code's meaning.

The AST is crucial for static analysis tools because it lets us traverse and analyze Python code systematically. When we parse a Python file, we get a tree where we can visit each function definition, track every function call, and understand the relationships between classes and methods.

Here's what the AST structure looks like for simple code:

```python
# Source code:
x = 5 + 3
print(x)

# Becomes this tree:
Module(
  body=[
    Assign(
      targets=[Name(id='x', ctx=Store())],
      value=BinOp(
        left=Constant(value=5),
        op=Add(),
        right=Constant(value=3)
      )
    ),
    Expr(
      value=Call(
        func=Name(id='print', ctx=Load()),
        args=[Name(id='x', ctx=Load())],
        keywords=[]
      )
    )
  ]
)
```

In [1]:
# To parse and inspect:
import ast

code = "x = 5 + 3"
tree = ast.parse(code)
print(ast.dump(tree, indent=2))  # Pretty-prints the tree structure

Module(
  body=[
    Assign(
      targets=[
        Name(id='x', ctx=Store())],
      value=BinOp(
        left=Constant(value=5),
        op=Add(),
        right=Constant(value=3)))])


Every node in the tree has a type (like Module, Assign, BinOp) and attributes specific to that type. Nodes also carry location information (line numbers, column offsets) that we use to report where functions are defined in the original source.

### 2. The NodeVisitor Pattern
The NodeVisitor pattern is the heart of AST traversal in Python. It's a design pattern that lets you define what happens when you encounter each type of node without writing complex traversal logic. The `ast.NodeVisitor` base class handles the tree walking for you - you just specify what to do at each node type you care about.

The visitor works through a dispatch mechanism: when it encounters a node of type `FunctionDef`, it looks for a method called `visit_FunctionDef`. If that method exists, it calls it; otherwise, it falls back to `generic_visit`, which simply visits all child nodes. This pattern is perfect for static analysis applications where we want to track specific constructs (functions, classes, calls) while ignoring others (imports, decorators).

In [2]:
import ast


class MyVisitor(ast.NodeVisitor):
    def __init__(self):
        self.functions_found = []
        self.calls_found = []

    def visit_FunctionDef(self, node):
        # Called for each function definition
        print(f"Found function: {node.name} at line {node.lineno}")
        self.functions_found.append(node.name)

        # CRITICAL: Must call this to visit the function's body!
        self.generic_visit(node)  # Visit child nodes

    def visit_Call(self, node):
        # Called for each function/method call
        if isinstance(node.func, ast.Name):
            print(f"Found call to: {node.func.id}")
            self.calls_found.append(node.func.id)
        self.generic_visit(node)

In [3]:
# Usage:
code = """
def greet(name):
    print(f"Hello, {name}")

def main():
    greet("World")
    print("Done")
"""

tree = ast.parse(code)
print("AST structure:")
print(ast.dump(tree, indent=2))

AST structure:
Module(
  body=[
    FunctionDef(
      name='greet',
      args=arguments(
        args=[
          arg(arg='name')]),
      body=[
        Expr(
          value=Call(
            func=Name(id='print', ctx=Load()),
            args=[
              JoinedStr(
                values=[
                  Constant(value='Hello, '),
                  FormattedValue(
                    value=Name(id='name', ctx=Load()),
                    conversion=-1)])]))]),
    FunctionDef(
      name='main',
      args=arguments(),
      body=[
        Expr(
          value=Call(
            func=Name(id='greet', ctx=Load()),
            args=[
              Constant(value='World')])),
        Expr(
          value=Call(
            func=Name(id='print', ctx=Load()),
            args=[
              Constant(value='Done')]))])])


In [4]:
visitor = MyVisitor()
visitor.visit(tree)
print(f"Functions: {visitor.functions_found}")  # ['greet', 'main']
print(f"Calls: {visitor.calls_found}")  # ['print', 'greet', 'print']

Found function: greet at line 2
Found call to: print
Found function: main at line 5
Found call to: greet
Found call to: print
Functions: ['greet', 'main']
Calls: ['print', 'greet', 'print']


**Key insight**: The visitor automatically dispatches to `visit_ClassName` methods based on node type. If no specific method exists, it calls `generic_visit`. The traversal is depth-first, meaning it processes a node before its children (unless you override this behavior).

### 3. Context (ctx) Attribute
Every `Name` and `Attribute` node has a context that tells you how that name is being used in the code. The context is crucial for understanding whether we're reading from a variable, writing to it, or deleting it. This distinction is essential for variable tracking - we need to know when a variable is being assigned a value (so we can track its type) versus when it's being used (so we can resolve its type).

The context appears as a `ctx` attribute on the node, and it's an instance of one of three classes:
- `Load()`: Reading/using a value (the variable appears in an expression)
- `Store()`: Writing/assigning a value (the variable appears on the left side of assignment)
- `Del()`: Deleting the variable (appears in a del statement)

In [5]:
import ast

# Example showing different contexts:
code = """
x = 5           # x has Store context
y = x + 10      # y has Store context, x has Load context
print(x)        # x has Load context
del y           # y has Del context
obj.attr = 20   # obj has Load context, attr is being stored to
z = obj.attr    # z has Store context, obj has Load, attr is being loaded
"""

tree = ast.parse(code)
print(ast.dump(tree, indent=2))

Module(
  body=[
    Assign(
      targets=[
        Name(id='x', ctx=Store())],
      value=Constant(value=5)),
    Assign(
      targets=[
        Name(id='y', ctx=Store())],
      value=BinOp(
        left=Name(id='x', ctx=Load()),
        op=Add(),
        right=Constant(value=10))),
    Expr(
      value=Call(
        func=Name(id='print', ctx=Load()),
        args=[
          Name(id='x', ctx=Load())])),
    Delete(
      targets=[
        Name(id='y', ctx=Del())]),
    Assign(
      targets=[
        Attribute(
          value=Name(id='obj', ctx=Load()),
          attr='attr',
          ctx=Store())],
      value=Constant(value=20)),
    Assign(
      targets=[
        Name(id='z', ctx=Store())],
      value=Attribute(
        value=Name(id='obj', ctx=Load()),
        attr='attr',
        ctx=Load()))])


In [6]:
class ContextInspector(ast.NodeVisitor):
    def visit_Name(self, node):
        context_type = type(node.ctx).__name__
        print(f"Variable '{node.id}' has context: {context_type}")
        self.generic_visit(node)

    def visit_Attribute(self, node):
        context_type = type(node.ctx).__name__
        print(f"Attribute '.{node.attr}' has context: {context_type}")
        self.generic_visit(node)


inspector = ContextInspector()
inspector.visit(tree)

Variable 'x' has context: Store
Variable 'y' has context: Store
Variable 'x' has context: Load
Variable 'print' has context: Load
Variable 'x' has context: Load
Variable 'y' has context: Del
Attribute '.attr' has context: Store
Variable 'obj' has context: Load
Variable 'z' has context: Store
Attribute '.attr' has context: Load
Variable 'obj' has context: Load


This is crucial for distinguishing between variable usage and assignment. In scope-aware variable tracking, we only record type information when we see Store context (assignments), and we look up type information when we see Load context (usage).

## 4. Function-Related Nodes

Function definitions are central to many static analysis tools. We need to extract function signatures, identify which parameters have type annotations, and track whether there's a return type annotation. Python has two function definition node types: `FunctionDef` for regular functions and `AsyncFunctionDef` for async functions. They have identical structure, which is why in static analysis code we often handle them with the same logic.

**ast.FunctionDef / ast.AsyncFunctionDef**

The function definition nodes contain everything about a function's signature and body. The `args` attribute is particularly important as it contains an `ast.arguments` object with all parameter information. The `returns` attribute holds the return type annotation if present. The `body` is a list of statement nodes representing the function's implementation.

In [7]:
import ast

code = """
def regular_function(a: int, b=5, *args, **kwargs) -> str:
    '''A docstring'''
    return str(a + b)

async def async_function(x: float) -> None:
    await some_operation(x)
"""

tree = ast.parse(code)
print(ast.dump(tree, indent=2))

Module(
  body=[
    FunctionDef(
      name='regular_function',
      args=arguments(
        args=[
          arg(
            arg='a',
            annotation=Name(id='int', ctx=Load())),
          arg(arg='b')],
        vararg=arg(arg='args'),
        kwarg=arg(arg='kwargs'),
        defaults=[
          Constant(value=5)]),
      body=[
        Expr(
          value=Constant(value='A docstring')),
        Return(
          value=Call(
            func=Name(id='str', ctx=Load()),
            args=[
              BinOp(
                left=Name(id='a', ctx=Load()),
                op=Add(),
                right=Name(id='b', ctx=Load()))]))],
      returns=Name(id='str', ctx=Load())),
    AsyncFunctionDef(
      name='async_function',
      args=arguments(
        args=[
          arg(
            arg='x',
            annotation=Name(id='float', ctx=Load()))]),
      body=[
        Expr(
          value=Await(
            value=Call(
              func=Name(id='some_operation', ctx=

In [8]:
class FunctionAnalyzer(ast.NodeVisitor):
    def visit_FunctionDef(self, node):
        self._analyze_function(node, is_async=False)

    def visit_AsyncFunctionDef(self, node):
        self._analyze_function(node, is_async=True)

    def _analyze_function(self, node, is_async):
        print(f"{'Async ' if is_async else ''}Function: {node.name}")
        print(f"  Line: {node.lineno}")
        print(f"  Has return annotation: {node.returns is not None}")
        if node.returns and isinstance(node.returns, ast.Name):
            print(f"  Return type: {node.returns.id}")
        print(f"  Number of decorators: {len(node.decorator_list)}")
        print(f"  Body has {len(node.body)} statements")

        # The first statement might be a docstring
        if node.body and isinstance(node.body[0], ast.Expr):
            if isinstance(node.body[0].value, ast.Constant):
                if isinstance(node.body[0].value.value, str):
                    print("  Has docstring: Yes")


analyzer = FunctionAnalyzer()
analyzer.visit(tree)

Function: regular_function
  Line: 2
  Has return annotation: True
  Return type: str
  Number of decorators: 0
  Body has 2 statements
  Has docstring: Yes
Async Function: async_function
  Line: 6
  Has return annotation: True
  Number of decorators: 0
  Body has 1 statements


**ast.arguments** (function parameters)

The `arguments` object is complex because Python supports many parameter types. Each parameter is represented as an `ast.arg` object with `arg` (the name) and `annotation` (the type hint) attributes. Parameters are grouped by their kind, and you need to check multiple lists to get all parameters.

In [9]:
def analyze_parameters(func_node):
    """Detailed parameter analysis for a function node."""
    args = func_node.args

    print(f"Analyzing parameters for: {func_node.name}")

    # Regular positional arguments (most common)
    for arg in args.args:
        annotation = "annotated" if arg.annotation else "not annotated"
        print(f"  Regular arg: {arg.arg} ({annotation})")

    # Positional-only arguments (before / in signature) - Python 3.8+
    for arg in args.posonlyargs:
        annotation = "annotated" if arg.annotation else "not annotated"
        print(f"  Positional-only: {arg.arg} ({annotation})")

    # Keyword-only arguments (after * in signature)
    for arg in args.kwonlyargs:
        annotation = "annotated" if arg.annotation else "not annotated"
        print(f"  Keyword-only: {arg.arg} ({annotation})")

    # *args parameter (if present)
    if args.vararg:
        annotation = "annotated" if args.vararg.annotation else "not annotated"
        print(f"  Varargs: *{args.vararg.arg} ({annotation})")

    # **kwargs parameter (if present)
    if args.kwarg:
        annotation = "annotated" if args.kwarg.annotation else "not annotated"
        print(f"  Kwargs: **{args.kwarg.arg} ({annotation})")

    # Default values for positional-only and regular parameters
    if args.defaults:
        # args.defaults contains defaults for BOTH posonlyargs and args.args
        # They are right-aligned across the combined list of parameters
        all_positional_params = args.posonlyargs + args.args
        num_params = len(all_positional_params)
        num_defaults = len(args.defaults)

        # Map defaults to the correct parameters
        for i, default in enumerate(args.defaults):
            param_index = num_params - num_defaults + i
            param = all_positional_params[param_index]
            print(f"  Default for {param.arg}: {ast.dump(default)}")

    # Default values for keyword-only parameters
    if args.kw_defaults:
        for arg, default in zip(args.kwonlyargs, args.kw_defaults, strict=False):
            if default is not None:  # kw_defaults uses None for parameters without defaults
                print(f"  Default for {arg.arg}: {ast.dump(default)}")

In [10]:
# Example with complex signature:
complex_func = """
def complex(a, b=10, /, c=20, *args, d, e=30, **kwargs) -> int:
    pass
"""
tree = ast.parse(complex_func)
print(ast.dump(tree, indent=2))

Module(
  body=[
    FunctionDef(
      name='complex',
      args=arguments(
        posonlyargs=[
          arg(arg='a'),
          arg(arg='b')],
        args=[
          arg(arg='c')],
        vararg=arg(arg='args'),
        kwonlyargs=[
          arg(arg='d'),
          arg(arg='e')],
        kw_defaults=[
          None,
          Constant(value=30)],
        kwarg=arg(arg='kwargs'),
        defaults=[
          Constant(value=10),
          Constant(value=20)]),
      body=[
        Pass()],
      returns=Name(id='int', ctx=Load()))])


In [11]:
func_node = tree.body[0]
analyze_parameters(func_node)

Analyzing parameters for: complex
  Regular arg: c (not annotated)
  Positional-only: a (not annotated)
  Positional-only: b (not annotated)
  Keyword-only: d (not annotated)
  Keyword-only: e (not annotated)
  Varargs: *args (not annotated)
  Kwargs: **kwargs (not annotated)
  Default for b: Constant(value=10)
  Default for c: Constant(value=20)
  Default for e: Constant(value=30)


This comprehensive analysis of function parameters is essential for static analysis tools that need to understand function signatures, such as those analyzing type annotations or function complexity.