# PHP-Parser-Py API Demo

This notebook demonstrates the functionality and APIs of `php-parser-py`, a Python wrapper for PHP-Parser that integrates with cpg2py's graph framework.

## Features

- Parse PHP code into Abstract Syntax Trees (AST)
- Query and traverse AST nodes using graph-based queries
- Access PHP-Parser attributes (line numbers, comments, etc.)
- **Rewrite and transform PHP code** (NEW!)
- Generate PHP code from modified ASTs (lossless round-trip)
- Type-safe operations with generic support

## Setup

First, let's import the library and set up our environment.

In [1]:
import sys
sys.path.insert(0, '../src')

from php_parser_py import (parse_code, parse_file, parse_project, PrettyPrinter, AST, Node)


## 1. Basic Parsing

The library provides three parsing methods (see [design](design.md)):

- **`parse_code(code: str)`**: Parse a code string into a list of top-level statement nodes (no project/file structure). Node IDs: `node_1`, `node_2`, ...
- **`parse_file(path: str)`**: Parse a single PHP file; returns an AST with project → file → statements. Root node ID: `"project"`; file node: 8-char hex hash; statements: `{hex}_1`, `{hex}_2`, ...
- **`parse_project(project_path: str, file_filter=...)`**: Recursively discover PHP files under a directory, then parse into one AST with project → multiple files → statements. Default filter: `.php` suffix.


In [2]:
# Simple PHP code
php_code = """
<?php
function greet($name) {
    echo "Hello, " . $name;
    return true;
}
"""

# Parse via parse_file() to get AST with project → file → statements
import tempfile
import os
with tempfile.NamedTemporaryFile(mode='w', suffix='.php', delete=False) as f:
    f.write(php_code)
    temp_path = f.name
try:
    ast = parse_file(temp_path)
    print(f"Parsed AST with {len(list(ast.nodes()))} nodes")
    print(f"AST type: {type(ast)}")

    # Check project structure
    project = ast.project_node()
    if project:
        print(f"Project path: {project.get_property('path')}")

    files = ast.file_nodes()
    print(f"Number of files: {len(files)}")
    if files:
        file_node = files[0]
        print(f"File path (relative): {file_node.get_property('path')}")
        print(f"File path (absolute): {file_node.get_property('filePath')}")
finally:
    os.unlink(temp_path)


Parsed AST with 14 nodes
AST type: <class 'php_parser_py._ast.AST'>
Project path: /private/var/folders/2s/wv1ypwr92msg9lmk6wtw0v000000gn/T
Number of files: 1
File path (relative): tmpdk_p8qw3.php
File path (absolute): /private/var/folders/2s/wv1ypwr92msg9lmk6wtw0v000000gn/T/tmpdk_p8qw3.php


## 2. Type-Safe Node Querying

The AST provides type-safe querying with generic support.

In [3]:
# Find all nodes
all_nodes = list(ast.nodes())
print(f"Total nodes: {len(all_nodes)}\n")

# Find function nodes (type-safe)
functions = list(ast.nodes(lambda n: n.node_type == "Stmt_Function"))
print(f"Found {len(functions)} function(s)")

# Get the first function
if functions:
    func: Node = functions[0]
    print(f"Function type: {func.node_type}")
    print(f"Function at line: {func.start_line}")

Total nodes: 14

Found 1 function(s)
Function type: Stmt_Function
Function at line: 3


## 3. Accessing Node Properties

Nodes support both Pythonic property access and dict-like access.

In [4]:
# Get the function node
func = ast.first_node(lambda n: n.node_type == "Stmt_Function")

if func:
    print("=== Pythonic Property Access ===")
    print(f"Node type: {func.node_type}")
    print(f"Start line: {func.start_line}")
    print(f"End line: {func.end_line}")
    
    print("\n=== Dict-like Access ===")
    print(f"Node type: {func['nodeType']}")
    print(f"By reference: {func['byRef']}")
    print(f"Has 'returnType': {'returnType' in func}")
    
    print("\n=== All Properties ===")
    for key, value in func.all_properties.items():
        if not isinstance(value, (dict, list)):
            print(f"  {key}: {value}")

=== Pythonic Property Access ===
Node type: Stmt_Function
Start line: 3
End line: 6

=== Dict-like Access ===
Node type: Stmt_Function
By reference: False
Has 'returnType': True

=== All Properties ===
  nodeType: Stmt_Function
  startLine: 3
  startTokenPos: 2
  startFilePos: 7
  endLine: 6
  endTokenPos: 25
  endFilePos: 76
  byRef: False
  returnType: None
  namespacedName: None


## 4. Graph Traversal

Navigate the AST using graph operations.

## 11. Working with Project Structure

Demonstrate parsing files and projects with path management.

In [5]:
import tempfile
import os
from pathlib import Path

# Example 1: Parse code without project structure
print("=== Example 1: parse_code() ===")
code = "<?php function helper() { return 1; }"
nodes = parse_code(code)
print(f"parse_code() returns: {type(nodes)}")
print(f"Number of nodes: {len(nodes)}")
print(f"First node type: {nodes[0].node_type}")

# Example 2: Parse a single file
print("\n=== Example 2: parse_file() ===")
with tempfile.NamedTemporaryFile(mode='w', suffix='.php', delete=False) as f:
    f.write('<?php function example() {}')
    temp_file = f.name

try:
    ast_file = parse_file(temp_file)
    project = ast_file.project_node()
    print(f"Project path: {project.get_property('path')}")
    
    files = ast_file.file_nodes()
    if files:
        file_node = files[0]
        print(f"File relative path: {file_node.get_property('path')}")
        print(f"File absolute path: {file_node.get_property('filePath')}")
        
        # Get file containing a specific node
        stmt = ast_file.first_node(lambda n: n.node_type == "Stmt_Function")
        if stmt:
            containing_file = ast_file.get_file(stmt.id)
            print(f"Function is in file: {containing_file.get_property('path')}")
finally:
    os.unlink(temp_file)

# Example 3: Parse project directory (recursive file discovery per design)
print("\n=== Example 3: parse_project() ===")
with tempfile.TemporaryDirectory() as tmpdir:
    subdir = Path(tmpdir) / 'src'
    subdir.mkdir()

    (subdir / 'file1.php').write_text('<?php function a() {}')
    (subdir / 'file2.php').write_text('<?php class B {}')

    ast_project = parse_project(tmpdir)
    project = ast_project.project_node()
    print(f"Project path: {project.get_property('path')}")
    
    print(f"\nFiles in project ({len(ast_project.file_nodes())}):")
    for file_node in ast_project.file_nodes():
        print(f"  - {file_node.get_property('filePath')} (relative)")
        print(f"    {file_node.get_property('filePath')} (absolute)")
    
    # Generate code for all files
    printer = PrettyPrinter()
    generated = printer.print(ast_project)
    print(f"\nGenerated code for {len(generated)} files:")
    for file_path, code in generated.items():
        print(f"\nFile: {file_path}")
        print(code[:100] + "..." if len(code) > 100 else code)

=== Example 1: parse_code() ===
parse_code() returns: <class 'list'>
Number of nodes: 1
First node type: Stmt_Function

=== Example 2: parse_file() ===
Project path: /private/var/folders/2s/wv1ypwr92msg9lmk6wtw0v000000gn/T
File relative path: tmpf7j3ko9l.php
File absolute path: /private/var/folders/2s/wv1ypwr92msg9lmk6wtw0v000000gn/T/tmpf7j3ko9l.php
Function is in file: tmpf7j3ko9l.php

=== Example 3: parse_project() ===
Project path: /private/var/folders/2s/wv1ypwr92msg9lmk6wtw0v000000gn/T/tmp4jkz2ph1

Files in project (2):
  - /private/var/folders/2s/wv1ypwr92msg9lmk6wtw0v000000gn/T/tmp4jkz2ph1/src/file1.php (relative)
    /private/var/folders/2s/wv1ypwr92msg9lmk6wtw0v000000gn/T/tmp4jkz2ph1/src/file1.php (absolute)
  - /private/var/folders/2s/wv1ypwr92msg9lmk6wtw0v000000gn/T/tmp4jkz2ph1/src/file2.php (relative)
    /private/var/folders/2s/wv1ypwr92msg9lmk6wtw0v000000gn/T/tmp4jkz2ph1/src/file2.php (absolute)

Generated code for 2 files:

File: /private/var/folders/2s/wv1ypwr92msg9lmk6

In [6]:
# Find different types of nodes
node_types = {}
for node in ast.nodes():
    node_type = node.node_type
    if node_type:
        node_types[node_type] = node_types.get(node_type, 0) + 1

print("Node types in the AST:")
for node_type, count in sorted(node_types.items()):
    print(f"  {node_type}: {count}")

Node types in the AST:
  Expr_BinaryOp_Concat: 1
  Expr_ConstFetch: 1
  Expr_Variable: 2
  File: 1
  Identifier: 1
  Name: 1
  Param: 1
  Project: 1
  Scalar_String: 1
  Stmt_Echo: 1
  Stmt_Function: 1
  Stmt_InlineHTML: 1
  Stmt_Return: 1


## 5. Finding Specific Patterns

Use lambda functions to find specific code patterns.

In [7]:
# Find echo statements
echo_nodes = list(ast.nodes(lambda n: n.node_type == "Stmt_Echo"))
print(f"Found {len(echo_nodes)} echo statement(s)")

# Find variable expressions
var_nodes = list(ast.nodes(lambda n: n.node_type == "Expr_Variable"))
print(f"Found {len(var_nodes)} variable expression(s)")

# Print variable names
print("\nVariable names:")
for var in var_nodes:
    if 'name' in var:
        print(f"  ${var['name']}")

Found 1 echo statement(s)
Found 2 variable expression(s)

Variable names:
  $name
  $name


## 6. Code Generation (Round-trip)

Generate PHP code from the AST.

In [8]:
# Generate PHP code from AST
# Note: printer.print() now returns a dict mapping file paths to code
printer = PrettyPrinter()
generated = printer.print(ast)

print("Generated PHP code:")
print("=" * 50)
# Get code from dict (for single file, use first value)
if isinstance(generated, dict):
    for file_path, code in generated.items():
        if file_path:
            print(f"File: {file_path}")
        print(code)
else:
    # Fallback for backward compatibility
    print(generated)
print("=" * 50)


Generated PHP code:
File: /private/var/folders/2s/wv1ypwr92msg9lmk6wtw0v000000gn/T/tmpdk_p8qw3.php

<?php 
function greet($name)
{
    echo "Hello, " . $name;
    return true;
}


## 7. AST Transformation - Wrapping Variables in Function Calls

**NEW!** Transform the AST by wrapping variables in function calls.

Example: `$data` → `sanitize($data)`

In [9]:
# Original code with user input
unsafe_code = """
<?php
echo $userInput;
$result = $userInput . " processed";
"""

# Use parse_code + build AST with project for printing, or parse_file on a temp file
import tempfile
import os as _os
with tempfile.NamedTemporaryFile(mode='w', suffix='.php', delete=False) as _f:
    _f.write(unsafe_code)
    _temp_path = _f.name
ast2 = parse_file(_temp_path)
try:
    print("Original code:")
    print(unsafe_code)

    # Transform: wrap all $userInput in sanitize() (uses internal _storage per design)
    def wrap_variable_in_function(ast_obj: AST, var_name: str, func_name: str):
        """Wrap all occurrences of a variable in a function call."""
        storage = ast_obj._storage
        var_nodes = [
            node for node in ast_obj.nodes()
            if node.node_type == "Expr_Variable" and node.get_property("name") == var_name
        ]

        for var_node in var_nodes:
            parent_edges = [
                e for e in storage.get_edges()
                if e[1] == var_node.id and e[2] == "PARENT_OF"
            ]
        
            if not parent_edges:
                continue

            parent_id = parent_edges[0][0]
            edge_props = storage.get_edge_props(parent_edges[0])
            field_name = edge_props.get("field")

            name_id = f"new_name_{var_node.id}"
            storage.add_node(name_id)
            storage.set_node_props(name_id, {
                "nodeType": "Name",
                "parts": [func_name],
                "startLine": var_node.start_line,
                "endLine": var_node.end_line,
            })

            arg_id = f"new_arg_{var_node.id}"
            storage.add_node(arg_id)
            storage.set_node_props(arg_id, {
                "nodeType": "Arg",
                "name": None,
                "byRef": False,
                "unpack": False,
                "startLine": var_node.start_line,
                "endLine": var_node.end_line,
            })

            funccall_id = f"new_funccall_{var_node.id}"
            storage.add_node(funccall_id)
            storage.set_node_props(funccall_id, {
                "nodeType": "Expr_FuncCall",
                "startLine": var_node.start_line,
                "endLine": var_node.end_line,
            })

            storage.add_edge((funccall_id, name_id, "PARENT_OF"))
            storage.set_edge_props((funccall_id, name_id, "PARENT_OF"), {"field": "name"})
            storage.add_edge((funccall_id, arg_id, "PARENT_OF"))
            storage.set_edge_props((funccall_id, arg_id, "PARENT_OF"), {"field": "args", "index": 0})
            storage.add_edge((arg_id, var_node.id, "PARENT_OF"))
            storage.set_edge_props((arg_id, var_node.id, "PARENT_OF"), {"field": "value"})
            storage.remove_edge((parent_id, var_node.id, "PARENT_OF"))
            storage.add_edge((parent_id, funccall_id, "PARENT_OF"))
            storage.set_edge_props((parent_id, funccall_id, "PARENT_OF"), edge_props)

    wrap_variable_in_function(ast2, "userInput", "sanitize")

    # Generate transformed code (PrettyPrinter.print returns dict[str, str] per design)
    printer = PrettyPrinter()
    transformed = printer.print(ast2)
    print("\nTransformed code (with sanitization):")
    if isinstance(transformed, dict):
        print(list(transformed.values())[0])
    else:
        print(transformed)
finally:
    _os.unlink(_temp_path)


Original code:

<?php
echo $userInput;
$result = $userInput . " processed";


Transformed code (with sanitization):

<?php 
echo sanitize($userInput);
$result = sanitize($userInput) . " processed";


## 8. AST Transformation - Modifying String Values

Modify scalar values in the AST.

In [10]:
# Code with a string (parse_file for project structure so printer can generate)
import os
code_with_string = '<?php echo "Hello";'
with tempfile.NamedTemporaryFile(mode='w', suffix='.php', delete=False) as _f:
    _f.write(code_with_string)
    _path3 = _f.name
ast3 = parse_file(_path3)
try:
    print("Original code:")
    print(code_with_string)

    string_node = ast3.first_node(lambda n: n.node_type == "Scalar_String")
    if string_node:
        print(f"\nOriginal string value: {string_node['value']}")
        props = (ast3._storage.get_node_props(string_node.id) or {}).copy()
        props['value'] = 'World'
        props['rawValue'] = '"World"'
        ast3._storage.set_node_props(string_node.id, props)
        print(f"Modified string value: {string_node.get_property('value')}")

    modified = printer.print(ast3)
    print("\nModified code:")
    if isinstance(modified, dict):
        print(list(modified.values())[0])
    else:
        print(modified)
finally:
    os.unlink(_path3)


Original code:
<?php echo "Hello";

Original string value: Hello
Modified string value: World

Modified code:
<?php

echo "World";


## 9. Working with Complex PHP Code

Parse and query a class with methods.

In [11]:
complex_php = """
<?php
class User {
    private $name;
    private $email;
    
    public function __construct($name, $email) {
        $this->name = $name;
        $this->email = $email;
    }
    
    public function getName() {
        return $this->name;
    }
}
"""

with tempfile.NamedTemporaryFile(mode='w', suffix='.php', delete=False) as _f:
    _f.write(complex_php)
    _complex_path = _f.name
ast4 = parse_file(_complex_path)
try:
    # Find class nodes
    classes = list(ast4.nodes(lambda n: n.node_type == "Stmt_Class"))
    print(f"Found {len(classes)} class(es)")

    if classes:
        cls = classes[0]
        # Note: class name is in a child Identifier node
        print(f"\nClass at lines {cls.start_line}-{cls.end_line}")

    # Find methods
    methods = list(ast4.nodes(lambda n: n.node_type == "Stmt_ClassMethod"))
    print(f"\nFound {len(methods)} method(s):")
    for method in methods:
        # Method name is in child Identifier node (see docs/php_parser_ast.md)
        print(f"  - Method at lines {method.start_line}-{method.end_line}")
finally:
    os.unlink(_complex_path)

Found 1 class(es)

Class at lines 3-15

Found 2 method(s):
  - Method at lines 7-10
  - Method at lines 12-14


## 10. Edge Traversal

Navigate parent-child relationships using edges.

In [12]:
# Get a function node
func = ast.first_node(lambda n: n.node_type == "Stmt_Function")

if func:
    # Find child nodes using edges
    print(f"Function node: {func.id}")
    print("\nChild nodes:")
    
    # succ() = children via PARENT_OF (design: inherited from AbcGraphQuerier)
    for child in ast.succ(func):
        edge = ast.edge(func.id, child.id, "PARENT_OF")
        field = edge.field if edge else "unknown"
        print(f"  - {field}: {child.node_type} (line {child.start_line})")


Function node: f0c8163c_2

Child nodes:
  - name: Identifier (line 3)
  - params: Param (line 3)
  - stmts: Stmt_Echo (line 4)
  - stmts: Stmt_Return (line 5)


## Summary

This notebook demonstrated:

1. ✅ **Parsing**: Convert PHP code to AST using `parse_code()`, `parse_file()`, `parse_project()` (see [design](design.md))
2. ✅ **Type-Safe Querying**: Find nodes with generic type support
3. ✅ **Properties**: Access node data via properties or dict-like syntax
4. ✅ **Traversal**: Navigate the AST structure using `succ()`, `prev()`, `descendants()`, `ancestors()`
5. ✅ **Attributes**: Access PHP-Parser metadata (lines, positions)
6. ✅ **Code Generation**: Lossless round-trip from AST back to PHP (returns dict of file paths to code)
7. ✅ **AST Transformation**: Modify and rewrite PHP code
8. ✅ **Edge Traversal**: Navigate parent-child relationships
9. ✅ **Project Structure**: Work with project and file nodes, path management
10. ✅ **File Management**: Access project paths, file relative/absolute paths, find file containing a node

### Key Features

- **Type-Safe**: Generic support with `AST[Node, Edge]`
- **Dynamic**: No hardcoded node types - all types from PHP-Parser
- **Pythonic**: Properties with snake_case naming
- **Flexible**: Both property and dict-like access
- **Powerful**: Full AST transformation capabilities
- **Complete**: Full access to PHP-Parser's features
- **Graph-based**: Powered by cpg2py for advanced queries
- **Multi-file Support**: Parse and manage multiple files with project structure
- **Path Management**: Track project root and file relative/absolute paths

### API Overview

- **Parsing**: `parse_code(code)`, `parse_file(path)`, `parse_project(project_path, file_filter=...)`
- **AST**: `project_node()`, `file_nodes()`, `get_file(node_id)`, `node(id)`, `succ`/`prev`/`descendants`/`ancestors`, `to_json()`
- **Code generation**: `PrettyPrinter.print(ast)` → `dict[str, str]`; `PrettyPrinter.print_file(ast, relative_path)` → `str`

For more information, see the [README](../README.md) and [design documentation](design.md).
