Skip to content

fix: Address mechanical review findings (security, performance, code hygiene)#52

Merged
mingjerli merged 3 commits intomainfrom
fix/mechanical-review-fixes
Feb 5, 2026
Merged

fix: Address mechanical review findings (security, performance, code hygiene)#52
mingjerli merged 3 commits intomainfrom
fix/mechanical-review-fixes

Conversation

@mingjerli
Copy link
Owner

Summary

Addresses 6 mechanical fixes identified during a staff engineer review of the clgraph codebase:

  • Item 1 (Security): Fix Jinja2 SSTI vulnerability by replacing jinja2.Template with SandboxedEnvironment in multi_query.py
  • Item 2 (Dependencies): Remove unused cloudpickle from core dependencies
  • Item 3 (Code hygiene): Replace all print() calls with structured logging module usage; strip emoji from log messages
  • Item 4 (Performance): Replace list.pop(0) (O(n)) with collections.deque.popleft() (O(1)) across 11 BFS traversals in 5 files
  • Item 5 (Performance): Add edge adjacency indices (_outgoing_index/_incoming_index) to ColumnLineageGraph and PipelineLineageGraph for O(1) edge lookups instead of O(E) linear scans; cache _build_column_dependencies()
  • Item 6 (Error handling): Narrow all 17 except Exception catches to specific exception types (SqlglotError, KeyError, ValueError, etc.); add debug-level logging at boundary handlers

Files changed (15)

File Changes
multi_query.py SSTI fix (Item 1)
pyproject.toml / uv.lock Remove cloudpickle (Item 2)
pipeline.py Logging, deque, adjacency indices, exception narrowing (Items 3-6)
execution.py Logging, exception narrowing (Items 3, 6)
agent.py Logging, exception narrowing (Items 3, 6)
column.py Deque, adjacency indices, exception narrowing (Items 4-6)
models.py Deque, adjacency indices (Items 4-5)
lineage_builder.py Deque, exception narrowing (Items 4, 6)
visualizations.py Deque, exception narrowing (Items 4, 6)
table.py Exception narrowing (Item 6)
__init__.py Exception narrowing (Item 6)
mcp/server.py Logging, exception narrowing (Items 3, 6)
tools/sql.py Exception narrowing (Item 6)
tests/test_validation_framework.py Update 2 tests to check log records instead of stdout

Test plan

  • All 778 tests pass (40 skipped - optional Dagster dependency)
  • ruff check and ruff format pass
  • Verify SandboxedEnvironment blocks SSTI payloads like {{ ''.__class__.__mro__[1].__subclasses__() }}
  • Verify logging.basicConfig(level=logging.INFO) shows pipeline progress
  • Spot-check adjacency index correctness on enterprise demo pipeline

…hygiene)

- Fix Jinja2 SSTI vulnerability using SandboxedEnvironment (Item 1)
- Remove unused cloudpickle dependency (Item 2)
- Replace print() with structured logging, strip emoji (Item 3)
- Replace list.pop(0) with deque.popleft() for O(1) BFS (Item 4)
- Add edge adjacency indices for O(1) graph lookups (Item 5)
- Narrow except Exception to specific types at 17 locations (Item 6)
Add *args to BaseTool.run() so subclasses can define specific
parameters without violating the Liskov Substitution Principle.
- agent.py: Use lambda for sorted key to avoid list[Sized] inference
- lineage_builder.py: Add type: ignore for union-attr, ensure str types
- mcp/server.py: Add type: ignore for optional mcp imports, assert
  Server availability, annotate schema dict
- multi_query.py: Remove unused type: ignore comment
- orchestrators: Add type: ignore for optional dagster/prefect imports
- query_parser.py: Use str() to ensure alias types after guard blocks
- tools/governance.py: Suppress mixed-type dict append with type: ignore
- visualizations.py: Replace hasattr with isinstance for type narrowing
@mingjerli mingjerli merged commit c2331fb into main Feb 5, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant