feat: add dataflow analysis (flows_to, returns, mutates)#254
feat: add dataflow analysis (flows_to, returns, mutates)#254carlos-alm merged 2 commits intomainfrom
Conversation
Track how data moves through functions with three new edge types: - flows_to: parameter/variable flows into another function as argument - returns: call return value is captured by the caller - mutates: parameter-derived value is mutated in-place Opt-in via `build --dataflow` (JS/TS only for MVP). Adds schema migration v10 (dataflow table), extractDataflow() AST walker with scope tracking and confidence scoring, query functions (dataflowData, dataflowPathData, dataflowImpactData), CLI command with --path and --impact modes, MCP tool, batch support, and programmatic API exports. Impact: 29 functions changed, 33 affected
|
Claude finished @carlos-alm's task —— View job PR Review: Dataflow Analysis FeatureReviewing PR #254 for dataflow analysis functionality. Todo List
Phase 1: Root Cause Analysis ✅ Phase 2: Configuration Change Detection ✅
Phase 3: Test Integrity ✅
Phase 4: Code Quality ✅ Security: Properly secured against SQL injection - all queries use prepared statements with parameterized inputs. File operations use safe path joining and proper error handling. Performance:
Architecture:
Code Quality:
Potential Issues:
Root Cause AnalysisProblem Being Solved: Adding dataflow analysis to track data movement through functions with Why This Approach: Opt-in via Risk Assessment: Low risk. No configuration changes, strong test coverage, proper security practices. Final Recommendation
|
|
All contributors have signed the CLA ✍️ ✅ |
Greptile SummaryAdds dataflow analysis to track how data moves through functions with three new edge types: Key changes:
Implementation quality:
Known limitations (acceptable for MVP):
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[build --dataflow] --> B[Parse files with tree-sitter]
B --> C[Extract symbols & complexity]
C --> D{dataflow flag set?}
D -->|No| E[Complete build]
D -->|Yes| F[extractDataflow AST walk]
F --> G[Track parameters & scope]
F --> H[Track return statements]
F --> I[Track call arg flows]
F --> J[Track mutations]
G --> K[buildDataflowEdges]
H --> K
I --> K
J --> K
K --> L[Resolve function names to node IDs]
L --> M[Insert flows_to edges]
L --> N[Insert returns edges]
L --> O[Insert mutates edges]
M --> P[(dataflow table)]
N --> P
O --> P
P --> Q[dataflow command]
Q --> R{Mode?}
R -->|edges| S[dataflowData: show flows/returns/mutates]
R -->|path| T[dataflowPathData: BFS A→B]
R -->|impact| U[dataflowImpactData: return-dependent blast radius]
S --> V[CLI/MCP/Batch output]
T --> V
U --> V
Last reviewed commit: 45eb976 |
|
I have read the CLA Document and I hereby sign the CLA |
Address review feedback from Greptile: - Track spread arguments (foo(...args)) by unwrapping spread_element - Handle optional chaining (foo?.bar()) in callee name resolution - Track non-declaration assignments (x = foo() without const/let/var) as returns edges - Add 3 tests covering these cases Impact: 3 functions changed, 3 affected
|
Addressed the known limitations identified in the Greptile review (8c82a14):
Added 3 new parser unit tests covering each case. Full suite: 1105 tests pass. |
|
Claude finished @carlos-alm's task —— View job PR Review: Dataflow Analysis FeatureReviewing PR #254 for dataflow analysis functionality and the recent fixes addressing Greptile feedback. Todo List
Phase 1: Root Cause Analysis ✅ Phase 2: Configuration Change Detection ✅
Phase 3: Test Integrity ✅
Phase 4: Code Quality ✅ Security: Properly secured against SQL injection - all queries use prepared statements with parameterized inputs. File operations use safe path joining and proper error handling. Performance:
Architecture:
Code Quality:
Recent Fixes (8c82a14):
Potential Issues:
Root Cause AnalysisProblem Being Solved: Adding dataflow analysis to track data movement through functions with Why This Approach: Opt-in via Risk Assessment: Low risk. No configuration changes, strong test coverage, proper security practices. Final Recommendation
|
|
recheck |
Summary
flows_to(parameter/variable passed as argument),returns(call return value captured), andmutates(parameter-derived value mutated in-place)build --dataflowflag — JS/TS only for MVP, extraction runs as a second AST pass after complexity analysisdataflow <name>CLI command with--path <target>(BFS data flow path) and--impact(return-value-dependent blast radius) modesdataflowtable with confidence scoring and expression trackingChanges
src/dataflow.jsextractDataflow()AST walker with scope tracking,buildDataflowEdges(), query functions (dataflowData,dataflowPathData,dataflowImpactData), CLI formatterssrc/db.jsdataflowtable with source_id, target_id, kind, param_index, expression, line, confidence + indexessrc/builder.js--dataflowopt-in phase after complexity, incremental cleanup for dataflow table, full build cascadesrc/cli.js--dataflowflag onbuild, newdataflowcommand with all standard optionssrc/mcp.jsdataflowtool in BASE_TOOLS with edges/path/impact modessrc/batch.jsdataflowadded to BATCH_COMMANDSsrc/index.jstests/parsers/dataflow-javascript.test.jsextractDataflow()tests/integration/dataflow.test.jstests/unit/mcp.test.jsDogfood results
Test plan
build --dataflowon codegraph itself produces 1439 edgesdataflow,--path,--impact,-jall work