Skip to content

feat(graph,encoder): persist DependencyEdge symbols and add cross-boundary excerpts#156

Merged
amondnet merged 6 commits intomainfrom
feat/dependency-edge-symbols-cross-excerpts
Mar 4, 2026
Merged

feat(graph,encoder): persist DependencyEdge symbols and add cross-boundary excerpts#156
amondnet merged 6 commits intomainfrom
feat/dependency-edge-symbols-cross-excerpts

Conversation

@amondnet
Copy link
Contributor

@amondnet amondnet commented Mar 4, 2026

Summary

  • DependencyEdge symbol persistence (packages/graph/src/adapters.ts): edgeToAttrs() now persists symboldep_symbol and targetSymboldep_target_symbol; attrsToEdge() restores them on round-trip
  • Cross-boundary code excerpts (packages/encoder/src/encoder.ts): new getCrossBoundaryExcerpts() method reads actual import/call lines from source files for cross-area dependency edges; wired into analyzeCrossAreaDataFlows() replacing the empty '' TODO

Test plan

  • Round-trip test: symbol/targetSymbol survive serializedeserialize
  • Cross-boundary excerpts: positive case with temp files, empty case with same-area edges
  • bun run typecheck — clean
  • Graph tests: 18/18 pass

Closes #150, closes #153


Summary by cubic

Persist DependencyEdge symbols across serialize/deserialize and add safe cross-area code excerpts in the encoder with symlink-aware path checks and backtick escaping, fulfilling #150 and #153.

  • New Features

    • Graph adapters serialize/deserialize dep_symbol and dep_target_symbol with null checks to preserve empty strings.
    • Encoder adds getCrossBoundaryExcerpts and uses it in analyzeCrossAreaDataFlows to read import/call lines for up to 50 cross-area edges, with memoized area lookup, path traversal and symlink containment checks, 300-char truncation, and warning logs.
  • Bug Fixes

    • Adapters use null checks for dependencyType and now throw on unknown edge types to avoid silent fallthrough.
    • Removed DataFlowEdge handling on this branch where data_flow edges don’t exist.
    • Escaped backticks in cross-boundary excerpts to prevent prompt injection in markdown prompts.

Written for commit 5f98480. Summary will update on new commits.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the repository planning graph's capabilities by improving how dependency information is stored and presented. It introduces the persistence of symbol and targetSymbol attributes for dependency edges, ensuring richer graph data. Additionally, it implements a mechanism to generate concrete code excerpts for cross-boundary dependencies, providing more actionable insights into how different domain areas interact.

Highlights

  • Dependency Edge Symbol Persistence: The symbol and targetSymbol properties of DependencyEdge objects are now correctly serialized and deserialized, ensuring their persistence across graph operations.
  • Cross-Boundary Code Excerpt Generation: A new method, getCrossBoundaryExcerpts, has been introduced to extract actual code lines for dependency edges that span different domain areas, replacing a previous placeholder.
Changelog
  • packages/encoder/src/encoder.ts
    • Implemented getCrossBoundaryExcerpts to read import/call lines for cross-area dependency edges.
    • Integrated getCrossBoundaryExcerpts into analyzeCrossAreaDataFlows to populate cross-boundary code excerpts.
  • packages/encoder/tests/encoder.test.ts
    • Added a test suite for RPGEncoder.getCrossBoundaryExcerpts to verify its functionality.
    • Included tests for both successful excerpt generation and cases where no cross-area edges exist.
  • packages/graph/src/adapters.ts
    • Updated edgeToAttrs to serialize symbol and targetSymbol from DependencyEdge to dep_symbol and dep_target_symbol attributes.
    • Updated attrsToEdge to deserialize dep_symbol and dep_target_symbol attributes back into symbol and targetSymbol for DependencyEdge.
  • packages/graph/tests/graph.test.ts
    • Added a test case to confirm that symbol and targetSymbol properties of DependencyEdge objects are preserved during graph serialization and deserialization.
Activity
  • Verified symbol/targetSymbol persistence through a round-trip test.
  • Confirmed cross-boundary excerpts generation with positive and empty test cases.
  • Ensured type-checking passes (bun run typecheck).
  • Confirmed all graph tests pass (18/18).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 4 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/graph/src/adapters.ts">

<violation number="1" location="packages/graph/src/adapters.ts:136">
P2: Using truthy checks for `symbol`/`targetSymbol` can silently drop valid empty-string values and break exact serialize→deserialize round-trips. Use null/undefined checks instead of truthiness.</violation>
</file>

<file name="packages/encoder/tests/encoder.test.ts">

<violation number="1" location="packages/encoder/tests/encoder.test.ts:630">
P2: Avoid hardcoded `/tmp/...` paths in tests; derive a temp path via `os.tmpdir()`/`path.join(...)` so the test runs consistently across platforms.

(Based on your team's feedback about avoiding hardcoded absolute paths in tests.) [FEEDBACK_USED]</violation>
</file>

<file name="packages/encoder/src/encoder.ts">

<violation number="1" location="packages/encoder/src/encoder.ts:960">
P2: Do not silently swallow excerpt read failures; log a non-fatal warning so partial analysis degradation is observable.

(Based on your team's feedback about avoiding silent catch-block failures in recoverable paths.) [FEEDBACK_USED]</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two main features: persisting symbol and targetSymbol for DependencyEdge during serialization, and implementing the generation of cross-boundary code excerpts for dependency analysis. While the new getCrossBoundaryExcerpts method enhances data flow analysis, it introduces two critical security concerns: a potential path traversal vulnerability via symbolic links and a risk of indirect prompt injection by including unsanitized file content in LLM prompts. These issues must be addressed to ensure the tool can safely handle untrusted repositories. Furthermore, there is an opportunity for performance improvement in getCrossBoundaryExcerpts that could benefit large repositories.

amondnet added 2 commits March 4, 2026 16:32
…ndary excerpts

- Add symbol/targetSymbol persistence in edgeToAttrs/attrsToEdge adapters
  so these fields survive serialize→deserialize round-trips (#150)
- Add getCrossBoundaryExcerpts() to RPGEncoder that reads import/call
  lines from source files for cross-area dependency edges, providing
  real code context to the LLM data-flow analysis prompt (#153)
- Add round-trip test for symbol/targetSymbol in graph.test.ts
- Add tests for cross-boundary excerpts in encoder.test.ts

Closes #150
Closes #153
The data_flow edge type doesn't exist on this branch (it's added in the
DataFlowEdge migration PR). Remove the DataFlowEdge import and attrs
handling that was incorrectly included, keeping only the DependencyEdge
symbol persistence changes (dep_symbol, dep_target_symbol).
@amondnet amondnet force-pushed the feat/dependency-edge-symbols-cross-excerpts branch from 019c730 to 02c5dc1 Compare March 4, 2026 07:41
- Use null checks instead of truthy for DependencyEdge symbol/targetSymbol
  in adapters.ts to avoid silently dropping empty-string values
- Replace hardcoded /tmp/test path with os.tmpdir() in encoder tests
  for cross-platform consistency
- Add warning log to catch block in getCrossBoundaryExcerpts instead
  of silently swallowing read failures
- Add memoization cache to findDomainArea for performance improvement
  on large repositories
- Add path traversal and symlink containment check in getCrossBoundaryExcerpts
  using path.resolve pre-check and realpath verification, hoisted outside loop
- Truncate source code lines at 300 chars before embedding in LLM prompt
- Extract magic number 50 to named MAX_CROSS_BOUNDARY_EXCERPTS constant
- Fix dependencyType guard from truthy to null-check in edgeToAttrs
- Change attrsToEdge to throw on unknown edge types instead of silent fallthrough
- Remove unused createLogger import from adapters.ts
- Add direct unit tests for edgeToAttrs/attrsToEdge covering symbol fields
- Add test for file-read failure path in getCrossBoundaryExcerpts
- Strengthen output format assertion with regex in excerpt tests
- Fix cross-platform temp path using Date.now() suffix in tests
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/encoder/src/encoder.ts">

<violation number="1" location="packages/encoder/src/encoder.ts:956">
P1: Path traversal pre-check is broken when `repoPath` is a symlink: `preResolved` uses `path.resolve` (no symlink resolution) but is compared against `resolvedRepo` from `realpath` (symlinks resolved). When `repoPath` is itself a symlink, the two will never share a prefix, causing all legitimate files to be silently skipped. The pre-check should compare against `path.resolve(repoPath)` instead of `resolvedRepo`.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

The pre-check compared preResolved (path.resolve, no symlink resolution)
against resolvedRepo (realpath, symlinks resolved). When repoPath itself
is a symlink, the two never share a prefix, causing all legitimate files
to be silently skipped.

Fix: compare preResolved against path.resolve(repoPath) instead of
resolvedRepo so both sides use consistent non-symlink resolution. The
actual symlink boundary check (after realpath) remains unchanged.

Addresses P1 review from cubic-dev-ai on PR #156.
@amondnet amondnet self-assigned this Mar 4, 2026
… prompt injection

Source code lines passed directly to LLM prompts could contain backtick
sequences that break out of markdown code spans or embed instructions.
Escape backticks in the excerpt after truncation so the string is safe
to embed in a markdown-formatted LLM prompt.

Addresses security feedback from gemini-code-assist on PR #156.
@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 4, 2026

@amondnet amondnet merged commit 0fadc4f into main Mar 4, 2026
7 checks passed
@amondnet amondnet deleted the feat/dependency-edge-symbols-cross-excerpts branch March 4, 2026 11:23
@pleaeai-bot pleaeai-bot bot mentioned this pull request Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Populate cross-area data flow code excerpts for LLM analysis Fix DependencyEdge symbol/targetSymbol persistence in graph store

1 participant