1.4.3 rc by rostilos · Pull Request #126 · rostilos/CodeCrow

rostilos · 2026-02-20T00:00:30Z

This pull request introduces significant improvements to the code review orchestration pipeline, focusing on more accurate severity calibration, better handling and integration of cross-file issues, and enabling Stage 3 aggregation to dismiss false positives via MCP verification. The changes also update prompt instructions to enforce stricter severity rules and add mechanisms for issue dismissal tracking.

Severity calibration and issue handling improvements:

Introduced confidence-based severity calibration in review_file_batch, ensuring that issues from LOW confidence reviews cannot be marked as HIGH severity. This prevents over-reporting of critical issues when the model is uncertain. [1] [2]
Updated prompt instructions in prompt_constants.py to explicitly define severity levels, rules for architectural observations, and pre-output self-checks for every issue. This guides the LLM to report issues more conservatively and accurately. [1] [2] [3]

Cross-file issue integration:

Added _convert_cross_file_issues function and logic in orchestrator.py to merge Stage 2 cross-file issues into the main issue list, ensuring they are included in the final PR report with proper formatting and context. [1] [2]

Stage 3 aggregation and MCP verification enhancements:

Refactored Stage 3 aggregation (execute_stage_3_aggregation and _stage_3_with_mcp) to return both the executive summary and a list of dismissed issue IDs, enabling the filtering out of false positives verified by MCP tool calls. Added parsing logic for the dismissal comment in the LLM's output. [1] [2] [3] [4] [5]
Modified the issue summary for Stage 3 to include issue IDs and a manifest, so dismissed issues can be referenced and removed accurately.
Updated Stage 3 MCP prompt to instruct the LLM to append a dismissal comment for false positives, and clarified rules for when issues can be dismissed.

Repository query improvements (Java):

Refined JPA queries in CodeAnalysisRepository.java to use subqueries for retrieving the latest analysis and top analysis by diff fingerprint or commit hash, improving query accuracy and consistency. [1] [2] [3]

Summary by CodeRabbit

New Features
- Git graph API: browse commits, parents and analysis status.
- Commit history support for VCS providers.
- Scheduled retries for stale branch analyses.
- Cross-file reporting and cross-batch deduplication.
- LLM-driven verification to reduce false positives.
- Clone-lineage tracking for analyses.
Improvements
- Issue IDs included in reports with dismissal workflow.
- Severity calibration based on confidence.
- Stage 1 accepts per-file outlines; deleted-file auto-resolutions.
- Better PR-number propagation across webhook events and merged-PR handling.

…ng and severity calibration

…ce by selecting IDs directly

coderabbitai · 2026-02-20T00:00:45Z

📝 Walkthrough

Walkthrough

Adds git-graph commit tracking and DAG sync services, VCS commit-history APIs, CommitNode persistence and migrations, updates branch/PR processors to mark analyzed commits, and extends the Python orchestrator with cross-batch deduplication, an LLM verification agent, confidence-based severity calibration, Stage‑3 dismissal handling, and file-outline prompt support.

Changes

Cohort / File(s)	Summary
Orchestrator main flow `python-ecosystem/inference-orchestrator/service/review/orchestrator/orchestrator.py`	Inserted cross-batch deduplication, LLM verification step, conversion of cross-file results into per-file `CodeReviewIssue`s, and filtering by Stage‑3 dismissed IDs; added `_convert_cross_file_issues`.
Orchestrator stages & prompts `python-ecosystem/inference-orchestrator/service/review/orchestrator/stages.py`, `python-ecosystem/inference-orchestrator/utils/prompts/prompt_builder.py`, `python-ecosystem/inference-orchestrator/utils/prompts/prompt_constants.py`	Added file outlines to Stage‑1 batch prompt, LOW-confidence severity downgrading (HIGH→MEDIUM), include issue IDs in Stage‑3 summaries, introduced `_extract_dismissed_issues`, and changed Stage‑3 to return `{report, dismissed_issue_ids}`; prompt templates expanded.
Reconciliation & verification `python-ecosystem/inference-orchestrator/service/review/orchestrator/reconciliation.py`, `.../verification_agent.py`	Added semantic-similarity utilities and cross-batch dedupe (`deduplicate_cross_batch_issues`), enhanced reconcile_previous_issues to reuse IDs and new resolution field names, and added `run_verification_agent` with `search_file_content` tool and `VerificationResult`.
Prompt builder types/imports `python-ecosystem/.../model/output_schemas` (imports in orchestrator)	Wired `CodeReviewIssue` type and new utilities into orchestration flow.
Git graph domain & persistence `java-ecosystem/libs/core/.../gitgraph/*`, `java-ecosystem/libs/core/src/main/java/module-info.java`	Added `CommitNode` JPA entity, `CommitAnalysisStatus` enum, `CommitNodeRepository` with bulk mark/queries, and module exports/opens for gitgraph packages.
Git graph service & processors `java-ecosystem/libs/analysis-engine/.../GitGraphSyncService.java`, `.../BranchAnalysisProcessor.java`, `.../PullRequestAnalysisProcessor.java`	New `GitGraphSyncService` for syncing DAG, finding unanalyzed ranges, and marking commits; branch/PR processors integrated with DAG pre-diff path, mark commits analyzed/failed, and constructors updated.
VCS clients & commit model `java-ecosystem/libs/vcs-client/.../VcsClient.java`, `.../github/GitHubClient.java`, `.../bitbucket/cloud/BitbucketCloudClient.java`, `.../model/VcsCommit.java`	Added default `getCommitHistory` to `VcsClient`; implemented `getCommitHistory` in GitHub and Bitbucket clients; introduced `VcsCommit` record. (Duplicate insertion observed in GitHubClient diff.)
CodeAnalysis & repository `java-ecosystem/libs/core/.../codeanalysis/CodeAnalysis.java`, `.../persistence/repository/codeanalysis/CodeAnalysisRepository.java`, `.../CodeAnalysisService.java`	Added `clonedFromAnalysisId` field and accessors; new repository finder `findByProjectIdAndCommitHashIn(...)` and adjusted queries to use id-based subqueries.
DB migrations `java-ecosystem/libs/core/src/main/resources/db/migration/1.5.0/*`	Added columns/indexes for commit analysis state and analyzed backfill; added `cloned_from_analysis_id` column and index.
Branch workflow & dedupe `java-ecosystem/services/pipeline-agent/.../BitbucketCloudBranchWebhookHandler.java`, `.../WebhookDeduplicationService.java`, `.../BranchHealthScheduler.java`	Record PR number for merge commits, enrich push events with recorded PR numbers, add PR-number cache APIs, and scheduler to retry STALE branches with backoff.
Webhook handlers (GitHub) `java-ecosystem/services/pipeline-agent/src/.../GitHubPrMergeWebhookHandler.java`, `java-ecosystem/services/pipeline-agent/src/.../GitHubPullRequestWebhookHandler.java`	Removed standalone GitHub PR-merge handler; merged PR-merge handling into `GitHubPullRequestWebhookHandler`, wiring BranchAnalysisProcessor and merge flow.
Web API `java-ecosystem/services/web-server/src/main/java/.../GitGraphController.java`	New REST controller exposing project git graph (commits + branches) including analysis metadata and severity counts.
Tests `java-ecosystem/.../BranchAnalysisProcessorTest.java`, `.../PullRequestAnalysisProcessorTest.java`	Updated tests to inject new `GitGraphSyncService` and `VcsClientProvider` mocks and adapt constructor calls.
Misc repos `java-ecosystem/libs/core/.../branch/BranchRepository.java`, `.../CodeAnalysisRepository.java`	Added `findByHealthStatusWithProject` and other repository query adjustments (new finder methods and query rewrites).

Sequence Diagram

sequenceDiagram
    participant Client
    participant Orchestrator
    participant Stage1 as Stage 1 (Batch)
    participant Stage2 as Stage 2 (Cross-File)
    participant Stage3 as Stage 3 (Aggregation)
    participant LLM as LLM/MCP
    participant Verifier as VerificationAgent

    Client->>Orchestrator: start review request
    Orchestrator->>Stage1: run Stage 1 (include file_outlines)
    Stage1->>Stage1: severity calibration (LOW confidence: HIGH→MEDIUM)
    Stage1-->>Orchestrator: file_issues

    Orchestrator->>Stage2: run Stage 2
    Stage2-->>Orchestrator: cross_file_issues

    rect rgba(100,150,200,0.5)
    Orchestrator->>Orchestrator: _convert_cross_file_issues()
    Orchestrator->>Orchestrator: deduplicate_cross_batch_issues()
    end

    Orchestrator->>Stage3: execute_stage_3_aggregation
    Stage3->>LLM: request aggregation/report
    LLM-->>Stage3: {report, dismissed_issue_ids}
    Stage3-->>Orchestrator: {report, dismissed_issue_ids}

    rect rgba(150,100,200,0.5)
    Orchestrator->>Orchestrator: filter dismissed_issue_ids from file_issues
    Orchestrator->>Verifier: run_verification_agent(llm, suspect_issues)
    Verifier-->>Orchestrator: filtered_issues
    end

    Orchestrator-->>Client: final_report

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 I hopped through commits, mapped every node,
Synced DAGs and histories down the road.
I nudged the LLM, chased duplicates away,
Downgraded loud warnings to calm the fray.
A carrot for reviewers — neat, traceable play!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 65.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title '1.4.3 rc' is vague and generic, using only a version identifier and release candidate label without conveying meaningful information about the changeset.	Replace with a descriptive title that summarizes the main change, such as 'Add confidence-based severity calibration and cross-file issue deduplication in code review orchestration' or 'Enhance code review verification with MCP dismissal tracking and severity calibration'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch 1.4.3-rc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

python-ecosystem/inference-orchestrator/service/review/orchestrator/stages.py (3)

996-1027: Extract the duplicated severity-calibration block into a shared helper.

The calibration logic at lines 996–1006 and 1018–1027 is identical. Any future change (e.g., adding INFO confidence handling or introducing MEDIUM→LOW downgrading) must be applied in two places.

♻️ Proposed refactor

+def _apply_severity_calibration(
+    reviews: list,
+) -> list:
+    """Downgrade HIGH-severity issues from LOW-confidence reviews to MEDIUM."""
+    all_issues = []
+    for review in reviews:
+        review_confidence = (review.confidence or "MEDIUM").upper()
+        for issue in review.issues:
+            if review_confidence == "LOW" and issue.severity.upper() == "HIGH":
+                logger.info(
+                    f"Downgrading issue in {review.file} from HIGH to MEDIUM "
+                    f"(batch confidence: LOW): {issue.reason[:80]}"
+                )
+                issue.severity = "MEDIUM"
+        all_issues.extend(review.issues)
+    return all_issues

Then replace both blocks:

-            all_batch_issues = []
-            for review in result.reviews:
-                # Confidence-based severity calibration:
-                # LOW confidence reviews should not produce HIGH severity issues
-                review_confidence = (review.confidence or "MEDIUM").upper()
-                for issue in review.issues:
-                    if review_confidence == "LOW" and issue.severity.upper() == "HIGH":
-                        logger.info(...)
-                        issue.severity = "MEDIUM"
-                all_batch_issues.extend(review.issues)
-            return all_batch_issues
+            return _apply_severity_calibration(result.reviews)

Apply the same replacement to the fallback path.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/stages.py`
around lines 996 - 1027, Extract the duplicated "Confidence-based severity
calibration" logic into a shared helper (e.g., calibrate_review_severity(review,
logger)) that inspects review.confidence (default "MEDIUM"), iterates
review.issues, and downgrades issue.severity from "HIGH" to "MEDIUM" when
confidence is "LOW" while logging the same message; then replace both inline
blocks (the structured-output path that appends to all_batch_issues and the
fallback path after parse_llm_response/for review in data.reviews) with calls to
this helper so future changes to confidence rules only need to be applied in one
place.

1524-1551: Two minor style issues in _extract_dismissed_issues.

Inconsistent import alias: import re as _re differs from the bare import re used inside _build_duplication_queries_from_diff (line 645) and _fetch_cross_module_context (line 1354) in the same file.
Untyped return annotation: -> tuple is imprecise. Tuple[str, List[str]] is more descriptive, but Tuple is not currently imported at the top of the file.

♻️ Suggested fix

At the top of the file, add Tuple to the typing import:

-from typing import Any, Dict, List, Optional, Callable
+from typing import Any, Dict, List, Optional, Callable, Tuple

In the function:

-def _extract_dismissed_issues(content: str) -> tuple:
+def _extract_dismissed_issues(content: str) -> Tuple[str, List[str]]:
     ...
-    import re as _re
-    pattern = r'<!--\s*DISMISSED_ISSUES:\s*(\[.*?\])\s*-->'
-    match = _re.search(pattern, content, _re.DOTALL)
+    import re
+    pattern = r'<!--\s*DISMISSED_ISSUES:\s*(\[.*?\])\s*-->'
+    match = re.search(pattern, content, re.DOTALL)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/stages.py`
around lines 1524 - 1551, The function _extract_dismissed_issues uses an
inconsistent import alias (import re as _re) and an imprecise return annotation
(-> tuple); change the re import to plain import re to match other functions
(_build_duplication_queries_from_diff, _fetch_cross_module_context) and update
the signature to -> Tuple[str, List[str]]; also add Tuple (and List if not
already present) to the top-of-file typing imports so the annotation resolves.

1196-1211: Redundant guard and shadowed loop variable in ID manifest block.

Two minor readability nits:

if issues: at line 1207 is dead code — the function already returns "No issues found in Stage 1." when issues is empty (top of function).
[i for i in all_ids if i] at line 1209 reuses i as the filter variable, shadowing the i from the enumerate(top_n, 1) loop above.

♻️ Suggested cleanup

-    # --- Full issue ID manifest (so Stage 3 can dismiss false positives) ---
-    if issues:
-        all_ids = [getattr(i, 'id', '') or '' for i in issues]
-        all_ids = [i for i in all_ids if i]
-        if all_ids:
-            lines.append(f"\nAll issue IDs: {', '.join(all_ids)}")
+    # --- Full issue ID manifest (so Stage 3 can dismiss false positives) ---
+    all_ids = [getattr(issue, 'id', '') or '' for issue in issues]
+    all_ids = [uid for uid in all_ids if uid]
+    if all_ids:
+        lines.append(f"\nAll issue IDs: {', '.join(all_ids)}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/stages.py`
around lines 1196 - 1211, The block generating the full issue ID manifest
contains a redundant guard and a shadowed loop variable: drop the unnecessary
"if issues:" check (the function already returns when issues is empty) and
simplify the all_ids filtering to avoid reusing the loop variable from the
earlier enumerate; e.g., compute all_ids = [getattr(i, 'id', '') or '' for i in
issues] and then filter using a different name or a builtin filter (e.g.,
all_ids = [id_ for id_ in all_ids if id_]) before appending the "All issue IDs"
line — update the code around the "All issue IDs" manifest where variables
ranked, top_n, lines, and all_ids are used.

python-ecosystem/inference-orchestrator/service/review/orchestrator/orchestrator.py (1)

286-293: Add a type annotation to the cross_file_issues parameter.

-def _convert_cross_file_issues(cross_file_issues) -> List[CodeReviewIssue]:
+def _convert_cross_file_issues(cross_file_issues: List[Any]) -> List[CodeReviewIssue]:

Or, if the cross-file issue type is importable from model.multi_stage, use the concrete type for better IDE support.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/orchestrator.py`
around lines 286 - 293, The function _convert_cross_file_issues lacks a
parameter type annotation; update its signature to accept the concrete type
(e.g., cross_file_issues: List[CrossFileIssue] or Iterable[CrossFileIssue]) and
add an import for CrossFileIssue from model.multi_stage (or the correct module
where the Stage 2 type is defined) so IDEs and linters can resolve the type;
ensure any existing usages still pass a compatible iterable/list type.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/codeanalysis/CodeAnalysisRepository.java`:
- Around line 69-72: The JPQL queries use a non-standard "LIMIT 1" inside scalar
subqueries which will likely throw QuerySyntaxException; remove the LIMIT usage
in the `@Query` annotations (e.g. in findLatestByProjectId and the other
"findLatest..." methods flagged) and replace the inner subquery with a
JPQL-compliant MAX aggregate (mirror the pattern used in
findByProjectIdAndPrNumberWithMaxPrVersion: compare createdAt or the appropriate
version/timestamp field and select the entity whose field equals (SELECT
MAX(...))). If you cannot express the tie-breaker with MAX, alternatively switch
the `@Query` to nativeQuery = true and write proper SQL with LIMIT, adjusting
table/column names to match the entity mapping. Ensure you update all three
affected repository methods (the findLatest* methods referenced in the review)
accordingly.

In `@python-ecosystem/inference-orchestrator/utils/prompts/prompt_constants.py`:
- Around line 352-364: The prompt contains contradictory severity guidance: the
"SEVERITY RULES" block mandates architecture/design observations be INFO/LOW,
while the "CROSS-MODULE DUPLICATION DETECTION" and "PRE-OUTPUT SELF-CHECK"
sections label duplication as HIGH; reconcile by either (A) adding a carve-out
to the "SEVERITY RULES" block that permits duplication to be HIGH only when it
produces concrete runtime/conflict evidence (e.g., two interceptors overwriting
behavior, data corruption, or authentication bypass), or (B) changing the
"CROSS-MODULE DUPLICATION DETECTION" and the PRE-OUTPUT SELF-CHECK duplication
wording to conditional language (e.g., "report as HIGH only if duplication
causes concrete runtime conflict; otherwise report as MEDIUM/LOW"); update the
text references for "SEVERITY RULES", "CROSS-MODULE DUPLICATION DETECTION", and
"PRE-OUTPUT SELF-CHECK" accordingly so the rules are non-contradictory and
deterministic for the LLM.

---

Nitpick comments:
In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/orchestrator.py`:
- Around line 286-293: The function _convert_cross_file_issues lacks a parameter
type annotation; update its signature to accept the concrete type (e.g.,
cross_file_issues: List[CrossFileIssue] or Iterable[CrossFileIssue]) and add an
import for CrossFileIssue from model.multi_stage (or the correct module where
the Stage 2 type is defined) so IDEs and linters can resolve the type; ensure
any existing usages still pass a compatible iterable/list type.

In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/stages.py`:
- Around line 996-1027: Extract the duplicated "Confidence-based severity
calibration" logic into a shared helper (e.g., calibrate_review_severity(review,
logger)) that inspects review.confidence (default "MEDIUM"), iterates
review.issues, and downgrades issue.severity from "HIGH" to "MEDIUM" when
confidence is "LOW" while logging the same message; then replace both inline
blocks (the structured-output path that appends to all_batch_issues and the
fallback path after parse_llm_response/for review in data.reviews) with calls to
this helper so future changes to confidence rules only need to be applied in one
place.
- Around line 1524-1551: The function _extract_dismissed_issues uses an
inconsistent import alias (import re as _re) and an imprecise return annotation
(-> tuple); change the re import to plain import re to match other functions
(_build_duplication_queries_from_diff, _fetch_cross_module_context) and update
the signature to -> Tuple[str, List[str]]; also add Tuple (and List if not
already present) to the top-of-file typing imports so the annotation resolves.
- Around line 1196-1211: The block generating the full issue ID manifest
contains a redundant guard and a shadowed loop variable: drop the unnecessary
"if issues:" check (the function already returns when issues is empty) and
simplify the all_ids filtering to avoid reusing the loop variable from the
earlier enumerate; e.g., compute all_ids = [getattr(i, 'id', '') or '' for i in
issues] and then filter using a different name or a builtin filter (e.g.,
all_ids = [id_ for id_ in all_ids if id_]) before appending the "All issue IDs"
line — update the code around the "All issue IDs" manifest where variables
ranked, top_n, lines, and all_ids are used.

coderabbitai · 2026-02-20T00:09:41Z

...a/org/rostilos/codecrow/core/persistence/repository/codeanalysis/CodeAnalysisRepository.java

+    @Query("SELECT ca FROM CodeAnalysis ca WHERE ca.id = " +
+            "(SELECT ca2.id FROM CodeAnalysis ca2 WHERE ca2.project.id = :projectId " +
+            "ORDER BY ca2.createdAt DESC LIMIT 1)")
    Optional<CodeAnalysis> findLatestByProjectId(@Param("projectId") Long projectId);


⚠️ Potential issue | 🟠 Major

LIMIT 1 inside a JPQL subquery is non-standard and likely to throw a QuerySyntaxException at startup

Limiting query results in JPA is different from SQL; the LIMIT keyword is not a direct part of JPQL. JPQL does not allow the addition of LIMIT (maxResults) or OFFSET (startIndex), and neither does a JPA subquery. Even in Hibernate 6.x/7.x where LIMIT is available as an HQL extension, it is only defined at the top-level query's queryOrder clause — not inside a WHERE-clause scalar subquery. Placing LIMIT 1 inside the inner SELECT ca2.id ... at all three changed locations risks a QuerySyntaxException on startup (or a runtime dialect error for databases that reject LIMIT in a scalar subquery context).

The existing findByProjectIdAndPrNumberWithMaxPrVersion query at line 88 demonstrates the correct JPQL approach: use a MAX aggregate in the subquery rather than ORDER BY ... LIMIT 1.

Recommended fixes (applies equally to all three methods):

Option A — use a MAX aggregate (JPQL-compliant, mirrors line 88):

♻️ Proposed fix using MAX aggregate subquery

- `@Query`("SELECT ca FROM CodeAnalysis ca WHERE ca.id = " + - "(SELECT ca2.id FROM CodeAnalysis ca2 WHERE ca2.project.id = :projectId " + - "ORDER BY ca2.createdAt DESC LIMIT 1)") + `@Query`("SELECT ca FROM CodeAnalysis ca WHERE ca.project.id = :projectId " + + "AND ca.createdAt = (SELECT MAX(ca2.createdAt) FROM CodeAnalysis ca2 WHERE ca2.project.id = :projectId)") Optional<CodeAnalysis> findLatestByProjectId(`@Param`("projectId") Long projectId);

- `@Query`("SELECT ca FROM CodeAnalysis ca WHERE ca.id = " + - "(SELECT ca2.id FROM CodeAnalysis ca2 WHERE ca2.project.id = :projectId " + - "AND ca2.diffFingerprint = :diffFingerprint " + - "AND ca2.status = org.rostilos.codecrow.core.model.codeanalysis.AnalysisStatus.ACCEPTED " + - "ORDER BY ca2.createdAt DESC LIMIT 1)") + `@Query`("SELECT ca FROM CodeAnalysis ca WHERE ca.project.id = :projectId " + + "AND ca.diffFingerprint = :diffFingerprint " + + "AND ca.status = org.rostilos.codecrow.core.model.codeanalysis.AnalysisStatus.ACCEPTED " + + "AND ca.createdAt = (SELECT MAX(ca2.createdAt) FROM CodeAnalysis ca2 WHERE ca2.project.id = :projectId " + + "AND ca2.diffFingerprint = :diffFingerprint " + + "AND ca2.status = org.rostilos.codecrow.core.model.codeanalysis.AnalysisStatus.ACCEPTED)") Optional<CodeAnalysis> findTopByProjectIdAndDiffFingerprint(...)

Option B — switch to nativeQuery = true to use raw SQL LIMIT (also valid if the aggregate approach introduces tie-breaking concerns):

♻️ Proposed fix using nativeQuery = true (example for findLatestByProjectId)

- `@Query`("SELECT ca FROM CodeAnalysis ca WHERE ca.id = " + - "(SELECT ca2.id FROM CodeAnalysis ca2 WHERE ca2.project.id = :projectId " + - "ORDER BY ca2.createdAt DESC LIMIT 1)") + `@Query`(value = "SELECT * FROM code_analysis ca WHERE ca.id = " + + "(SELECT ca2.id FROM code_analysis ca2 WHERE ca2.project_id = :projectId " + + "ORDER BY ca2.created_at DESC LIMIT 1)", + nativeQuery = true) Optional<CodeAnalysis> findLatestByProjectId(`@Param`("projectId") Long projectId);

Note: confirm actual table/column names match the entity mapping before applying.

Verify whether the Hibernate version in use accepts LIMIT in a JPQL scalar subquery, and confirm whether the existing queries fail at startup:

#!/bin/bash # Find the Spring Boot or Hibernate version declared in the project's build files fd -e xml -e gradle "pom|build.gradle" --exec grep -n -i "hibernate\|spring-boot" {} \; 2>/dev/null | grep -i "version\|artifactId" | head -40 # Check whether nativeQuery is used anywhere else in the repository for comparison rg -n "nativeQuery\s*=\s*true" --type java

Also applies to: 128-132, 149-153

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/codeanalysis/CodeAnalysisRepository.java` around lines 69 - 72, The JPQL queries use a non-standard "LIMIT 1" inside scalar subqueries which will likely throw QuerySyntaxException; remove the LIMIT usage in the `@Query` annotations (e.g. in findLatestByProjectId and the other "findLatest..." methods flagged) and replace the inner subquery with a JPQL-compliant MAX aggregate (mirror the pattern used in findByProjectIdAndPrNumberWithMaxPrVersion: compare createdAt or the appropriate version/timestamp field and select the entity whose field equals (SELECT MAX(...))). If you cannot express the tie-breaker with MAX, alternatively switch the `@Query` to nativeQuery = true and write proper SQL with LIMIT, adjusting table/column names to match the entity mapping. Ensure you update all three affected repository methods (the findLatest* methods referenced in the review) accordingly.

coderabbitai · 2026-02-20T00:09:41Z

python-ecosystem/inference-orchestrator/service/review/orchestrator/orchestrator.py

+    for cfi in cross_file_issues:
+        # Use first affected file as the primary location
+        primary_file = cfi.affected_files[0] if cfi.affected_files else "cross-file"
+        other_files = cfi.affected_files[1:] if len(cfi.affected_files) > 1 else []
+
+        # Build a comprehensive reason from the cross-file issue fields
+        reason_parts = [cfi.title]
+        if cfi.description:
+            reason_parts.append(cfi.description)
+        if cfi.evidence:
+            reason_parts.append(f"Evidence: {cfi.evidence}")
+        if cfi.business_impact:
+            reason_parts.append(f"Business impact: {cfi.business_impact}")
+        if other_files:
+            reason_parts.append(f"Also affects: {', '.join(other_files)}")
+
+        converted.append(CodeReviewIssue(
+            id=cfi.id,
+            severity=cfi.severity,
+            category=cfi.category,
+            file=primary_file,
+            line="1",  # Cross-file issues have no specific line; use 1 (VCS APIs reject 0)
+            reason="\n".join(reason_parts),
+            suggestedFixDescription=cfi.suggestion or "",
+            suggestedFixDiff=None,
+            isResolved=False,
+        ))


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find the relevant Python files to understand the schema find . -name "*.py" | grep -E "(schema|model|constant)" | head -20

Repository: rostilos/CodeCrow

Length of output: 907

🏁 Script executed:

# Check the CrossFileAnalysisResult and CodeReviewIssue models rg -n "class CrossFileAnalysisResult" -A 20 --type=py

Repository: rostilos/CodeCrow

Length of output: 801

🏁 Script executed:

# Check the output_schemas.py for CodeReviewIssue category valid values rg -n "class CodeReviewIssue" -A 30 --type=py

Repository: rostilos/CodeCrow

Length of output: 4506

🏁 Script executed:

# Check prompt_constants.py for STAGE_2_CROSS_FILE_PROMPT_TEMPLATE rg -n "STAGE_2_CROSS_FILE_PROMPT_TEMPLATE\|DATA_INTEGRITY\|BUSINESS_LOGIC" -B 2 -A 10 --type=py

Repository: rostilos/CodeCrow

Length of output: 43

🏁 Script executed:

# Check Java models if they exist in the repo find . -name "*.java" | head -20

Repository: rostilos/CodeCrow

Length of output: 2565

🏁 Script executed:

# Check the affected_files type in CrossFileAnalysisResult rg -n "affected_files" -B 3 -A 3 --type=py

Repository: rostilos/CodeCrow

Length of output: 2690

🏁 Script executed:

# Check the full CrossFileIssue model definition rg -n "class CrossFileIssue" -A 15 --type=py

Repository: rostilos/CodeCrow

Length of output: 1438

🏁 Script executed:

# Check if affected_files is ever Optional in the model rg -n "affected_files.*Optional\|affected_files.*None" --type=py

Repository: rostilos/CodeCrow

Length of output: 43

🏁 Script executed:

# Check the orchestrator.py file to see the full context of lines 295-321 sed -n '290,325p' ./python-ecosystem/inference-orchestrator/service/review/orchestrator/orchestrator.py

Repository: rostilos/CodeCrow

Length of output: 1428

🏁 Script executed:

# Search for how cfi.category is handled elsewhere in the codebase rg -n "cfi.category\|CrossFileIssue.*category" --type=py -B 2 -A 2

Repository: rostilos/CodeCrow

Length of output: 43

🏁 Script executed:

# Check if the affected_files field can be None despite not being Optional rg -n "affected_files" -B 5 -A 5 ./python-ecosystem/inference-orchestrator/model/multi_stage.py | head -40

Repository: rostilos/CodeCrow

Length of output: 322

🏁 Script executed:

# Check if there are any tests or examples of CrossFileIssue instantiation rg -n "CrossFileIssue(" --type=py -B 2 -A 5

Repository: rostilos/CodeCrow

Length of output: 141

🏁 Script executed:

# Verify the exact prompt template to see what categories the LLM is instructed to use rg -n "STAGE_2_CROSS_FILE_PROMPT_TEMPLATE" -A 50 --type=py | head -80

Repository: rostilos/CodeCrow

Length of output: 9881

DATA_INTEGRITY and BUSINESS_LOGIC categories from Stage 2 prompt are not in CodeReviewIssue valid category set.

STAGE_2_CROSS_FILE_PROMPT_TEMPLATE (prompt_constants.py:527) instructs the LLM to use "SECURITY|ARCHITECTURE|DATA_INTEGRITY|BUSINESS_LOGIC" for cross-file issues. However, CodeReviewIssue.category only accepts SECURITY|PERFORMANCE|CODE_QUALITY|BUG_RISK|STYLE|DOCUMENTATION|BEST_PRACTICES|ERROR_HANDLING|TESTING|ARCHITECTURE. Two LLM-generated categories — DATA_INTEGRITY and BUSINESS_LOGIC — are not in this valid set. Since category is a plain str, Pydantic accepts any value at runtime, but downstream Java persistence validation will reject or silently drop unknown categories.

Add category normalization:

🛡️ Proposed fix — map Stage 2 categories to valid CodeReviewIssue categories

+_CROSS_FILE_CATEGORY_MAP = { + "DATA_INTEGRITY": "BUG_RISK", + "BUSINESS_LOGIC": "CODE_QUALITY", +} + +_VALID_ISSUE_CATEGORIES = { + "SECURITY", "PERFORMANCE", "CODE_QUALITY", "BUG_RISK", + "STYLE", "DOCUMENTATION", "BEST_PRACTICES", "ERROR_HANDLING", + "TESTING", "ARCHITECTURE", +} + def _convert_cross_file_issues(cross_file_issues) -> List[CodeReviewIssue]: converted = [] for cfi in cross_file_issues: primary_file = cfi.affected_files[0] if cfi.affected_files else "cross-file" other_files = cfi.affected_files[1:] if len(cfi.affected_files) > 1 else [] + raw_cat = (cfi.category or "ARCHITECTURE").upper() + category = _CROSS_FILE_CATEGORY_MAP.get(raw_cat, raw_cat) + if category not in _VALID_ISSUE_CATEGORIES: + logger.warning(f"Unknown cross-file category '{raw_cat}', defaulting to ARCHITECTURE") + category = "ARCHITECTURE" ... converted.append(CodeReviewIssue( ... - category=cfi.category, + category=category, ... ))

coderabbitai · 2026-02-20T00:09:41Z

python-ecosystem/inference-orchestrator/utils/prompts/prompt_constants.py

+⚠️ SEVERITY CALIBRATION (follow these rules STRICTLY):
+- **HIGH**: Runtime crash, data corruption, security vulnerability, or authentication bypass that WILL occur in production. You must point to the exact line in the diff that causes it. Speculative or conditional risks are NOT HIGH.
+- **MEDIUM**: Confirmed logic error, missing validation with real impact, resource leak, or performance issue where the faulty code path is visible in the diff.
+- **LOW**: Code smell, minor inconsistency, suboptimal pattern, or improvement opportunity with no runtime risk.
+- **INFO**: Architecture observation, design suggestion, style preference, or pattern recommendation with no functional impact.
+
+SEVERITY RULES:
+1. Architecture/design observations that do NOT cause a runtime failure MUST be INFO or LOW, NEVER HIGH.
+2. Do NOT mark an issue HIGH solely because it involves security-adjacent code — the actual exploitable vulnerability must be demonstrable from the diff.
+3. Missing best practices (e.g., no interface, no factory pattern) are LOW or INFO, not HIGH.
+4. Framework API usage that appears valid for the project's framework version is NOT an issue — do not flag framework-provided interfaces/methods as errors without evidence they don't exist.
+5. If you are less than 80% confident an issue is real, downgrade it by one severity level.
+


⚠️ Potential issue | 🟠 Major

Severity calibration rules directly contradict the existing duplication instructions in the same prompt.

The new SEVERITY RULES at line 359 state "Architecture/design observations that do NOT cause a runtime failure MUST be INFO or LOW, NEVER HIGH." However, the unchanged CROSS-MODULE DUPLICATION DETECTION section at line 374 says "report it as category 'ARCHITECTURE' with severity 'HIGH'" and line 409 says "this is a HIGH severity ARCHITECTURE issue." The PRE-OUTPUT SELF-CHECK item 4 (line 422) adds a third conflicting signal.

The LLM receives all three directives simultaneously in one prompt. Depending on which instruction it weights more heavily (ordering, emphasis, recency), the severity assigned to duplication findings will be non-deterministic across review passes.

Suggested resolution: Either update lines 374 and 409 to use conditional severity language (e.g., "report as HIGH if it causes a concrete runtime conflict such as two interceptors overwriting each other's output; otherwise MEDIUM"), or add an explicit carve-out in the SEVERITY RULES block:

🔧 Proposed fix for SEVERITY RULES carve-out (lines 358-363)

SEVERITY RULES: 1. Architecture/design observations that do NOT cause a runtime failure MUST be INFO or LOW, NEVER HIGH. + Exception: Cross-module duplication where two implementations ACTIVELY CONFLICT at runtime + (e.g., two plugins overwriting the same method return value) MAY be HIGH. + Redundant-but-non-conflicting duplication is MEDIUM at most. 2. Do NOT mark an issue HIGH solely because it involves security-adjacent code — the actual exploitable vulnerability must be demonstrable from the diff.

And update line 374 (existing duplication instruction) accordingly:

-When you find duplication, report it as category "ARCHITECTURE" with severity "HIGH" and include: +When you find duplication, report it as category "ARCHITECTURE". Use HIGH severity only if the conflict +will cause a runtime failure (e.g., two plugins overwriting the same output); use MEDIUM for redundant +but non-conflicting duplication. Include:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python-ecosystem/inference-orchestrator/utils/prompts/prompt_constants.py` around lines 352 - 364, The prompt contains contradictory severity guidance: the "SEVERITY RULES" block mandates architecture/design observations be INFO/LOW, while the "CROSS-MODULE DUPLICATION DETECTION" and "PRE-OUTPUT SELF-CHECK" sections label duplication as HIGH; reconcile by either (A) adding a carve-out to the "SEVERITY RULES" block that permits duplication to be HIGH only when it produces concrete runtime/conflict evidence (e.g., two interceptors overwriting behavior, data corruption, or authentication bypass), or (B) changing the "CROSS-MODULE DUPLICATION DETECTION" and the PRE-OUTPUT SELF-CHECK duplication wording to conditional language (e.g., "report as HIGH only if duplication causes concrete runtime conflict; otherwise report as MEDIUM/LOW"); update the text references for "SEVERITY RULES", "CROSS-MODULE DUPLICATION DETECTION", and "PRE-OUTPUT SELF-CHECK" accordingly so the rules are non-contradictory and deterministic for the LLM.

rostilos · 2026-02-20T09:21:16Z

/codecrow analyze

rostilos · 2026-02-20T09:30:11Z

/codecrow analyze

…cation in multi-stage review process

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (5)

python-ecosystem/inference-orchestrator/service/review/orchestrator/verification_agent.py (3)

128-137: Double LLM invocation: agent run + separate structured-output parse.

The agent executes the full tool-calling loop (line 129), then its text output is re-sent to the LLM for structured extraction (lines 133–137). This doubles latency and token cost for this stage.

Consider using with_structured_output(VerificationResult) directly on the agent's final response, or instructing the agent prompt to emit JSON that can be parsed without a second call.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/verification_agent.py`
around lines 128 - 137, The current flow calls agent_executor.ainvoke(...) to
run the agent and then calls
llm.with_structured_output(VerificationResult).ainvoke(...) on
response['output'], causing a double LLM invocation; instead, ensure the agent
emits structured output directly by either (a) attaching the structured output
parser to the same LLM used by the agent before running agent_executor (use
llm.with_structured_output(VerificationResult) so the agent_executor returns a
parsed VerificationResult), or (b) modify the agent prompt/instructions given to
agent_executor so it emits JSON matching VerificationResult so you can parse
response directly without a second ai call; update code paths that reference
response['output'] and structured_llm.ainvoke(...) accordingly.
149-152: Use logger.exception to preserve the traceback.

Per the static analysis hint (TRY400), logger.error without exc_info=True drops the stack trace. logger.exception automatically includes it, which aids debugging when verification fails.
🔧 Fix
     except Exception as e:
-        logger.error(f"Stage 1.5 Verification failed: {e}")
+        logger.exception(f"Stage 1.5 Verification failed: {e}")
         # Fallback: keep all issues if verification fails
         final_issues = issues
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/verification_agent.py`
around lines 149 - 152, Replace the logger.error call in the except Exception as
e block with logger.exception so the traceback is preserved; locate the except
block that currently does logger.error(f"Stage 1.5 Verification failed: {e}") in
verification_agent.py (the handler that falls back to final_issues = issues) and
change it to use logger.exception("Stage 1.5 Verification failed") (or
logger.error(..., exc_info=True)) while keeping the fallback assignment to
final_issues = issues unchanged.
88-94: Issues with id=None are sent for verification with "ID: None" in the prompt.

If issue.id is None, the prompt includes "ID: None". The LLM may return "None" as a string ID to drop. While Pydantic coercion and Python's None != "None" comparison prevent accidental filtering, the verification results become less reliable — the LLM can't meaningfully reference these issues by ID.

Consider filtering out or assigning temporary IDs to issues missing an id before sending them for verification.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/verification_agent.py`
around lines 88 - 94, The prompt currently includes "ID: None" for issues where
issue.id is None (built in the issues_json generator using suspect_issues and
issue.id), which can confuse the LLM; update the preparation step to either
filter out suspect_issues with issue.id is None before building issues_json or
assign stable temporary IDs (e.g., temp-<index>) to issues missing an id and
propagate those IDs back into the suspect_issues entries so the
verification_agent can reference them consistently; ensure the code path that
builds issues_json uses the new/filtered id values (reference suspect_issues,
issue.id, and issues_json) and that any downstream verification result mapping
expects these temporary IDs.

python-ecosystem/inference-orchestrator/service/review/orchestrator/orchestrator.py (1)

200-203: Verification agent runs unconditionally even when no enrichment data is available.

run_verification_agent handles this internally by early-returning when enrichmentData is missing, so there's no functional issue. However, the status event at line 201 ("Verifying issues against file contents...") is always emitted even when verification is skipped, which could confuse users watching progress updates.

🔧 Conditionally emit status

-            # === STAGE 1.5: LLM-Driven Verification ===
-            _emit_status(self.event_callback, "verification_started", "Verifying issues against file contents...")
-            file_issues = await run_verification_agent(self.llm, file_issues, request)
-            _emit_progress(self.event_callback, 75, f"Verification Complete: {len(file_issues)} total issues after verification")
+            # === STAGE 1.5: LLM-Driven Verification ===
+            if request.enrichmentData and request.enrichmentData.fileContents:
+                _emit_status(self.event_callback, "verification_started", "Verifying issues against file contents...")
+                file_issues = await run_verification_agent(self.llm, file_issues, request)
+                _emit_progress(self.event_callback, 75, f"Verification Complete: {len(file_issues)} total issues after verification")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/orchestrator.py`
around lines 200 - 203, The status "Verifying issues against file contents..."
is emitted even when verification is skipped; guard the verification status and
call to run_verification_agent behind a check for enrichment data (and
optionally non-empty file_issues). Replace the unconditional _emit_status and
await run_verification_agent(...) with a conditional like if getattr(request,
"enrichmentData", None): _emit_status(...); file_issues = await
run_verification_agent(self.llm, file_issues, request) (or use
request.get("enrichmentData") if request is a dict) so the status only appears
when verification will actually run.

python-ecosystem/inference-orchestrator/service/review/orchestrator/stages.py (1)

1012-1022: Duplicate severity calibration logic across structured and fallback paths.

The identical calibration block is copy-pasted between lines 1012–1022 and 1034–1042. Extract to a small helper to keep them in sync.

♻️ Suggested extraction

+def _calibrate_severity(reviews: list, logger) -> list:
+    """Downgrade HIGH-severity issues from LOW-confidence reviews to MEDIUM."""
+    all_issues = []
+    for review in reviews:
+        review_confidence = (review.confidence or "MEDIUM").upper()
+        for issue in review.issues:
+            if review_confidence in ("LOW", "INFO") and issue.severity.upper() == "HIGH":
+                logger.info(
+                    f"Downgrading issue in {review.file} from HIGH to MEDIUM "
+                    f"(batch confidence: {review_confidence}): {issue.reason[:80]}"
+                )
+                issue.severity = "MEDIUM"
+        all_issues.extend(review.issues)
+    return all_issues

Note: this also addresses the INFO confidence gap — see below.

Also applies to: 1034-1042

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/stages.py`
around lines 1012 - 1022, Extract the duplicated "confidence-based severity
calibration" block into a small helper (e.g., calibrate_review_severity(review,
logger)) that computes review_confidence = (review.confidence or
"MEDIUM").upper(), iterates review.issues and downgrades any issue with severity
"HIGH" to "MEDIUM" when review_confidence == "LOW", logs the same info message
using review.file and issue.reason[:80], and returns/updates the modified
issues; then replace both in-place duplicated blocks with calls to
calibrate_review_severity(review, logger) before extending all_batch_issues to
keep behavior identical and avoid drift.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/reconciliation.py`:
- Around line 368-369: The conditional expressions for resolutionExplanation and
resolvedInCommit in reconciliation.py are mis-parsed due to operator precedence
(or binds tighter than if/else); update the assignments in the reconciliation
logic (the lines referencing resolutionExplanation, resolvedInCommit, prev_data
and was_resolved) to group the or with the second operand using parentheses so
the intended pattern "(prev_data.get('A') or prev_data.get('B')) if was_resolved
else None" is used for both resolutionExplanation and resolvedInCommit.

In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/stages.py`:
- Line 1016: The severity-downgrade check currently only tests review_confidence
== "LOW", but the schema also defines "INFO" as an even lower confidence; update
the condition around review_confidence and issue.severity in the function where
the line appears (refer to the review_confidence variable and issue.severity
checks) to treat "INFO" the same as "LOW" (e.g., check review_confidence in
{"LOW", "INFO"} or explicitly include review_confidence == "INFO") so
HIGH-severity issues are downgraded when confidence is INFO as well.
- Around line 976-989: The file outline filtering only checks
meta.path.endswith(bp) against batch_file_paths, so outlines can be missed when
batch_file_paths contains longer/absolute forms; update the matching logic in
the block that builds file_outlines_text (which iterates
request.enrichmentData.fileMetadata and uses meta.path and batch_file_paths) to
test both directions (e.g., meta.path.endswith(bp) OR bp.endswith(meta.path))
when deciding inclusion, mirroring the bidirectional check used elsewhere in
this module.

In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/verification_agent.py`:
- Around line 12-13: The module-level _FILE_CONTENTS_CACHE is a concurrency
hazard; remove the global and pass a context-scoped dict of file contents into
the verification flow (e.g., create a local file_contents dict in
run_verification_agent and pass it down or capture it in closures) so each run
has its own cache; update callers and helpers that reference
_FILE_CONTENTS_CACHE (notably search_file_content and any functions used during
verification) to accept a file_contents argument or close over the local dict,
and ensure the finally block only clears that local dict rather than a
module-level variable so concurrent coroutines do not overwrite each other’s
data.

---

Duplicate comments:
In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/orchestrator.py`:
- Around line 296-331: _convert_cross_file_issues is passing Stage 2 raw
categories (cfi.category) into CodeReviewIssue.category which only accepts a
fixed set; update this function to map/normalize cfi.category into one of the
allowed values (SECURITY, PERFORMANCE, CODE_QUALITY, BUG_RISK, STYLE,
DOCUMENTATION, BEST_PRACTICES, ERROR_HANDLING, TESTING, ARCHITECTURE) before
constructing the CodeReviewIssue. Implement a small mapping/dictionary in
_convert_cross_file_issues that converts known Stage 2 categories (e.g.,
DATA_INTEGRITY -> BUG_RISK or CODE_QUALITY, BUSINESS_LOGIC -> ARCHITECTURE or
BUG_RISK) and falls back to a safe default like CODE_QUALITY or BUG_RISK for
unknown values, then use the normalized value in the CodeReviewIssue(...) call
(replace cfi.category with the mapped value).

In `@python-ecosystem/inference-orchestrator/utils/prompts/prompt_constants.py`:
- Around line 353-365: The prompt's SEVERITY CALIBRATION section conflicts with
later directives that label architecture observations as HIGH (the lines that
instruct reporting category "ARCHITECTURE" with severity "HIGH" and the later
"this is a HIGH severity ARCHITECTURE issue") and also contradicts PRE-OUTPUT
SELF-CHECK item 4; resolve by either adding an explicit carve-out in the
SEVERITY CALIBRATION block that allows exceptions for those specific
ARCHITECTURE rules, or change the two ARCHITECTURE directives to conditional
language (e.g., "report as ARCHITECTURE with severity HIGH only when X/Y
conditions apply") so the severity is deterministic; locate and update the
SEVERITY CALIBRATION header, the two directives that hardcode ARCHITECTURE as
HIGH, and the PRE-OUTPUT SELF-CHECK item to be consistent.

---

Nitpick comments:
In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/orchestrator.py`:
- Around line 200-203: The status "Verifying issues against file contents..." is
emitted even when verification is skipped; guard the verification status and
call to run_verification_agent behind a check for enrichment data (and
optionally non-empty file_issues). Replace the unconditional _emit_status and
await run_verification_agent(...) with a conditional like if getattr(request,
"enrichmentData", None): _emit_status(...); file_issues = await
run_verification_agent(self.llm, file_issues, request) (or use
request.get("enrichmentData") if request is a dict) so the status only appears
when verification will actually run.

In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/stages.py`:
- Around line 1012-1022: Extract the duplicated "confidence-based severity
calibration" block into a small helper (e.g., calibrate_review_severity(review,
logger)) that computes review_confidence = (review.confidence or
"MEDIUM").upper(), iterates review.issues and downgrades any issue with severity
"HIGH" to "MEDIUM" when review_confidence == "LOW", logs the same info message
using review.file and issue.reason[:80], and returns/updates the modified
issues; then replace both in-place duplicated blocks with calls to
calibrate_review_severity(review, logger) before extending all_batch_issues to
keep behavior identical and avoid drift.

In
`@python-ecosystem/inference-orchestrator/service/review/orchestrator/verification_agent.py`:
- Around line 128-137: The current flow calls agent_executor.ainvoke(...) to run
the agent and then calls
llm.with_structured_output(VerificationResult).ainvoke(...) on
response['output'], causing a double LLM invocation; instead, ensure the agent
emits structured output directly by either (a) attaching the structured output
parser to the same LLM used by the agent before running agent_executor (use
llm.with_structured_output(VerificationResult) so the agent_executor returns a
parsed VerificationResult), or (b) modify the agent prompt/instructions given to
agent_executor so it emits JSON matching VerificationResult so you can parse
response directly without a second ai call; update code paths that reference
response['output'] and structured_llm.ainvoke(...) accordingly.
- Around line 149-152: Replace the logger.error call in the except Exception as
e block with logger.exception so the traceback is preserved; locate the except
block that currently does logger.error(f"Stage 1.5 Verification failed: {e}") in
verification_agent.py (the handler that falls back to final_issues = issues) and
change it to use logger.exception("Stage 1.5 Verification failed") (or
logger.error(..., exc_info=True)) while keeping the fallback assignment to
final_issues = issues unchanged.
- Around line 88-94: The prompt currently includes "ID: None" for issues where
issue.id is None (built in the issues_json generator using suspect_issues and
issue.id), which can confuse the LLM; update the preparation step to either
filter out suspect_issues with issue.id is None before building issues_json or
assign stable temporary IDs (e.g., temp-<index>) to issues missing an id and
propagate those IDs back into the suspect_issues entries so the
verification_agent can reference them consistently; ensure the code path that
builds issues_json uses the new/filtered id values (reference suspect_issues,
issue.id, and issues_json) and that any downstream verification result mapping
expects these temporary IDs.

coderabbitai · 2026-02-20T09:38:19Z

python-ecosystem/inference-orchestrator/service/review/orchestrator/reconciliation.py

+            resolutionExplanation=prev_data.get('resolutionExplanation') or prev_data.get('resolvedDescription') if was_resolved else None,
+            resolvedInCommit=prev_data.get('resolvedInCommit') or prev_data.get('resolvedByCommit') if was_resolved else None,


⚠️ Potential issue | 🔴 Critical

Operator precedence bug: or binds tighter than if/else, so resolutionExplanation leaks to non-resolved issues.

Python parses these lines as:

resolutionExplanation = prev_data.get('resolutionExplanation') or (prev_data.get('resolvedDescription') if was_resolved else None) resolvedInCommit = prev_data.get('resolvedInCommit') or (prev_data.get('resolvedByCommit') if was_resolved else None)

When was_resolved is False but prev_data['resolutionExplanation'] is truthy (e.g., stale metadata), it will be assigned despite the issue not being resolved. The intended grouping is (A or B) if C else None.

🐛 Fix — add parentheses

- resolutionExplanation=prev_data.get('resolutionExplanation') or prev_data.get('resolvedDescription') if was_resolved else None, - resolvedInCommit=prev_data.get('resolvedInCommit') or prev_data.get('resolvedByCommit') if was_resolved else None, + resolutionExplanation=(prev_data.get('resolutionExplanation') or prev_data.get('resolvedDescription')) if was_resolved else None, + resolvedInCommit=(prev_data.get('resolvedInCommit') or prev_data.get('resolvedByCommit')) if was_resolved else None,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python-ecosystem/inference-orchestrator/service/review/orchestrator/reconciliation.py` around lines 368 - 369, The conditional expressions for resolutionExplanation and resolvedInCommit in reconciliation.py are mis-parsed due to operator precedence (or binds tighter than if/else); update the assignments in the reconciliation logic (the lines referencing resolutionExplanation, resolvedInCommit, prev_data and was_resolved) to group the or with the second operand using parentheses so the intended pattern "(prev_data.get('A') or prev_data.get('B')) if was_resolved else None" is used for both resolutionExplanation and resolvedInCommit.

coderabbitai · 2026-02-20T09:38:19Z

python-ecosystem/inference-orchestrator/service/review/orchestrator/stages.py

+    # Extract file outlines from EnrichmentData
+    file_outlines_text = ""
+    if request.enrichmentData and request.enrichmentData.fileMetadata:
+        outlines = []
+        for meta in request.enrichmentData.fileMetadata:
+            if meta.path in batch_file_paths or any(meta.path.endswith(bp) for bp in batch_file_paths):
+                outline = f"File: {meta.path}\n"
+                if meta.imports:
+                    outline += f"Imports: {', '.join(meta.imports[:20])}\n"
+                if meta.semanticNames:
+                    outline += f"Symbols/Methods: {', '.join(meta.semanticNames[:30])}\n"
+                outlines.append(outline)
+        if outlines:
+            file_outlines_text = "AST Outlines for this batch:\n" + "\n".join(outlines)


⚠️ Potential issue | 🟡 Minor

File outline matching only checks one direction of path suffix.

Line 981 uses meta.path.endswith(bp) but not the reverse (bp.endswith(meta.path)). If meta.path is a relative path and batch_file_paths contains absolute/longer paths, some outlines may be silently excluded. Other matching code in this file (e.g., line 905) checks both directions.

🔧 Suggested fix

- if meta.path in batch_file_paths or any(meta.path.endswith(bp) for bp in batch_file_paths): + if meta.path in batch_file_paths or any( + meta.path.endswith(bp) or bp.endswith(meta.path) + for bp in batch_file_paths + ):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python-ecosystem/inference-orchestrator/service/review/orchestrator/stages.py` around lines 976 - 989, The file outline filtering only checks meta.path.endswith(bp) against batch_file_paths, so outlines can be missed when batch_file_paths contains longer/absolute forms; update the matching logic in the block that builds file_outlines_text (which iterates request.enrichmentData.fileMetadata and uses meta.path and batch_file_paths) to test both directions (e.g., meta.path.endswith(bp) OR bp.endswith(meta.path)) when deciding inclusion, mirroring the bidirectional check used elsewhere in this module.

coderabbitai · 2026-02-20T09:38:19Z

python-ecosystem/inference-orchestrator/service/review/orchestrator/stages.py

+                # LOW confidence reviews should not produce HIGH severity issues
+                review_confidence = (review.confidence or "MEDIUM").upper()
+                for issue in review.issues:
+                    if review_confidence == "LOW" and issue.severity.upper() == "HIGH":


⚠️ Potential issue | 🟡 Minor

INFO confidence doesn't trigger severity downgrade.

The calibration only checks review_confidence == "LOW", but the output schema allows "INFO" as a confidence value (prompt_constants.py line 452). An INFO-confidence review is presumably even less certain than LOW, yet its HIGH-severity issues pass through unchanged. Consider also including "INFO" in the downgrade condition.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python-ecosystem/inference-orchestrator/service/review/orchestrator/stages.py` at line 1016, The severity-downgrade check currently only tests review_confidence == "LOW", but the schema also defines "INFO" as an even lower confidence; update the condition around review_confidence and issue.severity in the function where the line appears (refer to the review_confidence variable and issue.severity checks) to treat "INFO" the same as "LOW" (e.g., check review_confidence in {"LOW", "INFO"} or explicitly include review_confidence == "INFO") so HIGH-severity issues are downgraded when confidence is INFO as well.

coderabbitai · 2026-02-20T09:38:19Z

python-ecosystem/inference-orchestrator/service/review/orchestrator/verification_agent.py

+# Global state for the tool to access file contents
+_FILE_CONTENTS_CACHE: Dict[str, str] = {}


⚠️ Potential issue | 🟠 Major

Global _FILE_CONTENTS_CACHE is a concurrency hazard in async contexts.

If multiple PR reviews run concurrently (common in a server handling webhooks), one coroutine's _FILE_CONTENTS_CACHE = {...} at line 58 overwrites another's data mid-verification. The finally block at line 155 then clears the cache for both. This can cause search_file_content to return incorrect results or "File not available" errors.

Pass file contents as a closure or use a context-scoped dict instead of module-level state.

🔧 Sketch: closure-based approach

-# Global state for the tool to access file contents -_FILE_CONTENTS_CACHE: Dict[str, str] = {} - -@tool -def search_file_content(file_path: str, search_string: str) -> str: - ... - content = _FILE_CONTENTS_CACHE.get(file_path) - ... +def _make_search_tool(file_contents: Dict[str, str]): + `@tool` + def search_file_content(file_path: str, search_string: str) -> str: + """Searches for a specific string within the full content of a file.""" + content = file_contents.get(file_path) + if not content: + return f"Error: File content for '{file_path}' not available in memory." + if search_string in content: + return f"Found: The string '{search_string}' exists in '{file_path}'." + return f"Not Found: The string '{search_string}' does NOT exist in '{file_path}'." + return search_file_content

Then in run_verification_agent:

+ file_contents = {f.path: f.content for f in request.enrichmentData.fileContents if f.content} + tools = [_make_search_tool(file_contents)]

Also applies to: 56-60, 153-155

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python-ecosystem/inference-orchestrator/service/review/orchestrator/verification_agent.py` around lines 12 - 13, The module-level _FILE_CONTENTS_CACHE is a concurrency hazard; remove the global and pass a context-scoped dict of file contents into the verification flow (e.g., create a local file_contents dict in run_verification_agent and pass it down or capture it in closures) so each run has its own cache; update callers and helpers that reference _FILE_CONTENTS_CACHE (notably search_file_content and any functions used during verification) to accept a file_contents argument or close over the local dict, and ensure the finally block only clears that local dict rather than a module-level variable so concurrent coroutines do not overwrite each other’s data.

rostilos · 2026-02-22T17:18:00Z

/codecrow analyze

- Added `cloned_from_analysis_id` to `code_analysis` table for traceability of cloned analyses. - Introduced `analysis_status`, `analyzed_at`, and `analysis_id` columns in `git_commit_node` to track analysis state. - Implemented `getCommitHistory` method in `VcsClient` interface and its Bitbucket and GitHub implementations to retrieve commit history with parent hashes. - Created `VcsCommit` record to encapsulate commit details including hash, message, author, timestamp, and parent hashes. - Developed `BranchHealthScheduler` to periodically retry analysis for STALE branches with exponential backoff. - Enhanced `WebhookDeduplicationService` to cache PR numbers associated with merge commits for cross-event enrichment. - Consolidated PR merge handling into `GitHubPullRequestWebhookHandler` to ensure deterministic event processing. - Added `GitGraphController` to expose the real git commit graph for projects, including commit and branch details.

feat: Enhance code analysis and VCS integration

coderabbitai

Actionable comments posted: 10

♻️ Duplicate comments (1)

java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/codeanalysis/CodeAnalysisRepository.java (1)
71-73: LIMIT 1 inside a JPQL subquery is non-standard and risks a QuerySyntaxException.

This was flagged in a previous review and applies equally to all three changed query methods (Lines 71-73, 130-134, 151-155). Standard JPQL does not support LIMIT inside subqueries. Hibernate 6.x HQL extensions allow LIMIT only at the top-level query, not in a WHERE-clause scalar subquery. Use a MAX aggregate pattern instead, as already demonstrated by findByProjectIdAndPrNumberWithMaxPrVersion at Line 90.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/codeanalysis/CodeAnalysisRepository.java`
around lines 71 - 73, The JPQL subqueries use a non-standard "LIMIT 1" which can
cause QuerySyntaxException; update each annotated query in
CodeAnalysisRepository that currently embeds "LIMIT 1" (the one selecting ca by
subquery at lines ~71-73 and the two similar queries around lines ~130-134 and
~151-155) to use an aggregate MAX pattern instead (e.g., compare ca.createdAt to
(SELECT MAX(ca2.createdAt) FROM CodeAnalysis ca2 WHERE ca2.project.id =
:projectId) or otherwise select the row with the MAX(createdAt)), following the
same approach as the existing findByProjectIdAndPrNumberWithMaxPrVersion method;
keep the Query annotation and parameter names but replace the LIMIT-based
subquery with the MAX(...) subquery so the JPQL remains standard and
HQL-compliant.

🧹 Nitpick comments (13)

java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/WebhookDeduplicationService.java (1)
99-107: Cleanup of mergePrNumbers is sound but relies on recentCommitAnalyses entry existence, not a time window.

The comment on line 104-105 says "use a wider window," but the predicate on line 106 is a simple key-existence check against recentCommitAnalyses, not a time-based window. This means a mergePrNumbers entry is evicted as soon as its corresponding recentCommitAnalyses entry is cleaned (after ~60s). In practice this is fine since the cross-event enrichment window is well within 60s, but the comment is slightly misleading.
📝 Suggested comment clarification
-        // Also clean up stale PR number entries (use a wider window since
-        // repo:push may arrive slightly later than the dedup window)
+        // Also clean up stale PR number entries: remove any that no longer have
+        // a corresponding entry in recentCommitAnalyses (effectively ~60s TTL)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/WebhookDeduplicationService.java`
around lines 99 - 107, The cleanupOldEntries method currently removes
mergePrNumbers entries based on the presence of keys in recentCommitAnalyses
(not a time window), so update the comment in
WebhookDeduplicationService.cleanupOldEntries to accurately state that
mergePrNumbers are evicted when their corresponding recentCommitAnalyses key is
removed (key-existence based), or alternatively implement time-based eviction by
storing timestamps on mergePrNumbers and comparing against now and
DEDUP_WINDOW_SECONDS if you intended a wider window; reference
cleanupOldEntries, recentCommitAnalyses, mergePrNumbers and DEDUP_WINDOW_SECONDS
when making the change.
java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/model/VcsCommit.java (1)
6-13: Consider making parentHashes defensively immutable.

Java records don't copy collections — a caller passing a mutable ArrayList can still mutate it after construction. Since VcsCommit is a cross-boundary DTO, a compact constructor with List.copyOf(parentHashes) would guarantee immutability.
Optional fix
 public record VcsCommit(
         String hash,
         String message,
         String authorName,
         String authorEmail,
         OffsetDateTime timestamp,
         List<String> parentHashes
-) {}
+) {
+    public VcsCommit {
+        parentHashes = parentHashes != null ? List.copyOf(parentHashes) : List.of();
+    }
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/model/VcsCommit.java`
around lines 6 - 13, VcsCommit's parentHashes field can be mutated by callers
because Java records don't defensive-copy collections; add a compact constructor
for the VcsCommit record that replaces the incoming parentHashes with an
immutable copy (e.g., this.parentHashes = List.copyOf(parentHashes)) after
validating for nulls as needed so the record always stores an unmodifiable list;
update any constructors/usage to rely on the record's compact constructor
behavior.
java-ecosystem/libs/core/src/main/resources/db/migration/1.5.0/V1.5.2__add_cloned_from_analysis_id.sql (1)
4-9: Consider adding a self-referential FK constraint for referential integrity.

cloned_from_analysis_id references code_analysis(id) but no FK is defined. If the source analysis row is deleted, the column becomes a dangling reference. A self-referential FK with ON DELETE SET NULL would keep data consistent:
Suggested addition
ALTER TABLE code_analysis
    ADD CONSTRAINT fk_code_analysis_cloned_from
    FOREIGN KEY (cloned_from_analysis_id) REFERENCES code_analysis(id)
    ON DELETE SET NULL;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/core/src/main/resources/db/migration/1.5.0/V1.5.2__add_cloned_from_analysis_id.sql`
around lines 4 - 9, The migration adds cloned_from_analysis_id to code_analysis
but lacks a self-referential FK, allowing dangling references; add a foreign key
constraint named fk_code_analysis_cloned_from on
code_analysis(cloned_from_analysis_id) referencing code_analysis(id) with ON
DELETE SET NULL to ensure referential integrity and keep the existing index
intact.
java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/bitbucket/cloud/BitbucketCloudClient.java (1)
625-719: Well-structured implementation of paginated commit history retrieval.

The method handles pagination, limit capping, and graceful degradation (e.g., null hash skipping, date parse failures) correctly.

One minor observation: Line 688 catches the broad Exception type. Consider narrowing to DateTimeParseException to avoid silently swallowing unexpected exceptions during date parsing.
Optional: narrow catch
-                        } catch (Exception e) {
+                        } catch (java.time.format.DateTimeParseException e) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/bitbucket/cloud/BitbucketCloudClient.java`
around lines 625 - 719, In getCommitHistory, narrow the broad catch around
OffsetDateTime.parse(...) to catch java.time.format.DateTimeParseException
instead of Exception so only parse errors are handled (leave the log message
unchanged); locate the try/catch in the date parsing block that sets timestamp
(the OffsetDateTime.parse call) and replace the generic Exception handler with
DateTimeParseException to avoid swallowing unrelated exceptions.
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/BranchHealthScheduler.java (1)
67-69: Long-lived transaction wrapping external API calls.

@Transactional here keeps a DB connection checked out for the entire scheduler run, including all branchAnalysisProcessor.process() invocations that hit VCS/AI APIs. With up to 5 retries per cycle, the transaction (and its JDBC connection) could be held for many minutes.

Consider removing @Transactional from this method and instead reading the stale branches in a short read-only transaction (e.g., a separate @Transactional(readOnly = true) helper), then iterating outside any transaction scope. The processor itself likely manages its own transactions internally.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/BranchHealthScheduler.java`
around lines 67 - 69, The retryStaleBranches method in BranchHealthScheduler
currently holds a long-lived `@Transactional` across calls to
branchAnalysisProcessor.process(), which can tie up a JDBC connection; remove
the `@Transactional` from retryStaleBranches, add a small helper like
fetchStaleBranches annotated `@Transactional`(readOnly = true) that loads the list
of stale branches, then call that helper from retryStaleBranches and iterate
over the returned branches outside any transaction while invoking
branchAnalysisProcessor.process(...) (the processor can manage its own
transactions).
java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/github/GitHubClient.java (1)
1044-1045: GraphQL injection risk in getFileContents.

The workspaceId and repoIdOrSlug values are interpolated directly into the GraphQL query string without escaping. If either contains a " character, the query breaks or could be manipulated. Similarly, branchOrCommit and file paths in expression are only escaped for " but workspaceId/repoIdOrSlug are not.

While these values typically come from trusted DB records, a defense-in-depth approach would escape or validate them. This is in existing (unchanged) code, so flagging as a low-priority observation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/github/GitHubClient.java`
around lines 1044 - 1045, getFileContents in class GitHubClient builds a GraphQL
query by directly interpolating workspaceId and repoIdOrSlug (and similarly
branchOrCommit/expression) which risks breaking or manipulating the query if
values contain quotes; change the code to stop inlining raw identifiers and
instead either (a) use GraphQL variables for owner/name/expression/commit to
pass these values safely, or (b) if variables are not possible, apply consistent
escaping of quotes and backslashes for workspaceId, repoIdOrSlug, branchOrCommit
and the expression string before appending to the query; update the queryBuilder
usage in getFileContents accordingly and ensure any existing escape logic used
for expression is applied consistently to owner/name as well.
java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/analysis/controller/GitGraphController.java (1)
44-108: Unbounded result set — no pagination for commits or edges.

For repositories with thousands of commits, findByProjectIdWithAnalysis and findEdgeHashesByProjectId will load everything into memory in a single request. Consider adding pagination or a commit-count limit (e.g., last N commits) to prevent OOM and slow responses on large projects.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/analysis/controller/GitGraphController.java`
around lines 44 - 108, The controller currently loads all commits and edges into
memory (getGitGraph uses commitNodeRepository.findByProjectIdWithAnalysis and
commitNodeRepository.findEdgeHashesByProjectId) which can OOM on large repos;
modify getGitGraph to accept optional pagination/query params (e.g., page & size
or a limit N) and pass a Pageable or a capped limit into repository calls,
update/overload the repository methods to return a Page or a limited result
(e.g., findByProjectIdWithAnalysis(Pageable) or findTopNByProjectIdWithAnalysis)
and similarly limit/find edges only for the returned commits, enforce a sensible
default and hard max (e.g., default 1000, max 5000) to prevent unbounded
results, and ensure the branch/edge lookup uses the same commit subset so the
response stays consistent.
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/gitgraph/CommitNodeRepository.java (1)
39-63: Add clearAutomatically = true to @Modifying queries to avoid stale persistence context.

The three @Modifying queries perform bulk updates via JPQL, which bypass the persistence context. Any CommitNode entities already loaded in the current session will retain stale analysisStatus values. While current callers invoke these at the end of a transaction, future callers within the same transaction would silently read outdated state.
♻️ Proposed fix
-    `@Modifying`
+    `@Modifying`(clearAutomatically = true)
     `@Query`("UPDATE CommitNode c SET c.analysisStatus = 'ANALYZED', c.analyzedAt = CURRENT_TIMESTAMP, c.analysis.id = :analysisId " +
            "WHERE c.project.id = :projectId AND c.commitHash IN :hashes")
     int markCommitsAnalyzed(...)

-    `@Modifying`
+    `@Modifying`(clearAutomatically = true)
     `@Query`("UPDATE CommitNode c SET c.analysisStatus = 'ANALYZED', c.analyzedAt = CURRENT_TIMESTAMP " +
            "WHERE c.project.id = :projectId AND c.commitHash IN :hashes")
     int markCommitsAnalyzedWithoutAnalysis(...)

-    `@Modifying`
+    `@Modifying`(clearAutomatically = true)
     `@Query`("UPDATE CommitNode c SET c.analysisStatus = 'FAILED' " +
            "WHERE c.project.id = :projectId AND c.commitHash IN :hashes AND c.analysisStatus = 'NOT_ANALYZED'")
     int markCommitsFailed(...)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/gitgraph/CommitNodeRepository.java`
around lines 39 - 63, The three repository methods markCommitsAnalyzed,
markCommitsAnalyzedWithoutAnalysis, and markCommitsFailed perform JPQL bulk
updates and should set `@Modifying`(clearAutomatically = true) to avoid stale
entities in the persistence context; update each `@Modifying` annotation for those
methods to include clearAutomatically = true so the EntityManager automatically
clears affected entities after the bulk update.
java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java (1)
718-744: Redundant updateIssueCounts() when both auto-resolved and AI-reconciled issues exist.

When both autoResolved and needsAiReconciliation are non-empty, updateIssueCounts() runs twice — once at line 718-720 (after AI reconciliation, which already reflects auto-resolved changes saved at lines 634-646) and again at lines 741-743. The second call is redundant in this case since the first already re-fetches the branch with all issues.

Consider guarding the second update:
♻️ Avoid double count update
-            // Update branch issue counts after any resolution (auto or AI)
-            if (!autoResolved.isEmpty()) {
+            // Update branch issue counts after auto-resolution (only if AI path didn't already update)
+            if (!autoResolved.isEmpty() && needsAiReconciliation.isEmpty()) {
                 Branch refreshedAfterAutoResolve = branchRepository.findByIdWithIssues(branch.getId()).orElse(branch);
                 refreshedAfterAutoResolve.updateIssueCounts();
                 branchRepository.save(refreshedAfterAutoResolve);
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java`
around lines 718 - 744, The code calls updateIssueCounts() twice when both
autoResolved and needsAiReconciliation are non-empty; modify the final
branch-count update so it only runs when there was no AI reconciliation.
Specifically, in BranchAnalysisProcessor replace the unconditional if
(!autoResolved.isEmpty()) block that fetches
refreshedAfterAutoResolve/updateIssueCounts/save with a guarded condition that
also checks needsAiReconciliation is empty (e.g., only run the re-fetch/update
when autoResolved is non-empty AND needsAiReconciliation is empty) to avoid
duplicating the update done using refreshedBranch from
branchRepository.findByIdWithIssues(...).
java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/gitgraph/GitGraphSyncService.java (3)
41-122: Transaction spans external I/O — consider splitting.

syncBranchGraph is @Transactional but calls vcsClient.getCommitHistory(...) (lines 45-50), which is a remote HTTP call. This holds a database connection for the entire duration of the VCS API call. Under load, slow VCS responses could exhaust the connection pool.

Consider fetching commits outside the transaction and then performing the DB operations in a separate transactional method.
♻️ Sketch: split VCS fetch from DB transaction
-    `@Transactional`
     public Map<String, CommitNode> syncBranchGraph(Project project, VcsClient vcsClient, String branchName, int limit) {
         try {
             log.info("Syncing git graph for project {} branch {}", project.getId(), branchName);
             List<VcsCommit> commits = vcsClient.getCommitHistory(
                     project.getEffectiveVcsRepoInfo().getRepoWorkspace(),
                     project.getEffectiveVcsRepoInfo().getRepoSlug(),
                     branchName,
                     limit
             );
 
             if (commits == null || commits.isEmpty()) {
                 log.warn("No commits found for branch {}", branchName);
                 return Collections.emptyMap();
             }
 
-            // ... all DB operations inline ...
+            return persistBranchGraph(project, commits);
         } catch (IOException e) {
             log.error("Failed to sync git graph for project {} branch {}", project.getId(), branchName, e);
             return Collections.emptyMap();
         }
     }
+
+    `@Transactional`
+    protected Map<String, CommitNode> persistBranchGraph(Project project, List<VcsCommit> commits) {
+        // ... move all DB operations here (steps 1-3) ...
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/gitgraph/GitGraphSyncService.java`
around lines 41 - 122, syncBranchGraph is annotated `@Transactional` but performs
remote I/O via vcsClient.getCommitHistory, holding a DB transaction/connection
during the HTTP call; move the external fetch out of the transaction by
splitting into two methods: 1) a non-transactional fetch method (call it
fetchCommitsForBranch or keep syncBranchGraph but remove `@Transactional`) that
invokes vcsClient.getCommitHistory and returns List<VcsCommit>, and 2) a new
`@Transactional` method (e.g., persistBranchGraphCommits(Project, List<VcsCommit>,
String)) that contains the DB logic currently in syncBranchGraph (look for
commitNodeRepository.findByProjectIdAndCommitHashes, creation of new CommitNode
objects, commitNodeRepository.saveAll, parent linking and final saveAll). Call
the fetcher first, then call the transactional persister with the fetched
commits so the DB transaction only spans repository operations.
58-61: Collectors.toMap without merge function may throw on duplicates.

If findByProjectIdAndCommitHashes ever returns duplicate commit hashes (e.g., due to data inconsistency), Collectors.toMap will throw IllegalStateException. A defensive merge function is cheap insurance.
🛡️ Proposed fix
-            Map<String, CommitNode> nodeMap = existingNodes.stream()
-                    .collect(Collectors.toMap(CommitNode::getCommitHash, n -> n));
+            Map<String, CommitNode> nodeMap = existingNodes.stream()
+                    .collect(Collectors.toMap(CommitNode::getCommitHash, n -> n, (a, b) -> a));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/gitgraph/GitGraphSyncService.java`
around lines 58 - 61, The existing Collectors.toMap call in GitGraphSyncService
that builds nodeMap from existingNodes can throw IllegalStateException on
duplicate commit hashes; change the stream collection to a defensive toMap
variant that provides a merge function (e.g., keep the first or last value) so
duplicates are resolved instead of throwing (refer to
commitNodeRepository.findByProjectIdAndCommitHashes, existingNodes, and
nodeMap). Ensure the chosen merge behavior is explicit and consistent with
domain expectations (for example (existing, replacement) -> existing) so nodeMap
creation is resilient to inconsistent data.
143-183: Unused variable oldestUnanalyzedHash.

oldestUnanalyzedHash (line 151) is assigned on every iteration (line 163) but never read. Remove it to avoid confusion.
🧹 Remove dead variable
         List<String> unanalyzed = new ArrayList<>();
         Set<String> visited = new HashSet<>();
         Deque<CommitNode> queue = new ArrayDeque<>();
         queue.add(head);
         visited.add(head.getCommitHash());
 
-        String oldestUnanalyzedHash = null;
-
         while (!queue.isEmpty()) {
             CommitNode current = queue.poll();
 
             if (current.isAnalyzed()) {
                 continue;
             }
 
             unanalyzed.add(current.getCommitHash());
-            oldestUnanalyzedHash = current.getCommitHash();
 
             for (CommitNode parent : current.getParents()) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/gitgraph/GitGraphSyncService.java`
around lines 143 - 183, Remove the dead local variable oldestUnanalyzedHash from
GitGraphSyncService: delete its declaration (String oldestUnanalyzedHash =
null;) and the assignment oldestUnanalyzedHash = current.getCommitHash(); in the
BFS loop since its value is never read; rely on the existing unanalyzed list
(and its reversal) and existing log/headHash usage instead.
java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/PullRequestAnalysisProcessor.java (1)
345-372: Consider extracting the hardcoded sync limit to a constant and review cache-hit bypass implications.

The method is well-structured: broad catch(Exception) with warning-level log is appropriate for non-critical side-effects. Two observations warrant attention:

Hardcoded limit = 100 — The call to syncBranchGraph hardcodes a limit of 100 commits. If a PR's source branch diverged with more than 100 commits, the method won't fetch the full history, potentially causing findUnanalyzedCommitRange to miss older commits or fail to locate the analyzed ancestor. Consider extracting this to a named constant (e.g., private static final int DAG_SYNC_LIMIT = 100;) or exposing it as a configurable property for consistency with other limits in the codebase.

Cache-hit paths bypass DAG marking — Both the commit-hash cache hit (line 186) and fingerprint cache hit (line 232) return early without invoking markPrCommitsAnalyzed. If a PR is re-opened from a different source branch with the same diff, the new branch's commits won't be marked in the DAG. This may be acceptable depending on whether the DAG is branch-scoped, but the implications should be explicitly documented.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/PullRequestAnalysisProcessor.java`
around lines 345 - 372, Extract the hardcoded sync limit (100) used in
gitGraphSyncService.syncBranchGraph inside markPrCommitsAnalyzed into a named
constant (e.g., private static final int DAG_SYNC_LIMIT = 100) or wire it to a
configurable property, and replace the literal with DAG_SYNC_LIMIT; additionally
add a short TODO/JavaDoc on markPrCommitsAnalyzed (or where cache-hit returns
occur) calling out that cache-hit paths (the commit-hash and fingerprint cache
returns) currently bypass marking branch commits in the DAG and either invoke
markPrCommitsAnalyzed from those cache-hit paths or document the intended
branch-scoped behavior so the DAG marking semantics are explicit.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java`:
- Around line 616-631: The loop over candidateBranchIssues adds issues with a
null CodeAnalysisIssue into needsAiReconciliation and later into candidateIssues
-> tempAnalysis.setIssues, causing NPEs; update the logic in
BranchAnalysisProcessor to filter out or exclude BranchIssue instances where
issue.getCodeAnalysisIssue() == null before adding to needsAiReconciliation (and
likewise when building candidateIssues) so only issues with non-null
getCodeAnalysisIssue() are sent to AI or set on tempAnalysis; reference
BranchIssue, getCodeAnalysisIssue(), candidateBranchIssues, autoResolved,
needsAiReconciliation, candidateIssues, and tempAnalysis.setIssues when making
the change.

In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/gitgraph/GitGraphSyncService.java`:
- Around line 86-108: The contains() check on node.getParents() uses identity
equality because CommitNode lacks equals()/hashCode(), so duplicate parent edges
can be added; implement equals() and hashCode() in the CommitNode entity to use
its persistent identity (e.g., id and/or commitHash) so logical equality is
used, then the existing check in the loop (inside GitGraphSyncService iterating
commits and using node.getParents().contains(parentNode)) will correctly detect
duplicates before adding and commitNodeRepository.saveAll(nodeMap.values()) will
persist the proper graph without duplicate edges.

In `@java-ecosystem/libs/core/src/main/java/module-info.java`:
- Line 105: Add the missing opens for the package used reflectively by the
analysis engine: update module-info.java so the opens directive for
org.rostilos.codecrow.core.model.gitgraph includes the
org.rostilos.codecrow.analysisengine module; this ensures reflective access to
CommitNode (and its getters like getCommitHash(), isAnalyzed(), getParents())
used by GitGraphSyncService, BranchAnalysisProcessor, and
PullRequestAnalysisProcessor succeeds at runtime.

In
`@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/gitgraph/CommitNode.java`:
- Around line 25-79: CommitNode instances are put into a HashSet (parents) but
rely on Object identity, causing duplicates and Hibernate issues for detached
entities; implement equals and hashCode on the CommitNode class using the
business key (project and commitHash) so two instances representing the same DB
row compare equal—use commitHash (and project.getId() / project) to compute
equality and a stable hashCode, guard for nulls and class equality, and ensure
equals/hashCode are declared on CommitNode (referencing the parents field,
project, commitHash, and getProject/getId where needed).

In
`@java-ecosystem/libs/core/src/main/resources/db/migration/1.5.0/V1.5.1__add_analysis_status_to_commit_node.sql`:
- Around line 19-26: The UPDATE is non-deterministic because UPDATE ... FROM can
pick an arbitrary matching code_analysis row when multiple rows share
(project_id, commit_hash); change the query to join to a deterministic subquery
that picks one analysis per (project_id, commit_hash) (e.g., the latest by
created_at or the max id). Replace the FROM code_analysis ca with a derived
table such as (SELECT DISTINCT ON (project_id, commit_hash) * FROM code_analysis
ORDER BY project_id, commit_hash, created_at DESC) latest_ca (or a GROUP BY
selecting MAX(created_at) and corresponding id) and join git_commit_node cn to
latest_ca on project_id and commit_hash, then SET analysis_status='ANALYZED',
analyzed_at=latest_ca.created_at, analysis_id=latest_ca.id where
cn.analysis_status='NOT_ANALYZED'.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/BranchHealthScheduler.java`:
- Around line 26-28: The Javadoc/comment in BranchHealthScheduler incorrectly
calls the strategy "exponential backoff" while the code uses a linear formula;
update all occurrences (the class comment and the method/comment near where
BASE_BACKOFF_MINUTES and consecutiveFailures are referenced) to describe a
linear backoff (e.g., "linear backoff: consecutiveFailures *
BASE_BACKOFF_MINUTES") and make the comment explicitly match the implementation
that uses consecutiveFailures * BASE_BACKOFF_MINUTES.
- Line 166: The call branchAnalysisProcessor.process(request, null) passes a
null consumer which causes a NullPointerException when the process(...)
implementation calls consumer.accept(...); update the process(...)
implementation (the method named process in the branch analysis processor class
used by BranchHealthScheduler) to guard every consumer.accept(...) call
(locations noted where consumer.accept is used) by checking consumer != null
before calling accept (or use Optional.ofNullable(consumer).ifPresent(c ->
c.accept(...))); ensure each invocation at the listed sites uses this null-check
so passing null from BranchHealthScheduler no longer throws.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/webhookhandler/GitHubPullRequestWebhookHandler.java`:
- Around line 316-318: The unguarded Long.parseLong(payload.pullRequestId()) in
GitHubPullRequestWebhookHandler can throw NumberFormatException for
null/non-numeric pullRequestId; wrap the parse in a
try/catch(NumberFormatException) (mirroring the Bitbucket handler pattern) and
defensively handle nulls from payload.pullRequestId() before parsing, logging a
clear error via the existing logger and aborting/returning from further
processing (or using a safe fallback) so prNumber and subsequent logic do not
throw an unhandled exception; refer to the mergeCommitSha and prNumber variables
and payload.pullRequestId() call to locate and update the code.
- Around line 290-351: The GitHub PR merge handler (handlePrMergeEvent) lacks
webhook-level deduplication before invoking branchAnalysisProcessor.process;
integrate the WebhookDeduplicationService similarly to
BitbucketCloudBranchWebhookHandler by checking if the commit is already
being/was processed and short-circuiting with an ignored WebhookResult when
duplicate, and mark the commit as in-flight before calling
branchAnalysisProcessor.process(BranchProcessRequest, ...), ensuring you use the
same deduplication key (project id + commit hash + AnalysisType.BRANCH_ANALYSIS
or PR number) and release/cleanup the deduplication entry after processing or on
exception so retries don’t queue duplicate analyses.

In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/analysis/controller/GitGraphController.java`:
- Around line 31-46: The getGitGraph endpoint in GitGraphController currently
has no authorization; add an authorization annotation to enforce access control
(e.g., annotate the getGitGraph method or the GitGraphController class with
`@IsWorkspaceMember` if the resource is workspace-scoped, or use
`@PreAuthorize`("@projectSecurity.canAccess(`#projectId`, authentication)" /
`@PreAuthorize`("isAuthenticated()") as appropriate) so that the
getGitGraph(`@PathVariable` Long projectId) method is guarded before executing
repository calls to commitNodeRepository and branchRepository.

---

Duplicate comments:
In
`@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/codeanalysis/CodeAnalysisRepository.java`:
- Around line 71-73: The JPQL subqueries use a non-standard "LIMIT 1" which can
cause QuerySyntaxException; update each annotated query in
CodeAnalysisRepository that currently embeds "LIMIT 1" (the one selecting ca by
subquery at lines ~71-73 and the two similar queries around lines ~130-134 and
~151-155) to use an aggregate MAX pattern instead (e.g., compare ca.createdAt to
(SELECT MAX(ca2.createdAt) FROM CodeAnalysis ca2 WHERE ca2.project.id =
:projectId) or otherwise select the row with the MAX(createdAt)), following the
same approach as the existing findByProjectIdAndPrNumberWithMaxPrVersion method;
keep the Query annotation and parameter names but replace the LIMIT-based
subquery with the MAX(...) subquery so the JPQL remains standard and
HQL-compliant.

---

Nitpick comments:
In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java`:
- Around line 718-744: The code calls updateIssueCounts() twice when both
autoResolved and needsAiReconciliation are non-empty; modify the final
branch-count update so it only runs when there was no AI reconciliation.
Specifically, in BranchAnalysisProcessor replace the unconditional if
(!autoResolved.isEmpty()) block that fetches
refreshedAfterAutoResolve/updateIssueCounts/save with a guarded condition that
also checks needsAiReconciliation is empty (e.g., only run the re-fetch/update
when autoResolved is non-empty AND needsAiReconciliation is empty) to avoid
duplicating the update done using refreshedBranch from
branchRepository.findByIdWithIssues(...).

In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/PullRequestAnalysisProcessor.java`:
- Around line 345-372: Extract the hardcoded sync limit (100) used in
gitGraphSyncService.syncBranchGraph inside markPrCommitsAnalyzed into a named
constant (e.g., private static final int DAG_SYNC_LIMIT = 100) or wire it to a
configurable property, and replace the literal with DAG_SYNC_LIMIT; additionally
add a short TODO/JavaDoc on markPrCommitsAnalyzed (or where cache-hit returns
occur) calling out that cache-hit paths (the commit-hash and fingerprint cache
returns) currently bypass marking branch commits in the DAG and either invoke
markPrCommitsAnalyzed from those cache-hit paths or document the intended
branch-scoped behavior so the DAG marking semantics are explicit.

In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/gitgraph/GitGraphSyncService.java`:
- Around line 41-122: syncBranchGraph is annotated `@Transactional` but performs
remote I/O via vcsClient.getCommitHistory, holding a DB transaction/connection
during the HTTP call; move the external fetch out of the transaction by
splitting into two methods: 1) a non-transactional fetch method (call it
fetchCommitsForBranch or keep syncBranchGraph but remove `@Transactional`) that
invokes vcsClient.getCommitHistory and returns List<VcsCommit>, and 2) a new
`@Transactional` method (e.g., persistBranchGraphCommits(Project, List<VcsCommit>,
String)) that contains the DB logic currently in syncBranchGraph (look for
commitNodeRepository.findByProjectIdAndCommitHashes, creation of new CommitNode
objects, commitNodeRepository.saveAll, parent linking and final saveAll). Call
the fetcher first, then call the transactional persister with the fetched
commits so the DB transaction only spans repository operations.
- Around line 58-61: The existing Collectors.toMap call in GitGraphSyncService
that builds nodeMap from existingNodes can throw IllegalStateException on
duplicate commit hashes; change the stream collection to a defensive toMap
variant that provides a merge function (e.g., keep the first or last value) so
duplicates are resolved instead of throwing (refer to
commitNodeRepository.findByProjectIdAndCommitHashes, existingNodes, and
nodeMap). Ensure the chosen merge behavior is explicit and consistent with
domain expectations (for example (existing, replacement) -> existing) so nodeMap
creation is resilient to inconsistent data.
- Around line 143-183: Remove the dead local variable oldestUnanalyzedHash from
GitGraphSyncService: delete its declaration (String oldestUnanalyzedHash =
null;) and the assignment oldestUnanalyzedHash = current.getCommitHash(); in the
BFS loop since its value is never read; rely on the existing unanalyzed list
(and its reversal) and existing log/headHash usage instead.

In
`@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/gitgraph/CommitNodeRepository.java`:
- Around line 39-63: The three repository methods markCommitsAnalyzed,
markCommitsAnalyzedWithoutAnalysis, and markCommitsFailed perform JPQL bulk
updates and should set `@Modifying`(clearAutomatically = true) to avoid stale
entities in the persistence context; update each `@Modifying` annotation for those
methods to include clearAutomatically = true so the EntityManager automatically
clears affected entities after the bulk update.

In
`@java-ecosystem/libs/core/src/main/resources/db/migration/1.5.0/V1.5.2__add_cloned_from_analysis_id.sql`:
- Around line 4-9: The migration adds cloned_from_analysis_id to code_analysis
but lacks a self-referential FK, allowing dangling references; add a foreign key
constraint named fk_code_analysis_cloned_from on
code_analysis(cloned_from_analysis_id) referencing code_analysis(id) with ON
DELETE SET NULL to ensure referential integrity and keep the existing index
intact.

In
`@java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/bitbucket/cloud/BitbucketCloudClient.java`:
- Around line 625-719: In getCommitHistory, narrow the broad catch around
OffsetDateTime.parse(...) to catch java.time.format.DateTimeParseException
instead of Exception so only parse errors are handled (leave the log message
unchanged); locate the try/catch in the date parsing block that sets timestamp
(the OffsetDateTime.parse call) and replace the generic Exception handler with
DateTimeParseException to avoid swallowing unrelated exceptions.

In
`@java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/github/GitHubClient.java`:
- Around line 1044-1045: getFileContents in class GitHubClient builds a GraphQL
query by directly interpolating workspaceId and repoIdOrSlug (and similarly
branchOrCommit/expression) which risks breaking or manipulating the query if
values contain quotes; change the code to stop inlining raw identifiers and
instead either (a) use GraphQL variables for owner/name/expression/commit to
pass these values safely, or (b) if variables are not possible, apply consistent
escaping of quotes and backslashes for workspaceId, repoIdOrSlug, branchOrCommit
and the expression string before appending to the query; update the queryBuilder
usage in getFileContents accordingly and ensure any existing escape logic used
for expression is applied consistently to owner/name as well.

In
`@java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/model/VcsCommit.java`:
- Around line 6-13: VcsCommit's parentHashes field can be mutated by callers
because Java records don't defensive-copy collections; add a compact constructor
for the VcsCommit record that replaces the incoming parentHashes with an
immutable copy (e.g., this.parentHashes = List.copyOf(parentHashes)) after
validating for nulls as needed so the record always stores an unmodifiable list;
update any constructors/usage to rely on the record's compact constructor
behavior.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/BranchHealthScheduler.java`:
- Around line 67-69: The retryStaleBranches method in BranchHealthScheduler
currently holds a long-lived `@Transactional` across calls to
branchAnalysisProcessor.process(), which can tie up a JDBC connection; remove
the `@Transactional` from retryStaleBranches, add a small helper like
fetchStaleBranches annotated `@Transactional`(readOnly = true) that loads the list
of stale branches, then call that helper from retryStaleBranches and iterate
over the returned branches outside any transaction while invoking
branchAnalysisProcessor.process(...) (the processor can manage its own
transactions).

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/WebhookDeduplicationService.java`:
- Around line 99-107: The cleanupOldEntries method currently removes
mergePrNumbers entries based on the presence of keys in recentCommitAnalyses
(not a time window), so update the comment in
WebhookDeduplicationService.cleanupOldEntries to accurately state that
mergePrNumbers are evicted when their corresponding recentCommitAnalyses key is
removed (key-existence based), or alternatively implement time-based eviction by
storing timestamps on mergePrNumbers and comparing against now and
DEDUP_WINDOW_SECONDS if you intended a wider window; reference
cleanupOldEntries, recentCommitAnalyses, mergePrNumbers and DEDUP_WINDOW_SECONDS
when making the change.

In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/analysis/controller/GitGraphController.java`:
- Around line 44-108: The controller currently loads all commits and edges into
memory (getGitGraph uses commitNodeRepository.findByProjectIdWithAnalysis and
commitNodeRepository.findEdgeHashesByProjectId) which can OOM on large repos;
modify getGitGraph to accept optional pagination/query params (e.g., page & size
or a limit N) and pass a Pageable or a capped limit into repository calls,
update/overload the repository methods to return a Page or a limited result
(e.g., findByProjectIdWithAnalysis(Pageable) or findTopNByProjectIdWithAnalysis)
and similarly limit/find edges only for the returned commits, enforce a sensible
default and hard max (e.g., default 1000, max 5000) to prevent unbounded
results, and ensure the branch/edge lookup uses the same commit subset so the
response stays consistent.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0911a8 and 1d5b4c0.

📒 Files selected for processing (26)

java-ecosystem/libs/analysis-engine/src/main/java/module-info.java
java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java
java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/PullRequestAnalysisProcessor.java
java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/gitgraph/GitGraphSyncService.java
java-ecosystem/libs/analysis-engine/src/test/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessorTest.java
java-ecosystem/libs/analysis-engine/src/test/java/org/rostilos/codecrow/analysisengine/processor/analysis/PullRequestAnalysisProcessorTest.java
java-ecosystem/libs/core/src/main/java/module-info.java
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/codeanalysis/CodeAnalysis.java
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/gitgraph/CommitAnalysisStatus.java
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/gitgraph/CommitNode.java
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/branch/BranchRepository.java
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/codeanalysis/CodeAnalysisRepository.java
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/gitgraph/CommitNodeRepository.java
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/service/CodeAnalysisService.java
java-ecosystem/libs/core/src/main/resources/db/migration/1.5.0/V1.5.1__add_analysis_status_to_commit_node.sql
java-ecosystem/libs/core/src/main/resources/db/migration/1.5.0/V1.5.2__add_cloned_from_analysis_id.sql
java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/VcsClient.java
java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/bitbucket/cloud/BitbucketCloudClient.java
java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/github/GitHubClient.java
java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/model/VcsCommit.java
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/bitbucket/webhookhandler/BitbucketCloudBranchWebhookHandler.java
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/BranchHealthScheduler.java
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/WebhookDeduplicationService.java
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/webhookhandler/GitHubPrMergeWebhookHandler.java
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/webhookhandler/GitHubPullRequestWebhookHandler.java
java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/analysis/controller/GitGraphController.java

💤 Files with no reviewable changes (1)

java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/webhookhandler/GitHubPrMergeWebhookHandler.java

coderabbitai · 2026-02-23T10:06:09Z

...in/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java

+            List<BranchIssue> autoResolved = new ArrayList<>();
+            List<BranchIssue> needsAiReconciliation = new ArrayList<>();
+
+            for (BranchIssue issue : candidateBranchIssues) {
+                String filePath = issue.getCodeAnalysisIssue() != null 
+                        ? issue.getCodeAnalysisIssue().getFilePath() 
+                        : null;
+
+                if (filePath != null && !filesExistingInBranch.contains(filePath)) {
+                    // File was deleted from the branch — auto-resolve
+                    autoResolved.add(issue);
+                } else {
+                    // File still exists — needs AI to determine if issue persists
+                    needsAiReconciliation.add(issue);
+                }
+            }


⚠️ Potential issue | 🟡 Minor

Guard against null CodeAnalysisIssue before sending to AI.

When issue.getCodeAnalysisIssue() is null, filePath will be null (line 620-622), the issue falls through to needsAiReconciliation (line 629), and then BranchIssue::getCodeAnalysisIssue (line 662) produces a null element in candidateIssues. This null propagates to tempAnalysis.setIssues(...) and could cause NPEs downstream.

🛡️ Proposed fix — filter nulls

List<CodeAnalysisIssue> candidateIssues = needsAiReconciliation.stream() .map(BranchIssue::getCodeAnalysisIssue) + .filter(Objects::nonNull) .toList();

Also applies to: 660-663

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java` around lines 616 - 631, The loop over candidateBranchIssues adds issues with a null CodeAnalysisIssue into needsAiReconciliation and later into candidateIssues -> tempAnalysis.setIssues, causing NPEs; update the logic in BranchAnalysisProcessor to filter out or exclude BranchIssue instances where issue.getCodeAnalysisIssue() == null before adding to needsAiReconciliation (and likewise when building candidateIssues) so only issues with non-null getCodeAnalysisIssue() are sent to AI or set on tempAnalysis; reference BranchIssue, getCodeAnalysisIssue(), candidateBranchIssues, autoResolved, needsAiReconciliation, candidateIssues, and tempAnalysis.setIssues when making the change.

coderabbitai · 2026-02-23T10:06:09Z

...src/main/java/org/rostilos/codecrow/analysisengine/service/gitgraph/GitGraphSyncService.java

+            for (VcsCommit commit : commits) {
+                CommitNode node = nodeMap.get(commit.hash());
+                if (node != null && commit.parentHashes() != null) {
+                    for (String parentHash : commit.parentHashes()) {
+                        CommitNode parentNode = nodeMap.get(parentHash);
+                        if (parentNode == null) {
+                            // Parent might not be in the fetched limit, try to find it in DB
+                            parentNode = commitNodeRepository.findByProjectIdAndCommitHash(project.getId(), parentHash).orElse(null);
+                            if (parentNode != null) {
+                                nodeMap.put(parentHash, parentNode);
+                            }
+                        }
+                        if (parentNode != null && !node.getParents().contains(parentNode)) {
+                            node.getParents().add(parentNode);
+                            edgesUpdated = true;
+                        }
+                    }
+                }
+            }
+
+            if (edgesUpdated) {
+                commitNodeRepository.saveAll(nodeMap.values());
+            }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd 'CommitNode.java' --type f

Repository: rostilos/CodeCrow

Length of output: 157

🏁 Script executed:

cat -n java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/gitgraph/CommitNode.java

Repository: rostilos/CodeCrow

Length of output: 6972

node.getParents().contains(parentNode) — entity equality using identity-based comparison.

CommitNode does not override equals() or hashCode(), so it uses JPA's default identity-based comparison. The Set.contains() check on line 98 relies on object reference equality, not logical equality. If the same database record is loaded as different object instances (e.g., through separate repository queries), the set will not detect it as a duplicate, potentially allowing duplicate parent edges to be added to the collection. Consider overriding equals() and hashCode() in CommitNode to compare by primary key or business identity (e.g., id and commitHash).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/gitgraph/GitGraphSyncService.java` around lines 86 - 108, The contains() check on node.getParents() uses identity equality because CommitNode lacks equals()/hashCode(), so duplicate parent edges can be added; implement equals() and hashCode() in the CommitNode entity to use its persistent identity (e.g., id and/or commitHash) so logical equality is used, then the existing check in the loop (inside GitGraphSyncService iterating commits and using node.getParents().contains(parentNode)) will correctly detect duplicates before adding and commitNodeRepository.saveAll(nodeMap.values()) will persist the proper graph without duplicate edges.

coderabbitai · 2026-02-23T10:06:09Z

java-ecosystem/libs/core/src/main/java/module-info.java

+    // Git Graph
+    exports org.rostilos.codecrow.core.model.gitgraph;
+    exports org.rostilos.codecrow.core.persistence.repository.gitgraph;
+    opens org.rostilos.codecrow.core.model.gitgraph to org.hibernate.orm.core, spring.beans, spring.context, spring.core;


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for CommitNode or CommitAnalysisStatus references outside the core module # This should find usages in the analysisengine module if they exist rg -rn --type java 'CommitNode|CommitAnalysisStatus' --glob '!**/libs/core/**'

Repository: rostilos/CodeCrow

Length of output: 7750

🏁 Script executed:

#!/bin/bash # Check the actual entity class name in the gitgraph model package find java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/gitgraph -name "*.java" -type f | head -20

Repository: rostilos/CodeCrow

Length of output: 264

🏁 Script executed:

#!/bin/bash # Read the CommitNode class definition to verify the actual class name cat java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/gitgraph/CommitNode.java | head -30

Repository: rostilos/CodeCrow

Length of output: 999

Missing org.rostilos.codecrow.analysisengine in the opens directive for model.gitgraph.

Every other model package that the analysis engine touches is explicitly opened to org.rostilos.codecrow.analysisengine:

model.project → includes org.rostilos.codecrow.analysisengine (Line 41)

model.pullrequest → includes org.rostilos.codecrow.analysisengine (Line 74)

model.analysis → includes org.rostilos.codecrow.analysisengine (Line 75)

The analysis engine module directly references CommitNode in GitGraphSyncService, BranchAnalysisProcessor, and PullRequestAnalysisProcessor. It creates instances, accesses fields via getters (e.g., getCommitHash(), isAnalyzed(), getParents()), and manipulates them in collections. Without opening model.gitgraph to org.rostilos.codecrow.analysisengine, reflective field access will throw InaccessibleObjectException at runtime.

🔧 Proposed fix

- opens org.rostilos.codecrow.core.model.gitgraph to org.hibernate.orm.core, spring.beans, spring.context, spring.core; + opens org.rostilos.codecrow.core.model.gitgraph to org.hibernate.orm.core, spring.beans, spring.context, spring.core, org.rostilos.codecrow.analysisengine;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

opens org.rostilos.codecrow.core.model.gitgraph to org.hibernate.orm.core, spring.beans, spring.context, spring.core;

opens org.rostilos.codecrow.core.model.gitgraph to org.hibernate.orm.core, spring.beans, spring.context, spring.core, org.rostilos.codecrow.analysisengine;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/libs/core/src/main/java/module-info.java` at line 105, Add the missing opens for the package used reflectively by the analysis engine: update module-info.java so the opens directive for org.rostilos.codecrow.core.model.gitgraph includes the org.rostilos.codecrow.analysisengine module; this ensures reflective access to CommitNode (and its getters like getCommitHash(), isAnalyzed(), getParents()) used by GitGraphSyncService, BranchAnalysisProcessor, and PullRequestAnalysisProcessor succeeds at runtime.

coderabbitai · 2026-02-23T10:06:09Z

...-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/gitgraph/CommitNode.java

+public class CommitNode {
+
+    @Id
+    @GeneratedValue(strategy = GenerationType.IDENTITY)
+    @Column(nullable = false, updatable = false)
+    private Long id;
+
+    @ManyToOne(fetch = FetchType.LAZY)
+    @JoinColumn(name = "project_id", nullable = false)
+    private Project project;
+
+    @Column(name = "commit_hash", nullable = false, length = 40)
+    private String commitHash;
+
+    @Column(name = "author_name")
+    private String authorName;
+
+    @Column(name = "author_email")
+    private String authorEmail;
+
+    @Column(name = "commit_message", columnDefinition = "TEXT")
+    private String commitMessage;
+
+    @Column(name = "commit_timestamp")
+    private OffsetDateTime commitTimestamp;
+
+    @Column(name = "created_at", nullable = false, updatable = false)
+    private OffsetDateTime createdAt = OffsetDateTime.now();
+
+    // ── Analysis tracking ──────────────────────────────────────────────
+
+    @Enumerated(EnumType.STRING)
+    @Column(name = "analysis_status", nullable = false, length = 20)
+    private CommitAnalysisStatus analysisStatus = CommitAnalysisStatus.NOT_ANALYZED;
+
+    @Column(name = "analyzed_at")
+    private OffsetDateTime analyzedAt;
+
+    /**
+     * The CodeAnalysis that covered this commit. Null until analysisStatus = ANALYZED.
+     * Multiple commits may point to the same analysis (e.g. a delta diff covering 5 commits).
+     */
+    @ManyToOne(fetch = FetchType.LAZY)
+    @JoinColumn(name = "analysis_id")
+    private CodeAnalysis analysis;
+
+    // ── DAG edges ──────────────────────────────────────────────────────
+
+    @ManyToMany(fetch = FetchType.LAZY)
+    @JoinTable(
+            name = "git_commit_edge",
+            joinColumns = @JoinColumn(name = "child_commit_id"),
+            inverseJoinColumns = @JoinColumn(name = "parent_commit_id")
+    )
+    private Set<CommitNode> parents = new HashSet<>();


🛠️ Refactor suggestion | 🟠 Major

Missing equals/hashCode — important for Set<CommitNode> parents.

CommitNode instances are stored in a HashSet (the parents field), but the entity relies on default Object identity for equality. Two detached instances representing the same DB row won't be considered equal, which can cause duplicates in the set and Hibernate dirty-checking anomalies.

Implement equals/hashCode based on the natural key (project, commitHash) or the business identifier commitHash (within a project scope):

Suggested implementation

`@Override` public boolean equals(Object o) { if (this == o) return true; if (o == null || getClass() != o.getClass()) return false; CommitNode that = (CommitNode) o; return commitHash != null && commitHash.equals(that.commitHash) && project != null && project.getId() != null && project.getId().equals(that.getProject() != null ? that.getProject().getId() : null); } `@Override` public int hashCode() { return commitHash != null ? commitHash.hashCode() : 0; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/gitgraph/CommitNode.java` around lines 25 - 79, CommitNode instances are put into a HashSet (parents) but rely on Object identity, causing duplicates and Hibernate issues for detached entities; implement equals and hashCode on the CommitNode class using the business key (project and commitHash) so two instances representing the same DB row compare equal—use commitHash (and project.getId() / project) to compute equality and a stable hashCode, guard for nulls and class equality, and ensure equals/hashCode are declared on CommitNode (referencing the parents field, project, commitHash, and getProject/getId where needed).

coderabbitai · 2026-02-23T10:06:09Z

...bs/core/src/main/resources/db/migration/1.5.0/V1.5.1__add_analysis_status_to_commit_node.sql

+UPDATE git_commit_node cn
+SET    analysis_status = 'ANALYZED',
+       analyzed_at     = ca.created_at,
+       analysis_id     = ca.id
+FROM   code_analysis ca
+WHERE  cn.commit_hash = ca.commit_hash
+  AND  cn.project_id  = ca.project_id
+  AND  cn.analysis_status = 'NOT_ANALYZED';


⚠️ Potential issue | 🟠 Major

Back-fill UPDATE is non-deterministic when multiple analyses match a commit.

The code_analysis table's unique constraint is on (project_id, commit_hash, pr_number), so there can be multiple analyses for the same (project_id, commit_hash) with different pr_number values. PostgreSQL's UPDATE ... FROM picks an arbitrary matching row when the join is one-to-many. This means the analysis_id could be set to any one of the matching analyses unpredictably.

Consider adding a subquery to deterministically pick, e.g., the latest analysis:

Proposed fix

-UPDATE git_commit_node cn -SET analysis_status = 'ANALYZED', - analyzed_at = ca.created_at, - analysis_id = ca.id -FROM code_analysis ca -WHERE cn.commit_hash = ca.commit_hash - AND cn.project_id = ca.project_id - AND cn.analysis_status = 'NOT_ANALYZED'; +UPDATE git_commit_node cn +SET analysis_status = 'ANALYZED', + analyzed_at = ca.created_at, + analysis_id = ca.id +FROM ( + SELECT DISTINCT ON (project_id, commit_hash) + id, project_id, commit_hash, created_at + FROM code_analysis + ORDER BY project_id, commit_hash, created_at DESC +) ca +WHERE cn.commit_hash = ca.commit_hash + AND cn.project_id = ca.project_id + AND cn.analysis_status = 'NOT_ANALYZED';

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

UPDATE git_commit_node cn

SET analysis_status = 'ANALYZED',

analyzed_at = ca.created_at,

analysis_id = ca.id

FROM code_analysis ca

WHERE cn.commit_hash = ca.commit_hash

AND cn.project_id = ca.project_id

AND cn.analysis_status = 'NOT_ANALYZED';

UPDATE git_commit_node cn

SET analysis_status = 'ANALYZED',

analyzed_at = ca.created_at,

analysis_id = ca.id

FROM (

SELECT DISTINCT ON (project_id, commit_hash)

id, project_id, commit_hash, created_at

FROM code_analysis

ORDER BY project_id, commit_hash, created_at DESC

) ca

WHERE cn.commit_hash = ca.commit_hash

AND cn.project_id = ca.project_id

AND cn.analysis_status = 'NOT_ANALYZED';

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/libs/core/src/main/resources/db/migration/1.5.0/V1.5.1__add_analysis_status_to_commit_node.sql` around lines 19 - 26, The UPDATE is non-deterministic because UPDATE ... FROM can pick an arbitrary matching code_analysis row when multiple rows share (project_id, commit_hash); change the query to join to a deterministic subquery that picks one analysis per (project_id, commit_hash) (e.g., the latest by created_at or the max id). Replace the FROM code_analysis ca with a derived table such as (SELECT DISTINCT ON (project_id, commit_hash) * FROM code_analysis ORDER BY project_id, commit_hash, created_at DESC) latest_ca (or a GROUP BY selecting MAX(created_at) and corresponding id) and join git_commit_node cn to latest_ca on project_id and commit_hash, then SET analysis_status='ANALYZED', analyzed_at=latest_ca.created_at, analysis_id=latest_ca.id where cn.analysis_status='NOT_ANALYZED'.

coderabbitai · 2026-02-23T10:06:10Z

...src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/BranchHealthScheduler.java

+ * backoff based on the number of consecutive failures.
+ * 
+ * Backoff strategy: consecutiveFailures * BASE_BACKOFF_MINUTES


⚠️ Potential issue | 🟡 Minor

Documentation says "exponential backoff" but the implementation is linear.

Line 26 and 97 say "exponential backoff" but the actual formula at Line 137 is consecutiveFailures * BASE_BACKOFF_MINUTES, which is linear (10m, 20m, 30m…). True exponential backoff would be something like BASE_BACKOFF_MINUTES * 2^(failures-1) (10m, 20m, 40m, 80m…). The current linear scheme is fine, but the comments should match the implementation.

Also applies to: 97-98

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/BranchHealthScheduler.java` around lines 26 - 28, The Javadoc/comment in BranchHealthScheduler incorrectly calls the strategy "exponential backoff" while the code uses a linear formula; update all occurrences (the class comment and the method/comment near where BASE_BACKOFF_MINUTES and consecutiveFailures are referenced) to describe a linear backoff (e.g., "linear backoff: consecutiveFailures * BASE_BACKOFF_MINUTES") and make the comment explicitly match the implementation that uses consecutiveFailures * BASE_BACKOFF_MINUTES.

coderabbitai · 2026-02-23T10:06:10Z

...src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/BranchHealthScheduler.java

+            request.analysisType = AnalysisType.BRANCH_ANALYSIS;
+            // No sourcePrNumber — this is a scheduled retry, not triggered by a PR merge
+
+            branchAnalysisProcessor.process(request, null);


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find BranchAnalysisProcessor.java fd "BranchAnalysisProcessor.java" --type f

Repository: rostilos/CodeCrow

Length of output: 195

🏁 Script executed:

#!/bin/bash # Read the BranchAnalysisProcessor.java file cat -n "java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java"

Repository: rostilos/CodeCrow

Length of output: 50373

🏁 Script executed:

#!/bin/bash # Check the acquireLockWithWait method to see if it null-guards the consumer fd "AnalysisLockService.java" --type f

Repository: rostilos/CodeCrow

Length of output: 180

🏁 Script executed:

#!/bin/bash # Read the AnalysisLockService.java to check acquireLockWithWait method wc -l "java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/AnalysisLockService.java"

Repository: rostilos/CodeCrow

Length of output: 184

🏁 Script executed:

#!/bin/bash # Read the AnalysisLockService.java file cat -n "java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/AnalysisLockService.java"

Repository: rostilos/CodeCrow

Length of output: 17352

Add null guard for consumer parameter to prevent NPE.

The process() method uses consumer.accept() at lines 158, 172, 204, 235, 323, 654, 858, and 684 without checking if the consumer is null. Since the caller (BranchHealthScheduler.java:166) passes null, the first invocation will throw NullPointerException. Add a null check before each consumer.accept() call, or wrap them like:

if (consumer != null) { consumer.accept(Map.of(...)); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/BranchHealthScheduler.java` at line 166, The call branchAnalysisProcessor.process(request, null) passes a null consumer which causes a NullPointerException when the process(...) implementation calls consumer.accept(...); update the process(...) implementation (the method named process in the branch analysis processor class used by BranchHealthScheduler) to guard every consumer.accept(...) call (locations noted where consumer.accept is used) by checking consumer != null before calling accept (or use Optional.ofNullable(consumer).ifPresent(c -> c.accept(...))); ensure each invocation at the listed sites uses this null-check so passing null from BranchHealthScheduler no longer throws.

coderabbitai · 2026-02-23T10:06:10Z

...g/rostilos/codecrow/pipelineagent/github/webhookhandler/GitHubPullRequestWebhookHandler.java

+    private WebhookResult handlePrMergeEvent(
+            WebhookPayload payload,
+            Project project,
+            Consumer<Map<String, Object>> eventConsumer
+    ) {
+        log.info("Handling GitHub PR merge event for project {}", project.getId());
+
+        try {
+            String validationError = validateProjectConnections(project);
+            if (validationError != null) {
+                log.warn("Project {} validation failed: {}", project.getId(), validationError);
+                return WebhookResult.error(validationError);
+            }
+
+            if (!project.isBranchAnalysisEnabled()) {
+                log.info("Branch analysis is disabled for project {}", project.getId());
+                return WebhookResult.ignored("Branch analysis is disabled for this project");
+            }
+
+            String targetBranch = payload.targetBranch();
+            if (!shouldAnalyze(project, targetBranch, AnalysisType.BRANCH_ANALYSIS)) {
+                log.info("Skipping branch reconciliation: target branch '{}' does not match configured patterns for project {}", 
+                        targetBranch, project.getId());
+                return WebhookResult.ignored("Target branch '" + targetBranch + "' does not match configured analysis patterns");
+            }
+
+            String mergeCommitSha = payload.rawPayload().path("pull_request").path("merge_commit_sha").asText(null);
+            Long prNumber = Long.parseLong(payload.pullRequestId());
+
+            // Fallback to commit hash if merge_commit_sha not available
+            String commitHash = mergeCommitSha != null ? mergeCommitSha : payload.commitHash();
+
+            if (commitHash == null) {
+                log.warn("No commit hash available for PR merge event");
+                return WebhookResult.error("No commit hash available");
+            }
+
+            BranchProcessRequest request = new BranchProcessRequest();
+            request.projectId = project.getId();
+            request.targetBranchName = targetBranch;
+            request.commitHash = commitHash;
+            request.analysisType = AnalysisType.BRANCH_ANALYSIS;
+            request.sourcePrNumber = prNumber;
+
+            log.info("Processing branch reconciliation after PR merge: project={}, branch={}, commit={}, PR={}", 
+                    project.getId(), targetBranch, commitHash, prNumber);
+
+            Consumer<Map<String, Object>> processorConsumer = event -> {
+                if (eventConsumer != null) {
+                    eventConsumer.accept(event);
+                }
+            };
+
+            Map<String, Object> result = branchAnalysisProcessor.process(request, processorConsumer);
+
+            return WebhookResult.success("Branch reconciliation completed after PR #" + prNumber + " merge", result);
+
+        } catch (Exception e) {
+            log.error("Branch reconciliation failed for project {}", project.getId(), e);
+            return WebhookResult.error("Branch reconciliation failed: " + e.getMessage());
+        }
+    }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find BranchAnalysisProcessor class find . -type f -name "*BranchAnalysisProcessor*" | head -20

Repository: rostilos/CodeCrow

Length of output: 338

🏁 Script executed:

#!/bin/bash # Search for the BranchAnalysisProcessor class definition and its process method rg -n "class BranchAnalysisProcessor" --type=java -A 50

Repository: rostilos/CodeCrow

Length of output: 17739

🏁 Script executed:

#!/bin/bash # Search for locking or idempotency logic in any BranchAnalysisProcessor rg -n "Lock|Dedup|idempotent|synchronized|ReentrantLock|Semaphore" --type=java | grep -i "branch"

Repository: rostilos/CodeCrow

Length of output: 23714

🏁 Script executed:

#!/bin/bash # Check the current file for analysisLockService usage rg -n "analysisLockService" java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/webhookhandler/GitHubPullRequestWebhookHandler.java

Repository: rostilos/CodeCrow

Length of output: 454

🏁 Script executed:

#!/bin/bash # Check for WebhookDeduplicationService in the entire codebase rg -l "WebhookDeduplicationService" --type=java

Repository: rostilos/CodeCrow

Length of output: 352

🏁 Script executed:

#!/bin/bash # Get file size first wc -l java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java

Repository: rostilos/CodeCrow

Length of output: 199

🏁 Script executed:

#!/bin/bash # Find the process method in BranchAnalysisProcessor rg -n "public.*process\(" java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java -A 30

Repository: rostilos/CodeCrow

Length of output: 1618

🏁 Script executed:

#!/bin/bash # Check the specific lines of the GitHub handler where handlePrMergeEvent is called rg -n "handlePrMergeEvent\|branchAnalysisProcessor.process" java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/webhookhandler/GitHubPullRequestWebhookHandler.java -B 3 -A 3

Repository: rostilos/CodeCrow

Length of output: 43

🏁 Script executed:

#!/bin/bash # Check the Bitbucket handler for comparison cat -n java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/bitbucket/webhookhandler/BitbucketCloudBranchWebhookHandler.java | head -150

Repository: rostilos/CodeCrow

Length of output: 8814

Add webhook-level deduplication like the Bitbucket handler has.

The PR analysis path uses analysisLockService.acquireLock(), and the Bitbucket handler uses WebhookDeduplicationService to deduplicate at the webhook level before calling branchAnalysisProcessor.process(). While BranchAnalysisProcessor.process() does acquire a lock internally (line 123), the merge-event path here lacks webhook-level deduplication. If GitHub retries the webhook (e.g., on a 5xx or timeout), duplicate process() calls could queue up separate analyses even though only one will hold the lock at a time. This differs from Bitbucket's approach, which filters duplicate commits at the webhook level (see BitbucketCloudBranchWebhookHandler lines 106-110).

Consider adding WebhookDeduplicationService checks here to skip duplicate commit analyses before calling branchAnalysisProcessor.process(), similar to the Bitbucket handler.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/webhookhandler/GitHubPullRequestWebhookHandler.java` around lines 290 - 351, The GitHub PR merge handler (handlePrMergeEvent) lacks webhook-level deduplication before invoking branchAnalysisProcessor.process; integrate the WebhookDeduplicationService similarly to BitbucketCloudBranchWebhookHandler by checking if the commit is already being/was processed and short-circuiting with an ignored WebhookResult when duplicate, and mark the commit as in-flight before calling branchAnalysisProcessor.process(BranchProcessRequest, ...), ensuring you use the same deduplication key (project id + commit hash + AnalysisType.BRANCH_ANALYSIS or PR number) and release/cleanup the deduplication entry after processing or on exception so retries don’t queue duplicate analyses.

coderabbitai · 2026-02-23T10:06:10Z

...g/rostilos/codecrow/pipelineagent/github/webhookhandler/GitHubPullRequestWebhookHandler.java

+            String mergeCommitSha = payload.rawPayload().path("pull_request").path("merge_commit_sha").asText(null);
+            Long prNumber = Long.parseLong(payload.pullRequestId());
+


⚠️ Potential issue | 🟡 Minor

Long.parseLong(payload.pullRequestId()) can throw NumberFormatException if pullRequestId is null or non-numeric.

The Bitbucket handler (line 100) wraps the same pattern in a try/catch (NumberFormatException), but here it's unguarded. While pullRequestId should always be present for a pull_request event, webhook payloads are external input and defensive parsing is warranted — especially since a failure here would bubble up as an unhandled exception, masking the real issue in logs.

🛡️ Proposed fix

String mergeCommitSha = payload.rawPayload().path("pull_request").path("merge_commit_sha").asText(null); - Long prNumber = Long.parseLong(payload.pullRequestId()); + Long prNumber; + try { + prNumber = Long.parseLong(payload.pullRequestId()); + } catch (NumberFormatException | NullPointerException e) { + log.warn("Could not parse PR number from merge event payload: {}", payload.pullRequestId()); + return WebhookResult.error("Invalid or missing PR number in merge event"); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

String mergeCommitSha = payload.rawPayload().path("pull_request").path("merge_commit_sha").asText(null);

Long prNumber = Long.parseLong(payload.pullRequestId());

String mergeCommitSha = payload.rawPayload().path("pull_request").path("merge_commit_sha").asText(null);

Long prNumber;

try {

prNumber = Long.parseLong(payload.pullRequestId());

} catch (NumberFormatException | NullPointerException e) {

log.warn("Could not parse PR number from merge event payload: {}", payload.pullRequestId());

return WebhookResult.error("Invalid or missing PR number in merge event");

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/webhookhandler/GitHubPullRequestWebhookHandler.java` around lines 316 - 318, The unguarded Long.parseLong(payload.pullRequestId()) in GitHubPullRequestWebhookHandler can throw NumberFormatException for null/non-numeric pullRequestId; wrap the parse in a try/catch(NumberFormatException) (mirroring the Bitbucket handler pattern) and defensively handle nulls from payload.pullRequestId() before parsing, logging a clear error via the existing logger and aborting/returning from further processing (or using a safe fallback) so prNumber and subsequent logic do not throw an unhandled exception; refer to the mergeCommitSha and prNumber variables and payload.pullRequestId() call to locate and update the code.

coderabbitai · 2026-02-23T10:06:10Z

...er/src/main/java/org/rostilos/codecrow/webserver/analysis/controller/GitGraphController.java

+@RestController
+@RequestMapping("/api/v1/projects/{projectId}/git-graph")
+public class GitGraphController {
+
+    private final CommitNodeRepository commitNodeRepository;
+    private final BranchRepository branchRepository;
+
+    public GitGraphController(CommitNodeRepository commitNodeRepository,
+                              BranchRepository branchRepository) {
+        this.commitNodeRepository = commitNodeRepository;
+        this.branchRepository = branchRepository;
+    }
+
+    @GetMapping
+    @Transactional(readOnly = true)
+    public ResponseEntity<Map<String, Object>> getGitGraph(@PathVariable Long projectId) {


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n "@PreAuthorize|@Secured|@RolesAllowed" --type java -g '*Controller.java' -C 2

Repository: rostilos/CodeCrow

Length of output: 10568

🏁 Script executed:

find . -name "GitGraphController.java" -type f -exec cat -n {} +

Repository: rostilos/CodeCrow

Length of output: 5966

🏁 Script executed:

find . -path "*analysis/controller/*.java" -type f | head -20

Repository: rostilos/CodeCrow

Length of output: 449

🏁 Script executed:

rg -n "@PreAuthorize|@Secured|@RolesAllowed" --type java -g '*AnalysisIssueController.java|*PullRequestController.java' -C 2

Repository: rostilos/CodeCrow

Length of output: 209

🏁 Script executed:

rg -n "@PreAuthorize|@Secured|@RolesAllowed" --type java java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/analysis/controller/ -C 2

Repository: rostilos/CodeCrow

Length of output: 43

🏁 Script executed:

cat -n ./java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/analysis/controller/AnalysisIssueController.java | head -60

Repository: rostilos/CodeCrow

Length of output: 3309

🏁 Script executed:

cat -n ./java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/analysis/controller/PullRequestController.java | head -60

Repository: rostilos/CodeCrow

Length of output: 3671

Add authorization check to the endpoint.

The getGitGraph endpoint lacks @PreAuthorize, @Secured, or equivalent authorization annotation. Other controllers in the codebase consistently use @PreAuthorize("isAuthenticated()") or custom annotations like @IsWorkspaceMember to guard endpoints. Without access control, this endpoint exposes git-graph data (commits, branches, analysis results) for any projectId to any caller who knows it.

Add an appropriate authorization check, such as @IsWorkspaceMember (if the project is workspace-scoped) or @PreAuthorize("@projectSecurity.canAccess(#projectId, authentication)") to enforce access control.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/analysis/controller/GitGraphController.java` around lines 31 - 46, The getGitGraph endpoint in GitGraphController currently has no authorization; add an authorization annotation to enforce access control (e.g., annotate the getGitGraph method or the GitGraphController class with `@IsWorkspaceMember` if the resource is workspace-scoped, or use `@PreAuthorize`("@projectSecurity.canAccess(`#projectId`, authentication)" / `@PreAuthorize`("isAuthenticated()") as appropriate) so that the getGitGraph(`@PathVariable` Long projectId) method is guarded before executing repository calls to commitNodeRepository and branchRepository.

…e events from Bitbucket webhooks

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java (1)
851-895: Null CodeAnalysisIssue still propagates into candidateIssues despite the new null-check.

The new guard at line 852–854 correctly extracts filePath without an NPE, but when getCodeAnalysisIssue() is null, filePath is null and the condition filePath != null && !filesExistingInBranch.contains(filePath) is false — so the issue lands in needsAiReconciliation (line 861). The stream at lines 893–895 then maps these to null, producing a null element in candidateIssues passed to tempAnalysis.setIssues(...), which can cause an NPE downstream. Adding .filter(Objects::nonNull) to the stream (as suggested in the previous review) is still needed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java`
around lines 851 - 895, The stream mapping
needsAiReconciliation.stream().map(BranchIssue::getCodeAnalysisIssue).toList()
can produce null entries because BranchIssue.getCodeAnalysisIssue() may be null;
update the pipeline in BranchAnalysisProcessor to filter out null
CodeAnalysisIssue instances (e.g., add
.map(BranchIssue::getCodeAnalysisIssue).filter(Objects::nonNull) before
collecting) so candidateIssues contains no nulls before passing to
tempAnalysis.setIssues(...) and downstream code.

🧹 Nitpick comments (2)

java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/WebhookDeduplicationService.java (1)

87-94: Consider declaring mergePrNumbers alongside the other fields at the top of the class.

The field is declared in the middle of the class between two method blocks (after isDuplicateBranchEvent, before isDuplicateCommitAnalysis). Moving it next to recentCommitAnalyses (line 42) would improve readability and keep all field declarations grouped.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/WebhookDeduplicationService.java`
around lines 87 - 94, The mergePrNumbers field is declared mid-class between
methods (after isDuplicateBranchEvent) which breaks grouping of fields; move the
declaration of private final Map<String, Long> mergePrNumbers = new
ConcurrentHashMap<>() so it sits with the other fields (e.g., alongside
recentCommitAnalyses) at the top of the class, keeping all field declarations
together and leaving isDuplicateBranchEvent and isDuplicateCommitAnalysis
methods unchanged.

java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java (1)

659-668: Consider batching save calls in the per-file issue loop.

Each updated CodeAnalysisIssue is persisted individually; for files with many issues this means one DB round-trip per updated issue.

♻️ Proposed refactor — collect modified issues, then `saveAll`

+            List<CodeAnalysisIssue> toSave = new ArrayList<>();
             for (CodeAnalysisIssue issue : issues) {
                 if (issue.getLineNumber() == null || issue.getLineNumber() <= 0) continue;

                 int newLine = mapLineNumber(issue.getLineNumber(), hunks, fileDiff);
                 if (newLine != issue.getLineNumber()) {
                     issue.setLineNumber(newLine);
-                    codeAnalysisIssueRepository.save(issue);
+                    toSave.add(issue);
                     updated++;
                 }
             }
+            if (!toSave.isEmpty()) {
+                codeAnalysisIssueRepository.saveAll(toSave);
+            }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java`
around lines 659 - 668, In BranchAnalysisProcessor's loop that calls
mapLineNumber for each CodeAnalysisIssue, avoid calling
codeAnalysisIssueRepository.save(issue) per-issue; instead collect modified
issues into a list (e.g., modifiedIssues) as you update their lineNumber and
after the per-file issue loop call
codeAnalysisIssueRepository.saveAll(modifiedIssues) (or saveAllAndFlush if
immediate flush is required) to batch DB writes and reduce round-trips.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java`:
- Around line 804-822: In mapLineInsideHunk (BranchAnalysisProcessor) the
backslash meta-line ("\ No newline at end of file") is being treated as a
context line and increments oldCursor/newCursor; update the loop so when
inTargetHunk and a line startsWith("\\") you skip it (continue) without changing
oldCursor or newCursor; this preserves correct mapping for subsequent lines and
keeps handling for "-" (deletions), "+" (additions), and context lines unchanged
for variables oldCursor, newCursor, and targetOldLine.

In
`@java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/bitbucket/cloud/actions/GetCommitRangeDiffAction.java`:
- Around line 48-51: The log statement in GetCommitRangeDiffAction uses
baseCommitHash and headCommitHash without null checks, which can throw NPEs when
either is null; update the logging to use a null-safe shortening helper or
inline ternary checks (e.g., if hash==null then "null" else hash.length()>=7 ?
hash.substring(0,7) : hash) for both baseCommitHash and headCommitHash before
passing to log.info so the log never calls .length() or .substring() on a null
value; ensure the same null-safe logic is applied wherever these shortened
hashes are constructed for logging.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/WebhookDeduplicationService.java`:
- Around line 147-150: The current cleanup removes mergePrNumbers entries if
their key is not in recentCommitAnalyses, which can evict entries created by
recordMergePrNumber before isDuplicateCommitAnalysis ever populates
recentCommitAnalyses; change mergePrNumbers to maintain its own timestamps
(e.g., a parallel Map<String,Long> or a small wrapper object for entries) and
update recordMergePrNumber to set the timestamp when inserting; then modify the
cleanup logic (the removeIf block) to age out mergePrNumbers by comparing the
stored timestamp against a configured TTL instead of checking
recentCommitAnalyses, ensuring entries live long enough even if
recentCommitAnalyses is populated later.

---

Duplicate comments:
In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java`:
- Around line 851-895: The stream mapping
needsAiReconciliation.stream().map(BranchIssue::getCodeAnalysisIssue).toList()
can produce null entries because BranchIssue.getCodeAnalysisIssue() may be null;
update the pipeline in BranchAnalysisProcessor to filter out null
CodeAnalysisIssue instances (e.g., add
.map(BranchIssue::getCodeAnalysisIssue).filter(Objects::nonNull) before
collecting) so candidateIssues contains no nulls before passing to
tempAnalysis.setIssues(...) and downstream code.

---

Nitpick comments:
In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java`:
- Around line 659-668: In BranchAnalysisProcessor's loop that calls
mapLineNumber for each CodeAnalysisIssue, avoid calling
codeAnalysisIssueRepository.save(issue) per-issue; instead collect modified
issues into a list (e.g., modifiedIssues) as you update their lineNumber and
after the per-file issue loop call
codeAnalysisIssueRepository.saveAll(modifiedIssues) (or saveAllAndFlush if
immediate flush is required) to batch DB writes and reduce round-trips.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/WebhookDeduplicationService.java`:
- Around line 87-94: The mergePrNumbers field is declared mid-class between
methods (after isDuplicateBranchEvent) which breaks grouping of fields; move the
declaration of private final Map<String, Long> mergePrNumbers = new
ConcurrentHashMap<>() so it sits with the other fields (e.g., alongside
recentCommitAnalyses) at the top of the class, keeping all field declarations
together and leaving isDuplicateBranchEvent and isDuplicateCommitAnalysis
methods unchanged.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d5b4c0 and 65566c8.

📒 Files selected for processing (4)

java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java
java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/bitbucket/cloud/actions/GetCommitRangeDiffAction.java
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/controller/ProviderWebhookController.java
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/WebhookDeduplicationService.java

coderabbitai · 2026-02-23T12:32:07Z

...in/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java

+            if (!inTargetHunk) continue;
+
+            if (line.startsWith("-")) {
+                if (oldCursor == targetOldLine) {
+                    // This line was deleted — return original (AI reconciliation will handle)
+                    return targetOldLine;
+                }
+                oldCursor++;
+            } else if (line.startsWith("+")) {
+                newCursor++;
+            } else {
+                // Context line (starts with ' ' or is the line itself)
+                if (oldCursor == targetOldLine) {
+                    return newCursor;
+                }
+                oldCursor++;
+                newCursor++;
+            }
+        }


⚠️ Potential issue | 🟡 Minor

mapLineInsideHunk incorrectly advances cursors on \ No newline at end of file meta-lines.

The \ No newline at end of file indicator in unified diffs starts with \, so it misses both the "-" and "+" branches and falls into the else (context-line) path. This spuriously increments both oldCursor and newCursor, causing all subsequent mappings within the hunk to be off by one for each meta-line encountered. Files without trailing newlines — not uncommon in practice — will produce incorrect line-number reconciliation.

🐛 Proposed fix — skip backslash meta-lines

if (line.startsWith("-")) { if (oldCursor == targetOldLine) { return targetOldLine; } oldCursor++; } else if (line.startsWith("+")) { newCursor++; + } else if (line.startsWith("\\")) { + // "\ No newline at end of file" meta-line — not a content line, skip } else { // Context line (starts with ' ' or is the line itself) if (oldCursor == targetOldLine) { return newCursor; } oldCursor++; newCursor++; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java` around lines 804 - 822, In mapLineInsideHunk (BranchAnalysisProcessor) the backslash meta-line ("\ No newline at end of file") is being treated as a context line and increments oldCursor/newCursor; update the loop so when inTargetHunk and a line startsWith("\\") you skip it (continue) without changing oldCursor or newCursor; this preserves correct mapping for subsequent lines and keeps handling for "-" (deletions), "+" (additions), and context lines unchanged for variables oldCursor, newCursor, and targetOldLine.

coderabbitai · 2026-02-23T12:32:07Z

...n/java/org/rostilos/codecrow/vcsclient/bitbucket/cloud/actions/GetCommitRangeDiffAction.java

+        log.info("Fetching commit range diff: {}/{} from {} to {}",
+                displayWorkspace, repoSlug,
+                baseCommitHash.length() >= 7 ? baseCommitHash.substring(0, 7) : baseCommitHash,
+                headCommitHash.length() >= 7 ? headCommitHash.substring(0, 7) : headCommitHash);


⚠️ Potential issue | 🟠 Major

Potential NullPointerException on null commit hashes.

baseCommitHash.length() and headCommitHash.length() (lines 50–51) dereference the parameters directly. Line 44's string concatenation silently tolerates null (producing "null..null"), so execution can reach the new log statement with a null hash and immediately throw an NPE before the HTTP request is even made.

🐛 Proposed fix — add null-safe shortening

- log.info("Fetching commit range diff: {}/{} from {} to {}", - displayWorkspace, repoSlug, - baseCommitHash.length() >= 7 ? baseCommitHash.substring(0, 7) : baseCommitHash, - headCommitHash.length() >= 7 ? headCommitHash.substring(0, 7) : headCommitHash); + log.info("Fetching commit range diff: {}/{} from {} to {}", + displayWorkspace, repoSlug, + baseCommitHash != null && baseCommitHash.length() >= 7 ? baseCommitHash.substring(0, 7) : baseCommitHash, + headCommitHash != null && headCommitHash.length() >= 7 ? headCommitHash.substring(0, 7) : headCommitHash);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

log.info("Fetching commit range diff: {}/{} from {} to {}",

displayWorkspace, repoSlug,

baseCommitHash.length() >= 7 ? baseCommitHash.substring(0, 7) : baseCommitHash,

headCommitHash.length() >= 7 ? headCommitHash.substring(0, 7) : headCommitHash);

log.info("Fetching commit range diff: {}/{} from {} to {}",

displayWorkspace, repoSlug,

baseCommitHash != null && baseCommitHash.length() >= 7 ? baseCommitHash.substring(0, 7) : baseCommitHash,

headCommitHash != null && headCommitHash.length() >= 7 ? headCommitHash.substring(0, 7) : headCommitHash);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/bitbucket/cloud/actions/GetCommitRangeDiffAction.java` around lines 48 - 51, The log statement in GetCommitRangeDiffAction uses baseCommitHash and headCommitHash without null checks, which can throw NPEs when either is null; update the logging to use a null-safe shortening helper or inline ternary checks (e.g., if hash==null then "null" else hash.length()>=7 ? hash.substring(0,7) : hash) for both baseCommitHash and headCommitHash before passing to log.info so the log never calls .length() or .substring() on a null value; ensure the same null-safe logic is applied wherever these shortened hashes are constructed for logging.

coderabbitai · 2026-02-23T12:32:07Z

...in/java/org/rostilos/codecrow/pipelineagent/generic/service/WebhookDeduplicationService.java

+        // Also clean up stale PR number entries (use a wider window since
+        // repo:push may arrive slightly later than the dedup window)
+        mergePrNumbers.entrySet().removeIf(entry -> !recentCommitAnalyses.containsKey(entry.getKey()));
+    }


⚠️ Potential issue | 🟡 Minor

mergePrNumbers cleanup is fragile — entries can be evicted before the commit key exists in the cache.

The cleanup at line 149 removes mergePrNumbers entries whose key doesn't exist in recentCommitAnalyses. But recordMergePrNumber stores its key as "projectId:commitHash", and this key only appears in recentCommitAnalyses after isDuplicateCommitAnalysis is called (by the handler, downstream). If a cleanup cycle fires between recordMergePrNumber and isDuplicateCommitAnalysis, the PR number entry is evicted prematurely.

In practice this is mitigated by the low cleanup frequency (every 10th call) and the tight timing between the two calls, but the coupling is implicit and fragile.

A safer approach: give mergePrNumbers its own timestamp and age it out independently.

Proposed fix

- private final Map<String, Long> mergePrNumbers = new ConcurrentHashMap<>(); + private final Map<String, MergePrEntry> mergePrNumbers = new ConcurrentHashMap<>(); + + private record MergePrEntry(Long prNumber, Instant recordedAt) {}

private void cleanupOldEntries(Instant now) { recentCommitAnalyses.entrySet().removeIf(entry -> { long age = now.getEpochSecond() - entry.getValue().getEpochSecond(); return age > DEDUP_WINDOW_SECONDS * 2; }); - mergePrNumbers.entrySet().removeIf(entry -> !recentCommitAnalyses.containsKey(entry.getKey())); + mergePrNumbers.entrySet().removeIf(entry -> { + long age = now.getEpochSecond() - entry.getValue().recordedAt().getEpochSecond(); + return age > DEDUP_WINDOW_SECONDS * 2; + }); }

public void recordMergePrNumber(Long projectId, String commitHash, Long prNumber) { if (commitHash == null || commitHash.isBlank() || prNumber == null) { return; } String key = projectId + ":" + commitHash; - mergePrNumbers.put(key, prNumber); + mergePrNumbers.put(key, new MergePrEntry(prNumber, Instant.now())); }

public Long getRecordedMergePrNumber(Long projectId, String commitHash) { if (commitHash == null || commitHash.isBlank()) { return null; } - return mergePrNumbers.get(projectId + ":" + commitHash); + MergePrEntry entry = mergePrNumbers.get(projectId + ":" + commitHash); + return entry != null ? entry.prNumber() : null; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/WebhookDeduplicationService.java` around lines 147 - 150, The current cleanup removes mergePrNumbers entries if their key is not in recentCommitAnalyses, which can evict entries created by recordMergePrNumber before isDuplicateCommitAnalysis ever populates recentCommitAnalyses; change mergePrNumbers to maintain its own timestamps (e.g., a parallel Map<String,Long> or a small wrapper object for entries) and update recordMergePrNumber to set the timestamp when inserting; then modify the cleanup logic (the removeIf block) to age out mergePrNumbers by comparing the stored timestamp against a configured TTL instead of checking recentCommitAnalyses, ensuring entries live long enough even if recentCommitAnalyses is populated later.

…PullRequest integration in GitGraphController

rostilos · 2026-02-23T22:20:55Z

/codecrow analyze

codecrow-local · 2026-02-23T22:23:03Z

⚠️ Code Analysis Results

Quality Gate Default Quality Gate: 🔴 FAILED

HIGH Issues by Severity > 0 (actual: 1) - FAILED
MEDIUM Issues by Severity > 0 (actual: 2) - FAILED

Summary

Pull Request Review: 1.4.3 rc


Status	PASS WITH WARNINGS
Risk Level	HIGH
Review Coverage	34 files analyzed in depth
Confidence	HIGH

Executive Summary

This PR introduces version 1.4.3 rc, focusing on major updates to the GitGraph synchronization logic, Python-based inference orchestration, and webhook deduplication services. While the PR successfully consolidates PR merge handling and refactors core repository queries, it introduces significant risks regarding multi-tenant data isolation and inter-ecosystem schema consistency. The architectural shift toward a more complex orchestration layer requires immediate attention to state management and service boundaries.

Recommendation

Decision: PASS WITH WARNINGS

The PR is functionally comprehensive but contains high-risk thread-safety concerns in the verification_agent.py and logic leakage in the GitGraphController. Approval is conditional on refactoring the global cache to prevent cross-request data contamination and moving transformation logic into the service layer to maintain architectural integrity.

Issues Overview

Severity	Count
🔴 High	1	Critical issues requiring immediate attention
🟡 Medium	2	Issues that should be addressed
🔵 Low	3	Minor issues and improvements
✅ Resolved	2	Resolved issues

Analysis completed on 2026-02-23 22:23:01 | View Full Report | Pull Request

📋 Detailed Issues (6)

🔴 High Severity Issues

Id on Platform: 3357

Category: 🔒 Security

File: .../orchestrator/verification_agent.py:1

Thread-Unsafe Global State Leakage in Verification Agent

Thread-Unsafe Global State Leakage in Verification Agent
The use of a global '_FILE_CONTENTS_CACHE' in the verification agent, when called by the multi-threaded or async orchestrator, creates a race condition and potential data leakage between different users' code review requests.
Evidence: verification_agent.py line 13 defines the global cache; orchestrator.py invokes this agent as part of its processing pipeline.
Business impact: One customer's proprietary code could be leaked into another customer's code review session or cause the LLM to verify code against the wrong file versions, leading to incorrect security approvals.
Also affects: python-ecosystem/inference-orchestrator/service/review/orchestrator/orchestrator.py

💡 Suggested Fix

Move the cache into a class instance or pass a request-scoped context object to the verification agent to ensure isolation between concurrent reviews.

No suggested fix provided

View Issue Details

🟡 Medium Severity Issues

Id on Platform: 3359

Category: 🏗️ Architecture

File: .../controller/GitGraphController.java:1

Webserver-to-Service Logic Leakage

Webserver-to-Service Logic Leakage
The GitGraphController contains heavy mapping and transformation logic that duplicates responsibilities intended for the GitGraphSyncService or a dedicated DTO mapper. This bypasses the established service-oriented architecture.
Evidence: GitGraphController.java lines 75-104 containing entity-to-map transformation logic instead of calling a service method.
Business impact: Increased maintenance cost and inability to reuse graph transformation logic in other parts of the system (e.g., background export jobs).
Also affects: java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/webserver/analysis/service/GitGraphSyncService.java

💡 Suggested Fix

Move the transformation and PR resolution logic from the Controller to the GitGraphSyncService.

No suggested fix provided

View Issue Details

Id on Platform: 3363

Category: 🏗️ Architecture

File: .../codeanalysis/CodeAnalysisRepository.java:1

Inter-Ecosystem Schema Fragility

Inter-Ecosystem Schema Fragility
The refactoring of the Java repository layer's 'findLatestByProjectId' query logic introduces a silent dependency risk for the Python stages that consume this data.
Evidence: Java changes at line 66-73 modify the retrieval pattern; Python stage logic at line 1534 relies on the specific structural integrity of the returned JSON list/object for dismissal extraction.
Business impact: Changes in Java query optimization that alter the sort order or the nested structure of the result set will cause the Python orchestrator to fail silently or misinterpret dismissed issues, leading to regressive 'new' issues being reported.
Also affects: python-ecosystem/inference-orchestrator/service/review/orchestrator/stages.py

💡 Suggested Fix

Implement a shared contract/schema (e.g., Protobuf or a strict JSON Schema) and unit tests in the Python layer that validate against the actual Java repository output format.

No suggested fix provided

View Issue Details

🔵 Low Severity Issues

Id on Platform: 3358

Category: ⚡ Performance

File: .../controller/GitGraphController.java

Potential performance issue with large git graphs

The controller fetches all commit nodes, edges, pull requests, and branches for a project in a single request without pagination. For projects with thousands of commits, this will result in large memory consumption and slow response times due to the transformation logic in the controller.

💡 Suggested Fix

Consider implementing pagination or a limit on the number of commits returned, or move the DTO transformation to a service layer using a more efficient mapping strategy.

No suggested fix provided

View Issue Details

Id on Platform: 3360

Category: ⚡ Performance

File: .../service/WebhookDeduplicationService.java

Suboptimal cleanup logic for mergePrNumbers map

The cleanup of 'mergePrNumbers' depends on the state of 'recentCommitAnalyses'. If a key is evicted or never existed in the latter, the former may accumulate entries until the next full cleanup cycle, although they share the same key structure.

💡 Suggested Fix

Explicitly remove the entry from mergePrNumbers when it is deemed stale in the cleanup loop.

No suggested fix provided

View Issue Details

Id on Platform: 3362

Category: ⚡ Performance

File: .../codeanalysis/CodeAnalysisRepository.java

Suboptimal subquery for latest record retrieval

The refactoring to use a subquery for 'LIMIT 1' (findLatestByProjectId) can lead to suboptimal execution plans in some SQL dialects compared to the previous direct LIMIT approach. Specifically, correlated subqueries can sometimes bypass index efficiency if the optimizer treats them as independent units.

💡 Suggested Fix

If using Spring Data JPA 3.0+ or Hibernate 6, consider using the 'findFirstBy...' or 'findTopBy...' method naming convention instead of custom JPQL for single-result queries, which allows the framework to handle limit/offset natively and safely with EntityGraphs.

No suggested fix provided

View Issue Details

Files Affected

.../controller/GitGraphController.java: 2 issues
.../codeanalysis/CodeAnalysisRepository.java: 2 issues
.../orchestrator/stages.py: 1 issue
.../service/WebhookDeduplicationService.java: 1 issue
.../webhookhandler/GitHubPullRequestWebhookHandler.java: 1 issue
.../orchestrator/verification_agent.py: 1 issue

rostilos added 2 commits February 20, 2026 01:17

feat: Enhance multi-stage review process with cross-file issue handli…

c248cbf

…ng and severity calibration

feat: Refactor queries in CodeAnalysisRepository to improve performan…

f68b9f5

…ce by selecting IDs directly

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

codecrow-local bot deleted a comment from codecrow-ai bot Feb 20, 2026

codecrow-ai bot deleted a comment from codecrow-local bot Feb 20, 2026

feat: Implement LLM-driven verification and cross-batch issue dedupli…

a0911a8

…cation in multi-stage review process

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

codecrow-local bot deleted a comment from codecrow-ai bot Feb 22, 2026

rostilos and others added 2 commits February 23, 2026 10:30

Merge pull request #127 from rostilos/feature/CA-19-git-graphs

1d5b4c0

feat: Enhance code analysis and VCS integration

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

feat: Implement branch analysis job deduplication to prevent duplicat…

65566c8

…e events from Bitbucket webhooks

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

rostilos added 2 commits February 23, 2026 15:28

feat: Enhance deduplication logic in BranchAnalysisProcessor and add …

ce81798

…PullRequest integration in GitGraphController

feat: Add final deduplication logic for issues after all review stages

7976a8c

codecrow-local bot deleted a comment from codecrow-ai bot Feb 23, 2026

rostilos merged commit 7976a8c into main Feb 27, 2026
1 check failed

This was referenced Feb 28, 2026

Add prompt constants for issue reporting and review instructions #146

Merged

1.5.3 rc #152

Merged

		resolutionExplanation=prev_data.get('resolutionExplanation') or prev_data.get('resolvedDescription') if was_resolved else None,
		resolvedInCommit=prev_data.get('resolvedInCommit') or prev_data.get('resolvedByCommit') if was_resolved else None,

		# Global state for the tool to access file contents
		_FILE_CONTENTS_CACHE: Dict[str, str] = {}

	opens org.rostilos.codecrow.core.model.gitgraph to org.hibernate.orm.core, spring.beans, spring.context, spring.core;
	opens org.rostilos.codecrow.core.model.gitgraph to org.hibernate.orm.core, spring.beans, spring.context, spring.core, org.rostilos.codecrow.analysisengine;

		String mergeCommitSha = payload.rawPayload().path("pull_request").path("merge_commit_sha").asText(null);
		Long prNumber = Long.parseLong(payload.pullRequestId());

Conversation

rostilos commented Feb 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

rostilos commented Feb 20, 2026

Uh oh!

rostilos commented Feb 20, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

rostilos commented Feb 22, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

rostilos commented Feb 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 20, 2026 •

edited

Loading