feat(codedb): integrate CodeDB as local code search engine by port8080 · Pull Request #158 · sageox/ox

port8080 · 2026-03-09T02:49:45Z

Evolution

Thursday, March 5: ylow/CodeDB
Saturday, March 7: rupakm

Benchmark (SFrameRust repo)

Metric	CodeDB Rust	CodeDB Go	ox
Index time (s)	2.70	12.18	4.64
Peak RSS (MB)	183	977	457
Re-index time (s)	0.20	0.84	1.94
Search latency (ms)	6–102	38–52	52–69
Commits	194	194	194
Blobs	661	661	661

Session Recording

View session recording

Summary by CodeRabbit

New Features
- Local CodeDB: index, search, and raw-SQL commands for creating and querying a local code index.
- Rich code search: support for repo/file/lang/symbol filters, callers/callees, date filters, counts, and regex.
- Agent query: new --source option to run code, team-context, or combined searches and emit unified results.
Documentation
- Added CodeDB subsystem docs and usage examples.
Tools
- Benchmark scripts for CodeDB indexing and search performance.

coderabbitai · 2026-03-09T02:50:01Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f0e8a22c-95f1-4aaa-8f46-a94ff7f13edf

📥 Commits

Reviewing files that changed from the base of the PR and between 80585b6 and 4462689.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (28)

cmd/ox/agent_query.go
cmd/ox/codedb.go
cmd/ox/query.go
go.mod
internal/codedb/README.md
internal/codedb/bench.sh
internal/codedb/codedb.go
internal/codedb/index/gitops.go
internal/codedb/index/gitops_test.go
internal/codedb/index/indexer.go
internal/codedb/language/detect.go
internal/codedb/language/detect_test.go
internal/codedb/search/ast.go
internal/codedb/search/executor.go
internal/codedb/search/executor_test.go
internal/codedb/search/planner.go
internal/codedb/search/planner_test.go
internal/codedb/search/query.go
internal/codedb/search/query_test.go
internal/codedb/search/translate.go
internal/codedb/search/translate_test.go
internal/codedb/store/schema.go
internal/codedb/store/store.go
internal/codedb/store/store_test.go
internal/codedb/symbols/symbols.go
internal/codedb/symbols/symbols_stub.go
internal/paths/paths.go
scripts/bench-codedb.sh

📝 Walkthrough

Walkthrough

Adds a new local CodeDB subsystem (SQLite + Bleve + git indexing, query AST/planner/executor, symbol models and parsers), codedb CLI commands, and integrates CodeDB plus team-context vector search into the agent query flow via a new --source flag that returns a combined JSON payload.

Changes

Cohort / File(s)	Summary
CLI / Agent `cmd/ox/agent_query.go`, `cmd/ox/query.go`, `cmd/ox/codedb.go`	Adds `--source` flag (`all
CodeDB Facade & Paths `internal/codedb/codedb.go`, `internal/paths/paths.go`, `internal/codedb/README.md`	New CodeDB facade API (Open/Close/IndexRepo/IndexLocalRepo/ParseSymbols/Search/TranslateQuery/RawSQL) and `CodeDBDataDir()` helper; README documents subsystem and CLI.
Store & Schema `internal/codedb/store/...` `internal/codedb/store/store.go`, `internal/codedb/store/schema.go`, `internal/codedb/store/store_test.go`	Implements Store wrapping SQLite + two Bleve indexes (code/diff), schema creation/migration, Bleve index init, SQL helpers, Close semantics; tests for initialization/idempotence.
Indexing & Git ops `internal/codedb/index/...` `internal/codedb/index/indexer.go`, `internal/codedb/index/gitops.go`, `internal/codedb/index/gitops_test.go`	Repo URL utilities, CloneOrFetch, and full indexing pipeline to ingest commits/blobs/diffs/file_revs and parse symbols; includes repo URL tests.
Language & Symbols `internal/codedb/language/detect.go`, `internal/codedb/language/detect_test.go`, `internal/codedb/symbols/...`	File-extension language detection and tests; symbol/ref data models and CGO-disabled stub Extract/SupportedLanguages.
Query AST / Parser `internal/codedb/search/ast.go`, `internal/codedb/search/query.go`, `internal/codedb/search/query_test.go`	Adds search AST, Filters, ParsedQuery and ParseQuery (tokenization, filters, regex/OR support) with extensive unit tests.
Translate / Planner / Executor `internal/codedb/search/translate.go`, `internal/codedb/search/planner.go`, `internal/codedb/search/executor.go`, `internal/codedb/search/*_test.go`	SQL translation layer, planning (JoinStrategy: SQL/Bleve/Intersect), and execution (SQL-only, Bleve-only, intersect) including enrichment and comprehensive tests.
Search ABI & Symbols `internal/codedb/codedb.go`, `internal/codedb/search/`, `internal/codedb/symbols/`	Exposes Search/TranslateQuery and result types used by agent integration; symbol models and placeholder parsing stubs.
Dependencies & Tooling `go.mod`, `scripts/bench-codedb.sh`, `internal/codedb/bench.sh`	Adds Bleve, go-git, modernc/sqlite and many transitive modules; adds benchmarking scripts for CodeDB.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant Agent as Agent Query
    participant TC as Team Context API
    participant CodeDB as CodeDB Facade
    participant Store as Store (SQLite + Bleve)

    User->>Agent: run agent query (--source)
    Agent->>Agent: parse flags, build queryArgs
    alt source == "teamctx" or "all"
        Agent->>TC: vector search request
        TC-->>Agent: team context results
    end
    alt source == "code" or "all"
        Agent->>CodeDB: Search(ctx, input)
        CodeDB->>CodeDB: ParseQuery -> Plan
        CodeDB->>Store: execute plan (Bleve / SQL / intersect)
        par Bleve path
            Store->>Store: Bleve query -> hits
            Store-->>CodeDB: hits + fragments
            CodeDB->>Store: SQL enrich (file_revs/commits)
            Store-->>CodeDB: enriched results
        and SQL-only path
            Store->>Store: execute SQL
            Store-->>CodeDB: SQL results
        end
        CodeDB-->>Agent: code search results
    end
    Agent->>Agent: combine results into single JSON
    Agent-->>User: unified response

sequenceDiagram
    autonumber
    actor User
    participant CLI as codedb CLI
    participant CodeDB as CodeDB Facade
    participant Git as Git
    participant Index as Bleve
    participant DB as SQLite

    User->>CLI: codedb index <url>
    CLI->>CodeDB: Open(root) -> IndexRepo(ctx, url)
    CodeDB->>Git: CloneOrFetch(url)
    Git-->>CodeDB: repo path
    CodeDB->>DB: upsert repo/commits/diffs/blobs
    CodeDB->>Index: index code/diff docs
    Index-->>CodeDB: indexed
    CodeDB->>CodeDB: ParseSymbols(progress)
    CodeDB->>DB: write symbols/symbol_refs
    DB-->>CodeDB: done
    CodeDB-->>CLI: summary (blobs/symbols parsed)

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

Rebase PR #158 (codedb) onto origin/main #163 — Related: this PR adds --source and CodeDB integration which overlaps the issue about --source integration and coexistence of query/code paths.

Possibly related PRs

feat(agent): add query command for semantic search over team knowledge #143 — Strongly related: touches agent query flow and team-context search that this PR extends.
feat: add 'ox query' as first-class top-level command #156 — Related: modifies cmd/ox/agent_query.go and intersects with the query execution/refactor in this PR.

Suggested labels

session, sageox

Poem

🐰
I hopped through repos, leaves of code,
I thumped on Bleve and SQLite road,
I parsed each symbol, index sowed,
Agent and team-context now bode,
Carrots of results in JSON stowed.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.06% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'feat(codedb): integrate CodeDB as local code search engine' directly and clearly describes the main change: integrating CodeDB as a local code search engine. It is specific, concise, and accurately reflects the primary objective of the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ajit/codedb-go-ox

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 15

🧹 Nitpick comments (9)

internal/codedb/store/store.go (1)

88-95: Prefer errors.Is for the Bleve missing-index check.

A direct equality check is brittle if bleve.Open ever wraps ErrorIndexPathDoesNotExist.

♻️ Suggested fix

 import (
 	"context"
 	"database/sql"
+	"errors"
 	"fmt"
 	"os"
 	"path/filepath"
@@
 func openOrCreateBleveIndex(path string) (bleve.Index, error) {
 	idx, err := bleve.Open(path)
-	if err == bleve.ErrorIndexPathDoesNotExist {
+	if errors.Is(err, bleve.ErrorIndexPathDoesNotExist) {
 		mapping := bleve.NewIndexMapping()
 		return bleve.New(path, mapping)
 	}
 	return idx, err
 }

As per coding guidelines, **/*.go: Use errors.Is() and errors.As() for error handling and wrap with context.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/codedb/store/store.go` around lines 88 - 95, The equality check
against bleve.ErrorIndexPathDoesNotExist in openOrCreateBleveIndex is brittle;
update the error handling to use errors.Is(err,
bleve.ErrorIndexPathDoesNotExist) when deciding to create a new index, and
wrap/return errors with context where appropriate (preserve the existing
bleve.Open and bleve.New calls but replace the direct == check with errors.Is
and add a contextual error wrap on failures).

internal/codedb/index/gitops.go (2)

14-46: Consider extracting common URL normalization logic.

RepoDirFromURL and RepoNameFromURL share identical prefix-stripping and normalization logic. Consider extracting this to a helper to reduce duplication.

♻️ Suggested refactor

func normalizeRepoURL(url string) (string, error) {
	stripped := url
	for _, prefix := range []string{"https://", "http://", "git://"} {
		if strings.HasPrefix(stripped, prefix) {
			stripped = strings.TrimPrefix(stripped, prefix)
			break
		}
	}
	stripped = strings.TrimRight(stripped, "/")
	stripped = strings.TrimSuffix(stripped, ".git")
	if stripped == "" {
		return "", fmt.Errorf("invalid repo URL: %s", url)
	}
	return stripped, nil
}

func RepoDirFromURL(url string) (string, error) {
	name, err := normalizeRepoURL(url)
	if err != nil {
		return "", err
	}
	return name + ".git", nil
}

func RepoNameFromURL(url string) (string, error) {
	return normalizeRepoURL(url)
}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/codedb/index/gitops.go` around lines 14 - 46, Extract the shared URL
normalization logic from RepoDirFromURL and RepoNameFromURL into a helper (e.g.,
normalizeRepoURL) that performs prefix stripping for "https://", "http://",
"git://", trims trailing slashes, removes ".git", and returns an error for empty
results; then update RepoDirFromURL to call normalizeRepoURL and append ".git"
to its result, and update RepoNameFromURL to return normalizeRepoURL's result
directly, preserving existing error handling.

75-85: Use errors.Is() for sentinel error comparison.

Per coding guidelines, use errors.Is() instead of direct equality comparison for error handling.

♻️ Proposed fix

 func fetch(repo *git.Repository) error {
 	err := repo.Fetch(&git.FetchOptions{
 		RemoteName: "origin",
 		RefSpecs:   []config.RefSpec{"+refs/*:refs/*"},
 		Force:      true,
 	})
-	if err == git.NoErrAlreadyUpToDate {
+	if errors.Is(err, git.NoErrAlreadyUpToDate) {
 		return nil
 	}
 	return err
 }

Also add "errors" to the import block.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/codedb/index/gitops.go` around lines 75 - 85, The fetch function
compares the sentinel error git.NoErrAlreadyUpToDate using ==; change this to
use errors.Is(err, git.NoErrAlreadyUpToDate) and update the import block to
include "errors" so the comparison is done properly (modify the fetch function
and its error check referencing git.NoErrAlreadyUpToDate).

cmd/ox/codedb.go (2)

39-43: Consider adding progress callback to IndexOptions.

IndexOptions is created empty, but the IndexOptions struct has a Progress field. Without it, users don't see progress during the (potentially long) indexing phase, only the initial "Indexing..." message.

♻️ Proposed enhancement

 		fmt.Fprintf(os.Stderr, "Indexing %s...\n", url)
-		opts := index.IndexOptions{}
+		opts := index.IndexOptions{
+			Progress: func(msg string) {
+				fmt.Fprintf(os.Stderr, "  %s\n", msg)
+			},
+		}
 		if err := db.IndexRepo(context.Background(), url, opts); err != nil {
 			return fmt.Errorf("index: %w", err)
 		}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cmd/ox/codedb.go` around lines 39 - 43, Indexing is currently invoked with an
empty IndexOptions so the Progress callback isn't used; update the call site
where IndexOptions is created (the IndexOptions{} passed to db.IndexRepo) to
provide a Progress function that reports progress (e.g., writes to stderr or
updates a spinner) during indexing; implement the callback by assigning
IndexOptions.Progress to a function that accepts progress events and logs or
prints them, then pass that populated IndexOptions into db.IndexRepo so users
see live progress.

84-109: Consider adding a warning about SQL injection for the sql subcommand.

The sql subcommand executes raw SQL directly. While this is an intentional power-user feature, consider adding a note in the help text or a debug log indicating this is for debugging/development use only.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cmd/ox/codedb.go` around lines 84 - 109, Add a clear SQL-injection warning to
the codedbSQLCmd definition and log a runtime warning before executing raw SQL:
update the command help (e.g., Short or add a Long field on codedbSQLCmd/Use) to
state this runs arbitrary SQL and is intended for debugging/development only,
and insert a runtime warning (e.g., via log.Printf or a processLogger) at the
start of RunE right before calling db.RawSQL(args[0]) to remind users this is
unsafe for untrusted input; reference codedbSQLCmd, its RunE function, and the
db.RawSQL call to locate where to add the help text and log.

internal/codedb/store/store_test.go (1)

9-52: Consider adding error path tests.

The current tests cover happy paths well. Per coding guidelines, consider adding tests for error scenarios such as:

Opening a path that cannot be created (e.g., under a non-existent parent with restricted permissions)
Schema creation failure scenarios

This would improve resilience against regression.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/codedb/store/store_test.go` around lines 9 - 52, Add negative tests
to cover Open error paths: create a test that attempts Open on a directory that
cannot be created (e.g., create a parent dir, chmod it to remove write
permission, then call Open on a child path) and assert Open returns a non-nil
error; also add a test that simulates schema creation failure for Open (for
example by creating a read-only metadata.db file or using a mock that causes
Exec/QueryRow used in Open/initialization to return an error) and assert Open
returns the expected error. Reference the Open function and the initialization
steps that create metadata.db and the repos table when adding these tests.
Ensure tests clean up/reset permissions so CI won’t be left in a bad state.

cmd/ox/agent_query.go (1)

167-181: Consider logging when CodeDB index doesn't exist.

When dataDir doesn't exist (line 170-172), the function silently returns empty results. Consider adding a debug log to help users understand why code results may be empty.

♻️ Proposed enhancement

 func queryCodeDB(qa *queryArgs) ([]search.Result, error) {
 	dataDir := paths.CodeDBDataDir()
 	if _, err := os.Stat(dataDir); os.IsNotExist(err) {
+		slog.Debug("codedb index not found, skipping code search", "path", dataDir)
 		return nil, nil // no index yet, return empty
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cmd/ox/agent_query.go` around lines 167 - 181, The function queryCodeDB
currently returns nil when the CodeDB dataDir (from paths.CodeDBDataDir())
doesn't exist; add a debug/info log before returning so callers can see why
searches are empty. Detect the os.IsNotExist(err) branch in queryCodeDB, log a
message that includes the dataDir path (and optionally the error) to explain "no
index yet", then return nil as before; reference the symbols queryCodeDB,
dataDir, paths.CodeDBDataDir, and the os.Stat check to locate where to insert
the log.

internal/codedb/search/executor_test.go (1)

47-305: Consider a table-driven executor matrix for the planner branches.

The repeated open/seed/parse/execute flow makes it hard to add regressions for the tricky paths. I'd add rows for regex code search, select:file/select:repo, diff rev:, and a repo whose default ref is not main.

Based on learnings: Applies to **/*_test.go : Use table-driven tests in Go test files; Every bug fix MUST include a regression test unless existing tests already cover the failure mode.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/codedb/search/executor_test.go` around lines 47 - 305, Refactor the
repeated openTestStore/seedTestData/ParseQuery/Execute pattern in the tests
(e.g., TestExecuteCommitSearch, TestExecuteSymbolSearch, TestExecuteWithCount,
etc.) into a single table-driven test that iterates rows describing input query
strings and expected assertions; add rows to cover regex code search,
select:file and select:repo queries, diff queries using rev:, and a case for a
repo with a non-main default ref, and ensure the table-driven test still uses
openTestStore, seedTestData, ParseQuery and Execute for each row so new
regressions are automatically exercised.

internal/codedb/search/query_test.go (1)

8-349: Please table-drive these parser cases and add the quoted-filter row.

The coverage breadth is good, but the one-test-per-shape layout makes it easy to miss combinations like message:"fix bug" and other filter/value edge cases.

Based on learnings: Applies to **/*_test.go : Use table-driven tests in Go test files.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/codedb/search/query_test.go` around lines 8 - 349, Convert the many
one-off tests in TestTokenize and the TestParse* functions that call ParseQuery
into table-driven tests: create a single slice of test cases (structs with name,
input, expected fields like SearchPattern, Filters, Type, IsRegex, SearchTerms,
HasEmptyPattern, error assertions) and iterate to run subtests with t.Run;
replace the individual TestParse* functions by adding their cases to that table
(keeping references to tokenize and ParseQuery for the tokenizer-specific
cases), and add a new case for a quoted filter like `message:"fix bug"`
verifying Filters.Message (or the equivalent filter field) and that the quoted
value is preserved as a single filter value; ensure error cases include expected
error substrings and negation/count/regex edge-cases are represented as table
entries.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/ox/agent_query.go`:
- Around line 122-126: The defined type codeSearchResult is unused; either
remove it or apply it to the combinedResponse's CodeResults field so results
carry the Source tag. Update the struct that defines combinedResponse (the field
CodeResults currently typed as []search.Result) to use []codeSearchResult if you
intend to include Source metadata, or delete the codeSearchResult type if not
needed, and adjust any serialization/JSON usages accordingly.

In `@internal/codedb/index/gitops_test.go`:
- Around line 33-41: Refactor TestRepoNameFromURL into a table-driven test that
iterates cases including both the existing happy path and at least one failing
input (e.g., an invalid URL string) and assert both returned value and error
behavior; update the TestRepoNameFromURL function to loop over a slice of
structs (with fields like name, input, wantRepo, wantErr) and for each case call
RepoNameFromURL, check whether err presence matches wantErr and compare got to
wantRepo when no error as needed.

In `@internal/codedb/index/indexer.go`:
- Around line 465-472: The call to insertDiff is passing nullBlob.Int64 (0) for
newBlobDBID which is confusing and error-prone; update insertDiff to accept
sql.NullInt64 for the new blob as well (change signature newBlobDBID int64 ->
newBlobDBID sql.NullInt64 and adjust callers), or at minimum pass an explicit
sql.NullInt64{Valid:false} when there is no new blob instead of nullBlob.Int64;
update the body of insertDiff (where it builds newBlobPtr/newBlobID) to read
from the sql.NullInt64 newBlobDBID and keep hasNew/hasOld semantics intact
(references: insertDiff, newBlobDBID, sql.NullInt64, hasNew).
- Around line 216-233: The loadKnownCommits function iterates rows but never
checks rows.Err() for iteration errors; update loadKnownCommits to defer
rows.Close() immediately after the Query succeeds, then after the rows.Next()
loop call rows.Err() and return any non-nil error (alongside the current known
map handling) so iteration errors are propagated; reference the rows variable
and the loadKnownCommits function when making the change.
- Around line 606-608: The code currently ignores the error from
st.tx.QueryRow(...).Scan(&parsed), which can mask DB issues; capture the
returned error, handle sql.ErrNoRows by setting parsed = 0 (or appropriate
default), and for any other error either return it or log it and bail out.
Update the call site around st.tx.QueryRow/Scan to check err :=
st.tx.QueryRow("SELECT parsed FROM blobs WHERE id = ?", blobDBID).Scan(&parsed);
if err != nil { if err == sql.ErrNoRows { parsed = 0 } else { return err (or
processLogger/error return per surrounding function) } } so failures are
surfaced instead of silently ignored.

In `@internal/codedb/README.md`:
- Around line 16-33: The fenced code blocks in internal/codedb/README.md are
missing language identifiers; update the first fenced block (the query examples
containing lines like "spawn", "lang:rust file:*.rs fn", "/err\d+/", "foo OR
bar") to use ```text and update the CLI fenced block (the snippet with "ox
codedb index <url>", "ox codedb search <query>", "ox codedb sql <sql>") to use
```bash so markdownlint warnings are resolved.

In `@internal/codedb/search/executor.go`:
- Around line 119-123: The current search uses a fixed Bleve window (searchReq
:= bleve.NewSearchRequestOptions(bleveQuery, plan.Limit*5, 0, false) and
idx.Search) and then post-filters during enrichment, which can drop matches
outside that window; change the logic to page through Bleve results (repeatedly
call idx.Search with adjustable From/Size or batch offset) until you have
collected enough post-filtered results (respecting plan.Count/plan.Limit and
select projections) or Bleve is exhausted, applying repo/file/lang filters
during enrichment for each batch and only stopping when the requested number of
final results is gathered; update the code paths around searchReq, idx.Search,
and the enrichment/select projection application to use this batched/looping
strategy.
- Around line 118-123: executePlanBleve currently always uses
bleve.NewQueryStringQuery which ignores plan.IsRegex; update executePlanBleve to
branch on plan.IsRegex and when true construct a bleve regexp query (use
bleve.NewRegexpQuery(plan.BleveQuery) and set the query Field to "content" or
otherwise mirror how non-regex queries target the content field), otherwise keep
using bleve.NewQueryStringQuery; ensure the resulting query is used to build
searchReq (and keep existing options like Highlight/Fields) so regex searches
and diff searches behave correctly.

In `@internal/codedb/search/planner.go`:
- Around line 139-177: planDiffSearch currently ignores the ParsedQuery rev
filter so diff searches like "type:diff rev:feature" are not constrained; update
planDiffSearch to treat query.Filters.Rev (and NegRev if present) as a metadata
filter (include them in hasMetadataFilters) and ensure the rev constraint is
applied to the SQL/SQLParams returned (for example by adding a "rev = ?" or "rev
!= ?" clause to translated.SQL and appending the corresponding value(s) to
translated.Params / sqlParams when translated != nil), and if using the Bleve
side, ensure the rev constraint is encoded into BleveQuery as well so both
ExecutionPlan.SQL/SQLParams and ExecutionPlan.BleveQuery respect the rev filter.

In `@internal/codedb/search/query.go`:
- Around line 19-33: The tokenizer currently flushes the current buffer before
reading a quoted value which splits "key:" from its quoted value; in the '"'
case inside tokenize (use symbols tokens, current, quoted, runes, i) change the
logic so that if current.Len() > 0 and the last character of current is ':' you
do NOT flush it first but instead read the quoted text and append a single token
combining current + `"` + quoted + `"` (preserving the colon), otherwise
preserve the existing behavior (flush current then push the quoted token).
Ensure you update the append logic to handle both paths and advance i past the
closing quote as before.

In `@internal/codedb/search/translate.go`:
- Around line 61-68: The helper resolveRevRef currently maps an empty rev to
"refs/heads/main", which wrongly assumes repos use "main"; change resolveRevRef
so an empty rev is not normalized to main—either return the empty string (so
callers can skip ref filtering and let the repo's actual default/HEAD be used)
or implement a lookup to resolve the repo's real default/HEAD ref via repository
metadata before prepending "refs/heads/". Update resolveRevRef and call sites
that rely on its output to treat "" as "no ref filter" (or to use the fetched
default ref) and only prepend "refs/heads/" when a non-empty branch name is
provided.

In `@internal/codedb/store/store.go`:
- Around line 31-38: The directories created for reposDir and bleveDir are too
permissive (0o755); change the os.MkdirAll calls in store.go that create
reposDir and bleveDir to use owner-only mode (0o700) and, to be safe, call
os.Chmod(dir, 0o700) after creation (checking and returning any error) so
existing dirs are tightened as well; update the os.MkdirAll and add the os.Chmod
error handling around those symbols to enforce owner-only access.

In `@scripts/bench-codedb.sh`:
- Around line 342-344: The current assignment to REPOS when args are passed can
overwrite with invalid keys and later trigger an unhelpful associative-array
error under set -u; before setting REPOS in the if block, validate each "$@"
against the known list (e.g. SUPPORTED_REPOS or SUPPORTED_KEYS) and if any key
is invalid print a clear error showing the supported values and exit non‑zero;
update the if [[ $# -gt 0 ]] block to loop/validate args, only assign
REPOS=("$@") when all keys are valid, and ensure the error path explains the
allowed repository keys.
- Around line 202-248: The script temporarily disables errexit around the full
and incremental indexing calls (set +e) which lets run_timed (called with
index_cmd) fail silently; restore failure handling by checking run_timed's exit
status immediately after each invocation: after run_timed "$index_cmd" >
"$logfile" 2>&1 and after run_timed "$index_cmd" > "$reindex_log" 2>&1 capture
its return ($?) and if non-zero write an error message (including
logfile/reindex_log and tool/repo context) and exit non-zero, or simply
re-enable errexit and call run_timed under set -e so failures abort before
recording R_INDEX_TIME/R_REINDEX_TIME and continuing to search; reference the
run_timed invocations, index_cmd variable, reindex_log variable, and the
R_INDEX_TIME/R_REINDEX_TIME assignments when adding the checks.
- Around line 162-186: The run_searches function currently ignores the command
exit status and always appends elapsed_ms to times, causing failed searches to
skew latency; update run_searches to capture the command's exit code (e.g.,
check $? immediately after the search command), and only append elapsed_ms to
the times array when the exit code is zero; for non-zero exits, increment a
failure counter or skip adding a sample and ensure the median computation (the
sorted/times logic and R_SEARCH assignment) handles the case of zero successful
samples (e.g., record a sentinel like "FAILED" or leave R_SEARCH unset) to avoid
indexing errors.

---

Nitpick comments:
In `@cmd/ox/agent_query.go`:
- Around line 167-181: The function queryCodeDB currently returns nil when the
CodeDB dataDir (from paths.CodeDBDataDir()) doesn't exist; add a debug/info log
before returning so callers can see why searches are empty. Detect the
os.IsNotExist(err) branch in queryCodeDB, log a message that includes the
dataDir path (and optionally the error) to explain "no index yet", then return
nil as before; reference the symbols queryCodeDB, dataDir, paths.CodeDBDataDir,
and the os.Stat check to locate where to insert the log.

In `@cmd/ox/codedb.go`:
- Around line 39-43: Indexing is currently invoked with an empty IndexOptions so
the Progress callback isn't used; update the call site where IndexOptions is
created (the IndexOptions{} passed to db.IndexRepo) to provide a Progress
function that reports progress (e.g., writes to stderr or updates a spinner)
during indexing; implement the callback by assigning IndexOptions.Progress to a
function that accepts progress events and logs or prints them, then pass that
populated IndexOptions into db.IndexRepo so users see live progress.
- Around line 84-109: Add a clear SQL-injection warning to the codedbSQLCmd
definition and log a runtime warning before executing raw SQL: update the
command help (e.g., Short or add a Long field on codedbSQLCmd/Use) to state this
runs arbitrary SQL and is intended for debugging/development only, and insert a
runtime warning (e.g., via log.Printf or a processLogger) at the start of RunE
right before calling db.RawSQL(args[0]) to remind users this is unsafe for
untrusted input; reference codedbSQLCmd, its RunE function, and the db.RawSQL
call to locate where to add the help text and log.

In `@internal/codedb/index/gitops.go`:
- Around line 14-46: Extract the shared URL normalization logic from
RepoDirFromURL and RepoNameFromURL into a helper (e.g., normalizeRepoURL) that
performs prefix stripping for "https://", "http://", "git://", trims trailing
slashes, removes ".git", and returns an error for empty results; then update
RepoDirFromURL to call normalizeRepoURL and append ".git" to its result, and
update RepoNameFromURL to return normalizeRepoURL's result directly, preserving
existing error handling.
- Around line 75-85: The fetch function compares the sentinel error
git.NoErrAlreadyUpToDate using ==; change this to use errors.Is(err,
git.NoErrAlreadyUpToDate) and update the import block to include "errors" so the
comparison is done properly (modify the fetch function and its error check
referencing git.NoErrAlreadyUpToDate).

In `@internal/codedb/search/executor_test.go`:
- Around line 47-305: Refactor the repeated
openTestStore/seedTestData/ParseQuery/Execute pattern in the tests (e.g.,
TestExecuteCommitSearch, TestExecuteSymbolSearch, TestExecuteWithCount, etc.)
into a single table-driven test that iterates rows describing input query
strings and expected assertions; add rows to cover regex code search,
select:file and select:repo queries, diff queries using rev:, and a case for a
repo with a non-main default ref, and ensure the table-driven test still uses
openTestStore, seedTestData, ParseQuery and Execute for each row so new
regressions are automatically exercised.

In `@internal/codedb/search/query_test.go`:
- Around line 8-349: Convert the many one-off tests in TestTokenize and the
TestParse* functions that call ParseQuery into table-driven tests: create a
single slice of test cases (structs with name, input, expected fields like
SearchPattern, Filters, Type, IsRegex, SearchTerms, HasEmptyPattern, error
assertions) and iterate to run subtests with t.Run; replace the individual
TestParse* functions by adding their cases to that table (keeping references to
tokenize and ParseQuery for the tokenizer-specific cases), and add a new case
for a quoted filter like `message:"fix bug"` verifying Filters.Message (or the
equivalent filter field) and that the quoted value is preserved as a single
filter value; ensure error cases include expected error substrings and
negation/count/regex edge-cases are represented as table entries.

In `@internal/codedb/store/store_test.go`:
- Around line 9-52: Add negative tests to cover Open error paths: create a test
that attempts Open on a directory that cannot be created (e.g., create a parent
dir, chmod it to remove write permission, then call Open on a child path) and
assert Open returns a non-nil error; also add a test that simulates schema
creation failure for Open (for example by creating a read-only metadata.db file
or using a mock that causes Exec/QueryRow used in Open/initialization to return
an error) and assert Open returns the expected error. Reference the Open
function and the initialization steps that create metadata.db and the repos
table when adding these tests. Ensure tests clean up/reset permissions so CI
won’t be left in a bad state.

In `@internal/codedb/store/store.go`:
- Around line 88-95: The equality check against bleve.ErrorIndexPathDoesNotExist
in openOrCreateBleveIndex is brittle; update the error handling to use
errors.Is(err, bleve.ErrorIndexPathDoesNotExist) when deciding to create a new
index, and wrap/return errors with context where appropriate (preserve the
existing bleve.Open and bleve.New calls but replace the direct == check with
errors.Is and add a contextual error wrap on failures).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a79f1a1d-58ac-4b44-b139-00dd7a9f4ba1

📥 Commits

Reviewing files that changed from the base of the PR and between 8983951 and 783a9c3.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (26)

cmd/ox/agent_query.go
cmd/ox/codedb.go
go.mod
internal/codedb/README.md
internal/codedb/codedb.go
internal/codedb/index/gitops.go
internal/codedb/index/gitops_test.go
internal/codedb/index/indexer.go
internal/codedb/language/detect.go
internal/codedb/language/detect_test.go
internal/codedb/search/ast.go
internal/codedb/search/executor.go
internal/codedb/search/executor_test.go
internal/codedb/search/planner.go
internal/codedb/search/planner_test.go
internal/codedb/search/query.go
internal/codedb/search/query_test.go
internal/codedb/search/translate.go
internal/codedb/search/translate_test.go
internal/codedb/store/schema.go
internal/codedb/store/store.go
internal/codedb/store/store_test.go
internal/codedb/symbols/symbols.go
internal/codedb/symbols/symbols_stub.go
internal/paths/paths.go
scripts/bench-codedb.sh

coderabbitai · 2026-03-09T03:06:10Z

+func resolveRevRef(rev string) string {
+	if rev == "" {
+		rev = "main"
+	}
+	if !strings.HasPrefix(rev, "refs/") {
+		rev = "refs/heads/" + rev
+	}
+	return rev


⚠️ Potential issue | 🟠 Major

Don't assume main is the implicit ref.

Normalizing an empty rev: to refs/heads/main makes code and symbol search return no hits for repos whose indexed default branch is master, trunk, etc. Empty rev: should resolve to the repo's actual default/HEAD ref, or skip ref filtering entirely.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/codedb/search/translate.go` around lines 61 - 68, The helper resolveRevRef currently maps an empty rev to "refs/heads/main", which wrongly assumes repos use "main"; change resolveRevRef so an empty rev is not normalized to main—either return the empty string (so callers can skip ref filtering and let the repo's actual default/HEAD be used) or implement a lookup to resolve the repo's real default/HEAD ref via repository metadata before prepending "refs/heads/". Update resolveRevRef and call sites that rely on its output to treat "" as "no ref filter" (or to use the fetched default ref) and only prepend "refs/heads/" when a non-empty branch name is provided.

Intentional default. The alternative — no ref filtering when rev: is omitted — would return results from every branch including stale feature branches and long-deleted experiment branches, which is strictly worse UX for the common case. main matches the overwhelming majority of repositories today. For repos using master or trunk, users can pass rev:master explicitly. A proper fix would be to resolve the repo's HEAD ref at index time and store it in the repos table, then use that as the default — but that's a schema change best done when we have multi-repo search working end-to-end.

@port8080, that's a fair design rationale — defaulting to refs/heads/main is a pragmatic tradeoff that avoids noise from stale branches, and the path forward (storing the HEAD ref in the repos table at index time) is a sensible deferred improvement. I'll note this for future reviews.

( ˵ •̀ ᴗ - ˵ ) ✧

✏️ Learnings added

Learnt from: port8080 Repo: sageox/ox PR: 158 File: internal/codedb/search/translate.go:61-68 Timestamp: 2026-03-09T03:20:33.069Z Learning: In `internal/codedb/search/translate.go`, the `resolveRevRef` helper intentionally defaults an empty `rev:` filter to `"refs/heads/main"`. This is a deliberate UX choice to avoid returning results from stale/deleted branches. For repos using `master` or `trunk`, users pass `rev:master` explicitly. The proper fix — resolving the repo's actual HEAD ref at index time and storing it in the `repos` table — is deferred until multi-repo search is working end-to-end.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

coderabbitai · 2026-03-09T03:06:10Z

+run_searches() {
+  local tool="$1" root="$2" repo="$3"
+  local bin
+  bin=$(get_bin "$tool")
+
+  for qi in "${!SEARCH_QUERIES[@]}"; do
+    local query="${SEARCH_QUERIES[$qi]}"
+    local times=()
+    for ((iter=0; iter<SEARCH_ITERS; iter++)); do
+      local t_start t_end elapsed_ms
+      t_start=$(date +%s%N)
+      set +e
+      if [[ "$tool" == "ox" ]]; then
+        XDG_DATA_HOME="$root/xdg" "$bin" codedb search "$query" >/dev/null 2>&1
+      else
+        "$bin" --root "$root" search "$query" >/dev/null 2>&1
+      fi
+      set -e
+      t_end=$(date +%s%N)
+      elapsed_ms=$(( (t_end - t_start) / 1000000 ))
+      times+=("$elapsed_ms")
+    done
+    IFS=$'\n' sorted=($(sort -n <<<"${times[*]}")); unset IFS
+    local median=${sorted[$(( SEARCH_ITERS / 2 ))]}
+    R_SEARCH["${tool}-${repo}-${qi}"]="$median"


⚠️ Potential issue | 🟠 Major

Do not count failed searches as latency samples.

Line 173 drops the search exit code, so parse errors or unsupported queries get recorded as “fast” searches. That makes the median misleading.

🛑 Suggested fix

run_searches() { local tool="$1" root="$2" repo="$3" - local bin - bin=$(get_bin "$tool") for qi in "${!SEARCH_QUERIES[@]}"; do local query="${SEARCH_QUERIES[$qi]}" local times=() for ((iter=0; iter<SEARCH_ITERS; iter++)); do - local t_start t_end elapsed_ms + local t_start t_end elapsed_ms rc t_start=$(date +%s%N) set +e - if [[ "$tool" == "ox" ]]; then - XDG_DATA_HOME="$root/xdg" "$bin" codedb search "$query" >/dev/null 2>&1 - else - "$bin" --root "$root" search "$query" >/dev/null 2>&1 - fi + run_search_cmd "$tool" "$root" "$query" >/dev/null 2>&1 + rc=$? set -e + if (( rc != 0 )); then + echo "Search failed for ${TOOL_LABELS[$tool]}/${REPO_NAMES[$repo]}: $query" >&2 + return $rc + fi t_end=$(date +%s%N) elapsed_ms=$(( (t_end - t_start) / 1000000 )) times+=("$elapsed_ms") done

🧰 Tools

🪛 Shellcheck (0.11.0)

[warning] 184-184: Prefer mapfile or read -a to split command output (or quote to avoid splitting).

(SC2207)

[warning] 186-186: R_SEARCH appears unused. Verify use (or export if used externally).

(SC2034)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/bench-codedb.sh` around lines 162 - 186, The run_searches function currently ignores the command exit status and always appends elapsed_ms to times, causing failed searches to skew latency; update run_searches to capture the command's exit code (e.g., check $? immediately after the search command), and only append elapsed_ms to the times array when the exit code is zero; for non-zero exits, increment a failure counter or skip adding a sample and ensure the median computation (the sorted/times logic and R_SEARCH assignment) handles the case of zero successful samples (e.g., record a sentinel like "FAILED" or leave R_SEARCH unset) to avoid indexing errors.

Fair point, but the benchmark is a developer tool for comparative analysis, not production code. A failed search command would produce an abnormally fast time (sub-millisecond) that stands out immediately in the results table — it's self-evident, not silent. Adding error handling here adds control flow complexity to a script whose purpose is quick iteration on perf comparisons. If a query fails for one tool but succeeds for another, that itself is useful signal.

@port8080, that's a fair and pragmatic call. For a local dev benchmarking script where the audience is the developer running it, a suspiciously fast outlier in the results table is indeed self-documenting — the visual signal is enough. The differential-failure-as-signal point is also well-taken; knowing that tool A can't handle a query that tool B handles fine is meaningful benchmark data in itself. Happy to defer to your judgment here.

coderabbitai · 2026-03-09T03:06:10Z

+  # Full index
+  echo "    Indexing (full)..."
+  local logfile="$LOG_DIR/${tool}-${repo}-index.log"
+  local bin
+  bin=$(get_bin "$tool")
+  local index_cmd
+  if [[ "$tool" == "ox" ]]; then
+    index_cmd="XDG_DATA_HOME='$root/xdg' '$bin' codedb index '$url'"
+  else
+    index_cmd="'$bin' --root '$root' index '$url'"
+  fi
+  set +e
+  run_timed "$index_cmd" > "$logfile" 2>&1
+  set -e
+  R_INDEX_TIME["${tool}-${repo}"]="$_wall_seconds"
+  R_PEAK_RSS["${tool}-${repo}"]="$_peak_rss_mb"
+  echo "    Index: ${_wall_seconds}s, RSS: ${_peak_rss_mb}MB"
+
+  # For ox, the actual data is in xdg/sageox/codedb
+  local data_root="$root"
+  if [[ "$tool" == "ox" ]]; then
+    data_root="$root/xdg/sageox/codedb"
+  fi
+
+  # Sizes
+  measure_sizes "$tool" "$data_root"
+  R_SQLITE_MB["${tool}-${repo}"]="$_sqlite_mb"
+  R_FTS_MB["${tool}-${repo}"]="$_fts_mb"
+  R_GIT_MB["${tool}-${repo}"]="$_git_mb"
+  R_TOTAL_MB["${tool}-${repo}"]="$_total_mb"
+
+  # DB stats
+  echo "    Collecting DB stats..."
+  measure_db_stats "$tool" "$data_root" "$root"
+  R_COMMITS["${tool}-${repo}"]="$_commits"
+  R_BLOBS["${tool}-${repo}"]="$_blobs"
+  R_SYMBOLS["${tool}-${repo}"]="$_symbols"
+  R_SYMBOL_REFS["${tool}-${repo}"]="$_symbol_refs"
+  R_FILE_REVS["${tool}-${repo}"]="$_file_revs"
+
+  # Re-index
+  echo "    Re-indexing (incremental)..."
+  local reindex_log="$LOG_DIR/${tool}-${repo}-reindex.log"
+  set +e
+  run_timed "$index_cmd" > "$reindex_log" 2>&1
+  set -e
+  R_REINDEX_TIME["${tool}-${repo}"]="$_wall_seconds"


⚠️ Potential issue | 🟠 Major

Fail the run when indexing or re-indexing fails.

Line 214 and Line 246 suppress set -e, but the status from run_timed is ignored. That lets a failed full index still record timings, and a failed incremental index still fall through to the search phase against stale data.

🛑 Suggested fix

local logfile="$LOG_DIR/${tool}-${repo}-index.log" local bin bin=$(get_bin "$tool") local index_cmd + local rc if [[ "$tool" == "ox" ]]; then index_cmd="XDG_DATA_HOME='$root/xdg' '$bin' codedb index '$url'" else index_cmd="'$bin' --root '$root' index '$url'" fi set +e run_timed "$index_cmd" > "$logfile" 2>&1 + rc=$? set -e + if (( rc != 0 )); then + echo " Full index failed for $label; see $logfile" >&2 + return $rc + fi R_INDEX_TIME["${tool}-${repo}"]="$_wall_seconds" R_PEAK_RSS["${tool}-${repo}"]="$_peak_rss_mb" echo " Index: ${_wall_seconds}s, RSS: ${_peak_rss_mb}MB" @@ local reindex_log="$LOG_DIR/${tool}-${repo}-reindex.log" set +e run_timed "$index_cmd" > "$reindex_log" 2>&1 + rc=$? set -e + if (( rc != 0 )); then + echo " Re-index failed for $label; see $reindex_log" >&2 + return $rc + fi R_REINDEX_TIME["${tool}-${repo}"]="$_wall_seconds" echo " Re-index: ${_wall_seconds}s"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Full index

echo " Indexing (full)..."

local logfile="$LOG_DIR/${tool}-${repo}-index.log"

local bin

bin=$(get_bin "$tool")

local index_cmd

if [[ "$tool" == "ox" ]]; then

index_cmd="XDG_DATA_HOME='$root/xdg' '$bin' codedb index '$url'"

else

index_cmd="'$bin' --root '$root' index '$url'"

fi

set +e

run_timed "$index_cmd" > "$logfile" 2>&1

set -e

R_INDEX_TIME["${tool}-${repo}"]="$_wall_seconds"

R_PEAK_RSS["${tool}-${repo}"]="$_peak_rss_mb"

echo " Index: ${_wall_seconds}s, RSS: ${_peak_rss_mb}MB"

# For ox, the actual data is in xdg/sageox/codedb

local data_root="$root"

if [[ "$tool" == "ox" ]]; then

data_root="$root/xdg/sageox/codedb"

fi

# Sizes

measure_sizes "$tool" "$data_root"

R_SQLITE_MB["${tool}-${repo}"]="$_sqlite_mb"

R_FTS_MB["${tool}-${repo}"]="$_fts_mb"

R_GIT_MB["${tool}-${repo}"]="$_git_mb"

R_TOTAL_MB["${tool}-${repo}"]="$_total_mb"

# DB stats

echo " Collecting DB stats..."

measure_db_stats "$tool" "$data_root" "$root"

R_COMMITS["${tool}-${repo}"]="$_commits"

R_BLOBS["${tool}-${repo}"]="$_blobs"

R_SYMBOLS["${tool}-${repo}"]="$_symbols"

R_SYMBOL_REFS["${tool}-${repo}"]="$_symbol_refs"

R_FILE_REVS["${tool}-${repo}"]="$_file_revs"

# Re-index

echo " Re-indexing (incremental)..."

local reindex_log="$LOG_DIR/${tool}-${repo}-reindex.log"

set +e

run_timed "$index_cmd" > "$reindex_log" 2>&1

set -e

R_REINDEX_TIME["${tool}-${repo}"]="$_wall_seconds"

# Full index

echo " Indexing (full)..."

local logfile="$LOG_DIR/${tool}-${repo}-index.log"

local bin

bin=$(get_bin "$tool")

local index_cmd

local rc

if [[ "$tool" == "ox" ]]; then

index_cmd="XDG_DATA_HOME='$root/xdg' '$bin' codedb index '$url'"

else

index_cmd="'$bin' --root '$root' index '$url'"

fi

set +e

run_timed "$index_cmd" > "$logfile" 2>&1

rc=$?

set -e

if (( rc != 0 )); then

echo " Full index failed for $label; see $logfile" >&2

return $rc

fi

R_INDEX_TIME["${tool}-${repo}"]="$_wall_seconds"

R_PEAK_RSS["${tool}-${repo}"]="$_peak_rss_mb"

echo " Index: ${_wall_seconds}s, RSS: ${_peak_rss_mb}MB"

# For ox, the actual data is in xdg/sageox/codedb

local data_root="$root"

if [[ "$tool" == "ox" ]]; then

data_root="$root/xdg/sageox/codedb"

fi

# Sizes

measure_sizes "$tool" "$data_root"

R_SQLITE_MB["${tool}-${repo}"]="$_sqlite_mb"

R_FTS_MB["${tool}-${repo}"]="$_fts_mb"

R_GIT_MB["${tool}-${repo}"]="$_git_mb"

R_TOTAL_MB["${tool}-${repo}"]="$_total_mb"

# DB stats

echo " Collecting DB stats..."

measure_db_stats "$tool" "$data_root" "$root"

R_COMMITS["${tool}-${repo}"]="$_commits"

R_BLOBS["${tool}-${repo}"]="$_blobs"

R_SYMBOLS["${tool}-${repo}"]="$_symbols"

R_SYMBOL_REFS["${tool}-${repo}"]="$_symbol_refs"

R_FILE_REVS["${tool}-${repo}"]="$_file_revs"

# Re-index

echo " Re-indexing (incremental)..."

local reindex_log="$LOG_DIR/${tool}-${repo}-reindex.log"

set +e

run_timed "$index_cmd" > "$reindex_log" 2>&1

rc=$?

set -e

if (( rc != 0 )); then

echo " Re-index failed for $label; see $reindex_log" >&2

return $rc

fi

R_REINDEX_TIME["${tool}-${repo}"]="$_wall_seconds"

🧰 Tools

🪛 Shellcheck (0.11.0)

[warning] 216-216: R_INDEX_TIME appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 217-217: R_PEAK_RSS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 228-228: R_SQLITE_MB appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 229-229: R_FTS_MB appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 230-230: R_GIT_MB appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 231-231: R_TOTAL_MB appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 236-236: R_COMMITS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 237-237: R_BLOBS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 238-238: R_SYMBOLS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 239-239: R_SYMBOL_REFS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 240-240: R_FILE_REVS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 248-248: R_REINDEX_TIME appears unused. Verify use (or export if used externally).

(SC2034)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/bench-codedb.sh` around lines 202 - 248, The script temporarily disables errexit around the full and incremental indexing calls (set +e) which lets run_timed (called with index_cmd) fail silently; restore failure handling by checking run_timed's exit status immediately after each invocation: after run_timed "$index_cmd" > "$logfile" 2>&1 and after run_timed "$index_cmd" > "$reindex_log" 2>&1 capture its return ($?) and if non-zero write an error message (including logfile/reindex_log and tool/repo context) and exit non-zero, or simply re-enable errexit and call run_timed under set -e so failures abort before recording R_INDEX_TIME/R_REINDEX_TIME and continuing to search; reference the run_timed invocations, index_cmd variable, reindex_log variable, and the R_INDEX_TIME/R_REINDEX_TIME assignments when adding the checks.

Same reasoning as the search error handling above — the benchmark script is a developer tool, and a failed index produces obviously wrong data (0s timing, missing DB stats) that's immediately visible in the summary table. The script already prints the log file path, so debugging is straightforward. Adding early-return error handling would prevent collecting partial results that are still useful for comparison (e.g., if only one tool fails, you still want the other two columns).

@port8080, that's a fair point. For a multi-tool comparison benchmark, continuing past a single tool's failure to preserve the other tools' results is a reasonable trade-off — especially since the zeroed-out timings and missing DB stats make the failure self-evident in the summary output. I'll defer to your judgment here.

coderabbitai · 2026-03-09T03:06:10Z

+if [[ $# -gt 0 ]]; then
+  REPOS=("$@")
+fi


⚠️ Potential issue | 🟡 Minor

Validate repo keys before overwriting REPOS.

A typo here reaches Line 194 and Line 195 and then exits via set -u with an unhelpful associative-array error. Check the keys up front and print the supported values instead.

🧭 Suggested fix

if [[ $# -gt 0 ]]; then + for repo in "$@"; do + if [[ -z "${REPO_URLS[$repo]+x}" ]]; then + echo "Unknown repo key: $repo" >&2 + echo "Available repos: ${!REPO_URLS[*]}" >&2 + exit 2 + fi + done REPOS=("$@") fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/bench-codedb.sh` around lines 342 - 344, The current assignment to REPOS when args are passed can overwrite with invalid keys and later trigger an unhelpful associative-array error under set -u; before setting REPOS in the if block, validate each "$@" against the known list (e.g. SUPPORTED_REPOS or SUPPORTED_KEYS) and if any key is invalid print a clear error showing the supported values and exit non‑zero; update the if [[ $# -gt 0 ]] block to loop/validate args, only assign REPOS=("$@") when all keys are valid, and ensure the error path explains the allowed repository keys.

The REPOS array is hardcoded with known-valid keys that match REPO_URLS and REPO_NAMES. The CLI override (bench.sh tokio) is a convenience for developers who know what they're doing — adding validation would protect against a mistake that's immediately caught by the first ${REPO_URLS[$repo]} expansion producing an empty URL. Not worth the extra code for a dev-only script.

@port8080 Fair point — for a developer-facing benchmark script, the natural failure from an empty URL expansion is sufficient feedback, and the extra validation boilerplate isn't worth the added noise. I'll defer to your judgement here.

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (1)

internal/codedb/search/executor.go (1)
127-130: ⚠️ Potential issue | 🟠 Major

Page through Bleve hits until you actually collect plan.Limit results.

This still fetches a fixed plan.Limit*5 window and only applies repo/file/lang filtering during enrichment. Restrictive filters can therefore return fewer results than requested even when more qualifying hits exist below that window. This was flagged earlier and remains unresolved.

Also applies to: 139-160
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/codedb/search/executor.go` around lines 127 - 130, The current Bleve
search only requests a fixed window (bleve.NewSearchRequestOptions called with
plan.Limit*5) which can miss qualifying hits after filters are applied; change
the logic in the search loop that builds and executes searchReq and calls
idx.Search so it pages through results (incrementing the From/offset and/or
increasing size) and repeatedly processes searchResult.Hits until you have
collected plan.Limit fully-filtered/enriched results or there are no more hits;
ensure the filtering/enrichment applied after retrieving hits (the same code
paths used in the block around searchReq, idx.Search, and where
searchResult.Hits are iterated) are used to decide whether to stop paging, and
stop once plan.Limit valid results are collected.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/ox/agent_query.go`:
- Around line 162-175: The queryCodeDB function currently ignores qa.limit
because it calls db.Search(context.Background(), qa.query) with only the raw
string; update queryCodeDB to respect qa.limit by either (a) injecting a count
(e.g., "count:<n>") into the query string before calling db.Search when
qa.limit>0, or (b) keeping the existing db.Search call and then slicing the
returned []search.Result to max qa.limit before returning; modify queryCodeDB
(and use qa.limit and db.Search) so the returned slice of search.Result is
constrained to qa.limit.
- Around line 185-188: runAgentQuery currently calls findProjectRoot()
unconditionally which aborts before handling qa.source == "code" or the "all"
fallback; change the control flow so findProjectRoot() is only required for
sources that need a project repo. Specifically, in runAgentQuery (where
projectRoot, err := findProjectRoot() is called), check qa.source (or the flag
that indicates source) first and: if qa.source == "code" (or the request can
fall back to code), skip requiring findProjectRoot and proceed to call
queryCodeDB(); otherwise call findProjectRoot() and return an error if it fails.
Also, if qa.source == "all", allow findProjectRoot failure to be ignored so you
can still run queryCodeDB() as a fallback. Ensure you reference runAgentQuery,
findProjectRoot, queryCodeDB, and the qa.source variable when applying the
change.

In `@internal/codedb/index/indexer.go`:
- Around line 499-505: The Bleve indexing is being flushed mid-SQL transaction
(see generateDiffText -> st.diffBatch.Index and calls to flushDiffBatch(false)
and flushCodeBatch(false)), which can leave orphaned Bleve docs if tx.Rollback()
runs; modify the flow so that these flush* calls do not publish to Bleve until
after tx.Commit(): buffer the doc IDs/content (e.g., keep st.diffBatch and
st.codeBatch in-memory and remove any immediate st.*.Index/flush calls), then
after a successful tx.Commit() call the existing
flushDiffBatch(true)/flushCodeBatch(true) or a new commit-flush path to write to
Bleve; alternatively, if you prefer rollback cleanup, record the indexed
document keys when calling st.diffBatch.Index/st.codeBatch.Index and ensure
tx.Rollback() triggers a cleanup routine that deletes those Bleve doc IDs (use
the same identifiers like "diff_%d" and the code doc keys) to keep SQL and Bleve
atomic.
- Around line 585-597: ensureBlob currently stores path-derived language on the
deduped blobs row, which is wrong because blobs are keyed by content_hash; stop
persisting language in the blobs table (remove language from the INSERT in
ensureBlob and from any reads of blobs.language). Instead persist language on a
path-scoped row such as file_revs (or the existing file_revs table) when
creating/updating a file revision: call language.Detect(path) in the
file-revision creation flow and store that value on file_revs.language. Update
any consumers (e.g., ParseSymbols, search executor code that filters on
blobs.language) to read the language from file_revs (or join blobs -> file_revs)
rather than blobs.language, and remove/clean up blobs.language usage from
indexState.ensureBlob and related blob access paths.

In `@internal/codedb/search/executor.go`:
- Around line 317-333: The case-insensitive GLOB path currently lowercases the
column in likeOrGlob (producing "lower(column) GLOB ?") but likeOrGlobParam
returns the original pattern, so wildcard patterns won't match; update
likeOrGlobParam (or add a second param) so when the pattern contains wildcards
and the caller requested case-insensitive matching it returns
strings.ToLower(pattern) (while preserving the existing behavior of returning
"%"+pattern+"%" for non-wildcards), and update call sites that invoke
likeOrGlobParam accordingly so the bound parameter matches the lower(column)
GLOB ? usage.

---

Duplicate comments:
In `@internal/codedb/search/executor.go`:
- Around line 127-130: The current Bleve search only requests a fixed window
(bleve.NewSearchRequestOptions called with plan.Limit*5) which can miss
qualifying hits after filters are applied; change the logic in the search loop
that builds and executes searchReq and calls idx.Search so it pages through
results (incrementing the From/offset and/or increasing size) and repeatedly
processes searchResult.Hits until you have collected plan.Limit
fully-filtered/enriched results or there are no more hits; ensure the
filtering/enrichment applied after retrieving hits (the same code paths used in
the block around searchReq, idx.Search, and where searchResult.Hits are
iterated) are used to decide whether to stop paging, and stop once plan.Limit
valid results are collected.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e351f4b5-faeb-49fa-90ba-408a9a3444ac

📥 Commits

Reviewing files that changed from the base of the PR and between 783a9c3 and 106e2dd.

📒 Files selected for processing (6)

cmd/ox/agent_query.go
internal/codedb/README.md
internal/codedb/index/gitops_test.go
internal/codedb/index/indexer.go
internal/codedb/search/executor.go
internal/codedb/store/store.go

🚧 Files skipped from review as they are similar to previous changes (3)

internal/codedb/index/gitops_test.go
internal/codedb/README.md
internal/codedb/store/store.go

coderabbitai · 2026-03-09T03:22:52Z

+func queryCodeDB(qa *queryArgs) ([]search.Result, error) {
+	dataDir := paths.CodeDBDataDir()
+	if _, err := os.Stat(dataDir); os.IsNotExist(err) {
+		return nil, nil // no index yet, return empty
+	}

-	// Execute query
-	resp, err := client.Query(req)
+	db, err := codedb.Open(dataDir)
 	if err != nil {
-		return fmt.Errorf("query failed: %w", err)
+		return nil, fmt.Errorf("open codedb: %w", err)
+	}
+	defer db.Close()
+
+	return db.Search(context.Background(), qa.query)
+}


⚠️ Potential issue | 🟠 Major

--limit is ignored on the CodeDB path.

db.Search(context.Background(), qa.query) only receives the raw query string, so qa.limit never constrains local code results unless the user manually adds count: to the query. Either slice the returned results or inject the count into the CodeDB query so --source code matches the documented CLI behavior.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/ox/agent_query.go` around lines 162 - 175, The queryCodeDB function currently ignores qa.limit because it calls db.Search(context.Background(), qa.query) with only the raw string; update queryCodeDB to respect qa.limit by either (a) injecting a count (e.g., "count:<n>") into the query string before calling db.Search when qa.limit>0, or (b) keeping the existing db.Search call and then slicing the returned []search.Result to max qa.limit before returning; modify queryCodeDB (and use qa.limit and db.Search) so the returned slice of search.Result is constrained to qa.limit.

coderabbitai · 2026-03-09T03:22:52Z

+	projectRoot, err := findProjectRoot()
+	if err != nil {
+		return fmt.Errorf("could not find project root: %w", err)
+	}


⚠️ Potential issue | 🟠 Major

Don't require a project root for --source code.

queryCodeDB() only reads the global CodeDB data dir, but runAgentQuery() exits on findProjectRoot() before it even looks at qa.source. That makes ox agent ... query --source code fail outside an initialized repo and also prevents the "all" path from falling back to code results when team-context setup is missing.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/ox/agent_query.go` around lines 185 - 188, runAgentQuery currently calls findProjectRoot() unconditionally which aborts before handling qa.source == "code" or the "all" fallback; change the control flow so findProjectRoot() is only required for sources that need a project repo. Specifically, in runAgentQuery (where projectRoot, err := findProjectRoot() is called), check qa.source (or the flag that indicates source) first and: if qa.source == "code" (or the request can fall back to code), skip requiring findProjectRoot and proceed to call queryCodeDB(); otherwise call findProjectRoot() and return an error if it fails. Also, if qa.source == "all", allow findProjectRoot failure to be ignored so you can still run queryCodeDB() as a fallback. Ensure you reference runAgentQuery, findProjectRoot, queryCodeDB, and the qa.source variable when applying the change.

coderabbitai · 2026-03-09T03:22:52Z

+	diffText := generateDiffText(st.repo, path, oldOID, newOID, hasOld, hasNew)
+	if diffText != "" {
+		st.diffBatch.Index(fmt.Sprintf("diff_%d", diffDBID), BleveDiffDoc{Content: diffText})
+		st.diffBatchN++
+		if err := st.flushDiffBatch(false); err != nil {
+			return err
+		}


⚠️ Potential issue | 🟠 Major

Keep Bleve writes atomic with the SQL transaction.

flushDiffBatch(false) and flushCodeBatch(false) can publish documents before the SQL transaction commits at Lines 191-193. If any later DB write fails or the context is canceled, tx.Rollback() removes the SQL rows but the Bleve docs stay behind, leaving hits that enrichment can no longer resolve. Buffer Bleve writes until after a successful tx.Commit(), or add rollback cleanup for the indexed document IDs.

Also applies to: 621-625

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/codedb/index/indexer.go` around lines 499 - 505, The Bleve indexing is being flushed mid-SQL transaction (see generateDiffText -> st.diffBatch.Index and calls to flushDiffBatch(false) and flushCodeBatch(false)), which can leave orphaned Bleve docs if tx.Rollback() runs; modify the flow so that these flush* calls do not publish to Bleve until after tx.Commit(): buffer the doc IDs/content (e.g., keep st.diffBatch and st.codeBatch in-memory and remove any immediate st.*.Index/flush calls), then after a successful tx.Commit() call the existing flushDiffBatch(true)/flushCodeBatch(true) or a new commit-flush path to write to Bleve; alternatively, if you prefer rollback cleanup, record the indexed document keys when calling st.diffBatch.Index/st.codeBatch.Index and ensure tx.Rollback() triggers a cleanup routine that deletes those Bleve doc IDs (use the same identifiers like "diff_%d" and the code doc keys) to keep SQL and Bleve atomic.

coderabbitai · 2026-03-09T03:22:52Z

+func (st *indexState) ensureBlob(blobOID plumbing.Hash, path string) (int64, bool, error) {
+	contentHash := blobOID.String()
+	lang := language.Detect(path)
+
+	var langPtr *string
+	if lang != "" {
+		langPtr = &lang
+	}
+
+	_, err := st.tx.Exec(
+		"INSERT OR IGNORE INTO blobs (content_hash, language) VALUES (?, ?)",
+		contentHash, langPtr,
+	)


⚠️ Potential issue | 🟠 Major

Don't persist language on a content-deduped blob row.

language.Detect(path) is path-dependent, but blobs are deduped by content_hash. The first filename that inserts a blob wins forever, so the same content reused under a different extension will be classified with the wrong language. That breaks lang: filtering in internal/codedb/search/executor.go and can also exclude blobs from ParseSymbols, which filters on blobs.language. Move language to file_revs (or another path-scoped table) instead of storing it on the deduped blob record.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/codedb/index/indexer.go` around lines 585 - 597, ensureBlob currently stores path-derived language on the deduped blobs row, which is wrong because blobs are keyed by content_hash; stop persisting language in the blobs table (remove language from the INSERT in ensureBlob and from any reads of blobs.language). Instead persist language on a path-scoped row such as file_revs (or the existing file_revs table) when creating/updating a file revision: call language.Detect(path) in the file-revision creation flow and store that value on file_revs.language. Update any consumers (e.g., ParseSymbols, search executor code that filters on blobs.language) to read the language from file_revs (or join blobs -> file_revs) rather than blobs.language, and remove/clean up blobs.language usage from indexState.ensureBlob and related blob access paths.

coderabbitai · 2026-03-09T03:22:52Z

+func likeOrGlob(column, pattern string, caseSensitive bool) string {
+	if strings.ContainsAny(pattern, "*?") {
+		if caseSensitive {
+			return column + " GLOB ?"
+		}
+		return "lower(" + column + ") GLOB ?"
+	}
+	return column + " LIKE ?"
+}
+
+// likeOrGlobParam returns the appropriate parameter value for likeOrGlob.
+func likeOrGlobParam(pattern string) string {
+	if strings.ContainsAny(pattern, "*?") {
+		return pattern
+	}
+	return "%" + pattern + "%"
+}


⚠️ Potential issue | 🟡 Minor

Lowercase wildcard params when emulating case-insensitive GLOB.

For wildcard patterns you generate lower(column) GLOB ?, but likeOrGlobParam() returns the original pattern unchanged. A query like repo:Foo* becomes lower(rp.name) GLOB 'Foo*', which will miss foobar. Lower the wildcard pattern before binding it, or stop wrapping the column in lower().

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/codedb/search/executor.go` around lines 317 - 333, The case-insensitive GLOB path currently lowercases the column in likeOrGlob (producing "lower(column) GLOB ?") but likeOrGlobParam returns the original pattern, so wildcard patterns won't match; update likeOrGlobParam (or add a second param) so when the pattern contains wildcards and the caller requested case-insensitive matching it returns strings.ToLower(pattern) (while preserving the existing behavior of returning "%"+pattern+"%" for non-wildcards), and update call sites that invoke likeOrGlobParam accordingly so the bound parameter matches the lower(column) GLOB ? usage.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (3)

internal/codedb/bench.sh (1)

114-116: Use mapfile to safely capture sorted output.

The current array assignment can break with word splitting edge cases. Using mapfile is the safer idiom here and addresses the shellcheck SC2207 warning.

♻️ Proposed fix

     # Sort and take median
-    IFS=$'\n' sorted=($(sort -n <<<"${times[*]}")); unset IFS
+    mapfile -t sorted < <(printf '%s\n' "${times[@]}" | sort -n)
     local median=${sorted[$(( SEARCH_ITERS / 2 ))]}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/codedb/bench.sh` around lines 114 - 116, The current sorting
assignment uses word-splitting-prone array expansion for sorted and IFS
manipulation; replace it with mapfile to safely read the sorted lines into an
array. Specifically, pipe the elements of the times array (e.g., via printf
"%s\n" "${times[@]}") into sort -n and use mapfile -t to populate sorted, then
compute local median using sorted[$(( SEARCH_ITERS / 2 ))]; remove the IFS/unset
IFS idiom and the direct array assignment that caused SC2207.

internal/codedb/index/indexer.go (1)

438-461: Consider using a valid-length synthetic hash.

The synthetic worktree hash is 44 characters, while Git SHA-1 hashes are 40 hex characters. This inconsistency could cause issues if:

The database column has a length constraint

Code elsewhere validates hash format/length

Future tooling expects consistent hash lengths
♻️ Proposed fix to use 40-character synthetic hash
 // ensureWorktreeRef creates a synthetic commit record to represent the working tree state.
 func ensureWorktreeRef(st *indexState) (int64, error) {
-	const worktreeHash = "0000000000000000000000000000000000worktree"
+	const worktreeHash = "0000000000000000000000000worktree" // 40 chars, clearly synthetic
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/codedb/index/indexer.go` around lines 438 - 461, The synthetic
worktree hash in ensureWorktreeRef is 44 chars and should be a valid
40-character SHA-1-length value; replace the worktreeHash constant with a
40-character hex string (e.g. "0000000000000000000000000000000000000000") and
keep using that constant throughout ensureWorktreeRef (and any other places that
reference worktreeHash) so DB length/format checks and other hash validations
remain consistent.

cmd/ox/codedb.go (1)

49-65: Use cmd.Context() for index/search work instead of context.Background().

These handlers should respect upstream Cobra context (if cancellation is wired at the root level or in future). Replace context.Background() with cmd.Context() at lines 49, 59, 65, and 92 to allow cancellation to propagate through the command tree.

Suggested fix

-			if err := db.IndexRepo(context.Background(), url, opts); err != nil {
+			if err := db.IndexRepo(cmd.Context(), url, opts); err != nil {
 				return fmt.Errorf("index: %w", err)
 			}
@@
-			if err := db.IndexLocalRepo(context.Background(), root, opts); err != nil {
+			if err := db.IndexLocalRepo(cmd.Context(), root, opts); err != nil {
 				return fmt.Errorf("index local: %w", err)
 			}
@@
-		stats, err := db.ParseSymbols(context.Background(), func(msg string) {
+		stats, err := db.ParseSymbols(cmd.Context(), func(msg string) {
 			fmt.Fprintf(os.Stderr, "  %s\n", msg)
 		})
@@
-		results, err := db.Search(context.Background(), query)
+		results, err := db.Search(cmd.Context(), query)

If you apply this, the unused context import at line 4 can be removed.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cmd/ox/codedb.go` around lines 49 - 65, Replace uses of context.Background()
passed to db methods with the Cobra command context by calling cmd.Context() so
the operations respect cancellation: change the context arguments for
db.IndexRepo, db.IndexLocalRepo, and db.ParseSymbols to use cmd.Context(); then
remove the now-unused context import. This ensures IndexRepo, IndexLocalRepo,
and ParseSymbols receive the command's cancellable context.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/ox/codedb.go`:
- Around line 121-125: Replace the manual tab-joins for printing SQL results
with a proper TSV writer: create a csv.NewWriter(os.Stdout), set writer.Comma =
'\t', use writer.Write(cols) for the header and writer.Write(row) for each row,
call writer.Flush() and check writer.Error(); also add the "encoding/csv"
import. This ensures fields containing tabs/newlines are quoted/escaped
correctly instead of being corrupted by strings.Join(columns, "\t").
- Around line 28-31: The CodeDB root is created with mode 0o755 which is too
permissive; change the creation to use 0o700 and ensure existing directories are
tightened by explicitly setting permissions after creation. In the block that
calls paths.CodeDBDataDir() and os.MkdirAll (around dataDir), use 0o700 when
creating and then call os.Chmod(dataDir, 0o700) (or equivalent) to enforce
private permissions even if the directory already existed, matching the
expectation in internal/codedb/store/store.go.

In `@internal/codedb/index/indexer.go`:
- Around line 1032-1046: After the for rows.Next() loops, add checks for
rows.Err() (and similarly repoRows.Err() after iterating repoRows) to surface
any iteration errors: after closing rows (and repoRows) call if err :=
rows.Err(); err != nil { return stats, fmt.Errorf("rows iteration error: %w",
err) } (and for repoRows use repoRows.Err() with an appropriate error message).
Update the blob iteration block around the blobRow scan loop and the repoRows
loop (the variables rows, repoRows and types blobRow/repoRow) to perform these
Err() checks before proceeding to use blobs or repos.
- Around line 398-428: The code currently swallows DB errors in the blob insert,
blob SELECT, and file_revs INSERT (calls to st.tx.Exec and st.tx.QueryRow) and
also ignores the error returned by ensureWorktreeRef; update the error handling
in the block around st.tx.Exec("INSERT OR IGNORE INTO blobs..."),
st.tx.QueryRow("SELECT id FROM blobs...").Scan(&blobDBID),
ensureWorktreeRef(st), and the final st.tx.Exec("INSERT OR REPLACE INTO
file_revs...") so that instead of returning nil you either log a warning with
context (include contentHash, relPath and returned error) and continue, or
propagate the error up the call chain; ensure you also handle and surface errors
from ensureWorktreeRef and from the file_revs Exec (do not discard with "_, _
="); keep the existing resilience behavior (skip file on error) only if you log
the failure at warn/debug level or increment/return a skip count so callers can
detect skipped files.

---

Nitpick comments:
In `@cmd/ox/codedb.go`:
- Around line 49-65: Replace uses of context.Background() passed to db methods
with the Cobra command context by calling cmd.Context() so the operations
respect cancellation: change the context arguments for db.IndexRepo,
db.IndexLocalRepo, and db.ParseSymbols to use cmd.Context(); then remove the
now-unused context import. This ensures IndexRepo, IndexLocalRepo, and
ParseSymbols receive the command's cancellable context.

In `@internal/codedb/bench.sh`:
- Around line 114-116: The current sorting assignment uses word-splitting-prone
array expansion for sorted and IFS manipulation; replace it with mapfile to
safely read the sorted lines into an array. Specifically, pipe the elements of
the times array (e.g., via printf "%s\n" "${times[@]}") into sort -n and use
mapfile -t to populate sorted, then compute local median using sorted[$((
SEARCH_ITERS / 2 ))]; remove the IFS/unset IFS idiom and the direct array
assignment that caused SC2207.

In `@internal/codedb/index/indexer.go`:
- Around line 438-461: The synthetic worktree hash in ensureWorktreeRef is 44
chars and should be a valid 40-character SHA-1-length value; replace the
worktreeHash constant with a 40-character hex string (e.g.
"0000000000000000000000000000000000000000") and keep using that constant
throughout ensureWorktreeRef (and any other places that reference worktreeHash)
so DB length/format checks and other hash validations remain consistent.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2b38dc80-0858-4c76-bfa8-4444e7b2fdde

📥 Commits

Reviewing files that changed from the base of the PR and between 106e2dd and ec86487.

📒 Files selected for processing (5)

cmd/ox/codedb.go
internal/codedb/README.md
internal/codedb/bench.sh
internal/codedb/codedb.go
internal/codedb/index/indexer.go

🚧 Files skipped from review as they are similar to previous changes (2)

internal/codedb/README.md
internal/codedb/codedb.go

coderabbitai · 2026-03-09T16:54:53Z

+		dataDir := paths.CodeDBDataDir()
+		if err := os.MkdirAll(dataDir, 0o755); err != nil {
+			return fmt.Errorf("create codedb dir: %w", err)
+		}


⚠️ Potential issue | 🟠 Major

Keep the CodeDB root private on first creation.

Line 29 creates dataDir with 0o755, but internal/codedb/store/store.go:23-35 intentionally creates the store root with 0o700. Because os.MkdirAll does not tighten an existing directory, the first run leaves the CodeDB root broader than the storage layer expects, which exposes repo/index metadata to other local users.

Suggested fix

dataDir := paths.CodeDBDataDir() - if err := os.MkdirAll(dataDir, 0o755); err != nil { - return fmt.Errorf("create codedb dir: %w", err) - } - db, err := codedb.Open(dataDir) if err != nil { return fmt.Errorf("open codedb: %w", err) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

dataDir := paths.CodeDBDataDir()

if err := os.MkdirAll(dataDir, 0o755); err != nil {

return fmt.Errorf("create codedb dir: %w", err)

}

dataDir := paths.CodeDBDataDir()

db, err := codedb.Open(dataDir)

if err != nil {

return fmt.Errorf("open codedb: %w", err)

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/ox/codedb.go` around lines 28 - 31, The CodeDB root is created with mode 0o755 which is too permissive; change the creation to use 0o700 and ensure existing directories are tightened by explicitly setting permissions after creation. In the block that calls paths.CodeDBDataDir() and os.MkdirAll (around dataDir), use 0o700 when creating and then call os.Chmod(dataDir, 0o700) (or equivalent) to enforce private permissions even if the directory already existed, matching the expectation in internal/codedb/store/store.go.

coderabbitai · 2026-03-09T16:54:54Z

+		// Print as TSV
+		fmt.Println(strings.Join(cols, "\t"))
+		for _, row := range rows {
+			fmt.Println(strings.Join(row, "\t"))
+		}


⚠️ Potential issue | 🟠 Major

Write SQL output with a real TSV writer.

Joining columns with "\t" corrupts output as soon as a cell contains tabs or newlines, which is plausible for code, diffs, and blob content returned by raw SQL. Use encoding/csv.Writer with Comma = '\t' so fields are quoted correctly.

Suggested fix

- // Print as TSV - fmt.Println(strings.Join(cols, "\t")) - for _, row := range rows { - fmt.Println(strings.Join(row, "\t")) - } + w := csv.NewWriter(os.Stdout) + w.Comma = '\t' + if err := w.Write(cols); err != nil { + return fmt.Errorf("write header: %w", err) + } + for _, row := range rows { + if err := w.Write(row); err != nil { + return fmt.Errorf("write row: %w", err) + } + } + w.Flush() + if err := w.Error(); err != nil { + return fmt.Errorf("flush rows: %w", err) + }

Also add this import:

"encoding/csv"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/ox/codedb.go` around lines 121 - 125, Replace the manual tab-joins for printing SQL results with a proper TSV writer: create a csv.NewWriter(os.Stdout), set writer.Comma = '\t', use writer.Write(cols) for the header and writer.Write(row) for each row, call writer.Flush() and check writer.Error(); also add the "encoding/csv" import. This ensures fields containing tabs/newlines are quoted/escaped correctly instead of being corrupted by strings.Join(columns, "\t").

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (3)

internal/codedb/store/store.go (1)
32-36: ⚠️ Potential issue | 🟠 Major

Tighten existing directories, not just newly created ones.

os.MkdirAll(dir, 0o700) does not change the mode of an already-existing path. cmd/ox/codedb.go pre-creates the CodeDB root with 0o755, so the first CLI run still leaves the index root world-traversable unless Open explicitly Chmods it back to 0o700.
Proposed fix
 	for _, dir := range []string{root, reposDir, bleveDir} {
 		if err := os.MkdirAll(dir, 0o700); err != nil {
 			return nil, fmt.Errorf("create dir %s: %w", dir, err)
 		}
+		if err := os.Chmod(dir, 0o700); err != nil {
+			return nil, fmt.Errorf("chmod dir %s: %w", dir, err)
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/codedb/store/store.go` around lines 32 - 36, The loop that calls
os.MkdirAll for each path (root, reposDir, bleveDir) doesn't change permissions
on existing directories, so add an explicit os.Chmod(dir, 0o700) after MkdirAll
inside the same loop to ensure existing directories are tightened; handle and
return any Chmod error (e.g., fmt.Errorf("chmod dir %s: %w", dir, err)) so
failures propagate—apply this change where the MkdirAll loop is implemented (the
code that sets up the store directories in store.go / Open or NewStore).
internal/codedb/index/indexer.go (2)
100-128: ⚠️ Potential issue | 🟠 Major

Keep Bleve writes behind the SQL commit.

flushCodeBatch / flushDiffBatch publish documents to Bleve before tx.Commit(), and those helpers are called while the SQL transaction is still open. Any later SQL failure or rollback leaves orphaned Bleve docs that point at rows that never committed.

Also applies to: 215-225, 323-333
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/codedb/index/indexer.go` around lines 100 - 128, The current
flushCodeBatch and flushDiffBatch functions are being called while the SQL
transaction is still open, causing Bleve writes to be published before
tx.Commit(); change the call sites so that flushCodeBatch(...) and
flushDiffBatch(...) are only invoked after tx.Commit() successfully returns
(i.e., move the calls out of the transactional code path to run after
tx.Commit() or on successful commit path), or alternatively wrap the batch
flushes behind a post-commit hook so no Bleve Batch() is executed until after
tx.Commit() has succeeded; update all callers that currently call these methods
during a transaction (including the places referenced around the duplicated
ranges) to follow this post-commit behavior.
421-423: ⚠️ Potential issue | 🟠 Major

Don't store path-derived language on deduped blobs.

language.Detect(path) depends on the filename, but blobs are keyed only by content_hash. The first extension that inserts a blob wins forever, so the same content reused under another path gets the wrong language, which then breaks lang: filtering and the ParseSymbols query that reads blobs.language.

Also applies to: 924-926, 1041-1043
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/codedb/index/indexer.go` around lines 421 - 423, The code currently
uses st.tx.Exec to run "INSERT OR IGNORE INTO blobs (content_hash, language)
..." with language set from language.Detect(path), which stores a path-derived
language on a blob even when the row already exists (blobs are keyed by
content_hash); change this to first check whether a blob with content_hash
(contentHash) already exists (SELECT language FROM blobs WHERE content_hash =
?), and only insert the detected language when no row exists—if a row exists do
nothing (or leave language unchanged). Replace the blind INSERT OR IGNORE with a
query-then-insert flow (or an INSERT ... WHERE NOT EXISTS pattern) so
language.Detect(path) is not recorded for deduped content; apply the same change
at the other two insertion sites referenced around the uses at the other
occurrences (the blocks using contentHash, langPtr and language.Detect(path) at
the two other locations).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/codedb/codedb.go`:
- Around line 71-104: RawSQL currently allocates sql.NullString for every column
so rows with non-text types are dropped; fix RawSQL (the rows.Next / rows.Scan
block) to allocate a []interface{} slice of empty interface values (e.g., vals
:= make([]interface{}, len(cols)) with ptrs := make([]interface{}, len(cols))
and set ptrs[i] = &vals[i]), call rows.Scan(ptrs...), then convert each vals[i]
to a string (handle nil -> "NULL", []byte -> string(b), otherwise use
fmt.Sprint(vals[i])) and append that row to results instead of skipping on scan
errors for type mismatches. Ensure you still return cols, results, nil.

In `@internal/codedb/index/gitops.go`:
- Around line 16-29: normalizeRepoURL currently only strips http(s)/git://
prefixes and .git suffix, so SSH-style URLs like git@github.com:org/repo.git and
ssh://git@github.com/org/repo.git are not canonicalized and can produce
different keys; update normalizeRepoURL to detect and normalize SCP-style and
ssh:// forms the same way as internal/repotools/url.go does: parse and convert
git@host:org/repo.git to host/org/repo and treat ssh://git@host/... like
host/... (removing user@ and any leading scheme, converting ':' separators to
'/', trimming ".git" and trailing slashes), ensuring
RepoDirFromURL/RepoNameFromURL will get a consistent canonical string for all
URL forms.

In `@internal/codedb/index/indexer.go`:
- Line 397: The symbol parser fails for synthetic worktree blobs because
contentHash values like "worktree:<sha256>" are created (contentHash variable)
but ParseSymbols only tries to reopen real git blobs via BlobObject, causing
content == nil and parsed=1; modify ParseSymbols (and any callers that use
BlobObject) to detect "worktree:" prefixed contentHash values and load the blob
content from the in-memory/synthetic worktree store instead of calling
BlobObject, or alternatively adjust the content retrieval code to first try the
worktree lookup for contentHash and fall back to BlobObject; update logic in
ParseSymbols (and related functions that rely on BlobObject) so synthetic
worktree blobs are passed actual content to the symbol parser.
- Around line 252-253: The code currently derives repoName from
filepath.Base(localPath) and calls upsertRepo(s, repoName, localPath), which
causes unrelated local repos with the same basename to collide; change repoName
to a stable unique key for local repos (e.g. use the absolute, cleaned path
obtained from filepath.Abs(localPath) or similarly canonicalized path) and pass
that to upsertRepo instead of filepath.Base(localPath). Make the same
replacement for the other occurrence where repoName is derived from
filepath.Base(localPath) (the block referenced in the review) so upsertRepo
receives a unique repo identifier for all local repositories.

In `@internal/codedb/search/executor_test.go`:
- Around line 47-305: Many executor tests repeat the same setup and execution
logic; consolidate TestExecuteCommitSearch, TestExecuteCommitSearchByMessage,
TestExecuteSymbolSearch, TestExecuteSymbolSearchWithKindFilter,
TestExecuteCallsSearch, TestExecuteReturnsSearch,
TestExecuteCommitSearchNoResults, TestExecuteWithCount,
TestExecuteContextCancelled, TestExecuteCalledBySearch,
TestExecuteSymbolSearchLangFilter, and TestExecuteCommitDateFilter into a single
table-driven test that iterates test cases; each case should call openTestStore
and seedTestData once per subtest, build/parse the query via ParseQuery, call
Execute (or call Execute with a canceled context for the canceled case), and
assert expected results or expected errors based on fields in the table (name,
query, wantErr, wantCount, wantContains, wantNotContains, etc.); keep existing
helper functions (openTestStore, seedTestData, ParseQuery, Execute) and create
subtests with t.Run for clearer failure reporting and to include
malformed/query-error cases alongside happy paths.

---

Duplicate comments:
In `@internal/codedb/index/indexer.go`:
- Around line 100-128: The current flushCodeBatch and flushDiffBatch functions
are being called while the SQL transaction is still open, causing Bleve writes
to be published before tx.Commit(); change the call sites so that
flushCodeBatch(...) and flushDiffBatch(...) are only invoked after tx.Commit()
successfully returns (i.e., move the calls out of the transactional code path to
run after tx.Commit() or on successful commit path), or alternatively wrap the
batch flushes behind a post-commit hook so no Bleve Batch() is executed until
after tx.Commit() has succeeded; update all callers that currently call these
methods during a transaction (including the places referenced around the
duplicated ranges) to follow this post-commit behavior.
- Around line 421-423: The code currently uses st.tx.Exec to run "INSERT OR
IGNORE INTO blobs (content_hash, language) ..." with language set from
language.Detect(path), which stores a path-derived language on a blob even when
the row already exists (blobs are keyed by content_hash); change this to first
check whether a blob with content_hash (contentHash) already exists (SELECT
language FROM blobs WHERE content_hash = ?), and only insert the detected
language when no row exists—if a row exists do nothing (or leave language
unchanged). Replace the blind INSERT OR IGNORE with a query-then-insert flow (or
an INSERT ... WHERE NOT EXISTS pattern) so language.Detect(path) is not recorded
for deduped content; apply the same change at the other two insertion sites
referenced around the uses at the other occurrences (the blocks using
contentHash, langPtr and language.Detect(path) at the two other locations).

In `@internal/codedb/store/store.go`:
- Around line 32-36: The loop that calls os.MkdirAll for each path (root,
reposDir, bleveDir) doesn't change permissions on existing directories, so add
an explicit os.Chmod(dir, 0o700) after MkdirAll inside the same loop to ensure
existing directories are tightened; handle and return any Chmod error (e.g.,
fmt.Errorf("chmod dir %s: %w", dir, err)) so failures propagate—apply this
change where the MkdirAll loop is implemented (the code that sets up the store
directories in store.go / Open or NewStore).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 12398aba-4b9e-4596-8ac2-d8eeee2e4f95

📥 Commits

Reviewing files that changed from the base of the PR and between ec86487 and 80585b6.

📒 Files selected for processing (6)

internal/codedb/codedb.go
internal/codedb/index/gitops.go
internal/codedb/index/gitops_test.go
internal/codedb/index/indexer.go
internal/codedb/search/executor_test.go
internal/codedb/store/store.go

🚧 Files skipped from review as they are similar to previous changes (1)

internal/codedb/index/gitops_test.go

coderabbitai · 2026-03-09T22:20:28Z

+func (db *DB) RawSQL(query string) ([]string, [][]string, error) {
+	rows, err := db.store.Query(query)
+	if err != nil {
+		return nil, nil, fmt.Errorf("execute sql: %w", err)
+	}
+	defer rows.Close()
+
+	cols, err := rows.Columns()
+	if err != nil {
+		return nil, nil, fmt.Errorf("get columns: %w", err)
+	}
+
+	var results [][]string
+	for rows.Next() {
+		values := make([]sql.NullString, len(cols))
+		ptrs := make([]interface{}, len(cols))
+		for i := range values {
+			ptrs[i] = &values[i]
+		}
+		if err := rows.Scan(ptrs...); err != nil {
+			continue
+		}
+		row := make([]string, len(cols))
+		for i, v := range values {
+			if v.Valid {
+				row[i] = v.String
+			} else {
+				row[i] = "NULL"
+			}
+		}
+		results = append(results, row)
+	}
+
+	return cols, results, nil


⚠️ Potential issue | 🔴 Critical

RawSQL drops rows for non-text columns.

Scanning every value into sql.NullString fails for INTEGER columns, and the continue silently discards those rows. Most CodeDB tables include ints (id, timestamp, line, col, ...), so valid SQL returns partial or empty output here.

Proposed fix

func (db *DB) RawSQL(query string) ([]string, [][]string, error) { rows, err := db.store.Query(query) if err != nil { return nil, nil, fmt.Errorf("execute sql: %w", err) } defer rows.Close() cols, err := rows.Columns() if err != nil { return nil, nil, fmt.Errorf("get columns: %w", err) } var results [][]string for rows.Next() { - values := make([]sql.NullString, len(cols)) - ptrs := make([]interface{}, len(cols)) + values := make([]interface{}, len(cols)) + ptrs := make([]interface{}, len(cols)) for i := range values { ptrs[i] = &values[i] } if err := rows.Scan(ptrs...); err != nil { - continue + return nil, nil, fmt.Errorf("scan row: %w", err) } row := make([]string, len(cols)) for i, v := range values { - if v.Valid { - row[i] = v.String - } else { - row[i] = "NULL" + switch x := v.(type) { + case nil: + row[i] = "NULL" + case []byte: + row[i] = string(x) + default: + row[i] = fmt.Sprint(x) } } results = append(results, row) } + if err := rows.Err(); err != nil { + return nil, nil, fmt.Errorf("iterate rows: %w", err) + } return cols, results, nil }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (db *DB) RawSQL(query string) ([]string, [][]string, error) {

rows, err := db.store.Query(query)

if err != nil {

return nil, nil, fmt.Errorf("execute sql: %w", err)

}

defer rows.Close()

cols, err := rows.Columns()

if err != nil {

return nil, nil, fmt.Errorf("get columns: %w", err)

}

var results [][]string

for rows.Next() {

values := make([]sql.NullString, len(cols))

ptrs := make([]interface{}, len(cols))

for i := range values {

ptrs[i] = &values[i]

}

if err := rows.Scan(ptrs...); err != nil {

continue

}

row := make([]string, len(cols))

for i, v := range values {

if v.Valid {

row[i] = v.String

} else {

row[i] = "NULL"

}

}

results = append(results, row)

}

return cols, results, nil

func (db *DB) RawSQL(query string) ([]string, [][]string, error) {

rows, err := db.store.Query(query)

if err != nil {

return nil, nil, fmt.Errorf("execute sql: %w", err)

}

defer rows.Close()

cols, err := rows.Columns()

if err != nil {

return nil, nil, fmt.Errorf("get columns: %w", err)

}

var results [][]string

for rows.Next() {

values := make([]interface{}, len(cols))

ptrs := make([]interface{}, len(cols))

for i := range values {

ptrs[i] = &values[i]

}

if err := rows.Scan(ptrs...); err != nil {

return nil, nil, fmt.Errorf("scan row: %w", err)

}

row := make([]string, len(cols))

for i, v := range values {

switch x := v.(type) {

case nil:

row[i] = "NULL"

case []byte:

row[i] = string(x)

default:

row[i] = fmt.Sprint(x)

}

}

results = append(results, row)

}

if err := rows.Err(); err != nil {

return nil, nil, fmt.Errorf("iterate rows: %w", err)

}

return cols, results, nil

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/codedb/codedb.go` around lines 71 - 104, RawSQL currently allocates sql.NullString for every column so rows with non-text types are dropped; fix RawSQL (the rows.Next / rows.Scan block) to allocate a []interface{} slice of empty interface values (e.g., vals := make([]interface{}, len(cols)) with ptrs := make([]interface{}, len(cols)) and set ptrs[i] = &vals[i]), call rows.Scan(ptrs...), then convert each vals[i] to a string (handle nil -> "NULL", []byte -> string(b), otherwise use fmt.Sprint(vals[i])) and append that row to results instead of skipping on scan errors for type mismatches. Ensure you still return cols, results, nil.

coderabbitai · 2026-03-09T22:20:28Z

+func normalizeRepoURL(url string) (string, error) {
+	stripped := url
+	for _, prefix := range []string{"https://", "http://", "git://"} {
+		if after, found := strings.CutPrefix(stripped, prefix); found {
+			stripped = after
+			break
+		}
+	}
+	stripped = strings.TrimRight(stripped, "/")
+	stripped = strings.TrimSuffix(stripped, ".git")
+	if stripped == "" {
+		return "", fmt.Errorf("invalid repo URL: %s", url)
+	}
+	return stripped, nil


⚠️ Potential issue | 🟠 Major

Handle SSH-style Git URLs in normalizeRepoURL.

This only canonicalizes https://, http://, and git://. Inputs like git@github.com:org/repo.git or ssh://git@github.com/org/repo.git end up with different repo keys than the HTTPS form, and the SCP-style variant leaves a literal : in the derived directory name. That splits the same remote across multiple cache entries and breaks path generation on Windows. internal/repotools/url.go already handles these SSH forms; RepoDirFromURL / RepoNameFromURL should normalize them the same way.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/codedb/index/gitops.go` around lines 16 - 29, normalizeRepoURL currently only strips http(s)/git:// prefixes and .git suffix, so SSH-style URLs like git@github.com:org/repo.git and ssh://git@github.com/org/repo.git are not canonicalized and can produce different keys; update normalizeRepoURL to detect and normalize SCP-style and ssh:// forms the same way as internal/repotools/url.go does: parse and convert git@host:org/repo.git to host/org/repo and treat ssh://git@host/... like host/... (removing user@ and any leading scheme, converting ':' separators to '/', trimming ".git" and trailing slashes), ensuring RepoDirFromURL/RepoNameFromURL will get a consistent canonical string for all URL forms.

coderabbitai · 2026-03-09T22:20:28Z

+	repoName := filepath.Base(localPath)
+	repoID, err := upsertRepo(s, repoName, localPath)


⚠️ Potential issue | 🟠 Major

Use a unique repo key for local repos, not filepath.Base(localPath).

upsertRepo conflicts only on repos.name, so /src/api and /tmp/api collapse into the same repo row. That mixes refs, commits, and file revisions for unrelated local repos that happen to share a basename. Use a stable unique identifier here, e.g. the absolute path.

Also applies to: 490-493

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/codedb/index/indexer.go` around lines 252 - 253, The code currently derives repoName from filepath.Base(localPath) and calls upsertRepo(s, repoName, localPath), which causes unrelated local repos with the same basename to collide; change repoName to a stable unique key for local repos (e.g. use the absolute, cleaned path obtained from filepath.Abs(localPath) or similarly canonicalized path) and pass that to upsertRepo instead of filepath.Base(localPath). Make the same replacement for the other occurrence where repoName is derived from filepath.Base(localPath) (the block referenced in the review) so upsertRepo receives a unique repo identifier for all local repositories.

coderabbitai · 2026-03-09T22:20:28Z

+
+		// Compute content hash
+		h := sha256.Sum256(content)
+		contentHash := "worktree:" + hex.EncodeToString(h[:])


⚠️ Potential issue | 🟠 Major

Worktree blobs can never reach the symbol parser.

Dirty files are stored under synthetic hashes like worktree:<sha256>, but ParseSymbols only knows how to reopen git object IDs via BlobObject. Those blobs are never in the object database, so every supported dirty file falls into the content == nil branch and gets marked parsed = 1 with zero symbols extracted.

Also applies to: 1117-1139

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/codedb/index/indexer.go` at line 397, The symbol parser fails for synthetic worktree blobs because contentHash values like "worktree:<sha256>" are created (contentHash variable) but ParseSymbols only tries to reopen real git blobs via BlobObject, causing content == nil and parsed=1; modify ParseSymbols (and any callers that use BlobObject) to detect "worktree:" prefixed contentHash values and load the blob content from the in-memory/synthetic worktree store instead of calling BlobObject, or alternatively adjust the content retrieval code to first try the worktree lookup for contentHash and fall back to BlobObject; update logic in ParseSymbols (and related functions that rely on BlobObject) so synthetic worktree blobs are passed actual content to the symbol parser.

coderabbitai · 2026-03-09T22:20:28Z

+func TestExecuteCommitSearch(t *testing.T) {
+	s := openTestStore(t)
+	seedTestData(t, s)
+
+	q, err := ParseQuery("type:commit author:alice")
+	if err != nil {
+		t.Fatal(err)
+	}
+	results, err := Execute(context.Background(), s, q)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if len(results) == 0 {
+		t.Fatal("expected results for commit search by alice")
+	}
+	for _, r := range results {
+		if r.Author != "alice" {
+			t.Errorf("expected author alice, got %q", r.Author)
+		}
+	}
+}
+
+func TestExecuteCommitSearchByMessage(t *testing.T) {
+	s := openTestStore(t)
+	seedTestData(t, s)
+
+	q, err := ParseQuery("type:commit refactor")
+	if err != nil {
+		t.Fatal(err)
+	}
+	results, err := Execute(context.Background(), s, q)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if len(results) != 1 {
+		t.Fatalf("expected 1 result, got %d", len(results))
+	}
+	if results[0].Message != "refactor search module" {
+		t.Errorf("message = %q", results[0].Message)
+	}
+}
+
+func TestExecuteSymbolSearch(t *testing.T) {
+	s := openTestStore(t)
+	seedTestData(t, s)
+
+	q, err := ParseQuery("type:symbol lang:go main")
+	if err != nil {
+		t.Fatal(err)
+	}
+	results, err := Execute(context.Background(), s, q)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if len(results) == 0 {
+		t.Fatal("expected results for symbol search")
+	}
+	found := false
+	for _, r := range results {
+		if r.SymbolName == "main" && r.SymbolKind == "function" {
+			found = true
+		}
+	}
+	if !found {
+		t.Error("expected to find 'main' function symbol")
+	}
+}
+
+func TestExecuteSymbolSearchWithKindFilter(t *testing.T) {
+	s := openTestStore(t)
+	seedTestData(t, s)
+
+	q, err := ParseQuery("type:symbol select:symbol.function lang:go helper")
+	if err != nil {
+		t.Fatal(err)
+	}
+	results, err := Execute(context.Background(), s, q)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if len(results) == 0 {
+		t.Fatal("expected results")
+	}
+	for _, r := range results {
+		if r.SymbolKind != "function" {
+			t.Errorf("expected kind=function, got %q", r.SymbolKind)
+		}
+	}
+}
+
+func TestExecuteCallsSearch(t *testing.T) {
+	s := openTestStore(t)
+	seedTestData(t, s)
+
+	q, err := ParseQuery("calls:helper")
+	if err != nil {
+		t.Fatal(err)
+	}
+	results, err := Execute(context.Background(), s, q)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if len(results) == 0 {
+		t.Fatal("expected results for calls:helper")
+	}
+	// The caller of helper should be main (symbol_id=1 -> name=main)
+	found := false
+	for _, r := range results {
+		if r.SymbolName == "main" {
+			found = true
+		}
+	}
+	if !found {
+		t.Error("expected to find 'main' as caller of 'helper'")
+	}
+}
+
+func TestExecuteReturnsSearch(t *testing.T) {
+	s := openTestStore(t)
+	seedTestData(t, s)
+
+	q, err := ParseQuery("returns:Result")
+	if err != nil {
+		t.Fatal(err)
+	}
+	results, err := Execute(context.Background(), s, q)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if len(results) == 0 {
+		t.Fatal("expected results for returns:Result")
+	}
+	found := false
+	for _, r := range results {
+		if r.SymbolName == "process" {
+			found = true
+		}
+	}
+	if !found {
+		t.Error("expected to find 'process' function returning Result")
+	}
+}
+
+func TestExecuteCommitSearchNoResults(t *testing.T) {
+	s := openTestStore(t)
+	seedTestData(t, s)
+
+	q, err := ParseQuery("type:commit author:nonexistent")
+	if err != nil {
+		t.Fatal(err)
+	}
+	results, err := Execute(context.Background(), s, q)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if len(results) != 0 {
+		t.Errorf("expected 0 results, got %d", len(results))
+	}
+}
+
+func TestExecuteWithCount(t *testing.T) {
+	s := openTestStore(t)
+	seedTestData(t, s)
+
+	q, err := ParseQuery("type:commit count:1 author:alice")
+	if err != nil {
+		t.Fatal(err)
+	}
+	results, err := Execute(context.Background(), s, q)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if len(results) > 1 {
+		t.Errorf("expected at most 1 result, got %d", len(results))
+	}
+}
+
+func TestExecuteContextCancelled(t *testing.T) {
+	s := openTestStore(t)
+	seedTestData(t, s)
+
+	ctx, cancel := context.WithCancel(context.Background())
+	cancel() // cancel immediately
+
+	q, err := ParseQuery("type:commit author:alice")
+	if err != nil {
+		t.Fatal(err)
+	}
+	_, err = Execute(ctx, s, q)
+	if err == nil {
+		t.Fatal("expected error from canceled context")
+	}
+}
+
+func TestExecuteCalledBySearch(t *testing.T) {
+	s := openTestStore(t)
+	seedTestData(t, s)
+
+	q, err := ParseQuery("calledby:process")
+	if err != nil {
+		t.Fatal(err)
+	}
+	results, err := Execute(context.Background(), s, q)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if len(results) == 0 {
+		t.Fatal("expected results for calledby:process")
+	}
+	found := false
+	for _, r := range results {
+		if r.SymbolName == "parse" {
+			found = true
+		}
+	}
+	if !found {
+		t.Error("expected to find 'parse' as callee of 'process'")
+	}
+}
+
+func TestExecuteSymbolSearchLangFilter(t *testing.T) {
+	s := openTestStore(t)
+	seedTestData(t, s)
+
+	// Search for symbols in rust — should find 'process' but not 'main'
+	q, err := ParseQuery("type:symbol lang:rust process")
+	if err != nil {
+		t.Fatal(err)
+	}
+	results, err := Execute(context.Background(), s, q)
+	if err != nil {
+		t.Fatal(err)
+	}
+	for _, r := range results {
+		if r.SymbolName == "main" {
+			t.Error("should not find Go symbol 'main' when filtering lang:rust")
+		}
+	}
+}
+
+func TestExecuteCommitDateFilter(t *testing.T) {
+	s := openTestStore(t)
+	seedTestData(t, s)
+
+	// Only commits after timestamp 1700050000 (should exclude 'initial commit')
+	q, err := ParseQuery("type:commit after:2023-11-15 author:alice")
+	if err != nil {
+		t.Fatal(err)
+	}
+	results, err := Execute(context.Background(), s, q)
+	if err != nil {
+		t.Fatal(err)
+	}
+	for _, r := range results {
+		if r.Message == "initial commit" {
+			t.Error("should not include 'initial commit' with after filter")
+		}
+	}
+}


🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash rg -n 'func TestExecute|openTestStore$t$|seedTestData$t, s$' internal/codedb/search/executor_test.go

Repository: sageox/ox

Length of output: 1292

🏁 Script executed:

#!/bin/bash # Check the full executor_test.go file structure to identify all test functions and error scenarios wc -l internal/codedb/search/executor_test.go

Repository: sageox/ox

Length of output: 96

🏁 Script executed:

#!/bin/bash # Look for test functions that explicitly test error conditions rg -n 'if err != nil|expected error|Fatal.*error' internal/codedb/search/executor_test.go | head -30

Repository: sageox/ox

Length of output: 603

🏁 Script executed:

#!/bin/bash # Check if there are tests for ParseQuery error scenarios rg -n 'ParseQuery.*invalid|ParseQuery.*error|ParseQuery.*bad' internal/codedb/search/executor_test.go

Repository: sageox/ox

Length of output: 35

Consolidate these executor tests into a table-driven harness.

The repeated openTestStore → seedTestData → ParseQuery → Execute pattern across 11 test functions creates unnecessary boilerplate that makes it difficult to extend the query and error-case matrices consistently. A single table-driven test would reduce duplication and enable systematic coverage of error paths (malformed queries, Execute failures) alongside the happy paths.

Per coding guidelines: "Use table-driven tests in Go test files" and "Test error paths in addition to happy paths."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/codedb/search/executor_test.go` around lines 47 - 305, Many executor tests repeat the same setup and execution logic; consolidate TestExecuteCommitSearch, TestExecuteCommitSearchByMessage, TestExecuteSymbolSearch, TestExecuteSymbolSearchWithKindFilter, TestExecuteCallsSearch, TestExecuteReturnsSearch, TestExecuteCommitSearchNoResults, TestExecuteWithCount, TestExecuteContextCancelled, TestExecuteCalledBySearch, TestExecuteSymbolSearchLangFilter, and TestExecuteCommitDateFilter into a single table-driven test that iterates test cases; each case should call openTestStore and seedTestData once per subtest, build/parse the query via ParseQuery, call Execute (or call Execute with a canceled context for the canceled case), and assert expected results or expected errors based on fields in the table (name, query, wantErr, wantCount, wantContains, wantNotContains, etc.); keep existing helper functions (openTestStore, seedTestData, ParseQuery, Execute) and create subtests with t.Run for clearer failure reporting and to include malformed/query-error cases alongside happy paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: SageOx <ox@sageox.ai>

…re-index optimization Co-Authored-By: SageOx <ox@sageox.ai> SageOx-Session: https://sageox.ai/repo/repo_019c5812-01e9-7b7d-b5b1-321c471c9777/sessions/2026-02-26T20-09-user-OxjMyG/view

Co-Authored-By: SageOx <ox@sageox.ai> SageOx-Session: https://sageox.ai/repo/repo_019c5812-01e9-7b7d-b5b1-321c471c9777/sessions/2026-02-26T20-09-user-OxjMyG/view

coderabbitai Bot reviewed Mar 9, 2026

View reviewed changes

port8080 changed the title ~~feat(codedb): integrate CodeDBGo as local code search engine~~ feat(codedb): integrate CodeDB as local code search engine Mar 9, 2026

coderabbitai Bot reviewed Mar 9, 2026

View reviewed changes

port8080 requested a review from rsnodgrass March 9, 2026 18:43

rsnodgrass mentioned this pull request Mar 9, 2026

Rebase PR #158 (codedb) onto origin/main #163

Closed

coderabbitai Bot reviewed Mar 9, 2026

View reviewed changes

port8080 and others added 4 commits March 9, 2026 16:29

feat(codedb): integrate CodeDBGo as local code search engine

c6dbaf9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: SageOx <ox@sageox.ai>

fix: address PR review feedback for codedb integration

c3a7b1d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: SageOx <ox@sageox.ai>

feat(codedb): add local repo indexing, default-branch-only mode, and …

80b73ca

…re-index optimization Co-Authored-By: SageOx <ox@sageox.ai> SageOx-Session: https://sageox.ai/repo/repo_019c5812-01e9-7b7d-b5b1-321c471c9777/sessions/2026-02-26T20-09-user-OxjMyG/view

fix(codedb): address PR review feedback items 1-9

4462689

Co-Authored-By: SageOx <ox@sageox.ai> SageOx-Session: https://sageox.ai/repo/repo_019c5812-01e9-7b7d-b5b1-321c471c9777/sessions/2026-02-26T20-09-user-OxjMyG/view

rsnodgrass force-pushed the ajit/codedb-go-ox branch from 80585b6 to 4462689 Compare March 9, 2026 23:31

rsnodgrass merged commit 4d58972 into main Mar 9, 2026
1 check was pending

rsnodgrass deleted the ajit/codedb-go-ox branch March 9, 2026 23:33

This was referenced Mar 10, 2026

feat(codedb): integrate local code search engine with daemon #170

Merged

feat: compact code search, worktree support, inline code index #174

Merged

rsnodgrass mentioned this pull request Mar 10, 2026

Rename ox codedb to ox code, hide sql subcommand #162

Closed

This was referenced Mar 11, 2026

feat(codedb): on-disk dirty overlay, comment indexing, code review fixes #180

Merged

feat: unified ox index + CodeDB GitHub PR/issue search #182

Merged

CodeDB performance improvements #257

Merged

coderabbitai Bot mentioned this pull request Mar 29, 2026

feat(stores): add sqlc typed SQL to whisper and codedb stores #346

Merged

4 tasks

		repoName := filepath.Base(localPath)
		repoID, err := upsertRepo(s, repoName, localPath)

Conversation

port8080 commented Mar 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Evolution

Benchmark (SFrameRust repo)

Session Recording

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

port8080 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

port8080 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

port8080 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

port8080 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026

port8080 commented Mar 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 9, 2026 •

edited

Loading

coderabbitai Bot Mar 9, 2026 •

edited

Loading

coderabbitai Bot Mar 9, 2026 •

edited

Loading

coderabbitai Bot Mar 9, 2026 •

edited

Loading

coderabbitai Bot Mar 9, 2026 •

edited

Loading