feat(codedb): integrate CodeDB as local code search engine#158
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (28)
📝 WalkthroughWalkthroughAdds a new local CodeDB subsystem (SQLite + Bleve + git indexing, query AST/planner/executor, symbol models and parsers), codedb CLI commands, and integrates CodeDB plus team-context vector search into the agent query flow via a new Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Agent as Agent Query
participant TC as Team Context API
participant CodeDB as CodeDB Facade
participant Store as Store (SQLite + Bleve)
User->>Agent: run agent query (--source)
Agent->>Agent: parse flags, build queryArgs
alt source == "teamctx" or "all"
Agent->>TC: vector search request
TC-->>Agent: team context results
end
alt source == "code" or "all"
Agent->>CodeDB: Search(ctx, input)
CodeDB->>CodeDB: ParseQuery -> Plan
CodeDB->>Store: execute plan (Bleve / SQL / intersect)
par Bleve path
Store->>Store: Bleve query -> hits
Store-->>CodeDB: hits + fragments
CodeDB->>Store: SQL enrich (file_revs/commits)
Store-->>CodeDB: enriched results
and SQL-only path
Store->>Store: execute SQL
Store-->>CodeDB: SQL results
end
CodeDB-->>Agent: code search results
end
Agent->>Agent: combine results into single JSON
Agent-->>User: unified response
sequenceDiagram
autonumber
actor User
participant CLI as codedb CLI
participant CodeDB as CodeDB Facade
participant Git as Git
participant Index as Bleve
participant DB as SQLite
User->>CLI: codedb index <url>
CLI->>CodeDB: Open(root) -> IndexRepo(ctx, url)
CodeDB->>Git: CloneOrFetch(url)
Git-->>CodeDB: repo path
CodeDB->>DB: upsert repo/commits/diffs/blobs
CodeDB->>Index: index code/diff docs
Index-->>CodeDB: indexed
CodeDB->>CodeDB: ParseSymbols(progress)
CodeDB->>DB: write symbols/symbol_refs
DB-->>CodeDB: done
CodeDB-->>CLI: summary (blobs/symbols parsed)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related issues
Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 15
🧹 Nitpick comments (9)
internal/codedb/store/store.go (1)
88-95: Prefererrors.Isfor the Bleve missing-index check.A direct equality check is brittle if
bleve.Openever wrapsErrorIndexPathDoesNotExist.♻️ Suggested fix
import ( "context" "database/sql" + "errors" "fmt" "os" "path/filepath" @@ func openOrCreateBleveIndex(path string) (bleve.Index, error) { idx, err := bleve.Open(path) - if err == bleve.ErrorIndexPathDoesNotExist { + if errors.Is(err, bleve.ErrorIndexPathDoesNotExist) { mapping := bleve.NewIndexMapping() return bleve.New(path, mapping) } return idx, err }As per coding guidelines,
**/*.go: Use errors.Is() and errors.As() for error handling and wrap with context.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/codedb/store/store.go` around lines 88 - 95, The equality check against bleve.ErrorIndexPathDoesNotExist in openOrCreateBleveIndex is brittle; update the error handling to use errors.Is(err, bleve.ErrorIndexPathDoesNotExist) when deciding to create a new index, and wrap/return errors with context where appropriate (preserve the existing bleve.Open and bleve.New calls but replace the direct == check with errors.Is and add a contextual error wrap on failures).internal/codedb/index/gitops.go (2)
14-46: Consider extracting common URL normalization logic.
RepoDirFromURLandRepoNameFromURLshare identical prefix-stripping and normalization logic. Consider extracting this to a helper to reduce duplication.♻️ Suggested refactor
func normalizeRepoURL(url string) (string, error) { stripped := url for _, prefix := range []string{"https://", "http://", "git://"} { if strings.HasPrefix(stripped, prefix) { stripped = strings.TrimPrefix(stripped, prefix) break } } stripped = strings.TrimRight(stripped, "/") stripped = strings.TrimSuffix(stripped, ".git") if stripped == "" { return "", fmt.Errorf("invalid repo URL: %s", url) } return stripped, nil } func RepoDirFromURL(url string) (string, error) { name, err := normalizeRepoURL(url) if err != nil { return "", err } return name + ".git", nil } func RepoNameFromURL(url string) (string, error) { return normalizeRepoURL(url) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/codedb/index/gitops.go` around lines 14 - 46, Extract the shared URL normalization logic from RepoDirFromURL and RepoNameFromURL into a helper (e.g., normalizeRepoURL) that performs prefix stripping for "https://", "http://", "git://", trims trailing slashes, removes ".git", and returns an error for empty results; then update RepoDirFromURL to call normalizeRepoURL and append ".git" to its result, and update RepoNameFromURL to return normalizeRepoURL's result directly, preserving existing error handling.
75-85: Useerrors.Is()for sentinel error comparison.Per coding guidelines, use
errors.Is()instead of direct equality comparison for error handling.♻️ Proposed fix
func fetch(repo *git.Repository) error { err := repo.Fetch(&git.FetchOptions{ RemoteName: "origin", RefSpecs: []config.RefSpec{"+refs/*:refs/*"}, Force: true, }) - if err == git.NoErrAlreadyUpToDate { + if errors.Is(err, git.NoErrAlreadyUpToDate) { return nil } return err }Also add
"errors"to the import block.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/codedb/index/gitops.go` around lines 75 - 85, The fetch function compares the sentinel error git.NoErrAlreadyUpToDate using ==; change this to use errors.Is(err, git.NoErrAlreadyUpToDate) and update the import block to include "errors" so the comparison is done properly (modify the fetch function and its error check referencing git.NoErrAlreadyUpToDate).cmd/ox/codedb.go (2)
39-43: Consider adding progress callback toIndexOptions.
IndexOptionsis created empty, but theIndexOptionsstruct has aProgressfield. Without it, users don't see progress during the (potentially long) indexing phase, only the initial "Indexing..." message.♻️ Proposed enhancement
fmt.Fprintf(os.Stderr, "Indexing %s...\n", url) - opts := index.IndexOptions{} + opts := index.IndexOptions{ + Progress: func(msg string) { + fmt.Fprintf(os.Stderr, " %s\n", msg) + }, + } if err := db.IndexRepo(context.Background(), url, opts); err != nil { return fmt.Errorf("index: %w", err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ox/codedb.go` around lines 39 - 43, Indexing is currently invoked with an empty IndexOptions so the Progress callback isn't used; update the call site where IndexOptions is created (the IndexOptions{} passed to db.IndexRepo) to provide a Progress function that reports progress (e.g., writes to stderr or updates a spinner) during indexing; implement the callback by assigning IndexOptions.Progress to a function that accepts progress events and logs or prints them, then pass that populated IndexOptions into db.IndexRepo so users see live progress.
84-109: Consider adding a warning about SQL injection for thesqlsubcommand.The
sqlsubcommand executes raw SQL directly. While this is an intentional power-user feature, consider adding a note in the help text or a debug log indicating this is for debugging/development use only.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ox/codedb.go` around lines 84 - 109, Add a clear SQL-injection warning to the codedbSQLCmd definition and log a runtime warning before executing raw SQL: update the command help (e.g., Short or add a Long field on codedbSQLCmd/Use) to state this runs arbitrary SQL and is intended for debugging/development only, and insert a runtime warning (e.g., via log.Printf or a processLogger) at the start of RunE right before calling db.RawSQL(args[0]) to remind users this is unsafe for untrusted input; reference codedbSQLCmd, its RunE function, and the db.RawSQL call to locate where to add the help text and log.internal/codedb/store/store_test.go (1)
9-52: Consider adding error path tests.The current tests cover happy paths well. Per coding guidelines, consider adding tests for error scenarios such as:
- Opening a path that cannot be created (e.g., under a non-existent parent with restricted permissions)
- Schema creation failure scenarios
This would improve resilience against regression.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/codedb/store/store_test.go` around lines 9 - 52, Add negative tests to cover Open error paths: create a test that attempts Open on a directory that cannot be created (e.g., create a parent dir, chmod it to remove write permission, then call Open on a child path) and assert Open returns a non-nil error; also add a test that simulates schema creation failure for Open (for example by creating a read-only metadata.db file or using a mock that causes Exec/QueryRow used in Open/initialization to return an error) and assert Open returns the expected error. Reference the Open function and the initialization steps that create metadata.db and the repos table when adding these tests. Ensure tests clean up/reset permissions so CI won’t be left in a bad state.cmd/ox/agent_query.go (1)
167-181: Consider logging when CodeDB index doesn't exist.When
dataDirdoesn't exist (line 170-172), the function silently returns empty results. Consider adding a debug log to help users understand why code results may be empty.♻️ Proposed enhancement
func queryCodeDB(qa *queryArgs) ([]search.Result, error) { dataDir := paths.CodeDBDataDir() if _, err := os.Stat(dataDir); os.IsNotExist(err) { + slog.Debug("codedb index not found, skipping code search", "path", dataDir) return nil, nil // no index yet, return empty }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ox/agent_query.go` around lines 167 - 181, The function queryCodeDB currently returns nil when the CodeDB dataDir (from paths.CodeDBDataDir()) doesn't exist; add a debug/info log before returning so callers can see why searches are empty. Detect the os.IsNotExist(err) branch in queryCodeDB, log a message that includes the dataDir path (and optionally the error) to explain "no index yet", then return nil as before; reference the symbols queryCodeDB, dataDir, paths.CodeDBDataDir, and the os.Stat check to locate where to insert the log.internal/codedb/search/executor_test.go (1)
47-305: Consider a table-driven executor matrix for the planner branches.The repeated open/seed/parse/execute flow makes it hard to add regressions for the tricky paths. I'd add rows for regex code search,
select:file/select:repo, diffrev:, and a repo whose default ref is notmain.Based on learnings: Applies to **/*_test.go : Use table-driven tests in Go test files; Every bug fix MUST include a regression test unless existing tests already cover the failure mode.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/codedb/search/executor_test.go` around lines 47 - 305, Refactor the repeated openTestStore/seedTestData/ParseQuery/Execute pattern in the tests (e.g., TestExecuteCommitSearch, TestExecuteSymbolSearch, TestExecuteWithCount, etc.) into a single table-driven test that iterates rows describing input query strings and expected assertions; add rows to cover regex code search, select:file and select:repo queries, diff queries using rev:, and a case for a repo with a non-main default ref, and ensure the table-driven test still uses openTestStore, seedTestData, ParseQuery and Execute for each row so new regressions are automatically exercised.internal/codedb/search/query_test.go (1)
8-349: Please table-drive these parser cases and add the quoted-filter row.The coverage breadth is good, but the one-test-per-shape layout makes it easy to miss combinations like
message:"fix bug"and other filter/value edge cases.Based on learnings: Applies to **/*_test.go : Use table-driven tests in Go test files.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/codedb/search/query_test.go` around lines 8 - 349, Convert the many one-off tests in TestTokenize and the TestParse* functions that call ParseQuery into table-driven tests: create a single slice of test cases (structs with name, input, expected fields like SearchPattern, Filters, Type, IsRegex, SearchTerms, HasEmptyPattern, error assertions) and iterate to run subtests with t.Run; replace the individual TestParse* functions by adding their cases to that table (keeping references to tokenize and ParseQuery for the tokenizer-specific cases), and add a new case for a quoted filter like `message:"fix bug"` verifying Filters.Message (or the equivalent filter field) and that the quoted value is preserved as a single filter value; ensure error cases include expected error substrings and negation/count/regex edge-cases are represented as table entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/ox/agent_query.go`:
- Around line 122-126: The defined type codeSearchResult is unused; either
remove it or apply it to the combinedResponse's CodeResults field so results
carry the Source tag. Update the struct that defines combinedResponse (the field
CodeResults currently typed as []search.Result) to use []codeSearchResult if you
intend to include Source metadata, or delete the codeSearchResult type if not
needed, and adjust any serialization/JSON usages accordingly.
In `@internal/codedb/index/gitops_test.go`:
- Around line 33-41: Refactor TestRepoNameFromURL into a table-driven test that
iterates cases including both the existing happy path and at least one failing
input (e.g., an invalid URL string) and assert both returned value and error
behavior; update the TestRepoNameFromURL function to loop over a slice of
structs (with fields like name, input, wantRepo, wantErr) and for each case call
RepoNameFromURL, check whether err presence matches wantErr and compare got to
wantRepo when no error as needed.
In `@internal/codedb/index/indexer.go`:
- Around line 465-472: The call to insertDiff is passing nullBlob.Int64 (0) for
newBlobDBID which is confusing and error-prone; update insertDiff to accept
sql.NullInt64 for the new blob as well (change signature newBlobDBID int64 ->
newBlobDBID sql.NullInt64 and adjust callers), or at minimum pass an explicit
sql.NullInt64{Valid:false} when there is no new blob instead of nullBlob.Int64;
update the body of insertDiff (where it builds newBlobPtr/newBlobID) to read
from the sql.NullInt64 newBlobDBID and keep hasNew/hasOld semantics intact
(references: insertDiff, newBlobDBID, sql.NullInt64, hasNew).
- Around line 216-233: The loadKnownCommits function iterates rows but never
checks rows.Err() for iteration errors; update loadKnownCommits to defer
rows.Close() immediately after the Query succeeds, then after the rows.Next()
loop call rows.Err() and return any non-nil error (alongside the current known
map handling) so iteration errors are propagated; reference the rows variable
and the loadKnownCommits function when making the change.
- Around line 606-608: The code currently ignores the error from
st.tx.QueryRow(...).Scan(&parsed), which can mask DB issues; capture the
returned error, handle sql.ErrNoRows by setting parsed = 0 (or appropriate
default), and for any other error either return it or log it and bail out.
Update the call site around st.tx.QueryRow/Scan to check err :=
st.tx.QueryRow("SELECT parsed FROM blobs WHERE id = ?", blobDBID).Scan(&parsed);
if err != nil { if err == sql.ErrNoRows { parsed = 0 } else { return err (or
processLogger/error return per surrounding function) } } so failures are
surfaced instead of silently ignored.
In `@internal/codedb/README.md`:
- Around line 16-33: The fenced code blocks in internal/codedb/README.md are
missing language identifiers; update the first fenced block (the query examples
containing lines like "spawn", "lang:rust file:*.rs fn", "/err\d+/", "foo OR
bar") to use ```text and update the CLI fenced block (the snippet with "ox
codedb index <url>", "ox codedb search <query>", "ox codedb sql <sql>") to use
```bash so markdownlint warnings are resolved.
In `@internal/codedb/search/executor.go`:
- Around line 119-123: The current search uses a fixed Bleve window (searchReq
:= bleve.NewSearchRequestOptions(bleveQuery, plan.Limit*5, 0, false) and
idx.Search) and then post-filters during enrichment, which can drop matches
outside that window; change the logic to page through Bleve results (repeatedly
call idx.Search with adjustable From/Size or batch offset) until you have
collected enough post-filtered results (respecting plan.Count/plan.Limit and
select projections) or Bleve is exhausted, applying repo/file/lang filters
during enrichment for each batch and only stopping when the requested number of
final results is gathered; update the code paths around searchReq, idx.Search,
and the enrichment/select projection application to use this batched/looping
strategy.
- Around line 118-123: executePlanBleve currently always uses
bleve.NewQueryStringQuery which ignores plan.IsRegex; update executePlanBleve to
branch on plan.IsRegex and when true construct a bleve regexp query (use
bleve.NewRegexpQuery(plan.BleveQuery) and set the query Field to "content" or
otherwise mirror how non-regex queries target the content field), otherwise keep
using bleve.NewQueryStringQuery; ensure the resulting query is used to build
searchReq (and keep existing options like Highlight/Fields) so regex searches
and diff searches behave correctly.
In `@internal/codedb/search/planner.go`:
- Around line 139-177: planDiffSearch currently ignores the ParsedQuery rev
filter so diff searches like "type:diff rev:feature" are not constrained; update
planDiffSearch to treat query.Filters.Rev (and NegRev if present) as a metadata
filter (include them in hasMetadataFilters) and ensure the rev constraint is
applied to the SQL/SQLParams returned (for example by adding a "rev = ?" or "rev
!= ?" clause to translated.SQL and appending the corresponding value(s) to
translated.Params / sqlParams when translated != nil), and if using the Bleve
side, ensure the rev constraint is encoded into BleveQuery as well so both
ExecutionPlan.SQL/SQLParams and ExecutionPlan.BleveQuery respect the rev filter.
In `@internal/codedb/search/query.go`:
- Around line 19-33: The tokenizer currently flushes the current buffer before
reading a quoted value which splits "key:" from its quoted value; in the '"'
case inside tokenize (use symbols tokens, current, quoted, runes, i) change the
logic so that if current.Len() > 0 and the last character of current is ':' you
do NOT flush it first but instead read the quoted text and append a single token
combining current + `"` + quoted + `"` (preserving the colon), otherwise
preserve the existing behavior (flush current then push the quoted token).
Ensure you update the append logic to handle both paths and advance i past the
closing quote as before.
In `@internal/codedb/search/translate.go`:
- Around line 61-68: The helper resolveRevRef currently maps an empty rev to
"refs/heads/main", which wrongly assumes repos use "main"; change resolveRevRef
so an empty rev is not normalized to main—either return the empty string (so
callers can skip ref filtering and let the repo's actual default/HEAD be used)
or implement a lookup to resolve the repo's real default/HEAD ref via repository
metadata before prepending "refs/heads/". Update resolveRevRef and call sites
that rely on its output to treat "" as "no ref filter" (or to use the fetched
default ref) and only prepend "refs/heads/" when a non-empty branch name is
provided.
In `@internal/codedb/store/store.go`:
- Around line 31-38: The directories created for reposDir and bleveDir are too
permissive (0o755); change the os.MkdirAll calls in store.go that create
reposDir and bleveDir to use owner-only mode (0o700) and, to be safe, call
os.Chmod(dir, 0o700) after creation (checking and returning any error) so
existing dirs are tightened as well; update the os.MkdirAll and add the os.Chmod
error handling around those symbols to enforce owner-only access.
In `@scripts/bench-codedb.sh`:
- Around line 342-344: The current assignment to REPOS when args are passed can
overwrite with invalid keys and later trigger an unhelpful associative-array
error under set -u; before setting REPOS in the if block, validate each "$@"
against the known list (e.g. SUPPORTED_REPOS or SUPPORTED_KEYS) and if any key
is invalid print a clear error showing the supported values and exit non‑zero;
update the if [[ $# -gt 0 ]] block to loop/validate args, only assign
REPOS=("$@") when all keys are valid, and ensure the error path explains the
allowed repository keys.
- Around line 202-248: The script temporarily disables errexit around the full
and incremental indexing calls (set +e) which lets run_timed (called with
index_cmd) fail silently; restore failure handling by checking run_timed's exit
status immediately after each invocation: after run_timed "$index_cmd" >
"$logfile" 2>&1 and after run_timed "$index_cmd" > "$reindex_log" 2>&1 capture
its return ($?) and if non-zero write an error message (including
logfile/reindex_log and tool/repo context) and exit non-zero, or simply
re-enable errexit and call run_timed under set -e so failures abort before
recording R_INDEX_TIME/R_REINDEX_TIME and continuing to search; reference the
run_timed invocations, index_cmd variable, reindex_log variable, and the
R_INDEX_TIME/R_REINDEX_TIME assignments when adding the checks.
- Around line 162-186: The run_searches function currently ignores the command
exit status and always appends elapsed_ms to times, causing failed searches to
skew latency; update run_searches to capture the command's exit code (e.g.,
check $? immediately after the search command), and only append elapsed_ms to
the times array when the exit code is zero; for non-zero exits, increment a
failure counter or skip adding a sample and ensure the median computation (the
sorted/times logic and R_SEARCH assignment) handles the case of zero successful
samples (e.g., record a sentinel like "FAILED" or leave R_SEARCH unset) to avoid
indexing errors.
---
Nitpick comments:
In `@cmd/ox/agent_query.go`:
- Around line 167-181: The function queryCodeDB currently returns nil when the
CodeDB dataDir (from paths.CodeDBDataDir()) doesn't exist; add a debug/info log
before returning so callers can see why searches are empty. Detect the
os.IsNotExist(err) branch in queryCodeDB, log a message that includes the
dataDir path (and optionally the error) to explain "no index yet", then return
nil as before; reference the symbols queryCodeDB, dataDir, paths.CodeDBDataDir,
and the os.Stat check to locate where to insert the log.
In `@cmd/ox/codedb.go`:
- Around line 39-43: Indexing is currently invoked with an empty IndexOptions so
the Progress callback isn't used; update the call site where IndexOptions is
created (the IndexOptions{} passed to db.IndexRepo) to provide a Progress
function that reports progress (e.g., writes to stderr or updates a spinner)
during indexing; implement the callback by assigning IndexOptions.Progress to a
function that accepts progress events and logs or prints them, then pass that
populated IndexOptions into db.IndexRepo so users see live progress.
- Around line 84-109: Add a clear SQL-injection warning to the codedbSQLCmd
definition and log a runtime warning before executing raw SQL: update the
command help (e.g., Short or add a Long field on codedbSQLCmd/Use) to state this
runs arbitrary SQL and is intended for debugging/development only, and insert a
runtime warning (e.g., via log.Printf or a processLogger) at the start of RunE
right before calling db.RawSQL(args[0]) to remind users this is unsafe for
untrusted input; reference codedbSQLCmd, its RunE function, and the db.RawSQL
call to locate where to add the help text and log.
In `@internal/codedb/index/gitops.go`:
- Around line 14-46: Extract the shared URL normalization logic from
RepoDirFromURL and RepoNameFromURL into a helper (e.g., normalizeRepoURL) that
performs prefix stripping for "https://", "http://", "git://", trims trailing
slashes, removes ".git", and returns an error for empty results; then update
RepoDirFromURL to call normalizeRepoURL and append ".git" to its result, and
update RepoNameFromURL to return normalizeRepoURL's result directly, preserving
existing error handling.
- Around line 75-85: The fetch function compares the sentinel error
git.NoErrAlreadyUpToDate using ==; change this to use errors.Is(err,
git.NoErrAlreadyUpToDate) and update the import block to include "errors" so the
comparison is done properly (modify the fetch function and its error check
referencing git.NoErrAlreadyUpToDate).
In `@internal/codedb/search/executor_test.go`:
- Around line 47-305: Refactor the repeated
openTestStore/seedTestData/ParseQuery/Execute pattern in the tests (e.g.,
TestExecuteCommitSearch, TestExecuteSymbolSearch, TestExecuteWithCount, etc.)
into a single table-driven test that iterates rows describing input query
strings and expected assertions; add rows to cover regex code search,
select:file and select:repo queries, diff queries using rev:, and a case for a
repo with a non-main default ref, and ensure the table-driven test still uses
openTestStore, seedTestData, ParseQuery and Execute for each row so new
regressions are automatically exercised.
In `@internal/codedb/search/query_test.go`:
- Around line 8-349: Convert the many one-off tests in TestTokenize and the
TestParse* functions that call ParseQuery into table-driven tests: create a
single slice of test cases (structs with name, input, expected fields like
SearchPattern, Filters, Type, IsRegex, SearchTerms, HasEmptyPattern, error
assertions) and iterate to run subtests with t.Run; replace the individual
TestParse* functions by adding their cases to that table (keeping references to
tokenize and ParseQuery for the tokenizer-specific cases), and add a new case
for a quoted filter like `message:"fix bug"` verifying Filters.Message (or the
equivalent filter field) and that the quoted value is preserved as a single
filter value; ensure error cases include expected error substrings and
negation/count/regex edge-cases are represented as table entries.
In `@internal/codedb/store/store_test.go`:
- Around line 9-52: Add negative tests to cover Open error paths: create a test
that attempts Open on a directory that cannot be created (e.g., create a parent
dir, chmod it to remove write permission, then call Open on a child path) and
assert Open returns a non-nil error; also add a test that simulates schema
creation failure for Open (for example by creating a read-only metadata.db file
or using a mock that causes Exec/QueryRow used in Open/initialization to return
an error) and assert Open returns the expected error. Reference the Open
function and the initialization steps that create metadata.db and the repos
table when adding these tests. Ensure tests clean up/reset permissions so CI
won’t be left in a bad state.
In `@internal/codedb/store/store.go`:
- Around line 88-95: The equality check against bleve.ErrorIndexPathDoesNotExist
in openOrCreateBleveIndex is brittle; update the error handling to use
errors.Is(err, bleve.ErrorIndexPathDoesNotExist) when deciding to create a new
index, and wrap/return errors with context where appropriate (preserve the
existing bleve.Open and bleve.New calls but replace the direct == check with
errors.Is and add a contextual error wrap on failures).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a79f1a1d-58ac-4b44-b139-00dd7a9f4ba1
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (26)
cmd/ox/agent_query.gocmd/ox/codedb.gogo.modinternal/codedb/README.mdinternal/codedb/codedb.gointernal/codedb/index/gitops.gointernal/codedb/index/gitops_test.gointernal/codedb/index/indexer.gointernal/codedb/language/detect.gointernal/codedb/language/detect_test.gointernal/codedb/search/ast.gointernal/codedb/search/executor.gointernal/codedb/search/executor_test.gointernal/codedb/search/planner.gointernal/codedb/search/planner_test.gointernal/codedb/search/query.gointernal/codedb/search/query_test.gointernal/codedb/search/translate.gointernal/codedb/search/translate_test.gointernal/codedb/store/schema.gointernal/codedb/store/store.gointernal/codedb/store/store_test.gointernal/codedb/symbols/symbols.gointernal/codedb/symbols/symbols_stub.gointernal/paths/paths.goscripts/bench-codedb.sh
| func resolveRevRef(rev string) string { | ||
| if rev == "" { | ||
| rev = "main" | ||
| } | ||
| if !strings.HasPrefix(rev, "refs/") { | ||
| rev = "refs/heads/" + rev | ||
| } | ||
| return rev |
There was a problem hiding this comment.
Don't assume main is the implicit ref.
Normalizing an empty rev: to refs/heads/main makes code and symbol search return no hits for repos whose indexed default branch is master, trunk, etc. Empty rev: should resolve to the repo's actual default/HEAD ref, or skip ref filtering entirely.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/codedb/search/translate.go` around lines 61 - 68, The helper
resolveRevRef currently maps an empty rev to "refs/heads/main", which wrongly
assumes repos use "main"; change resolveRevRef so an empty rev is not normalized
to main—either return the empty string (so callers can skip ref filtering and
let the repo's actual default/HEAD be used) or implement a lookup to resolve the
repo's real default/HEAD ref via repository metadata before prepending
"refs/heads/". Update resolveRevRef and call sites that rely on its output to
treat "" as "no ref filter" (or to use the fetched default ref) and only prepend
"refs/heads/" when a non-empty branch name is provided.
There was a problem hiding this comment.
Intentional default. The alternative — no ref filtering when rev: is omitted — would return results from every branch including stale feature branches and long-deleted experiment branches, which is strictly worse UX for the common case. main matches the overwhelming majority of repositories today. For repos using master or trunk, users can pass rev:master explicitly. A proper fix would be to resolve the repo's HEAD ref at index time and store it in the repos table, then use that as the default — but that's a schema change best done when we have multi-repo search working end-to-end.
There was a problem hiding this comment.
@port8080, that's a fair design rationale — defaulting to refs/heads/main is a pragmatic tradeoff that avoids noise from stale branches, and the path forward (storing the HEAD ref in the repos table at index time) is a sensible deferred improvement. I'll note this for future reviews.
( ˵ •̀ ᴗ - ˵ ) ✧
✏️ Learnings added
Learnt from: port8080
Repo: sageox/ox PR: 158
File: internal/codedb/search/translate.go:61-68
Timestamp: 2026-03-09T03:20:33.069Z
Learning: In `internal/codedb/search/translate.go`, the `resolveRevRef` helper intentionally defaults an empty `rev:` filter to `"refs/heads/main"`. This is a deliberate UX choice to avoid returning results from stale/deleted branches. For repos using `master` or `trunk`, users pass `rev:master` explicitly. The proper fix — resolving the repo's actual HEAD ref at index time and storing it in the `repos` table — is deferred until multi-repo search is working end-to-end.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
| run_searches() { | ||
| local tool="$1" root="$2" repo="$3" | ||
| local bin | ||
| bin=$(get_bin "$tool") | ||
|
|
||
| for qi in "${!SEARCH_QUERIES[@]}"; do | ||
| local query="${SEARCH_QUERIES[$qi]}" | ||
| local times=() | ||
| for ((iter=0; iter<SEARCH_ITERS; iter++)); do | ||
| local t_start t_end elapsed_ms | ||
| t_start=$(date +%s%N) | ||
| set +e | ||
| if [[ "$tool" == "ox" ]]; then | ||
| XDG_DATA_HOME="$root/xdg" "$bin" codedb search "$query" >/dev/null 2>&1 | ||
| else | ||
| "$bin" --root "$root" search "$query" >/dev/null 2>&1 | ||
| fi | ||
| set -e | ||
| t_end=$(date +%s%N) | ||
| elapsed_ms=$(( (t_end - t_start) / 1000000 )) | ||
| times+=("$elapsed_ms") | ||
| done | ||
| IFS=$'\n' sorted=($(sort -n <<<"${times[*]}")); unset IFS | ||
| local median=${sorted[$(( SEARCH_ITERS / 2 ))]} | ||
| R_SEARCH["${tool}-${repo}-${qi}"]="$median" |
There was a problem hiding this comment.
Do not count failed searches as latency samples.
Line 173 drops the search exit code, so parse errors or unsupported queries get recorded as “fast” searches. That makes the median misleading.
🛑 Suggested fix
run_searches() {
local tool="$1" root="$2" repo="$3"
- local bin
- bin=$(get_bin "$tool")
for qi in "${!SEARCH_QUERIES[@]}"; do
local query="${SEARCH_QUERIES[$qi]}"
local times=()
for ((iter=0; iter<SEARCH_ITERS; iter++)); do
- local t_start t_end elapsed_ms
+ local t_start t_end elapsed_ms rc
t_start=$(date +%s%N)
set +e
- if [[ "$tool" == "ox" ]]; then
- XDG_DATA_HOME="$root/xdg" "$bin" codedb search "$query" >/dev/null 2>&1
- else
- "$bin" --root "$root" search "$query" >/dev/null 2>&1
- fi
+ run_search_cmd "$tool" "$root" "$query" >/dev/null 2>&1
+ rc=$?
set -e
+ if (( rc != 0 )); then
+ echo "Search failed for ${TOOL_LABELS[$tool]}/${REPO_NAMES[$repo]}: $query" >&2
+ return $rc
+ fi
t_end=$(date +%s%N)
elapsed_ms=$(( (t_end - t_start) / 1000000 ))
times+=("$elapsed_ms")
done🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 184-184: Prefer mapfile or read -a to split command output (or quote to avoid splitting).
(SC2207)
[warning] 186-186: R_SEARCH appears unused. Verify use (or export if used externally).
(SC2034)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/bench-codedb.sh` around lines 162 - 186, The run_searches function
currently ignores the command exit status and always appends elapsed_ms to
times, causing failed searches to skew latency; update run_searches to capture
the command's exit code (e.g., check $? immediately after the search command),
and only append elapsed_ms to the times array when the exit code is zero; for
non-zero exits, increment a failure counter or skip adding a sample and ensure
the median computation (the sorted/times logic and R_SEARCH assignment) handles
the case of zero successful samples (e.g., record a sentinel like "FAILED" or
leave R_SEARCH unset) to avoid indexing errors.
There was a problem hiding this comment.
Fair point, but the benchmark is a developer tool for comparative analysis, not production code. A failed search command would produce an abnormally fast time (sub-millisecond) that stands out immediately in the results table — it's self-evident, not silent. Adding error handling here adds control flow complexity to a script whose purpose is quick iteration on perf comparisons. If a query fails for one tool but succeeds for another, that itself is useful signal.
There was a problem hiding this comment.
@port8080, that's a fair and pragmatic call. For a local dev benchmarking script where the audience is the developer running it, a suspiciously fast outlier in the results table is indeed self-documenting — the visual signal is enough. The differential-failure-as-signal point is also well-taken; knowing that tool A can't handle a query that tool B handles fine is meaningful benchmark data in itself. Happy to defer to your judgment here.
| # Full index | ||
| echo " Indexing (full)..." | ||
| local logfile="$LOG_DIR/${tool}-${repo}-index.log" | ||
| local bin | ||
| bin=$(get_bin "$tool") | ||
| local index_cmd | ||
| if [[ "$tool" == "ox" ]]; then | ||
| index_cmd="XDG_DATA_HOME='$root/xdg' '$bin' codedb index '$url'" | ||
| else | ||
| index_cmd="'$bin' --root '$root' index '$url'" | ||
| fi | ||
| set +e | ||
| run_timed "$index_cmd" > "$logfile" 2>&1 | ||
| set -e | ||
| R_INDEX_TIME["${tool}-${repo}"]="$_wall_seconds" | ||
| R_PEAK_RSS["${tool}-${repo}"]="$_peak_rss_mb" | ||
| echo " Index: ${_wall_seconds}s, RSS: ${_peak_rss_mb}MB" | ||
|
|
||
| # For ox, the actual data is in xdg/sageox/codedb | ||
| local data_root="$root" | ||
| if [[ "$tool" == "ox" ]]; then | ||
| data_root="$root/xdg/sageox/codedb" | ||
| fi | ||
|
|
||
| # Sizes | ||
| measure_sizes "$tool" "$data_root" | ||
| R_SQLITE_MB["${tool}-${repo}"]="$_sqlite_mb" | ||
| R_FTS_MB["${tool}-${repo}"]="$_fts_mb" | ||
| R_GIT_MB["${tool}-${repo}"]="$_git_mb" | ||
| R_TOTAL_MB["${tool}-${repo}"]="$_total_mb" | ||
|
|
||
| # DB stats | ||
| echo " Collecting DB stats..." | ||
| measure_db_stats "$tool" "$data_root" "$root" | ||
| R_COMMITS["${tool}-${repo}"]="$_commits" | ||
| R_BLOBS["${tool}-${repo}"]="$_blobs" | ||
| R_SYMBOLS["${tool}-${repo}"]="$_symbols" | ||
| R_SYMBOL_REFS["${tool}-${repo}"]="$_symbol_refs" | ||
| R_FILE_REVS["${tool}-${repo}"]="$_file_revs" | ||
|
|
||
| # Re-index | ||
| echo " Re-indexing (incremental)..." | ||
| local reindex_log="$LOG_DIR/${tool}-${repo}-reindex.log" | ||
| set +e | ||
| run_timed "$index_cmd" > "$reindex_log" 2>&1 | ||
| set -e | ||
| R_REINDEX_TIME["${tool}-${repo}"]="$_wall_seconds" |
There was a problem hiding this comment.
Fail the run when indexing or re-indexing fails.
Line 214 and Line 246 suppress set -e, but the status from run_timed is ignored. That lets a failed full index still record timings, and a failed incremental index still fall through to the search phase against stale data.
🛑 Suggested fix
local logfile="$LOG_DIR/${tool}-${repo}-index.log"
local bin
bin=$(get_bin "$tool")
local index_cmd
+ local rc
if [[ "$tool" == "ox" ]]; then
index_cmd="XDG_DATA_HOME='$root/xdg' '$bin' codedb index '$url'"
else
index_cmd="'$bin' --root '$root' index '$url'"
fi
set +e
run_timed "$index_cmd" > "$logfile" 2>&1
+ rc=$?
set -e
+ if (( rc != 0 )); then
+ echo " Full index failed for $label; see $logfile" >&2
+ return $rc
+ fi
R_INDEX_TIME["${tool}-${repo}"]="$_wall_seconds"
R_PEAK_RSS["${tool}-${repo}"]="$_peak_rss_mb"
echo " Index: ${_wall_seconds}s, RSS: ${_peak_rss_mb}MB"
@@
local reindex_log="$LOG_DIR/${tool}-${repo}-reindex.log"
set +e
run_timed "$index_cmd" > "$reindex_log" 2>&1
+ rc=$?
set -e
+ if (( rc != 0 )); then
+ echo " Re-index failed for $label; see $reindex_log" >&2
+ return $rc
+ fi
R_REINDEX_TIME["${tool}-${repo}"]="$_wall_seconds"
echo " Re-index: ${_wall_seconds}s"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Full index | |
| echo " Indexing (full)..." | |
| local logfile="$LOG_DIR/${tool}-${repo}-index.log" | |
| local bin | |
| bin=$(get_bin "$tool") | |
| local index_cmd | |
| if [[ "$tool" == "ox" ]]; then | |
| index_cmd="XDG_DATA_HOME='$root/xdg' '$bin' codedb index '$url'" | |
| else | |
| index_cmd="'$bin' --root '$root' index '$url'" | |
| fi | |
| set +e | |
| run_timed "$index_cmd" > "$logfile" 2>&1 | |
| set -e | |
| R_INDEX_TIME["${tool}-${repo}"]="$_wall_seconds" | |
| R_PEAK_RSS["${tool}-${repo}"]="$_peak_rss_mb" | |
| echo " Index: ${_wall_seconds}s, RSS: ${_peak_rss_mb}MB" | |
| # For ox, the actual data is in xdg/sageox/codedb | |
| local data_root="$root" | |
| if [[ "$tool" == "ox" ]]; then | |
| data_root="$root/xdg/sageox/codedb" | |
| fi | |
| # Sizes | |
| measure_sizes "$tool" "$data_root" | |
| R_SQLITE_MB["${tool}-${repo}"]="$_sqlite_mb" | |
| R_FTS_MB["${tool}-${repo}"]="$_fts_mb" | |
| R_GIT_MB["${tool}-${repo}"]="$_git_mb" | |
| R_TOTAL_MB["${tool}-${repo}"]="$_total_mb" | |
| # DB stats | |
| echo " Collecting DB stats..." | |
| measure_db_stats "$tool" "$data_root" "$root" | |
| R_COMMITS["${tool}-${repo}"]="$_commits" | |
| R_BLOBS["${tool}-${repo}"]="$_blobs" | |
| R_SYMBOLS["${tool}-${repo}"]="$_symbols" | |
| R_SYMBOL_REFS["${tool}-${repo}"]="$_symbol_refs" | |
| R_FILE_REVS["${tool}-${repo}"]="$_file_revs" | |
| # Re-index | |
| echo " Re-indexing (incremental)..." | |
| local reindex_log="$LOG_DIR/${tool}-${repo}-reindex.log" | |
| set +e | |
| run_timed "$index_cmd" > "$reindex_log" 2>&1 | |
| set -e | |
| R_REINDEX_TIME["${tool}-${repo}"]="$_wall_seconds" | |
| # Full index | |
| echo " Indexing (full)..." | |
| local logfile="$LOG_DIR/${tool}-${repo}-index.log" | |
| local bin | |
| bin=$(get_bin "$tool") | |
| local index_cmd | |
| local rc | |
| if [[ "$tool" == "ox" ]]; then | |
| index_cmd="XDG_DATA_HOME='$root/xdg' '$bin' codedb index '$url'" | |
| else | |
| index_cmd="'$bin' --root '$root' index '$url'" | |
| fi | |
| set +e | |
| run_timed "$index_cmd" > "$logfile" 2>&1 | |
| rc=$? | |
| set -e | |
| if (( rc != 0 )); then | |
| echo " Full index failed for $label; see $logfile" >&2 | |
| return $rc | |
| fi | |
| R_INDEX_TIME["${tool}-${repo}"]="$_wall_seconds" | |
| R_PEAK_RSS["${tool}-${repo}"]="$_peak_rss_mb" | |
| echo " Index: ${_wall_seconds}s, RSS: ${_peak_rss_mb}MB" | |
| # For ox, the actual data is in xdg/sageox/codedb | |
| local data_root="$root" | |
| if [[ "$tool" == "ox" ]]; then | |
| data_root="$root/xdg/sageox/codedb" | |
| fi | |
| # Sizes | |
| measure_sizes "$tool" "$data_root" | |
| R_SQLITE_MB["${tool}-${repo}"]="$_sqlite_mb" | |
| R_FTS_MB["${tool}-${repo}"]="$_fts_mb" | |
| R_GIT_MB["${tool}-${repo}"]="$_git_mb" | |
| R_TOTAL_MB["${tool}-${repo}"]="$_total_mb" | |
| # DB stats | |
| echo " Collecting DB stats..." | |
| measure_db_stats "$tool" "$data_root" "$root" | |
| R_COMMITS["${tool}-${repo}"]="$_commits" | |
| R_BLOBS["${tool}-${repo}"]="$_blobs" | |
| R_SYMBOLS["${tool}-${repo}"]="$_symbols" | |
| R_SYMBOL_REFS["${tool}-${repo}"]="$_symbol_refs" | |
| R_FILE_REVS["${tool}-${repo}"]="$_file_revs" | |
| # Re-index | |
| echo " Re-indexing (incremental)..." | |
| local reindex_log="$LOG_DIR/${tool}-${repo}-reindex.log" | |
| set +e | |
| run_timed "$index_cmd" > "$reindex_log" 2>&1 | |
| rc=$? | |
| set -e | |
| if (( rc != 0 )); then | |
| echo " Re-index failed for $label; see $reindex_log" >&2 | |
| return $rc | |
| fi | |
| R_REINDEX_TIME["${tool}-${repo}"]="$_wall_seconds" |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 216-216: R_INDEX_TIME appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 217-217: R_PEAK_RSS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 228-228: R_SQLITE_MB appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 229-229: R_FTS_MB appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 230-230: R_GIT_MB appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 231-231: R_TOTAL_MB appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 236-236: R_COMMITS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 237-237: R_BLOBS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 238-238: R_SYMBOLS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 239-239: R_SYMBOL_REFS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 240-240: R_FILE_REVS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 248-248: R_REINDEX_TIME appears unused. Verify use (or export if used externally).
(SC2034)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/bench-codedb.sh` around lines 202 - 248, The script temporarily
disables errexit around the full and incremental indexing calls (set +e) which
lets run_timed (called with index_cmd) fail silently; restore failure handling
by checking run_timed's exit status immediately after each invocation: after
run_timed "$index_cmd" > "$logfile" 2>&1 and after run_timed "$index_cmd" >
"$reindex_log" 2>&1 capture its return ($?) and if non-zero write an error
message (including logfile/reindex_log and tool/repo context) and exit non-zero,
or simply re-enable errexit and call run_timed under set -e so failures abort
before recording R_INDEX_TIME/R_REINDEX_TIME and continuing to search; reference
the run_timed invocations, index_cmd variable, reindex_log variable, and the
R_INDEX_TIME/R_REINDEX_TIME assignments when adding the checks.
There was a problem hiding this comment.
Same reasoning as the search error handling above — the benchmark script is a developer tool, and a failed index produces obviously wrong data (0s timing, missing DB stats) that's immediately visible in the summary table. The script already prints the log file path, so debugging is straightforward. Adding early-return error handling would prevent collecting partial results that are still useful for comparison (e.g., if only one tool fails, you still want the other two columns).
There was a problem hiding this comment.
@port8080, that's a fair point. For a multi-tool comparison benchmark, continuing past a single tool's failure to preserve the other tools' results is a reasonable trade-off — especially since the zeroed-out timings and missing DB stats make the failure self-evident in the summary output. I'll defer to your judgment here.
| if [[ $# -gt 0 ]]; then | ||
| REPOS=("$@") | ||
| fi |
There was a problem hiding this comment.
Validate repo keys before overwriting REPOS.
A typo here reaches Line 194 and Line 195 and then exits via set -u with an unhelpful associative-array error. Check the keys up front and print the supported values instead.
🧭 Suggested fix
if [[ $# -gt 0 ]]; then
+ for repo in "$@"; do
+ if [[ -z "${REPO_URLS[$repo]+x}" ]]; then
+ echo "Unknown repo key: $repo" >&2
+ echo "Available repos: ${!REPO_URLS[*]}" >&2
+ exit 2
+ fi
+ done
REPOS=("$@")
fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/bench-codedb.sh` around lines 342 - 344, The current assignment to
REPOS when args are passed can overwrite with invalid keys and later trigger an
unhelpful associative-array error under set -u; before setting REPOS in the if
block, validate each "$@" against the known list (e.g. SUPPORTED_REPOS or
SUPPORTED_KEYS) and if any key is invalid print a clear error showing the
supported values and exit non‑zero; update the if [[ $# -gt 0 ]] block to
loop/validate args, only assign REPOS=("$@") when all keys are valid, and ensure
the error path explains the allowed repository keys.
There was a problem hiding this comment.
The REPOS array is hardcoded with known-valid keys that match REPO_URLS and REPO_NAMES. The CLI override (bench.sh tokio) is a convenience for developers who know what they're doing — adding validation would protect against a mistake that's immediately caught by the first ${REPO_URLS[$repo]} expansion producing an empty URL. Not worth the extra code for a dev-only script.
There was a problem hiding this comment.
@port8080 Fair point — for a developer-facing benchmark script, the natural failure from an empty URL expansion is sufficient feedback, and the extra validation boilerplate isn't worth the added noise. I'll defer to your judgement here.
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
internal/codedb/search/executor.go (1)
127-130:⚠️ Potential issue | 🟠 MajorPage through Bleve hits until you actually collect
plan.Limitresults.This still fetches a fixed
plan.Limit*5window and only applies repo/file/lang filtering during enrichment. Restrictive filters can therefore return fewer results than requested even when more qualifying hits exist below that window. This was flagged earlier and remains unresolved.Also applies to: 139-160
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/codedb/search/executor.go` around lines 127 - 130, The current Bleve search only requests a fixed window (bleve.NewSearchRequestOptions called with plan.Limit*5) which can miss qualifying hits after filters are applied; change the logic in the search loop that builds and executes searchReq and calls idx.Search so it pages through results (incrementing the From/offset and/or increasing size) and repeatedly processes searchResult.Hits until you have collected plan.Limit fully-filtered/enriched results or there are no more hits; ensure the filtering/enrichment applied after retrieving hits (the same code paths used in the block around searchReq, idx.Search, and where searchResult.Hits are iterated) are used to decide whether to stop paging, and stop once plan.Limit valid results are collected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/ox/agent_query.go`:
- Around line 162-175: The queryCodeDB function currently ignores qa.limit
because it calls db.Search(context.Background(), qa.query) with only the raw
string; update queryCodeDB to respect qa.limit by either (a) injecting a count
(e.g., "count:<n>") into the query string before calling db.Search when
qa.limit>0, or (b) keeping the existing db.Search call and then slicing the
returned []search.Result to max qa.limit before returning; modify queryCodeDB
(and use qa.limit and db.Search) so the returned slice of search.Result is
constrained to qa.limit.
- Around line 185-188: runAgentQuery currently calls findProjectRoot()
unconditionally which aborts before handling qa.source == "code" or the "all"
fallback; change the control flow so findProjectRoot() is only required for
sources that need a project repo. Specifically, in runAgentQuery (where
projectRoot, err := findProjectRoot() is called), check qa.source (or the flag
that indicates source) first and: if qa.source == "code" (or the request can
fall back to code), skip requiring findProjectRoot and proceed to call
queryCodeDB(); otherwise call findProjectRoot() and return an error if it fails.
Also, if qa.source == "all", allow findProjectRoot failure to be ignored so you
can still run queryCodeDB() as a fallback. Ensure you reference runAgentQuery,
findProjectRoot, queryCodeDB, and the qa.source variable when applying the
change.
In `@internal/codedb/index/indexer.go`:
- Around line 499-505: The Bleve indexing is being flushed mid-SQL transaction
(see generateDiffText -> st.diffBatch.Index and calls to flushDiffBatch(false)
and flushCodeBatch(false)), which can leave orphaned Bleve docs if tx.Rollback()
runs; modify the flow so that these flush* calls do not publish to Bleve until
after tx.Commit(): buffer the doc IDs/content (e.g., keep st.diffBatch and
st.codeBatch in-memory and remove any immediate st.*.Index/flush calls), then
after a successful tx.Commit() call the existing
flushDiffBatch(true)/flushCodeBatch(true) or a new commit-flush path to write to
Bleve; alternatively, if you prefer rollback cleanup, record the indexed
document keys when calling st.diffBatch.Index/st.codeBatch.Index and ensure
tx.Rollback() triggers a cleanup routine that deletes those Bleve doc IDs (use
the same identifiers like "diff_%d" and the code doc keys) to keep SQL and Bleve
atomic.
- Around line 585-597: ensureBlob currently stores path-derived language on the
deduped blobs row, which is wrong because blobs are keyed by content_hash; stop
persisting language in the blobs table (remove language from the INSERT in
ensureBlob and from any reads of blobs.language). Instead persist language on a
path-scoped row such as file_revs (or the existing file_revs table) when
creating/updating a file revision: call language.Detect(path) in the
file-revision creation flow and store that value on file_revs.language. Update
any consumers (e.g., ParseSymbols, search executor code that filters on
blobs.language) to read the language from file_revs (or join blobs -> file_revs)
rather than blobs.language, and remove/clean up blobs.language usage from
indexState.ensureBlob and related blob access paths.
In `@internal/codedb/search/executor.go`:
- Around line 317-333: The case-insensitive GLOB path currently lowercases the
column in likeOrGlob (producing "lower(column) GLOB ?") but likeOrGlobParam
returns the original pattern, so wildcard patterns won't match; update
likeOrGlobParam (or add a second param) so when the pattern contains wildcards
and the caller requested case-insensitive matching it returns
strings.ToLower(pattern) (while preserving the existing behavior of returning
"%"+pattern+"%" for non-wildcards), and update call sites that invoke
likeOrGlobParam accordingly so the bound parameter matches the lower(column)
GLOB ? usage.
---
Duplicate comments:
In `@internal/codedb/search/executor.go`:
- Around line 127-130: The current Bleve search only requests a fixed window
(bleve.NewSearchRequestOptions called with plan.Limit*5) which can miss
qualifying hits after filters are applied; change the logic in the search loop
that builds and executes searchReq and calls idx.Search so it pages through
results (incrementing the From/offset and/or increasing size) and repeatedly
processes searchResult.Hits until you have collected plan.Limit
fully-filtered/enriched results or there are no more hits; ensure the
filtering/enrichment applied after retrieving hits (the same code paths used in
the block around searchReq, idx.Search, and where searchResult.Hits are
iterated) are used to decide whether to stop paging, and stop once plan.Limit
valid results are collected.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e351f4b5-faeb-49fa-90ba-408a9a3444ac
📒 Files selected for processing (6)
cmd/ox/agent_query.gointernal/codedb/README.mdinternal/codedb/index/gitops_test.gointernal/codedb/index/indexer.gointernal/codedb/search/executor.gointernal/codedb/store/store.go
🚧 Files skipped from review as they are similar to previous changes (3)
- internal/codedb/index/gitops_test.go
- internal/codedb/README.md
- internal/codedb/store/store.go
| func queryCodeDB(qa *queryArgs) ([]search.Result, error) { | ||
| dataDir := paths.CodeDBDataDir() | ||
| if _, err := os.Stat(dataDir); os.IsNotExist(err) { | ||
| return nil, nil // no index yet, return empty | ||
| } | ||
|
|
||
| // Execute query | ||
| resp, err := client.Query(req) | ||
| db, err := codedb.Open(dataDir) | ||
| if err != nil { | ||
| return fmt.Errorf("query failed: %w", err) | ||
| return nil, fmt.Errorf("open codedb: %w", err) | ||
| } | ||
| defer db.Close() | ||
|
|
||
| return db.Search(context.Background(), qa.query) | ||
| } |
There was a problem hiding this comment.
--limit is ignored on the CodeDB path.
db.Search(context.Background(), qa.query) only receives the raw query string, so qa.limit never constrains local code results unless the user manually adds count: to the query. Either slice the returned results or inject the count into the CodeDB query so --source code matches the documented CLI behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/ox/agent_query.go` around lines 162 - 175, The queryCodeDB function
currently ignores qa.limit because it calls db.Search(context.Background(),
qa.query) with only the raw string; update queryCodeDB to respect qa.limit by
either (a) injecting a count (e.g., "count:<n>") into the query string before
calling db.Search when qa.limit>0, or (b) keeping the existing db.Search call
and then slicing the returned []search.Result to max qa.limit before returning;
modify queryCodeDB (and use qa.limit and db.Search) so the returned slice of
search.Result is constrained to qa.limit.
| projectRoot, err := findProjectRoot() | ||
| if err != nil { | ||
| return fmt.Errorf("could not find project root: %w", err) | ||
| } |
There was a problem hiding this comment.
Don't require a project root for --source code.
queryCodeDB() only reads the global CodeDB data dir, but runAgentQuery() exits on findProjectRoot() before it even looks at qa.source. That makes ox agent ... query --source code fail outside an initialized repo and also prevents the "all" path from falling back to code results when team-context setup is missing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/ox/agent_query.go` around lines 185 - 188, runAgentQuery currently calls
findProjectRoot() unconditionally which aborts before handling qa.source ==
"code" or the "all" fallback; change the control flow so findProjectRoot() is
only required for sources that need a project repo. Specifically, in
runAgentQuery (where projectRoot, err := findProjectRoot() is called), check
qa.source (or the flag that indicates source) first and: if qa.source == "code"
(or the request can fall back to code), skip requiring findProjectRoot and
proceed to call queryCodeDB(); otherwise call findProjectRoot() and return an
error if it fails. Also, if qa.source == "all", allow findProjectRoot failure to
be ignored so you can still run queryCodeDB() as a fallback. Ensure you
reference runAgentQuery, findProjectRoot, queryCodeDB, and the qa.source
variable when applying the change.
| diffText := generateDiffText(st.repo, path, oldOID, newOID, hasOld, hasNew) | ||
| if diffText != "" { | ||
| st.diffBatch.Index(fmt.Sprintf("diff_%d", diffDBID), BleveDiffDoc{Content: diffText}) | ||
| st.diffBatchN++ | ||
| if err := st.flushDiffBatch(false); err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Keep Bleve writes atomic with the SQL transaction.
flushDiffBatch(false) and flushCodeBatch(false) can publish documents before the SQL transaction commits at Lines 191-193. If any later DB write fails or the context is canceled, tx.Rollback() removes the SQL rows but the Bleve docs stay behind, leaving hits that enrichment can no longer resolve. Buffer Bleve writes until after a successful tx.Commit(), or add rollback cleanup for the indexed document IDs.
Also applies to: 621-625
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/codedb/index/indexer.go` around lines 499 - 505, The Bleve indexing
is being flushed mid-SQL transaction (see generateDiffText -> st.diffBatch.Index
and calls to flushDiffBatch(false) and flushCodeBatch(false)), which can leave
orphaned Bleve docs if tx.Rollback() runs; modify the flow so that these flush*
calls do not publish to Bleve until after tx.Commit(): buffer the doc
IDs/content (e.g., keep st.diffBatch and st.codeBatch in-memory and remove any
immediate st.*.Index/flush calls), then after a successful tx.Commit() call the
existing flushDiffBatch(true)/flushCodeBatch(true) or a new commit-flush path to
write to Bleve; alternatively, if you prefer rollback cleanup, record the
indexed document keys when calling st.diffBatch.Index/st.codeBatch.Index and
ensure tx.Rollback() triggers a cleanup routine that deletes those Bleve doc IDs
(use the same identifiers like "diff_%d" and the code doc keys) to keep SQL and
Bleve atomic.
| func (st *indexState) ensureBlob(blobOID plumbing.Hash, path string) (int64, bool, error) { | ||
| contentHash := blobOID.String() | ||
| lang := language.Detect(path) | ||
|
|
||
| var langPtr *string | ||
| if lang != "" { | ||
| langPtr = &lang | ||
| } | ||
|
|
||
| _, err := st.tx.Exec( | ||
| "INSERT OR IGNORE INTO blobs (content_hash, language) VALUES (?, ?)", | ||
| contentHash, langPtr, | ||
| ) |
There was a problem hiding this comment.
Don't persist language on a content-deduped blob row.
language.Detect(path) is path-dependent, but blobs are deduped by content_hash. The first filename that inserts a blob wins forever, so the same content reused under a different extension will be classified with the wrong language. That breaks lang: filtering in internal/codedb/search/executor.go and can also exclude blobs from ParseSymbols, which filters on blobs.language. Move language to file_revs (or another path-scoped table) instead of storing it on the deduped blob record.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/codedb/index/indexer.go` around lines 585 - 597, ensureBlob
currently stores path-derived language on the deduped blobs row, which is wrong
because blobs are keyed by content_hash; stop persisting language in the blobs
table (remove language from the INSERT in ensureBlob and from any reads of
blobs.language). Instead persist language on a path-scoped row such as file_revs
(or the existing file_revs table) when creating/updating a file revision: call
language.Detect(path) in the file-revision creation flow and store that value on
file_revs.language. Update any consumers (e.g., ParseSymbols, search executor
code that filters on blobs.language) to read the language from file_revs (or
join blobs -> file_revs) rather than blobs.language, and remove/clean up
blobs.language usage from indexState.ensureBlob and related blob access paths.
| func likeOrGlob(column, pattern string, caseSensitive bool) string { | ||
| if strings.ContainsAny(pattern, "*?") { | ||
| if caseSensitive { | ||
| return column + " GLOB ?" | ||
| } | ||
| return "lower(" + column + ") GLOB ?" | ||
| } | ||
| return column + " LIKE ?" | ||
| } | ||
|
|
||
| // likeOrGlobParam returns the appropriate parameter value for likeOrGlob. | ||
| func likeOrGlobParam(pattern string) string { | ||
| if strings.ContainsAny(pattern, "*?") { | ||
| return pattern | ||
| } | ||
| return "%" + pattern + "%" | ||
| } |
There was a problem hiding this comment.
Lowercase wildcard params when emulating case-insensitive GLOB.
For wildcard patterns you generate lower(column) GLOB ?, but likeOrGlobParam() returns the original pattern unchanged. A query like repo:Foo* becomes lower(rp.name) GLOB 'Foo*', which will miss foobar. Lower the wildcard pattern before binding it, or stop wrapping the column in lower().
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/codedb/search/executor.go` around lines 317 - 333, The
case-insensitive GLOB path currently lowercases the column in likeOrGlob
(producing "lower(column) GLOB ?") but likeOrGlobParam returns the original
pattern, so wildcard patterns won't match; update likeOrGlobParam (or add a
second param) so when the pattern contains wildcards and the caller requested
case-insensitive matching it returns strings.ToLower(pattern) (while preserving
the existing behavior of returning "%"+pattern+"%" for non-wildcards), and
update call sites that invoke likeOrGlobParam accordingly so the bound parameter
matches the lower(column) GLOB ? usage.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
internal/codedb/bench.sh (1)
114-116: Usemapfileto safely capture sorted output.The current array assignment can break with word splitting edge cases. Using
mapfileis the safer idiom here and addresses the shellcheck SC2207 warning.♻️ Proposed fix
# Sort and take median - IFS=$'\n' sorted=($(sort -n <<<"${times[*]}")); unset IFS + mapfile -t sorted < <(printf '%s\n' "${times[@]}" | sort -n) local median=${sorted[$(( SEARCH_ITERS / 2 ))]}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/codedb/bench.sh` around lines 114 - 116, The current sorting assignment uses word-splitting-prone array expansion for sorted and IFS manipulation; replace it with mapfile to safely read the sorted lines into an array. Specifically, pipe the elements of the times array (e.g., via printf "%s\n" "${times[@]}") into sort -n and use mapfile -t to populate sorted, then compute local median using sorted[$(( SEARCH_ITERS / 2 ))]; remove the IFS/unset IFS idiom and the direct array assignment that caused SC2207.internal/codedb/index/indexer.go (1)
438-461: Consider using a valid-length synthetic hash.The synthetic worktree hash is 44 characters, while Git SHA-1 hashes are 40 hex characters. This inconsistency could cause issues if:
- The database column has a length constraint
- Code elsewhere validates hash format/length
- Future tooling expects consistent hash lengths
♻️ Proposed fix to use 40-character synthetic hash
// ensureWorktreeRef creates a synthetic commit record to represent the working tree state. func ensureWorktreeRef(st *indexState) (int64, error) { - const worktreeHash = "0000000000000000000000000000000000worktree" + const worktreeHash = "0000000000000000000000000worktree" // 40 chars, clearly synthetic🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/codedb/index/indexer.go` around lines 438 - 461, The synthetic worktree hash in ensureWorktreeRef is 44 chars and should be a valid 40-character SHA-1-length value; replace the worktreeHash constant with a 40-character hex string (e.g. "0000000000000000000000000000000000000000") and keep using that constant throughout ensureWorktreeRef (and any other places that reference worktreeHash) so DB length/format checks and other hash validations remain consistent.cmd/ox/codedb.go (1)
49-65: Usecmd.Context()for index/search work instead ofcontext.Background().These handlers should respect upstream Cobra context (if cancellation is wired at the root level or in future). Replace
context.Background()withcmd.Context()at lines 49, 59, 65, and 92 to allow cancellation to propagate through the command tree.Suggested fix
- if err := db.IndexRepo(context.Background(), url, opts); err != nil { + if err := db.IndexRepo(cmd.Context(), url, opts); err != nil { return fmt.Errorf("index: %w", err) } @@ - if err := db.IndexLocalRepo(context.Background(), root, opts); err != nil { + if err := db.IndexLocalRepo(cmd.Context(), root, opts); err != nil { return fmt.Errorf("index local: %w", err) } @@ - stats, err := db.ParseSymbols(context.Background(), func(msg string) { + stats, err := db.ParseSymbols(cmd.Context(), func(msg string) { fmt.Fprintf(os.Stderr, " %s\n", msg) }) @@ - results, err := db.Search(context.Background(), query) + results, err := db.Search(cmd.Context(), query)If you apply this, the unused
contextimport at line 4 can be removed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ox/codedb.go` around lines 49 - 65, Replace uses of context.Background() passed to db methods with the Cobra command context by calling cmd.Context() so the operations respect cancellation: change the context arguments for db.IndexRepo, db.IndexLocalRepo, and db.ParseSymbols to use cmd.Context(); then remove the now-unused context import. This ensures IndexRepo, IndexLocalRepo, and ParseSymbols receive the command's cancellable context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/ox/codedb.go`:
- Around line 121-125: Replace the manual tab-joins for printing SQL results
with a proper TSV writer: create a csv.NewWriter(os.Stdout), set writer.Comma =
'\t', use writer.Write(cols) for the header and writer.Write(row) for each row,
call writer.Flush() and check writer.Error(); also add the "encoding/csv"
import. This ensures fields containing tabs/newlines are quoted/escaped
correctly instead of being corrupted by strings.Join(columns, "\t").
- Around line 28-31: The CodeDB root is created with mode 0o755 which is too
permissive; change the creation to use 0o700 and ensure existing directories are
tightened by explicitly setting permissions after creation. In the block that
calls paths.CodeDBDataDir() and os.MkdirAll (around dataDir), use 0o700 when
creating and then call os.Chmod(dataDir, 0o700) (or equivalent) to enforce
private permissions even if the directory already existed, matching the
expectation in internal/codedb/store/store.go.
In `@internal/codedb/index/indexer.go`:
- Around line 1032-1046: After the for rows.Next() loops, add checks for
rows.Err() (and similarly repoRows.Err() after iterating repoRows) to surface
any iteration errors: after closing rows (and repoRows) call if err :=
rows.Err(); err != nil { return stats, fmt.Errorf("rows iteration error: %w",
err) } (and for repoRows use repoRows.Err() with an appropriate error message).
Update the blob iteration block around the blobRow scan loop and the repoRows
loop (the variables rows, repoRows and types blobRow/repoRow) to perform these
Err() checks before proceeding to use blobs or repos.
- Around line 398-428: The code currently swallows DB errors in the blob insert,
blob SELECT, and file_revs INSERT (calls to st.tx.Exec and st.tx.QueryRow) and
also ignores the error returned by ensureWorktreeRef; update the error handling
in the block around st.tx.Exec("INSERT OR IGNORE INTO blobs..."),
st.tx.QueryRow("SELECT id FROM blobs...").Scan(&blobDBID),
ensureWorktreeRef(st), and the final st.tx.Exec("INSERT OR REPLACE INTO
file_revs...") so that instead of returning nil you either log a warning with
context (include contentHash, relPath and returned error) and continue, or
propagate the error up the call chain; ensure you also handle and surface errors
from ensureWorktreeRef and from the file_revs Exec (do not discard with "_, _
="); keep the existing resilience behavior (skip file on error) only if you log
the failure at warn/debug level or increment/return a skip count so callers can
detect skipped files.
---
Nitpick comments:
In `@cmd/ox/codedb.go`:
- Around line 49-65: Replace uses of context.Background() passed to db methods
with the Cobra command context by calling cmd.Context() so the operations
respect cancellation: change the context arguments for db.IndexRepo,
db.IndexLocalRepo, and db.ParseSymbols to use cmd.Context(); then remove the
now-unused context import. This ensures IndexRepo, IndexLocalRepo, and
ParseSymbols receive the command's cancellable context.
In `@internal/codedb/bench.sh`:
- Around line 114-116: The current sorting assignment uses word-splitting-prone
array expansion for sorted and IFS manipulation; replace it with mapfile to
safely read the sorted lines into an array. Specifically, pipe the elements of
the times array (e.g., via printf "%s\n" "${times[@]}") into sort -n and use
mapfile -t to populate sorted, then compute local median using sorted[$((
SEARCH_ITERS / 2 ))]; remove the IFS/unset IFS idiom and the direct array
assignment that caused SC2207.
In `@internal/codedb/index/indexer.go`:
- Around line 438-461: The synthetic worktree hash in ensureWorktreeRef is 44
chars and should be a valid 40-character SHA-1-length value; replace the
worktreeHash constant with a 40-character hex string (e.g.
"0000000000000000000000000000000000000000") and keep using that constant
throughout ensureWorktreeRef (and any other places that reference worktreeHash)
so DB length/format checks and other hash validations remain consistent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2b38dc80-0858-4c76-bfa8-4444e7b2fdde
📒 Files selected for processing (5)
cmd/ox/codedb.gointernal/codedb/README.mdinternal/codedb/bench.shinternal/codedb/codedb.gointernal/codedb/index/indexer.go
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/codedb/README.md
- internal/codedb/codedb.go
| dataDir := paths.CodeDBDataDir() | ||
| if err := os.MkdirAll(dataDir, 0o755); err != nil { | ||
| return fmt.Errorf("create codedb dir: %w", err) | ||
| } |
There was a problem hiding this comment.
Keep the CodeDB root private on first creation.
Line 29 creates dataDir with 0o755, but internal/codedb/store/store.go:23-35 intentionally creates the store root with 0o700. Because os.MkdirAll does not tighten an existing directory, the first run leaves the CodeDB root broader than the storage layer expects, which exposes repo/index metadata to other local users.
Suggested fix
dataDir := paths.CodeDBDataDir()
- if err := os.MkdirAll(dataDir, 0o755); err != nil {
- return fmt.Errorf("create codedb dir: %w", err)
- }
-
db, err := codedb.Open(dataDir)
if err != nil {
return fmt.Errorf("open codedb: %w", err)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| dataDir := paths.CodeDBDataDir() | |
| if err := os.MkdirAll(dataDir, 0o755); err != nil { | |
| return fmt.Errorf("create codedb dir: %w", err) | |
| } | |
| dataDir := paths.CodeDBDataDir() | |
| db, err := codedb.Open(dataDir) | |
| if err != nil { | |
| return fmt.Errorf("open codedb: %w", err) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/ox/codedb.go` around lines 28 - 31, The CodeDB root is created with mode
0o755 which is too permissive; change the creation to use 0o700 and ensure
existing directories are tightened by explicitly setting permissions after
creation. In the block that calls paths.CodeDBDataDir() and os.MkdirAll (around
dataDir), use 0o700 when creating and then call os.Chmod(dataDir, 0o700) (or
equivalent) to enforce private permissions even if the directory already
existed, matching the expectation in internal/codedb/store/store.go.
| // Print as TSV | ||
| fmt.Println(strings.Join(cols, "\t")) | ||
| for _, row := range rows { | ||
| fmt.Println(strings.Join(row, "\t")) | ||
| } |
There was a problem hiding this comment.
Write SQL output with a real TSV writer.
Joining columns with "\t" corrupts output as soon as a cell contains tabs or newlines, which is plausible for code, diffs, and blob content returned by raw SQL. Use encoding/csv.Writer with Comma = '\t' so fields are quoted correctly.
Suggested fix
- // Print as TSV
- fmt.Println(strings.Join(cols, "\t"))
- for _, row := range rows {
- fmt.Println(strings.Join(row, "\t"))
- }
+ w := csv.NewWriter(os.Stdout)
+ w.Comma = '\t'
+ if err := w.Write(cols); err != nil {
+ return fmt.Errorf("write header: %w", err)
+ }
+ for _, row := range rows {
+ if err := w.Write(row); err != nil {
+ return fmt.Errorf("write row: %w", err)
+ }
+ }
+ w.Flush()
+ if err := w.Error(); err != nil {
+ return fmt.Errorf("flush rows: %w", err)
+ }Also add this import:
"encoding/csv"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/ox/codedb.go` around lines 121 - 125, Replace the manual tab-joins for
printing SQL results with a proper TSV writer: create a
csv.NewWriter(os.Stdout), set writer.Comma = '\t', use writer.Write(cols) for
the header and writer.Write(row) for each row, call writer.Flush() and check
writer.Error(); also add the "encoding/csv" import. This ensures fields
containing tabs/newlines are quoted/escaped correctly instead of being corrupted
by strings.Join(columns, "\t").
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (3)
internal/codedb/store/store.go (1)
32-36:⚠️ Potential issue | 🟠 MajorTighten existing directories, not just newly created ones.
os.MkdirAll(dir, 0o700)does not change the mode of an already-existing path.cmd/ox/codedb.gopre-creates the CodeDB root with0o755, so the first CLI run still leaves the index root world-traversable unlessOpenexplicitlyChmods it back to0o700.Proposed fix
for _, dir := range []string{root, reposDir, bleveDir} { if err := os.MkdirAll(dir, 0o700); err != nil { return nil, fmt.Errorf("create dir %s: %w", dir, err) } + if err := os.Chmod(dir, 0o700); err != nil { + return nil, fmt.Errorf("chmod dir %s: %w", dir, err) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/codedb/store/store.go` around lines 32 - 36, The loop that calls os.MkdirAll for each path (root, reposDir, bleveDir) doesn't change permissions on existing directories, so add an explicit os.Chmod(dir, 0o700) after MkdirAll inside the same loop to ensure existing directories are tightened; handle and return any Chmod error (e.g., fmt.Errorf("chmod dir %s: %w", dir, err)) so failures propagate—apply this change where the MkdirAll loop is implemented (the code that sets up the store directories in store.go / Open or NewStore).internal/codedb/index/indexer.go (2)
100-128:⚠️ Potential issue | 🟠 MajorKeep Bleve writes behind the SQL commit.
flushCodeBatch/flushDiffBatchpublish documents to Bleve beforetx.Commit(), and those helpers are called while the SQL transaction is still open. Any later SQL failure or rollback leaves orphaned Bleve docs that point at rows that never committed.Also applies to: 215-225, 323-333
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/codedb/index/indexer.go` around lines 100 - 128, The current flushCodeBatch and flushDiffBatch functions are being called while the SQL transaction is still open, causing Bleve writes to be published before tx.Commit(); change the call sites so that flushCodeBatch(...) and flushDiffBatch(...) are only invoked after tx.Commit() successfully returns (i.e., move the calls out of the transactional code path to run after tx.Commit() or on successful commit path), or alternatively wrap the batch flushes behind a post-commit hook so no Bleve Batch() is executed until after tx.Commit() has succeeded; update all callers that currently call these methods during a transaction (including the places referenced around the duplicated ranges) to follow this post-commit behavior.
421-423:⚠️ Potential issue | 🟠 MajorDon't store path-derived language on deduped
blobs.
language.Detect(path)depends on the filename, butblobsare keyed only bycontent_hash. The first extension that inserts a blob wins forever, so the same content reused under another path gets the wrong language, which then breakslang:filtering and theParseSymbolsquery that readsblobs.language.Also applies to: 924-926, 1041-1043
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/codedb/index/indexer.go` around lines 421 - 423, The code currently uses st.tx.Exec to run "INSERT OR IGNORE INTO blobs (content_hash, language) ..." with language set from language.Detect(path), which stores a path-derived language on a blob even when the row already exists (blobs are keyed by content_hash); change this to first check whether a blob with content_hash (contentHash) already exists (SELECT language FROM blobs WHERE content_hash = ?), and only insert the detected language when no row exists—if a row exists do nothing (or leave language unchanged). Replace the blind INSERT OR IGNORE with a query-then-insert flow (or an INSERT ... WHERE NOT EXISTS pattern) so language.Detect(path) is not recorded for deduped content; apply the same change at the other two insertion sites referenced around the uses at the other occurrences (the blocks using contentHash, langPtr and language.Detect(path) at the two other locations).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/codedb/codedb.go`:
- Around line 71-104: RawSQL currently allocates sql.NullString for every column
so rows with non-text types are dropped; fix RawSQL (the rows.Next / rows.Scan
block) to allocate a []interface{} slice of empty interface values (e.g., vals
:= make([]interface{}, len(cols)) with ptrs := make([]interface{}, len(cols))
and set ptrs[i] = &vals[i]), call rows.Scan(ptrs...), then convert each vals[i]
to a string (handle nil -> "NULL", []byte -> string(b), otherwise use
fmt.Sprint(vals[i])) and append that row to results instead of skipping on scan
errors for type mismatches. Ensure you still return cols, results, nil.
In `@internal/codedb/index/gitops.go`:
- Around line 16-29: normalizeRepoURL currently only strips http(s)/git://
prefixes and .git suffix, so SSH-style URLs like git@github.com:org/repo.git and
ssh://git@github.com/org/repo.git are not canonicalized and can produce
different keys; update normalizeRepoURL to detect and normalize SCP-style and
ssh:// forms the same way as internal/repotools/url.go does: parse and convert
git@host:org/repo.git to host/org/repo and treat ssh://git@host/... like
host/... (removing user@ and any leading scheme, converting ':' separators to
'/', trimming ".git" and trailing slashes), ensuring
RepoDirFromURL/RepoNameFromURL will get a consistent canonical string for all
URL forms.
In `@internal/codedb/index/indexer.go`:
- Line 397: The symbol parser fails for synthetic worktree blobs because
contentHash values like "worktree:<sha256>" are created (contentHash variable)
but ParseSymbols only tries to reopen real git blobs via BlobObject, causing
content == nil and parsed=1; modify ParseSymbols (and any callers that use
BlobObject) to detect "worktree:" prefixed contentHash values and load the blob
content from the in-memory/synthetic worktree store instead of calling
BlobObject, or alternatively adjust the content retrieval code to first try the
worktree lookup for contentHash and fall back to BlobObject; update logic in
ParseSymbols (and related functions that rely on BlobObject) so synthetic
worktree blobs are passed actual content to the symbol parser.
- Around line 252-253: The code currently derives repoName from
filepath.Base(localPath) and calls upsertRepo(s, repoName, localPath), which
causes unrelated local repos with the same basename to collide; change repoName
to a stable unique key for local repos (e.g. use the absolute, cleaned path
obtained from filepath.Abs(localPath) or similarly canonicalized path) and pass
that to upsertRepo instead of filepath.Base(localPath). Make the same
replacement for the other occurrence where repoName is derived from
filepath.Base(localPath) (the block referenced in the review) so upsertRepo
receives a unique repo identifier for all local repositories.
In `@internal/codedb/search/executor_test.go`:
- Around line 47-305: Many executor tests repeat the same setup and execution
logic; consolidate TestExecuteCommitSearch, TestExecuteCommitSearchByMessage,
TestExecuteSymbolSearch, TestExecuteSymbolSearchWithKindFilter,
TestExecuteCallsSearch, TestExecuteReturnsSearch,
TestExecuteCommitSearchNoResults, TestExecuteWithCount,
TestExecuteContextCancelled, TestExecuteCalledBySearch,
TestExecuteSymbolSearchLangFilter, and TestExecuteCommitDateFilter into a single
table-driven test that iterates test cases; each case should call openTestStore
and seedTestData once per subtest, build/parse the query via ParseQuery, call
Execute (or call Execute with a canceled context for the canceled case), and
assert expected results or expected errors based on fields in the table (name,
query, wantErr, wantCount, wantContains, wantNotContains, etc.); keep existing
helper functions (openTestStore, seedTestData, ParseQuery, Execute) and create
subtests with t.Run for clearer failure reporting and to include
malformed/query-error cases alongside happy paths.
---
Duplicate comments:
In `@internal/codedb/index/indexer.go`:
- Around line 100-128: The current flushCodeBatch and flushDiffBatch functions
are being called while the SQL transaction is still open, causing Bleve writes
to be published before tx.Commit(); change the call sites so that
flushCodeBatch(...) and flushDiffBatch(...) are only invoked after tx.Commit()
successfully returns (i.e., move the calls out of the transactional code path to
run after tx.Commit() or on successful commit path), or alternatively wrap the
batch flushes behind a post-commit hook so no Bleve Batch() is executed until
after tx.Commit() has succeeded; update all callers that currently call these
methods during a transaction (including the places referenced around the
duplicated ranges) to follow this post-commit behavior.
- Around line 421-423: The code currently uses st.tx.Exec to run "INSERT OR
IGNORE INTO blobs (content_hash, language) ..." with language set from
language.Detect(path), which stores a path-derived language on a blob even when
the row already exists (blobs are keyed by content_hash); change this to first
check whether a blob with content_hash (contentHash) already exists (SELECT
language FROM blobs WHERE content_hash = ?), and only insert the detected
language when no row exists—if a row exists do nothing (or leave language
unchanged). Replace the blind INSERT OR IGNORE with a query-then-insert flow (or
an INSERT ... WHERE NOT EXISTS pattern) so language.Detect(path) is not recorded
for deduped content; apply the same change at the other two insertion sites
referenced around the uses at the other occurrences (the blocks using
contentHash, langPtr and language.Detect(path) at the two other locations).
In `@internal/codedb/store/store.go`:
- Around line 32-36: The loop that calls os.MkdirAll for each path (root,
reposDir, bleveDir) doesn't change permissions on existing directories, so add
an explicit os.Chmod(dir, 0o700) after MkdirAll inside the same loop to ensure
existing directories are tightened; handle and return any Chmod error (e.g.,
fmt.Errorf("chmod dir %s: %w", dir, err)) so failures propagate—apply this
change where the MkdirAll loop is implemented (the code that sets up the store
directories in store.go / Open or NewStore).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 12398aba-4b9e-4596-8ac2-d8eeee2e4f95
📒 Files selected for processing (6)
internal/codedb/codedb.gointernal/codedb/index/gitops.gointernal/codedb/index/gitops_test.gointernal/codedb/index/indexer.gointernal/codedb/search/executor_test.gointernal/codedb/store/store.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/codedb/index/gitops_test.go
| func (db *DB) RawSQL(query string) ([]string, [][]string, error) { | ||
| rows, err := db.store.Query(query) | ||
| if err != nil { | ||
| return nil, nil, fmt.Errorf("execute sql: %w", err) | ||
| } | ||
| defer rows.Close() | ||
|
|
||
| cols, err := rows.Columns() | ||
| if err != nil { | ||
| return nil, nil, fmt.Errorf("get columns: %w", err) | ||
| } | ||
|
|
||
| var results [][]string | ||
| for rows.Next() { | ||
| values := make([]sql.NullString, len(cols)) | ||
| ptrs := make([]interface{}, len(cols)) | ||
| for i := range values { | ||
| ptrs[i] = &values[i] | ||
| } | ||
| if err := rows.Scan(ptrs...); err != nil { | ||
| continue | ||
| } | ||
| row := make([]string, len(cols)) | ||
| for i, v := range values { | ||
| if v.Valid { | ||
| row[i] = v.String | ||
| } else { | ||
| row[i] = "NULL" | ||
| } | ||
| } | ||
| results = append(results, row) | ||
| } | ||
|
|
||
| return cols, results, nil |
There was a problem hiding this comment.
RawSQL drops rows for non-text columns.
Scanning every value into sql.NullString fails for INTEGER columns, and the continue silently discards those rows. Most CodeDB tables include ints (id, timestamp, line, col, ...), so valid SQL returns partial or empty output here.
Proposed fix
func (db *DB) RawSQL(query string) ([]string, [][]string, error) {
rows, err := db.store.Query(query)
if err != nil {
return nil, nil, fmt.Errorf("execute sql: %w", err)
}
defer rows.Close()
cols, err := rows.Columns()
if err != nil {
return nil, nil, fmt.Errorf("get columns: %w", err)
}
var results [][]string
for rows.Next() {
- values := make([]sql.NullString, len(cols))
- ptrs := make([]interface{}, len(cols))
+ values := make([]interface{}, len(cols))
+ ptrs := make([]interface{}, len(cols))
for i := range values {
ptrs[i] = &values[i]
}
if err := rows.Scan(ptrs...); err != nil {
- continue
+ return nil, nil, fmt.Errorf("scan row: %w", err)
}
row := make([]string, len(cols))
for i, v := range values {
- if v.Valid {
- row[i] = v.String
- } else {
- row[i] = "NULL"
+ switch x := v.(type) {
+ case nil:
+ row[i] = "NULL"
+ case []byte:
+ row[i] = string(x)
+ default:
+ row[i] = fmt.Sprint(x)
}
}
results = append(results, row)
}
+ if err := rows.Err(); err != nil {
+ return nil, nil, fmt.Errorf("iterate rows: %w", err)
+ }
return cols, results, nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (db *DB) RawSQL(query string) ([]string, [][]string, error) { | |
| rows, err := db.store.Query(query) | |
| if err != nil { | |
| return nil, nil, fmt.Errorf("execute sql: %w", err) | |
| } | |
| defer rows.Close() | |
| cols, err := rows.Columns() | |
| if err != nil { | |
| return nil, nil, fmt.Errorf("get columns: %w", err) | |
| } | |
| var results [][]string | |
| for rows.Next() { | |
| values := make([]sql.NullString, len(cols)) | |
| ptrs := make([]interface{}, len(cols)) | |
| for i := range values { | |
| ptrs[i] = &values[i] | |
| } | |
| if err := rows.Scan(ptrs...); err != nil { | |
| continue | |
| } | |
| row := make([]string, len(cols)) | |
| for i, v := range values { | |
| if v.Valid { | |
| row[i] = v.String | |
| } else { | |
| row[i] = "NULL" | |
| } | |
| } | |
| results = append(results, row) | |
| } | |
| return cols, results, nil | |
| func (db *DB) RawSQL(query string) ([]string, [][]string, error) { | |
| rows, err := db.store.Query(query) | |
| if err != nil { | |
| return nil, nil, fmt.Errorf("execute sql: %w", err) | |
| } | |
| defer rows.Close() | |
| cols, err := rows.Columns() | |
| if err != nil { | |
| return nil, nil, fmt.Errorf("get columns: %w", err) | |
| } | |
| var results [][]string | |
| for rows.Next() { | |
| values := make([]interface{}, len(cols)) | |
| ptrs := make([]interface{}, len(cols)) | |
| for i := range values { | |
| ptrs[i] = &values[i] | |
| } | |
| if err := rows.Scan(ptrs...); err != nil { | |
| return nil, nil, fmt.Errorf("scan row: %w", err) | |
| } | |
| row := make([]string, len(cols)) | |
| for i, v := range values { | |
| switch x := v.(type) { | |
| case nil: | |
| row[i] = "NULL" | |
| case []byte: | |
| row[i] = string(x) | |
| default: | |
| row[i] = fmt.Sprint(x) | |
| } | |
| } | |
| results = append(results, row) | |
| } | |
| if err := rows.Err(); err != nil { | |
| return nil, nil, fmt.Errorf("iterate rows: %w", err) | |
| } | |
| return cols, results, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/codedb/codedb.go` around lines 71 - 104, RawSQL currently allocates
sql.NullString for every column so rows with non-text types are dropped; fix
RawSQL (the rows.Next / rows.Scan block) to allocate a []interface{} slice of
empty interface values (e.g., vals := make([]interface{}, len(cols)) with ptrs
:= make([]interface{}, len(cols)) and set ptrs[i] = &vals[i]), call
rows.Scan(ptrs...), then convert each vals[i] to a string (handle nil -> "NULL",
[]byte -> string(b), otherwise use fmt.Sprint(vals[i])) and append that row to
results instead of skipping on scan errors for type mismatches. Ensure you still
return cols, results, nil.
| func normalizeRepoURL(url string) (string, error) { | ||
| stripped := url | ||
| for _, prefix := range []string{"https://", "http://", "git://"} { | ||
| if after, found := strings.CutPrefix(stripped, prefix); found { | ||
| stripped = after | ||
| break | ||
| } | ||
| } | ||
| stripped = strings.TrimRight(stripped, "/") | ||
| stripped = strings.TrimSuffix(stripped, ".git") | ||
| if stripped == "" { | ||
| return "", fmt.Errorf("invalid repo URL: %s", url) | ||
| } | ||
| return stripped, nil |
There was a problem hiding this comment.
Handle SSH-style Git URLs in normalizeRepoURL.
This only canonicalizes https://, http://, and git://. Inputs like git@github.com:org/repo.git or ssh://git@github.com/org/repo.git end up with different repo keys than the HTTPS form, and the SCP-style variant leaves a literal : in the derived directory name. That splits the same remote across multiple cache entries and breaks path generation on Windows. internal/repotools/url.go already handles these SSH forms; RepoDirFromURL / RepoNameFromURL should normalize them the same way.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/codedb/index/gitops.go` around lines 16 - 29, normalizeRepoURL
currently only strips http(s)/git:// prefixes and .git suffix, so SSH-style URLs
like git@github.com:org/repo.git and ssh://git@github.com/org/repo.git are not
canonicalized and can produce different keys; update normalizeRepoURL to detect
and normalize SCP-style and ssh:// forms the same way as
internal/repotools/url.go does: parse and convert git@host:org/repo.git to
host/org/repo and treat ssh://git@host/... like host/... (removing user@ and any
leading scheme, converting ':' separators to '/', trimming ".git" and trailing
slashes), ensuring RepoDirFromURL/RepoNameFromURL will get a consistent
canonical string for all URL forms.
| repoName := filepath.Base(localPath) | ||
| repoID, err := upsertRepo(s, repoName, localPath) |
There was a problem hiding this comment.
Use a unique repo key for local repos, not filepath.Base(localPath).
upsertRepo conflicts only on repos.name, so /src/api and /tmp/api collapse into the same repo row. That mixes refs, commits, and file revisions for unrelated local repos that happen to share a basename. Use a stable unique identifier here, e.g. the absolute path.
Also applies to: 490-493
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/codedb/index/indexer.go` around lines 252 - 253, The code currently
derives repoName from filepath.Base(localPath) and calls upsertRepo(s, repoName,
localPath), which causes unrelated local repos with the same basename to
collide; change repoName to a stable unique key for local repos (e.g. use the
absolute, cleaned path obtained from filepath.Abs(localPath) or similarly
canonicalized path) and pass that to upsertRepo instead of
filepath.Base(localPath). Make the same replacement for the other occurrence
where repoName is derived from filepath.Base(localPath) (the block referenced in
the review) so upsertRepo receives a unique repo identifier for all local
repositories.
|
|
||
| // Compute content hash | ||
| h := sha256.Sum256(content) | ||
| contentHash := "worktree:" + hex.EncodeToString(h[:]) |
There was a problem hiding this comment.
Worktree blobs can never reach the symbol parser.
Dirty files are stored under synthetic hashes like worktree:<sha256>, but ParseSymbols only knows how to reopen git object IDs via BlobObject. Those blobs are never in the object database, so every supported dirty file falls into the content == nil branch and gets marked parsed = 1 with zero symbols extracted.
Also applies to: 1117-1139
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/codedb/index/indexer.go` at line 397, The symbol parser fails for
synthetic worktree blobs because contentHash values like "worktree:<sha256>" are
created (contentHash variable) but ParseSymbols only tries to reopen real git
blobs via BlobObject, causing content == nil and parsed=1; modify ParseSymbols
(and any callers that use BlobObject) to detect "worktree:" prefixed contentHash
values and load the blob content from the in-memory/synthetic worktree store
instead of calling BlobObject, or alternatively adjust the content retrieval
code to first try the worktree lookup for contentHash and fall back to
BlobObject; update logic in ParseSymbols (and related functions that rely on
BlobObject) so synthetic worktree blobs are passed actual content to the symbol
parser.
| func TestExecuteCommitSearch(t *testing.T) { | ||
| s := openTestStore(t) | ||
| seedTestData(t, s) | ||
|
|
||
| q, err := ParseQuery("type:commit author:alice") | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| results, err := Execute(context.Background(), s, q) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| if len(results) == 0 { | ||
| t.Fatal("expected results for commit search by alice") | ||
| } | ||
| for _, r := range results { | ||
| if r.Author != "alice" { | ||
| t.Errorf("expected author alice, got %q", r.Author) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| func TestExecuteCommitSearchByMessage(t *testing.T) { | ||
| s := openTestStore(t) | ||
| seedTestData(t, s) | ||
|
|
||
| q, err := ParseQuery("type:commit refactor") | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| results, err := Execute(context.Background(), s, q) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| if len(results) != 1 { | ||
| t.Fatalf("expected 1 result, got %d", len(results)) | ||
| } | ||
| if results[0].Message != "refactor search module" { | ||
| t.Errorf("message = %q", results[0].Message) | ||
| } | ||
| } | ||
|
|
||
| func TestExecuteSymbolSearch(t *testing.T) { | ||
| s := openTestStore(t) | ||
| seedTestData(t, s) | ||
|
|
||
| q, err := ParseQuery("type:symbol lang:go main") | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| results, err := Execute(context.Background(), s, q) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| if len(results) == 0 { | ||
| t.Fatal("expected results for symbol search") | ||
| } | ||
| found := false | ||
| for _, r := range results { | ||
| if r.SymbolName == "main" && r.SymbolKind == "function" { | ||
| found = true | ||
| } | ||
| } | ||
| if !found { | ||
| t.Error("expected to find 'main' function symbol") | ||
| } | ||
| } | ||
|
|
||
| func TestExecuteSymbolSearchWithKindFilter(t *testing.T) { | ||
| s := openTestStore(t) | ||
| seedTestData(t, s) | ||
|
|
||
| q, err := ParseQuery("type:symbol select:symbol.function lang:go helper") | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| results, err := Execute(context.Background(), s, q) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| if len(results) == 0 { | ||
| t.Fatal("expected results") | ||
| } | ||
| for _, r := range results { | ||
| if r.SymbolKind != "function" { | ||
| t.Errorf("expected kind=function, got %q", r.SymbolKind) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| func TestExecuteCallsSearch(t *testing.T) { | ||
| s := openTestStore(t) | ||
| seedTestData(t, s) | ||
|
|
||
| q, err := ParseQuery("calls:helper") | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| results, err := Execute(context.Background(), s, q) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| if len(results) == 0 { | ||
| t.Fatal("expected results for calls:helper") | ||
| } | ||
| // The caller of helper should be main (symbol_id=1 -> name=main) | ||
| found := false | ||
| for _, r := range results { | ||
| if r.SymbolName == "main" { | ||
| found = true | ||
| } | ||
| } | ||
| if !found { | ||
| t.Error("expected to find 'main' as caller of 'helper'") | ||
| } | ||
| } | ||
|
|
||
| func TestExecuteReturnsSearch(t *testing.T) { | ||
| s := openTestStore(t) | ||
| seedTestData(t, s) | ||
|
|
||
| q, err := ParseQuery("returns:Result") | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| results, err := Execute(context.Background(), s, q) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| if len(results) == 0 { | ||
| t.Fatal("expected results for returns:Result") | ||
| } | ||
| found := false | ||
| for _, r := range results { | ||
| if r.SymbolName == "process" { | ||
| found = true | ||
| } | ||
| } | ||
| if !found { | ||
| t.Error("expected to find 'process' function returning Result") | ||
| } | ||
| } | ||
|
|
||
| func TestExecuteCommitSearchNoResults(t *testing.T) { | ||
| s := openTestStore(t) | ||
| seedTestData(t, s) | ||
|
|
||
| q, err := ParseQuery("type:commit author:nonexistent") | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| results, err := Execute(context.Background(), s, q) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| if len(results) != 0 { | ||
| t.Errorf("expected 0 results, got %d", len(results)) | ||
| } | ||
| } | ||
|
|
||
| func TestExecuteWithCount(t *testing.T) { | ||
| s := openTestStore(t) | ||
| seedTestData(t, s) | ||
|
|
||
| q, err := ParseQuery("type:commit count:1 author:alice") | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| results, err := Execute(context.Background(), s, q) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| if len(results) > 1 { | ||
| t.Errorf("expected at most 1 result, got %d", len(results)) | ||
| } | ||
| } | ||
|
|
||
| func TestExecuteContextCancelled(t *testing.T) { | ||
| s := openTestStore(t) | ||
| seedTestData(t, s) | ||
|
|
||
| ctx, cancel := context.WithCancel(context.Background()) | ||
| cancel() // cancel immediately | ||
|
|
||
| q, err := ParseQuery("type:commit author:alice") | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| _, err = Execute(ctx, s, q) | ||
| if err == nil { | ||
| t.Fatal("expected error from canceled context") | ||
| } | ||
| } | ||
|
|
||
| func TestExecuteCalledBySearch(t *testing.T) { | ||
| s := openTestStore(t) | ||
| seedTestData(t, s) | ||
|
|
||
| q, err := ParseQuery("calledby:process") | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| results, err := Execute(context.Background(), s, q) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| if len(results) == 0 { | ||
| t.Fatal("expected results for calledby:process") | ||
| } | ||
| found := false | ||
| for _, r := range results { | ||
| if r.SymbolName == "parse" { | ||
| found = true | ||
| } | ||
| } | ||
| if !found { | ||
| t.Error("expected to find 'parse' as callee of 'process'") | ||
| } | ||
| } | ||
|
|
||
| func TestExecuteSymbolSearchLangFilter(t *testing.T) { | ||
| s := openTestStore(t) | ||
| seedTestData(t, s) | ||
|
|
||
| // Search for symbols in rust — should find 'process' but not 'main' | ||
| q, err := ParseQuery("type:symbol lang:rust process") | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| results, err := Execute(context.Background(), s, q) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| for _, r := range results { | ||
| if r.SymbolName == "main" { | ||
| t.Error("should not find Go symbol 'main' when filtering lang:rust") | ||
| } | ||
| } | ||
| } | ||
|
|
||
| func TestExecuteCommitDateFilter(t *testing.T) { | ||
| s := openTestStore(t) | ||
| seedTestData(t, s) | ||
|
|
||
| // Only commits after timestamp 1700050000 (should exclude 'initial commit') | ||
| q, err := ParseQuery("type:commit after:2023-11-15 author:alice") | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| results, err := Execute(context.Background(), s, q) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| for _, r := range results { | ||
| if r.Message == "initial commit" { | ||
| t.Error("should not include 'initial commit' with after filter") | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n 'func TestExecute|openTestStore\(t\)|seedTestData\(t, s\)' internal/codedb/search/executor_test.goRepository: sageox/ox
Length of output: 1292
🏁 Script executed:
#!/bin/bash
# Check the full executor_test.go file structure to identify all test functions and error scenarios
wc -l internal/codedb/search/executor_test.goRepository: sageox/ox
Length of output: 96
🏁 Script executed:
#!/bin/bash
# Look for test functions that explicitly test error conditions
rg -n 'if err != nil|expected error|Fatal.*error' internal/codedb/search/executor_test.go | head -30Repository: sageox/ox
Length of output: 603
🏁 Script executed:
#!/bin/bash
# Check if there are tests for ParseQuery error scenarios
rg -n 'ParseQuery.*invalid|ParseQuery.*error|ParseQuery.*bad' internal/codedb/search/executor_test.goRepository: sageox/ox
Length of output: 35
Consolidate these executor tests into a table-driven harness.
The repeated openTestStore → seedTestData → ParseQuery → Execute pattern across 11 test functions creates unnecessary boilerplate that makes it difficult to extend the query and error-case matrices consistently. A single table-driven test would reduce duplication and enable systematic coverage of error paths (malformed queries, Execute failures) alongside the happy paths.
Per coding guidelines: "Use table-driven tests in Go test files" and "Test error paths in addition to happy paths."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/codedb/search/executor_test.go` around lines 47 - 305, Many executor
tests repeat the same setup and execution logic; consolidate
TestExecuteCommitSearch, TestExecuteCommitSearchByMessage,
TestExecuteSymbolSearch, TestExecuteSymbolSearchWithKindFilter,
TestExecuteCallsSearch, TestExecuteReturnsSearch,
TestExecuteCommitSearchNoResults, TestExecuteWithCount,
TestExecuteContextCancelled, TestExecuteCalledBySearch,
TestExecuteSymbolSearchLangFilter, and TestExecuteCommitDateFilter into a single
table-driven test that iterates test cases; each case should call openTestStore
and seedTestData once per subtest, build/parse the query via ParseQuery, call
Execute (or call Execute with a canceled context for the canceled case), and
assert expected results or expected errors based on fields in the table (name,
query, wantErr, wantCount, wantContains, wantNotContains, etc.); keep existing
helper functions (openTestStore, seedTestData, ParseQuery, Execute) and create
subtests with t.Run for clearer failure reporting and to include
malformed/query-error cases alongside happy paths.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: SageOx <ox@sageox.ai>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: SageOx <ox@sageox.ai>
…re-index optimization Co-Authored-By: SageOx <ox@sageox.ai> SageOx-Session: https://sageox.ai/repo/repo_019c5812-01e9-7b7d-b5b1-321c471c9777/sessions/2026-02-26T20-09-user-OxjMyG/view
Co-Authored-By: SageOx <ox@sageox.ai> SageOx-Session: https://sageox.ai/repo/repo_019c5812-01e9-7b7d-b5b1-321c471c9777/sessions/2026-02-26T20-09-user-OxjMyG/view
80585b6 to
4462689
Compare
Evolution
Thursday, March 5: ylow/CodeDB
Saturday, March 7: rupakm
Benchmark (SFrameRust repo)
Session Recording
View session recording
Summary by CodeRabbit
New Features
Documentation
Tools