Skip to content

Enforce v2-only API and config, and SQLx macro checks#32

Merged
aurexav merged 2 commits intomainfrom
feat/v2-only-sqlx
Feb 7, 2026
Merged

Enforce v2-only API and config, and SQLx macro checks#32
aurexav merged 2 commits intomainfrom
feat/v2-only-sqlx

Conversation

@aurexav
Copy link
Member

@aurexav aurexav commented Feb 7, 2026

Closes #31.

Changes:

  • Remove all v1 API routes and docs, leaving only /v2 endpoints.
  • Drop v1 config version toggles and default example config to v2 collection.
  • Convert SQLx usage to query!/query_as!/query_scalar! and commit offline metadata (.sqlx/).
  • Add scripts/sqlx-prepare.sh and set SQLX_OFFLINE=true in cargo make lint/test tasks.

Verification:

  • cargo make fmt-check
  • cargo make lint
  • cargo make test
  • cargo make test-integration (with ELF_PG_DSN, ELF_QDRANT_URL)
  • cargo make e2e (with ELF_PG_DSN, ELF_QDRANT_URL, ELF_QDRANT_HTTP_URL)

…v2-only API and config","intent":"Remove v1 endpoints and config variants and switch SQLx to macros with offline metadata","impact":"All builds use SQLX_OFFLINE with committed .sqlx and only /v2 routes are supported","breaking":true,"risk":"medium","refs":["gh:#31"]}
top_k,
};

let mut items = Vec::with_capacity(results.len());

Check failure

Code scanning / CodeQL

Uncontrolled allocation size High

This allocation size is derived from a
user-provided value
and could allocate arbitrary amounts of memory.
This allocation size is derived from a
user-provided value
and could allocate arbitrary amounts of memory.

Copilot Autofix

AI 8 days ago

General approach: enforce a reasonable maximum on the number of search results that can be requested/handled, and ensure allocations are based on a bounded value rather than an unbounded user-controlled value. Additionally, avoid using with_capacity with a tainted size when it’s not necessary.

Best fix here without changing functionality: introduce a hard upper bound for top_k and candidate_k derived from configuration, and clamp the values accordingly. Because results.len() is effectively bounded by top_k, once top_k is clamped we know results.len() is safe. For extra safety and simplicity, we can also replace Vec::with_capacity(results.len()) with Vec::new(), removing the need to pre-allocate based on a tainted size while having negligible performance impact.

Concretely:

  • In packages/elf-service/src/search.rs, in ElfService::search_raw, after computing top_k and candidate_k, clamp them to maximums defined in config (which are accessed via self.cfg.memory.max_top_k / max_candidate_k or similar). Since we can’t assume such fields exist, we instead define local constants in this function (or near its top) like const MAX_TOP_K: u32 = 10_000; and clamp top_k and candidate_k to these constants. This ensures results.len() is bounded.
  • In finish_search in the same file, change let mut items = Vec::with_capacity(results.len()); to let mut items = Vec::new();, which removes the flagged sink while keeping behavior the same.
  • In packages/elf-service/src/progressive_search.rs, where top_k and candidate_k are recomputed in ElfService::search, apply the same clamping pattern, to ensure the admin/raw search variants also share the bound.

No new methods are required; we just add simple clamping and adjust the one allocation call.

Suggested changeset 2
packages/elf-service/src/search.rs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/packages/elf-service/src/search.rs b/packages/elf-service/src/search.rs
--- a/packages/elf-service/src/search.rs
+++ b/packages/elf-service/src/search.rs
@@ -384,8 +384,21 @@
 			return Err(ServiceError::NonEnglishInput { field: "$.query".to_string() });
 		}
 
-		let top_k = req.top_k.unwrap_or(self.cfg.memory.top_k).max(1);
-		let candidate_k = req.candidate_k.unwrap_or(self.cfg.memory.candidate_k).max(top_k);
+		// Clamp the number of requested results to prevent unbounded allocations.
+		const MAX_TOP_K: u32 = 10_000;
+		const MAX_CANDIDATE_K: u32 = 20_000;
+
+		let mut top_k = req.top_k.unwrap_or(self.cfg.memory.top_k).max(1);
+		if top_k > MAX_TOP_K {
+			top_k = MAX_TOP_K;
+		}
+
+		let mut candidate_k =
+			req.candidate_k.unwrap_or(self.cfg.memory.candidate_k).max(top_k);
+		if candidate_k > MAX_CANDIDATE_K {
+			candidate_k = MAX_CANDIDATE_K;
+		}
+
 		let query = req.query.clone();
 		let read_profile = req.read_profile.clone();
 		let record_hits_enabled = req.record_hits.unwrap_or(false);
@@ -1434,7 +1447,8 @@
 			top_k,
 		};
 
-		let mut items = Vec::with_capacity(results.len());
+		// Build the items vector without pre-allocating based on a tainted size.
+		let mut items = Vec::new();
 		let mut trace_builder = SearchTraceBuilder::new(trace_context, &self.cfg, now);
 
 		for (idx, scored_chunk) in results.into_iter().enumerate() {
EOF
@@ -384,8 +384,21 @@
return Err(ServiceError::NonEnglishInput { field: "$.query".to_string() });
}

let top_k = req.top_k.unwrap_or(self.cfg.memory.top_k).max(1);
let candidate_k = req.candidate_k.unwrap_or(self.cfg.memory.candidate_k).max(top_k);
// Clamp the number of requested results to prevent unbounded allocations.
const MAX_TOP_K: u32 = 10_000;
const MAX_CANDIDATE_K: u32 = 20_000;

let mut top_k = req.top_k.unwrap_or(self.cfg.memory.top_k).max(1);
if top_k > MAX_TOP_K {
top_k = MAX_TOP_K;
}

let mut candidate_k =
req.candidate_k.unwrap_or(self.cfg.memory.candidate_k).max(top_k);
if candidate_k > MAX_CANDIDATE_K {
candidate_k = MAX_CANDIDATE_K;
}

let query = req.query.clone();
let read_profile = req.read_profile.clone();
let record_hits_enabled = req.record_hits.unwrap_or(false);
@@ -1434,7 +1447,8 @@
top_k,
};

let mut items = Vec::with_capacity(results.len());
// Build the items vector without pre-allocating based on a tainted size.
let mut items = Vec::new();
let mut trace_builder = SearchTraceBuilder::new(trace_context, &self.cfg, now);

for (idx, scored_chunk) in results.into_iter().enumerate() {
packages/elf-service/src/progressive_search.rs
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/packages/elf-service/src/progressive_search.rs b/packages/elf-service/src/progressive_search.rs
--- a/packages/elf-service/src/progressive_search.rs
+++ b/packages/elf-service/src/progressive_search.rs
@@ -171,9 +171,21 @@
 
 impl ElfService {
 	pub async fn search(&self, req: SearchRequest) -> ServiceResult<SearchIndexResponse> {
-		let top_k = req.top_k.unwrap_or(self.cfg.memory.top_k).max(1);
-		let candidate_k = req.candidate_k.unwrap_or(self.cfg.memory.candidate_k).max(top_k);
+		// Clamp requested sizes to prevent unbounded allocations in downstream search.
+		const MAX_TOP_K: u32 = 10_000;
+		const MAX_CANDIDATE_K: u32 = 20_000;
 
+		let mut top_k = req.top_k.unwrap_or(self.cfg.memory.top_k).max(1);
+		if top_k > MAX_TOP_K {
+			top_k = MAX_TOP_K;
+		}
+
+		let mut candidate_k =
+			req.candidate_k.unwrap_or(self.cfg.memory.candidate_k).max(top_k);
+		if candidate_k > MAX_CANDIDATE_K {
+			candidate_k = MAX_CANDIDATE_K;
+		}
+
 		let mut raw_req = req.clone();
 
 		raw_req.top_k = Some(candidate_k);
EOF
@@ -171,9 +171,21 @@

impl ElfService {
pub async fn search(&self, req: SearchRequest) -> ServiceResult<SearchIndexResponse> {
let top_k = req.top_k.unwrap_or(self.cfg.memory.top_k).max(1);
let candidate_k = req.candidate_k.unwrap_or(self.cfg.memory.candidate_k).max(top_k);
// Clamp requested sizes to prevent unbounded allocations in downstream search.
const MAX_TOP_K: u32 = 10_000;
const MAX_CANDIDATE_K: u32 = 20_000;

let mut top_k = req.top_k.unwrap_or(self.cfg.memory.top_k).max(1);
if top_k > MAX_TOP_K {
top_k = MAX_TOP_K;
}

let mut candidate_k =
req.candidate_k.unwrap_or(self.cfg.memory.candidate_k).max(top_k);
if candidate_k > MAX_CANDIDATE_K {
candidate_k = MAX_CANDIDATE_K;
}

let mut raw_req = req.clone();

raw_req.top_k = Some(candidate_k);
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e8158a1be5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +36 to +38
if ! command -v taplo >/dev/null 2>&1; then
echo "Missing taplo." >&2
exit 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Drop hard failure on missing taplo in e2e harness

The harness now exits immediately when taplo is not installed, which makes cargo make e2e fail in environments that previously worked with only runtime dependencies (curl, psql, jq/jaq). Because the generated TOML is already valid for execution, this formatting-only tool requirement can block experiment runs without improving correctness; making formatting optional (or warning-only) would avoid this regression.

Useful? React with 👍 / 👎.

…runtime SQLx queries in tests","intent":"Allow tests to use sqlx::query/query_as/query_scalar without compile-time macros","impact":"Test code no longer depends on SQLx macros while production code remains macro-checked","breaking":false,"risk":"low","refs":["pr:32"]}
@aurexav aurexav merged commit 08fa6af into main Feb 7, 2026
5 of 6 checks passed
@aurexav aurexav deleted the feat/v2-only-sqlx branch February 7, 2026 14:36
@aurexav aurexav restored the feat/v2-only-sqlx branch February 7, 2026 14:37
@aurexav aurexav deleted the feat/v2-only-sqlx branch February 7, 2026 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enforce v2-only API and configuration

1 participant