feat: implement discovery domain with record search, feature catalog, and feature search#79
Conversation
… and feature search Add the discovery bounded context providing read-only search and filtering over published records and hook-emitted feature data. Record search: metadata field filters with typed casts (Float for numbers, Date for dates), free-text search across text/URL fields, cursor-based keyset pagination, and configurable sort. Feature search: per-hook-table queries with schema-aware column access using local MetaData (no global cache pollution), operator validation against JSON Schema types, and row_id-based cursor encoding. Feature catalog: lists all feature tables with column schemas and record counts. Targeted get_feature_table_schema() for single-table lookup in the search path (avoids N+1 catalog scan). Includes defense-in-depth quoted_name for dynamic SQL identifiers, FeatureReader port for record-level feature enrichment, and discovery API surface documentation. Closes #77
|
|
@greptile |
Greptile SummaryThis PR introduces the Two critical issues remain:
All other review findings from prior passes (LIKE escaping, keyset sort casting, N+1 optimization, Confidence Score: 3/5
Last reviewed commit: d6f3665 |
…ation total, q validation - Escape LIKE metacharacters (%, _, \) in free-text and CONTAINS filters to prevent user input from being interpreted as wildcard patterns - Cast sort expressions to Float/Date for NUMBER/DATE fields so keyset cursor comparison uses correct ordering instead of lexicographic text - Consolidate N+1 COUNT queries in get_feature_catalog into a single UNION ALL, and N+1 per-table SELECTs in get_features_for_record into a single UNION ALL with to_jsonb - Remove broken total count from paginated responses — COUNT(*) OVER() was evaluated after the cursor WHERE clause, producing a shrinking total across pages; has_more now uses len(results) == limit - Raise ValidationError when q is provided but no text/URL fields exist, instead of silently discarding the search term
|
@greptile |
|
@greptile |
quoted_name only instructs SQLAlchemy's compiler to quote — when interpolated via f-string into text(), it emits the bare string unquoted. Replace with _quote_ident() that properly double-quotes identifiers and escapes embedded double-quotes.
Replace raw dict handling of feature_schema JSON with a typed
FeatureSchema Pydantic model. Parse at the DB boundary, then pass
typed objects everywhere downstream.
- Add FeatureSchema, build_feature_table, and data_columns to
feature_table.py as the single source of truth for feature table
construction
- Eliminate raw SQL (text() f-strings, manual _quote_ident) from
get_feature_catalog and get_features_for_record
- Remove duplicated auto-column definitions from feature_store.py
- Extract _to_column_info helper, remove dead pop("created_at")
- Consolidate redundant DDL tests into test_feature_table.py
… typing - Add KeysetPage abstraction that derives ORDER BY and WHERE from a single sort spec with correct NULL semantics (NULL cursor no longer produces `sort > NULL` which is always false in PostgreSQL) - Wrap RecordSRN.parse in SearchFeaturesHandler with try/except to return 422 instead of unhandled 500 - Use type_coerce for jsonb_build_object string keys so asyncpg can determine parameter types
|
@greptile |
Service now fetches limit+1 rows from the adapter and uses the extra row as the "more" signal, then slices back to limit before returning. This prevents clients from making a wasted round-trip on exact-limit pages.
|
@greptile |
| def decode_cursor(cursor: str) -> dict[str, Any]: | ||
| """Decode a base64 JSON cursor. Raises ValueError on malformed input.""" | ||
| try: | ||
| raw = base64.urlsafe_b64decode(cursor.encode()) | ||
| data = json.loads(raw) | ||
| except Exception as exc: | ||
| raise ValueError(f"Malformed cursor: {exc}") from exc | ||
| if not isinstance(data, dict) or "s" not in data or "id" not in data: | ||
| raise ValueError("Cursor must contain 's' and 'id' keys") | ||
| return data |
There was a problem hiding this comment.
Cursor type validation missing — non-integer id or s values cause unhandled 500
The decode_cursor() function only validates that keys "s" and "id" are present (line 58) but does not validate their types. A client can Base64-encode arbitrary JSON to produce a cursor with non-integer values, e.g.:
{"s": 1, "id": "not_a_number"}When this cursor is decoded and used in the keyset predicate, SQLAlchemy attempts to bind the string value to a BigInteger column (ft.c.id), causing PostgreSQL to reject it with invalid input syntax for type bigint. This surfaces as an unhandled 500 instead of a clean 422 validation error.
The same issue applies to the "s" (sort value) key when sorting by a numeric or date column.
Fix: Add type validation after decoding:
def decode_cursor(cursor: str) -> dict[str, Any]:
"""Decode a base64 JSON cursor. Raises ValueError on malformed input."""
try:
raw = base64.urlsafe_b64decode(cursor.encode())
data = json.loads(raw)
except Exception as exc:
raise ValueError(f"Malformed cursor: {exc}") from exc
if not isinstance(data, dict) or "s" not in data or "id" not in data:
raise ValueError("Cursor must contain 's' and 'id' keys")
# Validate types
if not isinstance(data.get("id"), int):
raise ValueError("Cursor 'id' must be an integer")
return dataIn the feature search handler, also validate that "s" is an integer when sorting by id, or matches the column type when sorting by other columns.
Summary
Floatfor numbers,Datefor dates instead of lexicographic string comparison), free-textILIKEsearch across text/URL fields, cursor-based keyset paginationMetaData()per query (no global cache pollution), operator validation against JSON Schema types,row_id-based cursor encodingget_feature_table_schema()for single-table lookup in the search path (avoids full catalog scan)featuresdict)quoted_namefor all dynamic SQL identifiersRecordService.feature_readeris now required (no backward-compatNoneguard — pre-launch)What changed
New domain:
osa/domain/discovery/— model, ports, queries, service, DI providerNew adapters:
PostgresDiscoveryReadStore,PostgresFieldDefinitionReader,PostgresFeatureReaderModified:
RecordService(requiredfeature_reader),GetRecordHandler(returns features), records route (features in response), DI wiring, migrationAPI surface (all public, no auth):
POST /api/v1/discovery/records— search/filter recordsGET /api/v1/discovery/features— feature table catalogPOST /api/v1/discovery/features/{hook_name}— search/filter feature rowsTest plan
Closes #77