Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,39 @@ All notable changes to HyperCache are recorded here. The format follows

### Added

- **Cluster-wide key browser (`GET /v1/cache/keys`).** New v1 client-API endpoint that fans out across
every alive peer, dedupes replicas, sorts, and returns a paged slice — designed for the operator-debug
workflow of "browse / refine a search" rather than as a primary data-access path. The `q` parameter
switches between two modes via a small classifier:
patterns containing any of `*`, `?`, `[` go through Go's `path.Match` (platform-agnostic glob —
`filepath.Match`'s OS-specific separator semantics are wrong for arbitrary string keys);
everything else is treated as a literal prefix via `strings.HasPrefix`. Two hard caps bound the
worst case: `max` (default 10000, ceiling 50000) for the full deduplicated result set held in memory,
and `limit` (default 100, ceiling 500) for the page size — `cursor` paging is offset-based against the
sorted set so successive pages are stable across requests. Per-peer fan-out failures are best-effort:
the failed peer ID lands in `partial_nodes` rather than failing the whole call, mirroring the
read-repair and hint-replay contracts elsewhere in the cluster. Returns 501 when the underlying
backend isn't `DistMemory` (this endpoint requires a cluster). The new method
[`(*DistMemory).ListKeys`](pkg/backend/dist_keys.go) drives the fan-out via `errgroup` with a
`listKeysAccumulator` merge struct keyed by a single mutex; the self-peer slice walks local shards
directly (no HTTP self-hop). The `DistTransport` interface grows a new method
`ListKeys(ctx, nodeID, pattern)` with implementations in `InProcessTransport` (direct shard scan),
`DistHTTPTransport` (extends the existing `/internal/keys` path with an optional `q` query param —
backward compatible; cursor semantics unchanged), and `chaosTransport` (pass-through with the same
drop/latency injection hooks as the other verbs). Unit tests in
[`pkg/backend/dist_keys_test.go`](pkg/backend/dist_keys_test.go) pin the
prefix-vs-glob classifier across twelve table cases and the malformed-glob → `path.ErrBadPattern`
surface; HTTP smoke tests in [`cmd/hypercache-server/handlers_test.go`](cmd/hypercache-server/handlers_test.go)
drive seed → paged walk → assert union and no cross-page duplicates, plus 400 surfaces for invalid
cursor and malformed glob; five integration tests in
[`tests/hypercache_distmemory_listkeys_test.go`](tests/hypercache_distmemory_listkeys_test.go)
cover cluster-wide dedup at RF=3 across 5 nodes (50 unique seeds → 50 keys, not 150 = 50 × RF=3),
prefix vs glob filters, and `max`-triggered truncation. Route registration order matters in Fiber's
trie router: `/v1/cache/keys` must come before `/v1/cache/:key`, otherwise the parameterized handler
shadows it with `key="keys"`. OpenAPI spec entry (`ListKeysResponse` schema + operation) added to
[`cmd/hypercache-server/openapi.yaml`](cmd/hypercache-server/openapi.yaml); the drift-detector test
in [`cmd/hypercache-server/openapi_test.go`](cmd/hypercache-server/openapi_test.go) catches future
spec / route mismatches.
- **Async read-repair batching (Phase 4) + unconditional `ForwardSet`-only repair.** Two composing changes
in the same PR that together cut the wire-call cost of read-repair under quorum reads. (1) The defensive
`ForwardGet` probe in `repairRemoteReplica` is gone — every repair is now exactly one `ForwardSet`,
Expand Down
137 changes: 137 additions & 0 deletions cmd/hypercache-server/handlers_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ import (
"io"
"net/http"
"net/http/httptest"
"net/url"
"strconv"
"strings"
"testing"

Expand Down Expand Up @@ -47,6 +49,10 @@ func newTestServer(t *testing.T) *fiber.App {
app := fiber.New()
nodeCtx := &nodeContext{hc: hc, nodeID: "test-node"}

// Match production ordering: literal /v1/cache/keys before the
// parameterized /v1/cache/:key so the router picks handleListKeys
// for the literal path.
app.Get("/v1/cache/keys", func(c fiber.Ctx) error { return handleListKeys(c, nodeCtx) })
app.Get("/v1/cache/:key", func(c fiber.Ctx) error { return handleGet(c, nodeCtx) })
app.Head("/v1/cache/:key", func(c fiber.Ctx) error { return handleHead(c, nodeCtx) })
app.Put("/v1/cache/:key", func(c fiber.Ctx) error { return handlePut(c, nodeCtx) })
Expand Down Expand Up @@ -352,3 +358,134 @@ func TestHandleBatchDelete_BasicFlow(t *testing.T) {
t.Fatalf("batch-get post-delete should report found:false; got %s", got.body)
}
}

// seedListKeysFixture seeds `count` keys prefixed `first-NN` plus a
// `second-1` decoy via PUT. Returns the test app so the caller can
// drive the list-keys endpoint against the populated cache. Kept
// here rather than in newTestServer so the test bodies stay
// declarative.
func seedListKeysFixture(t *testing.T, count int) *fiber.App {
t.Helper()

app := newTestServer(t)

for i := range count {
n := strconv.Itoa(i + 1)
key := "first-" + strings.Repeat("0", 2-len(n)) + n

put := doRequest(t, app, http.MethodPut, "/v1/cache/"+key, "v", nil)
if put.status != http.StatusOK {
t.Fatalf("seed put %s: %d", key, put.status)
}
}

put := doRequest(t, app, http.MethodPut, "/v1/cache/second-1", "v", nil)
if put.status != http.StatusOK {
t.Fatalf("seed second put: %d", put.status)
}

return app
}

// fetchListKeysPage drives one /v1/cache/keys request and decodes
// the response, failing the test on transport or parse errors.
// Extracted so the test body can focus on the cursor walk.
func fetchListKeysPage(t *testing.T, app *fiber.App, cursor string) listKeysResponse {
t.Helper()

target := "/v1/cache/keys?q=first-&limit=10"
if cursor != "" {
target += "&cursor=" + cursor
}

got := doRequest(t, app, http.MethodGet, target, "", nil)
if got.status != http.StatusOK {
t.Fatalf("cursor=%q: status %d body=%s", cursor, got.status, got.body)
}

var resp listKeysResponse

err := json.Unmarshal([]byte(got.body), &resp)
if err != nil {
t.Fatalf("cursor=%q decode: %v", cursor, err)
}

return resp
}

// TestHandleListKeys_PrefixAndPaging drives the v1 list-keys
// endpoint end-to-end: seed via PUT, filter by prefix, walk the
// cursor across multiple pages, assert the union matches the seed
// set and no key appears twice.
func TestHandleListKeys_PrefixAndPaging(t *testing.T) {
t.Parallel()

const seedCount = 25

app := seedListKeysFixture(t, seedCount)

collected := make(map[string]struct{}, seedCount)
cursor := ""

for range 10 {
resp := fetchListKeysPage(t, app, cursor)

if resp.TotalMatched != seedCount {
t.Fatalf("total_matched=%d, want %d", resp.TotalMatched, seedCount)
}

for _, k := range resp.Keys {
if !strings.HasPrefix(k, "first-") {
t.Fatalf("non-prefix key in result: %s", k)
}

if _, dup := collected[k]; dup {
t.Fatalf("duplicate key across pages: %s", k)
}

collected[k] = struct{}{}
}

if resp.NextCursor == "" {
break
}

cursor = resp.NextCursor
}

if len(collected) != seedCount {
t.Fatalf("collected %d keys across pages, want %d", len(collected), seedCount)
}
}

// TestHandleListKeys_InvalidCursor pins that a malformed cursor
// surfaces 400, not 500 — the cursor field is operator-controlled
// and must be validated at the boundary.
func TestHandleListKeys_InvalidCursor(t *testing.T) {
t.Parallel()

app := newTestServer(t)

got := doRequest(t, app, http.MethodGet, "/v1/cache/keys?cursor=not-a-number", "", nil)
if got.status != http.StatusBadRequest {
t.Fatalf("expected 400 for malformed cursor, got %d body=%s", got.status, got.body)
}
}

// TestHandleListKeys_InvalidGlob surfaces malformed glob patterns
// as 400, matching the same validate-at-boundary contract as
// cursor.
func TestHandleListKeys_InvalidGlob(t *testing.T) {
t.Parallel()

app := newTestServer(t)

// "[unclosed" is a malformed character class — URL-encoded so
// the literal `[` is preserved through fiber's query parser.
target := "/v1/cache/keys?q=" + url.QueryEscape("[unclosed")

got := doRequest(t, app, http.MethodGet, target, "", nil)
if got.status != http.StatusBadRequest {
t.Fatalf("expected 400 for malformed glob, got %d body=%s", got.status, got.body)
}
}
153 changes: 153 additions & 0 deletions cmd/hypercache-server/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -459,6 +459,12 @@ func registerClientRoutes(app *fiber.App, policy httpauth.Policy, nodeCtx *nodeC
return c.Send(openapiSpec)
})

// /v1/cache/keys must be registered BEFORE the parameterized
// /v1/cache/:key — Fiber matches in registration order and the
// literal-path route would otherwise be shadowed by the
// param-bound handler (handleGet would be invoked with
// `key="keys"` and return 404).
app.Get("/v1/cache/keys", read, func(c fiber.Ctx) error { return handleListKeys(c, nodeCtx) })
app.Put("/v1/cache/:key", write, func(c fiber.Ctx) error { return handlePut(c, nodeCtx) })
app.Get("/v1/cache/:key", read, func(c fiber.Ctx) error { return handleGet(c, nodeCtx) })
app.Head("/v1/cache/:key", read, func(c fiber.Ctx) error { return handleHead(c, nodeCtx) })
Expand Down Expand Up @@ -646,6 +652,33 @@ type ownersResponse struct {
Node string `json:"node"`
}

// listKeysResponse is the body of GET /v1/cache/keys — operator-
// facing key browser. `NextCursor` is empty on the last page;
// `TotalMatched` is the full deduplicated matched set (capped by
// `max`). `Truncated` reports that the cluster-wide cap was hit
// and the operator should refine the pattern. `PartialNodes`
// lists peers whose fan-out failed; their keys may be missing.
type listKeysResponse struct {
Keys []string `json:"keys"`
NextCursor string `json:"next_cursor"`
TotalMatched int `json:"total_matched"`
Truncated bool `json:"truncated"`
Node string `json:"node"`
PartialNodes []string `json:"partial_nodes,omitempty"`
}

// list-keys query-parameter bounds. Defaults match the operator
// "browse / refine" workflow; the hard caps bound the worst-case
// memory and response size — operators needing a larger sweep
// script against the per-node /internal/keys path with their own
// paging instead of lifting these.
const (
listKeysDefaultLimit = 100
listKeysMaxLimit = 500
listKeysDefaultMax = 10000
listKeysHardMax = 50000
)

// handlePut implements PUT /v1/cache/:key.
// Body is the raw value (any content type). Optional ?ttl=<dur>
// applies a relative expiration; empty/absent means no expiration.
Expand Down Expand Up @@ -1289,6 +1322,126 @@ func handleOwners(c fiber.Ctx, nodeCtx *nodeContext) error {
})
}

// listKeysParams is the parsed-and-validated form of the
// /v1/cache/keys query string. Returned as a struct so
// parseListKeysQuery stays under the function-result-limit and
// the call site reads fields by name rather than position.
type listKeysParams struct {
Pattern string
Cursor int
Limit int
MaxResults int
}

// parseBoundedPositiveInt reads a query parameter as a positive int
// with a default fallback and a hard ceiling. Empty value → default.
// Out-of-range or non-numeric → caller-visible error (must surface
// as 400 BAD_REQUEST).
func parseBoundedPositiveInt(c fiber.Ctx, name string, def, hardMax int) (int, error) {
v := c.Query(name)
if v == "" {
return def, nil
}

n, err := strconv.Atoi(v)
if err != nil || n <= 0 {
return 0, ewrap.New("invalid " + name + ": must be a positive integer")
}

if n > hardMax {
n = hardMax
}

return n, nil
}

// parseListKeysQuery extracts and validates the query parameters
// for GET /v1/cache/keys. Defaults and hard caps are applied here
// so handleListKeys keeps a single response-shape concern.
func parseListKeysQuery(c fiber.Ctx) (listKeysParams, error) {
out := listKeysParams{Pattern: c.Query("q")}

if cursorStr := c.Query("cursor"); cursorStr != "" {
n, err := strconv.Atoi(cursorStr)
if err != nil || n < 0 {
return listKeysParams{}, ewrap.New("invalid cursor: must be a non-negative integer")
}

out.Cursor = n
}

limit, err := parseBoundedPositiveInt(c, "limit", listKeysDefaultLimit, listKeysMaxLimit)
if err != nil {
return listKeysParams{}, err
}

out.Limit = limit

maxResults, err := parseBoundedPositiveInt(c, "max", listKeysDefaultMax, listKeysHardMax)
if err != nil {
return listKeysParams{}, err
}

out.MaxResults = maxResults

return out, nil
}

// handleListKeys implements GET /v1/cache/keys — operator-facing
// cluster-wide key browser. Fans out across every alive peer,
// merges + dedupes + sorts the result, then slices the page via
// cursor/limit. The full deduplicated set is held in memory for
// one request (bounded by `max`); paging re-fans out — fine for
// the operator-debug workflow this endpoint serves.
//
// Returns 501 when the underlying backend isn't a DistMemory
// (in-memory / Redis): the surface only makes sense in cluster
// mode and surfacing that explicitly is friendlier than a
// silently empty page.
func handleListKeys(c fiber.Ctx, nodeCtx *nodeContext) error {
params, err := parseListKeysQuery(c)
if err != nil {
return jsonErr(c, fiber.StatusBadRequest, codeBadRequest, err.Error())
}

res, err := nodeCtx.hc.ClusterKeys(c.Context(), params.Pattern, params.MaxResults)
if err != nil {
return jsonErr(c, fiber.StatusBadRequest, codeBadRequest, err.Error())
}

if res == nil {
return jsonErr(
c,
fiber.StatusNotImplemented,
codeInternal,
"list-keys requires a distributed backend",
)
}

total := len(res.Keys)

// Cursor past the end is a valid terminal state (last page +
// 1): respond with an empty page rather than 400. Mirrors how
// SQL OFFSET past the row count returns an empty result set.
start := min(params.Cursor, total)
end := min(start+params.Limit, total)
page := res.Keys[start:end]

nextCursor := ""
if end < total {
nextCursor = strconv.Itoa(end)
}

return c.JSON(listKeysResponse{
Keys: page,
NextCursor: nextCursor,
TotalMatched: total,
Truncated: res.Truncated,
Node: nodeCtx.nodeID,
PartialNodes: res.PartialNodes,
})
}

// meResponse is the body of GET /v1/me — the resolved caller identity
// after auth middleware ran. Mirrors httpauth.Identity but written as
// a wire type so the JSON tags are owned by the API surface, not the
Expand Down
Loading
Loading