feat: add catalog server, bundle support, and OpenShift deployment#11
feat: add catalog server, bundle support, and OpenShift deployment#11
Conversation
Signed-off-by: Pavel Anni <panni@redhat.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
Kustomize base with Deployment, Service, ServiceAccount, PVC. OpenShift overlay adds Route, image-puller RoleBinding, and internal registry ConfigMap. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 36 minutes and 28 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (14)
📝 WalkthroughWalkthroughThis pull request introduces a complete skill catalog server with HTTP API endpoints, SQLite storage, registry synchronization, and Kubernetes/OpenShift deployment infrastructure. It adds CLI support for launching the server and packing skill bundles, along with OCI registry discovery and interaction capabilities. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User/CLI
participant Serve as Serve Command
participant Server as HTTP Server
participant Store as SQLite Store
participant Registry as OCI Registry
participant Handler as Handler (Skills API)
User->>Serve: skillctl serve --registry-url=...
Serve->>Server: Run(ctx, Config)
Server->>Store: New(dbPath)
Server->>Registry: Discover & Sync
Registry-->>Store: Fetch manifests, Upsert skills
Store-->>Server: Sync complete
Server->>Server: Start background ticker for periodic sync
Server->>Server: Create router with handlers
Server->>Server: Listen on :port
User->>Server: GET /api/v1/skills?namespace=foo
Server->>Handler: List(w, r)
Handler->>Store: ListSkills(filter)
Store-->>Handler: []Skill
Handler-->>User: JSON response with pagination
User->>Server: GET /api/v1/skills/foo/bar/v1.0/content
Server->>Handler: Content(w, r, ns, name, ver)
Handler->>Store: GetSkill(ns, name)
Store-->>Handler: Skill with registry reference
Handler->>Registry: Pull layer with OCI client
Registry-->>Handler: SKILL.md bytes
Handler-->>User: text/markdown response
sequenceDiagram
participant User as User/CLI
participant Pack as Pack Command
participant Client as OCI Client
participant Store as OCI Store
participant Registry as Remote Registry
User->>Pack: skillctl pack --bundle bundleDir --tag v1.0.0
Pack->>Client: PackBundle(ctx, bundleDir, opts)
Client->>Client: Enumerate subdirectories
Client->>Client: Parse skill.yaml for each skill
Client->>Client: Validate skills & namespace consistency
Client->>Client: Create bundle layer
Client->>Store: Push layer bytes
Store-->>Client: Layer digest
Client->>Client: Build image config
Client->>Store: Push config bytes
Store-->>Client: Config digest
Client->>Client: Marshal image manifest with layer+config
Client->>Store: Push manifest
Store-->>Client: Manifest digest
Client->>Store: Apply tag (namespace/bundleName:v1.0.0)
Store-->>Client: Descriptor
Client-->>Pack: Success
Pack-->>User: Packed bundle message
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Blockers: - Use parameterized LIMIT/OFFSET in ListSkills (SQL injection vector) - Guard concurrent sync with atomic.Bool (prevents goroutine/DB exhaustion) - Log credential store errors in ListRemoteRepositories (silent 401/403) High priority: - Check json.Marshal error in PackBundle - Log debug warning on invalid word count annotation - Validate all bundle skills share the same namespace Medium priority: - Extract buildFilterClause to eliminate duplicate filter code - Fix parseName to use repo basename (not title which duplicates DisplayName) - Scope JSON Content-Type middleware to /api/v1 route group only - Use time.Hour in DeleteStale test to eliminate race condition - Use cancellable context for background sync (respects shutdown) Content handler optimization deferred to issue #12. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
Public registries (quay.io, ghcr.io, Docker Hub) do not support the /v2/_catalog endpoint, so the sync engine silently finds zero repositories and indexes nothing. Add --repositories flag to specify exact repo names, bypassing catalog discovery. Add diagnostic logging throughout the sync process so empty results are visible. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
Add local testing chapter to deploy/README showing how to run the catalog server against quay.io or a local registry without a cluster. Add `make deploy` target for the build-push-restart inner loop. Set imagePullPolicy: Always so :latest is always re-pulled on pod restart. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
configMapGenerator appends a hash suffix to the ConfigMap name, making it impossible to edit with oc edit/patch after deploy. Use a plain ConfigMap resource so users can update registry-url and registry-repositories directly. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
Add a Discoverer interface with two implementations: - QuayDiscoverer: uses Quay REST API (/api/v1/repository) to auto-discover repos in an organization. Works with quay.io and self-hosted Quay instances, with optional auth from the Docker credential store for private repos. - OCICatalogDiscoverer: wraps existing /v2/_catalog logic for private registries (Harbor, Zot, OpenShift internal). Auto-detects registry type from URL (contains "quay" -> Quay adapter), with --registry-type flag to override for self-hosted instances. With this change, discovery against quay.io works without --repositories: skillctl serve --registry quay.io --namespace skillimage Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
Document the Quay auto-discovery workflow (--namespace instead of --repositories). Add note about Quay's default-private repo visibility as a common gotcha. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
The autoincrement ID is an internal SQLite detail that jumps unpredictably due to upsert behavior. Not useful to API consumers; the natural key is (repository, tag). Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
Replace hardcoded --repositories with --namespace so the Quay discovery adapter is used instead of an explicit repo list. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 18
🧹 Nitpick comments (8)
Makefile (1)
22-27: Backgroundedoc logs -fis racy and leaves an orphaned process.
oc logs -f ... &followed bysleep 3 && echo Route: ...lets Make exit while the streamingoc logskeeps running in the background, detached from the terminal session. The 3-second sleep is also a race against the route URL print being interleaved with log output.Consider either streaming logs in the foreground (so
make deployblocks until Ctrl-C) or printing the route first and dropping the backgrounded follow:♻️ Suggested change
deploy: image podman -c rhel push $(IMAGE) oc rollout restart deploy/skillctl-catalog oc rollout status deploy/skillctl-catalog --timeout=60s - oc logs -f deploy/skillctl-catalog & - `@sleep` 3 && echo "---" && echo "Route: https://$$(oc get route skillctl-catalog -o jsonpath='{.spec.host}')/api/v1/skills" + `@echo` "---" && echo "Route: https://$$(oc get route skillctl-catalog -o jsonpath='{.spec.host}')/api/v1/skills" + oc logs -f deploy/skillctl-catalog🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 22 - 27, The deploy target currently backgrounds the streaming command `oc logs -f deploy/skillctl-catalog &` and uses `sleep 3` which can leave an orphaned process and race with printing the route; change the sequence so the route is printed before starting logs and run `oc logs -f deploy/skillctl-catalog` in the foreground (remove the trailing `&` and the `sleep 3`) so `make deploy` blocks while streaming logs, or alternatively remove the follow entirely and only print the route; update the `deploy` recipe accordingly around the `oc logs -f deploy/skillctl-catalog` and the `sleep`/echo route lines.deploy/base/deployment.yaml (1)
26-27: Pin image tag instead of:latest.Using
image: ghcr.io/redhat-et/skillctl:latestwithimagePullPolicy: Alwaysmakes the deployed version non-reproducible and complicates rollback / auditing. Prefer pinning to an immutable tag or digest (e.g.,:vX.Y.Zor@sha256:...) and let the overlay/CI bump it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/base/deployment.yaml` around lines 26 - 27, The manifest uses an unpinned image "ghcr.io/redhat-et/skillctl:latest" with imagePullPolicy: Always; change the image field to a fixed immutable tag or digest (e.g., replace ":latest" with a release tag like ":vX.Y.Z" or "@sha256:...") and update imagePullPolicy to the appropriate value (e.g., IfNotPresent) so deployments are reproducible and CI/overlays can manage version bumps; target the image and imagePullPolicy entries in the Deployment manifest.internal/cli/serve.go (2)
53-56: Trim whitespace when splitting--repositories.
strings.Split("a, b, c", ",")yields["a", " b", " c"]. Those leading spaces will silently fail to match any registry repository names. A common ergonomic pitfall worth fixing.♻️ Proposed fix
- var repos []string - if repositories != "" { - repos = strings.Split(repositories, ",") - } + var repos []string + if repositories != "" { + for _, r := range strings.Split(repositories, ",") { + if r = strings.TrimSpace(r); r != "" { + repos = append(repos, r) + } + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/cli/serve.go` around lines 53 - 56, When parsing the repositories flag (variable repositories) into repos, trim whitespace on each split element and drop empty entries so values like "a, b, c" become ["a","b","c"]; update the block where repos is assigned (the strings.Split call) to split and then iterate over the parts, using strings.TrimSpace on each token and only append non-empty trimmed values to repos.
66-66: Add validation for--registry-typeflag to reject invalid values.The
oci.RegistryType(registryType)cast accepts any string without validation. Invalid values like--registry-type=queyare silently accepted and default to OCI type behavior instead of failing fast with a clear error. Add validation inRunEto reject anything outside{"auto", "oci", "quay"}, or use a constructor that returns an error.Applies to lines 66 and 76.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/cli/serve.go` at line 66, The RunE handler currently casts the raw registryType string with oci.RegistryType(registryType) which accepts anything; add validation in RunE that checks the registryType variable against the allowed set {"auto","oci","quay"} and return a clear error if it is not one of those values (or call a parsing/constructor function that returns an error instead of using a direct cast); update the places that construct RegistryType (the oci.RegistryType(registryType) usage) to only use the validated/parsed value so invalid flags like --registry-type=quey fail fast with a descriptive error.internal/handler/skills_test.go (1)
25-30: Optionally fail fast on seed errors.Several
_ = db.UpsertSkill(...)calls swallow errors. If a schema change ever breaksSkillfield mapping, every assertion below will fail with confusing "got 0 skills, want 1" messages instead of a clean seed-error report. A small helper that callst.Fatalfon error would make future regressions easier to debug.Also applies to: 86-93, 114-118, 146-155
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/handler/skills_test.go` around lines 25 - 30, Replace the silent `_ = db.UpsertSkill(...)` calls with a small test helper that fails fast on errors: add a helper like `mustUpsertSkill(t testing.TB, db *Store, s store.Skill)` (or similar) in the test file that calls `db.UpsertSkill(s)` and calls `t.Fatalf` with the error if non-nil, then use `mustUpsertSkill(t, db, store.Skill{...})` wherever `_ = db.UpsertSkill(...)` is used (including the other occurrences mentioned) so seed failures surface immediately instead of causing confusing downstream assertions.internal/server/router.go (1)
17-19: Consider a request timeout for content-fetching routes.The
/skills/{ns}/{name}/versions/{ver}/contenthandler synchronously pulls a layer from the upstream registry. Withoutmiddleware.Timeoutor a per-route deadline, a slow/unreachable registry can tie up server goroutines indefinitely. A coarsemiddleware.Timeout(30*time.Second)(or just on the content route) would bound the blast radius.Also applies to: 37-39
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/server/router.go` around lines 17 - 19, The content-fetching route for /skills/{ns}/{name}/versions/{ver}/content (and the similar routes at the later block) can hang indefinitely; add a request deadline using middleware.Timeout to bound work (e.g., middleware.Timeout(30*time.Second)). Update the router setup to apply either a coarse global timeout via r.Use(middleware.Timeout(30*time.Second))) or wrap just the content route with r.With(middleware.Timeout(30*time.Second))). Ensure you import time and apply the timeout to the handler(s) that synchronously pull from the upstream registry (the content handler for that route and the analogous handlers referenced on lines 37-39).internal/server/server.go (1)
91-93: Useerrors.Isforhttp.ErrServerClosedcomparison.Direct
!=comparison fails to match wrapped errors. Whilesrv.ListenAndServe()currently returns the sentinel directly, idiomatic Go and any future wrapping middleware/decorator would silently surface "expected" shutdown as a fatalRunerror.Proposed change
- if err := srv.ListenAndServe(); err != http.ErrServerClosed { + if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) { return err }Add
"errors"to the import block.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/server/server.go` around lines 91 - 93, Replace the direct comparison against the sentinel http.ErrServerClosed with an errors.Is check: call srv.ListenAndServe(), capture its error, and return it only if !errors.Is(err, http.ErrServerClosed); also add "errors" to the import block. Update references around the ListenAndServe call (the srv.ListenAndServe invocation and its surrounding error handling) to use errors.Is(err, http.ErrServerClosed) so wrapped sentinel errors are recognized.internal/store/store.go (1)
144-157: Expose a sentinelErrNotFoundso the HTTP layer can map to 404 reliably.Returning
fmt.Errorf("skill %s/%s not found", ...)forces callers to do string matching (or treat all errors as 500) to differentiate "missing" from "DB failure". Define a sentinel and check witherrors.Isin the handler.♻️ Proposed change
import ( "database/sql" + "errors" "fmt" "time" ... ) +// ErrNotFound is returned when a requested skill does not exist. +var ErrNotFound = errors.New("skill not found") ... if len(skills) == 0 { - return nil, fmt.Errorf("skill %s/%s not found", namespace, name) + return nil, fmt.Errorf("%w: %s/%s", ErrNotFound, namespace, name) } return &skills[0], nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/store/store.go` around lines 144 - 157, Add a sentinel error (e.g., var ErrNotFound = errors.New("not found")) to the package and update Store.GetSkill to return that sentinel when no rows are found instead of fmt.Errorf; specifically, in GetSkill (method on Store) change the "skill %s/%s not found" branch to return nil, ErrNotFound. Also ensure callers (HTTP handlers) use errors.Is(err, ErrNotFound) to map to 404. Keep the rest of GetSkill and querySkills behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deploy/base/deployment.yaml`:
- Around line 8-9: The Deployment currently sets spec.replicas: 1 which combined
with a RWO PVC (skillctl-catalog-data) and the default RollingUpdate strategy
can hang rollouts; update the Deployment spec to use strategy.type: Recreate so
the old pod is terminated before a new pod mounts the ReadWriteOnce volume.
Locate the Deployment manifest where spec.replicas is defined and add/replace
the strategy block (strategy.type -> Recreate) within the same spec for the
controller (the Deployment resource) to ensure single-replica rollouts do not
attempt concurrent mounts of the RWO PVC.
In `@deploy/README.md`:
- Line 58: The README example for the serve command uses a literal placeholder
("--repositories ...") that will be passed verbatim; update the example
invocation for bin/skillctl serve to show concrete repository values (e.g.
comma-separated image names) instead of "..." so users can copy/paste a working
command; change both occurrences of the example (the line with bin/skillctl
serve --registry ... --repositories ... --db :memory: and the similar line at
the later occurrence) to use real-looking examples like "--repositories
team1/doc-reviewer,team1/translator" while keeping the --registry and --db flags
as shown.
- Around line 122-126: The command using "oc create configmap
skillctl-catalog-config --from-literal=... --dry-run=client -o yaml | oc apply
-f -" will strip existing metadata (labels/annotations) from the
deploy/overlays/openshift/configmap.yaml; instead update the existing ConfigMap
manifest in-place (edit deploy/overlays/openshift/configmap.yaml to add or
change registry-url/registry-namespace keys while keeping app.kubernetes.io/*
labels) or use the imperative command that preserves metadata: run "oc set data
configmap/skillctl-catalog-config" with the same --from-literal pairs to update
the data on the existing ConfigMap without removing labels.
- Around line 176-181: The README example sets registry-repositories in the
ConfigMap but the deployment-patch never wires it up so the serve command never
receives the --repositories flag; update the deployment-patch to read the
registry-repositories key into an env var (e.g., REGISTRY_REPOSITORIES) and add
the container arg --repositories=$(REGISTRY_REPOSITORIES) alongside the existing
mappings for registry-url and registry-namespace so the serve command actually
receives the repositories list; alternatively, if you prefer not to support
this, remove the registry-repositories example and replace it with guidance to
use --registry-type quay or manual Quay API setup.
In `@go.mod`:
- Line 18: Run `go mod tidy` to refresh module dependencies so
github.com/go-chi/chi/v5 is recorded as a direct dependency (it's imported in
internal/server/router.go in the import block), which will remove the incorrect
`// indirect` marker in go.mod and update go.sum accordingly; commit the updated
go.mod and go.sum after running the command.
In `@internal/handler/skills.go`:
- Around line 119-177: The Content handler currently creates fresh temp dirs,
instantiates a new oci.Client (oci.NewClient) and pulls the entire image
(client.Pull) on every request which is expensive and unsafe; change it to reuse
cached SKILL.md content or a shared OCI cache: (a) on sync/update (where
versions are discovered) extract SKILL.md via findSkillMD and store the markdown
in the store.Skill record (or a new sqlite table) and have Content read and
return that cached blob, or (b) create a long‑lived shared OCI store/client
(singletons) keyed by image digest, pull only required layer/blob (or
referrer/annotation) into a pooled store with a size cap and an in‑process LRU,
and add concurrency control around pulls to dedupe simultaneous requests; update
Content to use the cached blob or shared client/store and remove per-request
os.MkdirTemp / oci.NewClient / full client.Pull behavior.
In `@internal/server/server.go`:
- Line 60: The code calls time.NewTicker(cfg.SyncInterval) inside Run without
validating cfg.SyncInterval; since time.NewTicker panics for zero or negative
durations, add a guard in Run that checks Config.SyncInterval (cfg.SyncInterval)
is > 0 before creating the ticker—if invalid, return an error (or set a safe
default) and log the issue; ensure the ticker creation (ticker :=
time.NewTicker(...)) only happens after this validation so the server goroutine
cannot crash on misconfigured CLI flags.
In `@internal/store/store.go`:
- Around line 129-130: The current SQL build in store.go appends " ORDER BY
namespace, name, version" which lexically sorts the version TEXT and misorders
semver values; update the ORDER BY to match GetSkill/GetVersions by using "ORDER
BY namespace, name, created DESC" (i.e., change the query variable assembly that
currently includes ORDER BY namespace, name, version) so the list's "top" row
aligns with GetSkill's latest-by-created logic; if you truly need version-based
ordering instead, add a normalized sortable_version column and order by that
instead of the raw version field.
- Around line 51-64: The New function leaks the opened *sql.DB when subsequent
steps fail; ensure that after calling sql.Open you close the db on any error
path (e.g., if db.Ping() returns an error or if s.createSchema() fails). Modify
New so that if db.Ping() fails you call db.Close() before returning the wrapped
error, and likewise if createSchema() returns an error you call db.Close() (or
defer a conditional close) before returning; reference the New constructor, the
db variable returned by sql.Open, the db.Ping() call, and the s.createSchema()
invocation to locate the spots to add the cleanup.
- Around line 196-199: The current tag-filter loop in store.go builds an
unanchored substring match using clause and args which causes false positives
(e.g., tag "go" matching "google"); change the predicate to match JSON-quoted
tokens or use SQLite json_each for exact matches: for the simple fix update the
LIKE pattern to include JSON quotes (e.g., LIKE '%\"' || ? || '\"%') when
appending to clause/args, or for an exact approach replace the per-tag LIKE
clauses with a JOIN/EXISTS over json_each(tags_json) comparing the tag value
directly (use json_each(tags_json).value = ?); adjust the code that builds
clause and args (the loop over f.Tags and the variables clause and args)
accordingly so tags are matched as whole JSON tokens.
In `@internal/store/sync.go`:
- Around line 42-77: The sync currently may call DeleteStale(syncStart) even if
many ListRemoteTags or FetchManifestAnnotations failed, which can delete healthy
entries; modify the loop that iterates repos/tags (where ListRemoteTags,
FetchManifestAnnotations, manifestToSkill, and UpsertSkill are used) to track
counts: total candidates examined, fetch failures, and successful upserts
(indexed). After the loop compute a simple health check (e.g., if
fetchFailures/totalCandidates > X% or successfulUpserts == 0) and if the check
fails skip calling DeleteStale(syncStart) and log a clear warning; otherwise
proceed with the existing DeleteStale call and logs. Ensure you reference and
update the variables used now (indexed, syncStart) and keep existing
slog.Info/Warn semantics for non-guard paths.
In `@pkg/oci/annotations.go`:
- Line 24: There are duplicate annotation constants: AnnotationStatus duplicates
lifecycle.StatusAnnotation ("io.skillimage.status"); update code to have a
single source of truth by removing AnnotationStatus and replacing its usages in
bundle.go and catalog.go to reference lifecycle.StatusAnnotation (and adjust
imports as needed), or alternatively move lifecycle.StatusAnnotation to a
central pkg used by buildAnnotations, bundle.go and catalog.go; ensure you
delete the redundant constant Declaration (AnnotationStatus) in
pkg/oci/annotations.go, update all references (search for AnnotationStatus and
buildAnnotations) to use lifecycle.StatusAnnotation, and run a build to fix any
missing imports.
In `@pkg/oci/bundle.go`:
- Around line 138-169: bundleName built from filepath.Base(bundleDir) can be "."
or invalid for an OCI reference, causing c.store.Tag(ref) to fail or create bad
names; update the code around bundleName/manifest/ref to normalize and validate
the repository name: reject "." , ".." and empty values, sanitize or map invalid
characters to the allowed OCI repository charset (letters, digits, separators
like -, _, . and path components), or require/accept an explicit bundle name
flag when bundleDir yields an invalid name; ensure you check the final ref (the
string used in c.store.Tag and Tag call sites) and return a clear error if the
computed bundleName cannot be normalized into a valid OCI repository component.
In `@pkg/oci/catalog.go`:
- Around line 96-99: The current unbounded call io.ReadAll(rc) can OOM if a
remote registry returns a huge manifest; replace it by reading through an
io.LimitReader wrapping rc with a safe cap (e.g., 4 MiB) and use io.ReadAll on
that Limited reader (and treat an io.ErrUnexpectedEOF/short read or additional
content past the limit as an error) so manifestBytes is bounded; update the
manifest reading in pkg/oci/catalog.go around manifestBytes and rc accordingly
and return a clear error if the size limit is exceeded.
In `@pkg/oci/discovery.go`:
- Around line 61-66: DetectRegistryType currently uses a loose substring check
on registryURL which misclassifies hosts; update DetectRegistryType to parse the
registryURL with net/url (or split host if not a full URL), extract and
normalize the host portion, then check the host exactly equals "quay.io" or is a
subdomain suffix like ".quay.io" (or match against a maintained known-suffix
list) before returning RegistryTypeQuay; otherwise return RegistryTypeOCI. Refer
to the DetectRegistryType function and ensure you handle cases where registryURL
may be just a host or include a scheme and port.
- Around line 104-107: The request URL is forcing public-only results by
unconditionally setting q.Set("public","true"); change this so private repos
visible to the auth context are not excluded—either remove the "public" query
param entirely (so Quay returns repos based on auth) or make it conditional via
a new config/field (e.g., a boolean on the discovery struct like
includePublicOnly or publicOnly) and only set q.Set("public","true") when that
flag is true; update the repository-building code that references d.namespace
and apiURL (in pkg/oci/discovery.go) to use the new behavior and adjust any
tests or callers that expect the old public-only semantics.
- Around line 97-100: The base URL construction using d.apiBaseURL /
d.registryURL can produce a double scheme when RegistryURL already contains
"http(s)://"; update the logic in discovery.go where base is set (the block
referencing base, d.apiBaseURL and d.registryURL) to normalize the registry URL
by parsing or trimming the scheme: if d.apiBaseURL is empty, parse d.registryURL
and, if it already contains a scheme, use it as-is (or normalize to https/http
accordingly), otherwise prepend "https://" before assigning to base; ensure the
normalized URL is a valid absolute URL before using it for API requests.
- Around line 138-152: The current decode only reads the first page; modify the
logic in pkg/oci/discovery.go around the result decoding to support Quay
pagination by adding NextPage string `json:"next_page"` to the anonymous result
struct and wrapping the request+decode in a loop: decode each page into result,
append each result.Repositories entries into repos (using the existing repos
slice logic that builds Namespace+"/"+Name), then if result.NextPage is empty
break, otherwise issue a new request for the next page (pass the next_page token
as the query parameter) and continue until no next_page is returned; finally
return the accumulated repos. Ensure you reuse resp handling/closing and error
handling (the existing json decode error path) inside the loop.
---
Nitpick comments:
In `@deploy/base/deployment.yaml`:
- Around line 26-27: The manifest uses an unpinned image
"ghcr.io/redhat-et/skillctl:latest" with imagePullPolicy: Always; change the
image field to a fixed immutable tag or digest (e.g., replace ":latest" with a
release tag like ":vX.Y.Z" or "@sha256:...") and update imagePullPolicy to the
appropriate value (e.g., IfNotPresent) so deployments are reproducible and
CI/overlays can manage version bumps; target the image and imagePullPolicy
entries in the Deployment manifest.
In `@internal/cli/serve.go`:
- Around line 53-56: When parsing the repositories flag (variable repositories)
into repos, trim whitespace on each split element and drop empty entries so
values like "a, b, c" become ["a","b","c"]; update the block where repos is
assigned (the strings.Split call) to split and then iterate over the parts,
using strings.TrimSpace on each token and only append non-empty trimmed values
to repos.
- Line 66: The RunE handler currently casts the raw registryType string with
oci.RegistryType(registryType) which accepts anything; add validation in RunE
that checks the registryType variable against the allowed set
{"auto","oci","quay"} and return a clear error if it is not one of those values
(or call a parsing/constructor function that returns an error instead of using a
direct cast); update the places that construct RegistryType (the
oci.RegistryType(registryType) usage) to only use the validated/parsed value so
invalid flags like --registry-type=quey fail fast with a descriptive error.
In `@internal/handler/skills_test.go`:
- Around line 25-30: Replace the silent `_ = db.UpsertSkill(...)` calls with a
small test helper that fails fast on errors: add a helper like
`mustUpsertSkill(t testing.TB, db *Store, s store.Skill)` (or similar) in the
test file that calls `db.UpsertSkill(s)` and calls `t.Fatalf` with the error if
non-nil, then use `mustUpsertSkill(t, db, store.Skill{...})` wherever `_ =
db.UpsertSkill(...)` is used (including the other occurrences mentioned) so seed
failures surface immediately instead of causing confusing downstream assertions.
In `@internal/server/router.go`:
- Around line 17-19: The content-fetching route for
/skills/{ns}/{name}/versions/{ver}/content (and the similar routes at the later
block) can hang indefinitely; add a request deadline using middleware.Timeout to
bound work (e.g., middleware.Timeout(30*time.Second)). Update the router setup
to apply either a coarse global timeout via
r.Use(middleware.Timeout(30*time.Second))) or wrap just the content route with
r.With(middleware.Timeout(30*time.Second))). Ensure you import time and apply
the timeout to the handler(s) that synchronously pull from the upstream registry
(the content handler for that route and the analogous handlers referenced on
lines 37-39).
In `@internal/server/server.go`:
- Around line 91-93: Replace the direct comparison against the sentinel
http.ErrServerClosed with an errors.Is check: call srv.ListenAndServe(), capture
its error, and return it only if !errors.Is(err, http.ErrServerClosed); also add
"errors" to the import block. Update references around the ListenAndServe call
(the srv.ListenAndServe invocation and its surrounding error handling) to use
errors.Is(err, http.ErrServerClosed) so wrapped sentinel errors are recognized.
In `@internal/store/store.go`:
- Around line 144-157: Add a sentinel error (e.g., var ErrNotFound =
errors.New("not found")) to the package and update Store.GetSkill to return that
sentinel when no rows are found instead of fmt.Errorf; specifically, in GetSkill
(method on Store) change the "skill %s/%s not found" branch to return nil,
ErrNotFound. Also ensure callers (HTTP handlers) use errors.Is(err, ErrNotFound)
to map to 404. Keep the rest of GetSkill and querySkills behavior unchanged.
In `@Makefile`:
- Around line 22-27: The deploy target currently backgrounds the streaming
command `oc logs -f deploy/skillctl-catalog &` and uses `sleep 3` which can
leave an orphaned process and race with printing the route; change the sequence
so the route is printed before starting logs and run `oc logs -f
deploy/skillctl-catalog` in the foreground (remove the trailing `&` and the
`sleep 3`) so `make deploy` blocks while streaming logs, or alternatively remove
the follow entirely and only print the route; update the `deploy` recipe
accordingly around the `oc logs -f deploy/skillctl-catalog` and the `sleep`/echo
route lines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 6b622334-9e18-4f58-90da-45d938ba02ed
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (34)
.gitignoreMakefiledeploy/README.mddeploy/base/deployment.yamldeploy/base/kustomization.yamldeploy/base/pvc.yamldeploy/base/service.yamldeploy/base/serviceaccount.yamldeploy/overlays/openshift/configmap.yamldeploy/overlays/openshift/deployment-patch.yamldeploy/overlays/openshift/kustomization.yamldeploy/overlays/openshift/rolebinding.yamldeploy/overlays/openshift/route.yamlgo.modinternal/cli/pack.gointernal/cli/root.gointernal/cli/serve.gointernal/handler/health.gointernal/handler/skills.gointernal/handler/skills_test.gointernal/handler/sync.gointernal/server/router.gointernal/server/server.gointernal/server/server_test.gointernal/store/store.gointernal/store/store_test.gointernal/store/sync.gopkg/oci/annotations.gopkg/oci/bundle.gopkg/oci/bundle_test.gopkg/oci/catalog.gopkg/oci/discovery.gopkg/oci/discovery_test.gopkg/oci/push.go
| func DetectRegistryType(registryURL string) RegistryType { | ||
| if strings.Contains(strings.ToLower(registryURL), "quay") { | ||
| return RegistryTypeQuay | ||
| } | ||
| return RegistryTypeOCI | ||
| } |
There was a problem hiding this comment.
DetectRegistryType substring match is too loose.
strings.Contains(strings.ToLower(registryURL), "quay") would misclassify quayside.corp.example.com or aquaray.io as Quay. Match against the host portion and check for quay.io, *.quay.io, or a known suffix list.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/oci/discovery.go` around lines 61 - 66, DetectRegistryType currently uses
a loose substring check on registryURL which misclassifies hosts; update
DetectRegistryType to parse the registryURL with net/url (or split host if not a
full URL), extract and normalize the host portion, then check the host exactly
equals "quay.io" or is a subdomain suffix like ".quay.io" (or match against a
maintained known-suffix list) before returning RegistryTypeQuay; otherwise
return RegistryTypeOCI. Refer to the DetectRegistryType function and ensure you
handle cases where registryURL may be just a host or include a scheme and port.
| q := url.Values{} | ||
| q.Set("namespace", d.namespace) | ||
| q.Set("public", "true") | ||
| apiURL := base + "/api/v1/repository?" + q.Encode() |
There was a problem hiding this comment.
public=true silently excludes private repositories.
The query forces only public repos to be returned, even when the user has supplied credentials capable of seeing private ones. For an enterprise Quay deployment this is likely the wrong default. Consider omitting public (Quay then returns repos visible to the auth context) or adding a flag.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/oci/discovery.go` around lines 104 - 107, The request URL is forcing
public-only results by unconditionally setting q.Set("public","true"); change
this so private repos visible to the auth context are not excluded—either remove
the "public" query param entirely (so Quay returns repos based on auth) or make
it conditional via a new config/field (e.g., a boolean on the discovery struct
like includePublicOnly or publicOnly) and only set q.Set("public","true") when
that flag is true; update the repository-building code that references
d.namespace and apiURL (in pkg/oci/discovery.go) to use the new behavior and
adjust any tests or callers that expect the old public-only semantics.
| var result struct { | ||
| Repositories []struct { | ||
| Namespace string `json:"namespace"` | ||
| Name string `json:"name"` | ||
| } `json:"repositories"` | ||
| } | ||
| if err := json.NewDecoder(resp.Body).Decode(&result); err != nil { | ||
| return nil, fmt.Errorf("decoding quay API response: %w", err) | ||
| } | ||
|
|
||
| repos := make([]string, 0, len(result.Repositories)) | ||
| for _, r := range result.Repositories { | ||
| repos = append(repos, r.Namespace+"/"+r.Name) | ||
| } | ||
| return repos, nil |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Quay API /api/v1/repository pagination next_page parameter
💡 Result:
Quay’s repository listing endpoint (GET /api/v1/repository) paginates using an opaque cursor token. - Request query parameter: next_page (optional) - Meaning: “The page token for the next page”. - How to paginate: 1) Call GET https:///api/v1/repository with any required filters (for example, namespace, public/starred, etc.). 2) In the JSON response, if there is more data, the response includes a next_page field containing an opaque token. 3) Send another request with ?next_page= appended. 4) Repeat using the returned next_page value until the response no longer includes next_page. Example (from Red Hat Quay docs for the same pagination pattern): - First page: GET …/api/v1/superuser/logs - Second page: GET …/api/v1/superuser/logs?next_page= Notes: - The token is opaque (not a numeric page); you should treat next_page as a cursor value exactly as returned by the API. - For the repository list specifically, the API reference documents next_page as the “page token for the next page” under GET /api/v1/repository.
Citations:
- 1: https://docs.redhat.com/en/documentation/red_hat_quay/3.15/html/red_hat_quay_api_reference/repository
- 2: https://docs.redhat.com/en/documentation/red_hat_quay/3.7/html/use_red_hat_quay/using_the_red_hat_quay_api
- 3: https://docs.redhat.com/en/documentation/red_hat_quay/3.10/html/use_red_hat_quay/using_the_red_hat_quay_api
Add pagination support to handle orgs with more than ~50 repositories.
Quay's /api/v1/repository endpoint paginates using a next_page cursor token. The current code only decodes the first page and returns, silently truncating results for larger organizations. Add a loop that continues fetching pages while next_page is present in the response. The decoded struct must include a NextPage string field to capture the pagination token.
Pagination loop sketch
+ pageToken := ""
+ for {
+ q := url.Values{}
+ q.Set("namespace", d.namespace)
+ q.Set("public", "true")
+ if pageToken != "" {
+ q.Set("next_page", pageToken)
+ }
+ apiURL := base + "/api/v1/repository?" + q.Encode()
+ // ... build req, do, decode into result + result.NextPage ...
+ for _, r := range result.Repositories { repos = append(repos, r.Namespace+"/"+r.Name) }
+ if result.NextPage == "" { break }
+ pageToken = result.NextPage
+ }The decoded struct should add NextPage string \json:"next_page"``.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/oci/discovery.go` around lines 138 - 152, The current decode only reads
the first page; modify the logic in pkg/oci/discovery.go around the result
decoding to support Quay pagination by adding NextPage string `json:"next_page"`
to the anonymous result struct and wrapping the request+decode in a loop: decode
each page into result, append each result.Repositories entries into repos (using
the existing repos slice logic that builds Namespace+"/"+Name), then if
result.NextPage is empty break, otherwise issue a new request for the next page
(pass the next_page token as the query parameter) and continue until no
next_page is returned; finally return the accumulated repos. Ensure you reuse
resp handling/closing and error handling (the existing json decode error path)
inside the loop.
Deployment: - Use Recreate strategy to avoid RWO PVC hang on RollingUpdate - Fix README examples: use --namespace, oc patch instead of dry-run Reliability: - Close db on error in store.New() (resource leak) - Validate SyncInterval > 0 before creating ticker (panic guard) - Skip DeleteStale when no skills indexed and fetch errors occurred - Add 30s request timeout on content retrieval route - Limit manifest reads to 4 MiB (OOM protection) - Validate bundleName from filepath.Base (reject "." and "..") - Normalize registry URL scheme in Quay discovery adapter Correctness: - ORDER BY created DESC instead of version (semver lexical misorder) - Tag filter: match JSON-quoted tokens to avoid false positives - Remove duplicate AnnotationStatus, use lifecycle.StatusAnnotation - Use errors.Is for http.ErrServerClosed comparison - Return store.ErrNotFound sentinel, map to 404 in handler - Validate --registry-type flag against allowed values - Trim whitespace in --repositories values Housekeeping: - go mod tidy (chi as direct dependency) - Fix Makefile deploy target orphaned background process Deferred (tracked): - Content handler caching: issue #12 - Quay pagination: most orgs <100 repos Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Pavel Anni <panni@redhat.com>
Summary
skillctl serve): read-only HTTP server that indexes skills from an OCI registry into SQLite and serves them via REST API with filtering, pagination, and searchskillctl pack --bundle): pack a directory of multiple skill subdirectories into a single OCI image for deploying curated skill sets/api/v1/skills/{ns}/{name}/versions/{ver}/contentAPI endpoints
/api/v1/skills/api/v1/skills/{ns}/{name}/api/v1/skills/{ns}/{name}/versions/api/v1/skills/{ns}/{name}/versions/{ver}/content/api/v1/sync/healthzNew dependencies
modernc.org/sqlite— pure-Go SQLite (no CGO, cross-compiles, works in scratch containers)github.com/go-chi/chi/v5— HTTP router with middlewareTest plan
skillctl serve --helpshows flags,skillctl pack --bundleend-to-endgolangci-lint run— 0 issuesSummary by CodeRabbit
Release Notes
New Features
Documentation
Infrastructure
servecommand to run the catalog server