Introduce basic reconciliation loop for lightspeed-stack based deployment#94
Conversation
Remove controller logic (funcs.go, ols_install.go, openstacklightspeed_controller.go) and RBAC permissions (OLSConfig access, OLM operator resources) that are no longer needed. This prepares the operator for migration to lightspeed-core by stripping out all OLS-specific functionality.
|
Skipping CI for Draft Pull Request. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughReplaces in-cluster OLS operator lifecycle with in-repo reconciliation for Postgres and LCore; centralizes controller constants/helpers/errors; changes working namespace from Changes
Sequence DiagramsequenceDiagram
participant Recon as Reconciler
participant PR as PostgresReconciler
participant PD as PostgresDeployment
participant LR as LCoreReconciler
participant LD as LCoreDeployment
participant K8s as KubernetesAPI
Recon->>PR: ReconcilePostgresResources(h, ctx, instance)
PR->>K8s: Create/Patch ConfigMap, Secret(s), NetworkPolicy
Recon->>PD: ReconcilePostgresDeployment(h, ctx, instance)
PD->>K8s: Create/Patch Deployment (podTemplate, TLS mounts)
PD->>K8s: Create/Patch Service (ClusterIP)
PD->>K8s: Poll for TLS Secret
Recon->>LR: ReconcileLCoreResources(h, ctx, instance)
LR->>K8s: Create/Patch ServiceAccount, ClusterRole/Binding, ConfigMaps, NetworkPolicy
Recon->>LD: ReconcileLCoreDeployment(h, ctx, instance)
LD->>K8s: Create/Patch Deployment (llama-stack + lightspeed-service)
LD->>K8s: Create/Patch Service (HTTPS)
LD->>K8s: Poll for TLS Secret
Recon->>K8s: Update OpenStackLightspeed status (Ready/conditions)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/test all |
lightspeed-stack based deploymentlightspeed-stack based deployment
06822e8 to
7936551
Compare
|
/test all |
7936551 to
0431153
Compare
|
/test all |
7833d19 to
efee6ac
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
/test all |
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
config/manifests/bases/openstack-lightspeed-operator.clusterserviceversion.yaml (1)
155-159:⚠️ Potential issue | 🟠 Major
relatedImagesis still incomplete for the new runtime.The CSV only declares
operatorandrag-content, but the operator now deploys PostgreSQL, lightspeed-stack, and Llama Stack workloads. In disconnected installs OLM mirrors onlyrelatedImages, so those pods can end up pulling images the bundle never advertised.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/manifests/bases/openstack-lightspeed-operator.clusterserviceversion.yaml` around lines 155 - 159, The CSV's relatedImages section only lists operator and rag-content, causing missing mirrored images for PostgreSQL, lightspeed-stack, and Llama Stack; update the relatedImages array to include all container images the operator deploys (e.g., PostgreSQL image(s), lightspeed-stack image(s), Llama Stack image(s)) with descriptive name keys so OLM will mirror them; locate the relatedImages block (symbol: relatedImages) and add additional entries similar to the existing ones (image: <registry>/<repo>:<tag>, name: <unique-name>) for each runtime component the operator creates.internal/controller/ocp_version.go (1)
228-243:⚠️ Potential issue | 🟠 MajorReject unsupported
ocpRAGVersionOverridevalues.Any non-empty override is accepted here, so values like
4.17orfooget written intoActiveOCPRAGVersionand then used to build a non-existent index path/ID.resolveOCPVersion()is already prepared to surface an invalid-override condition, but this branch never returns that error.🔧 Suggested validation
func ResolveOCPVersion(detectedVersion, overrideVersion string, enableOCPRAG bool) (string, bool, error) { if !enableOCPRAG { return "", false, nil } // Use override if provided if overrideVersion != "" { + if !IsSupportedOCPVersion(overrideVersion) { + return "", false, fmt.Errorf( + "unsupported OCP RAG version override %q (supported: %v)", + overrideVersion, + SupportedOCPVersions, + ) + } return overrideVersion, false, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/ocp_version.go` around lines 228 - 243, The current branch in resolveOCPVersion accepts any non-empty overrideVersion; instead validate overrides before accepting them: when overrideVersion != "" call IsSupportedOCPVersion(overrideVersion) (or re-use resolveOCPVersion’s existing validation logic) and only return overrideVersion, false, nil if it is supported; otherwise return an error indicating the invalid override (e.g., fmt.Errorf("invalid ocpRAGVersionOverride: %s", overrideVersion)). Ensure you reference overrideVersion and IsSupportedOCPVersion/resolveOCPVersion so invalid values like "4.17" or "foo" are rejected rather than written into ActiveOCPRAGVersion.
🧹 Nitpick comments (10)
hack/env.sh (1)
2-4: Pin immutable image refs in shipping manifests.The three image defaults in lines 2–4 propagate directly into
config/manager/manager.yaml:75–80and the operator bundle (bundle/manifests/openstack-lightspeed-operator.clusterserviceversion.yaml:313–318). Using floating:latesttags makes operator releases non-reproducible since the same build tag can pull different images at deploy time. The codebase already uses fixed tags elsewhere (e.g.,OPENSTACK_LIGHTSPEED_IMAGE_URL_DEFAULTusesos-docs-2025.2); apply the same pattern to the LCore, Exporter, and Postgres image defaults.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hack/env.sh` around lines 2 - 4, Replace the floating :latest tags with immutable, fixed tags for the three image default environment variables to make releases reproducible: update RELATED_IMAGE_LCORE_IMAGE_URL_DEFAULT, RELATED_IMAGE_EXPORTER_IMAGE_URL_DEFAULT, and RELATED_IMAGE_POSTGRES_IMAGE_URL_DEFAULT to point at specific, versioned image tags (matching the project's tagging pattern such as the fixed tag used for OPENSTACK_LIGHTSPEED_IMAGE_URL_DEFAULT) so that the values propagated into manager.yaml and the operator bundle are immutable.scripts/confirm-rhosls-running.sh (1)
3-11: Add a bounded timeout for the polling loop.
while truecan hang forever if the Subscription never reportsinstalledCSV, which can block CI/jobs indefinitely.Suggested fix
-while true; do +max_attempts=60 +attempt=0 +while [ "$attempt" -lt "$max_attempts" ]; do csv=$(oc get subscription openstack-lightspeed-operator -n openstack-lightspeed -o jsonpath='{.status.installedCSV}' 2>/dev/null) if [ -n "$csv" ]; then echo "Found installedCSV: $csv" break fi echo "Waiting for openstack-lightspeed-operator Subscription installedCSV to be populated ..." sleep 5 + attempt=$((attempt + 1)) done + +if [ -z "$csv" ]; then + echo "Timed out waiting for installedCSV on Subscription/openstack-lightspeed-operator in namespace openstack-lightspeed" >&2 + exit 1 +fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/confirm-rhosls-running.sh` around lines 3 - 11, The polling loop in scripts/confirm-rhosls-running.sh uses while true and can hang; modify the loop that reads the csv variable (csv=$(oc get subscription ... installedCSV)) to implement a bounded timeout by introducing a retry counter or deadline (e.g., max_retries or timeout_seconds), incrementing it each sleep and breaking with a non-zero exit and an error message if the limit is reached, otherwise continue polling as before; ensure the error message includes context about the openstack-lightspeed-operator Subscription and the final observed state (or lack thereof) before exiting.config/manager/manager.yaml (1)
76-80: Prefer immutable image refs over:latestdefaults.Three default container images use mutable
:latesttags:RELATED_IMAGE_LCORE_IMAGE_URL_DEFAULT,RELATED_IMAGE_EXPORTER_IMAGE_URL_DEFAULT, andRELATED_IMAGE_POSTGRES_IMAGE_URL_DEFAULT. This weakens reproducibility and can cause unexpected runtime drift across environments.Since
RELATED_IMAGE_OPENSTACK_LIGHTSPEED_IMAGE_URL_DEFAULTalready uses a versioned tag (os-docs-2025.2), align the other defaults to pinned versions or digests for consistency:
quay.io/lightspeed-core/lightspeed-stack:latest→ versioned tagquay.io/lightspeed-core/lightspeed-to-dataverse-exporter:latest→ versioned tagregistry.redhat.io/rhel9/postgresql-16:latest→ versioned tagNote: Users can override these via
RELATED_IMAGE_*_URL_DEFAULTenvironment variables, but hardcoded defaults should follow best practices.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/manager/manager.yaml` around lines 76 - 80, Replace the mutable :latest defaults with immutable, versioned tags or digests for the three environment variables to ensure reproducible images: update RELATED_IMAGE_LCORE_IMAGE_URL_DEFAULT (currently quay.io/lightspeed-core/lightspeed-stack:latest), RELATED_IMAGE_EXPORTER_IMAGE_URL_DEFAULT (quay.io/lightspeed-core/lightspeed-to-dataverse-exporter:latest), and RELATED_IMAGE_POSTGRES_IMAGE_URL_DEFAULT (registry.redhat.io/rhel9/postgresql-16:latest) to pinned tags or SHA digests (matching the project's release versions, e.g., use the same style as RELATED_IMAGE_OPENSTACK_LIGHTSPEED_IMAGE_URL_DEFAULT's os-docs-2025.2), keeping the same env var names so users can still override them.internal/controller/assets/postgres_bootstrap.sh (1)
1-23: Quote all-darguments and add strict mode for robustness.The script uses unquoted variable expansions in six
-dflag invocations (lines 12, 15, 18, 21–23), which can fail with unexpected input or when variables contain whitespace. Addset -euo pipefailand validate thePOSTGRESQL_DATABASEenvironment variable to prevent silent failures.Suggested refactor
#!/bin/bash +set -euo pipefail +: "${POSTGRESQL_DATABASE:?POSTGRESQL_DATABASE must be set}" @@ -echo "SELECT 'CREATE DATABASE $DB_NAME' WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = '$DB_NAME')\gexec" | _psql -d $POSTGRESQL_DATABASE +echo "SELECT 'CREATE DATABASE $DB_NAME' WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = '$DB_NAME')\gexec" | _psql -d "$POSTGRESQL_DATABASE" @@ -echo "CREATE EXTENSION IF NOT EXISTS pg_trgm;" | _psql -d $POSTGRESQL_DATABASE +echo "CREATE EXTENSION IF NOT EXISTS pg_trgm;" | _psql -d "$POSTGRESQL_DATABASE" @@ -echo "CREATE EXTENSION IF NOT EXISTS pg_trgm;" | _psql -d $DB_NAME +echo "CREATE EXTENSION IF NOT EXISTS pg_trgm;" | _psql -d "$DB_NAME" @@ -echo "CREATE SCHEMA IF NOT EXISTS lcore;" | _psql -d $POSTGRESQL_DATABASE -echo "CREATE SCHEMA IF NOT EXISTS quota;" | _psql -d $POSTGRESQL_DATABASE -echo "CREATE SCHEMA IF NOT EXISTS conversation_cache;" | _psql -d $POSTGRESQL_DATABASE +echo "CREATE SCHEMA IF NOT EXISTS lcore;" | _psql -d "$POSTGRESQL_DATABASE" +echo "CREATE SCHEMA IF NOT EXISTS quota;" | _psql -d "$POSTGRESQL_DATABASE" +echo "CREATE SCHEMA IF NOT EXISTS conversation_cache;" | _psql -d "$POSTGRESQL_DATABASE"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/assets/postgres_bootstrap.sh` around lines 1 - 23, Enable strict mode and validate POSTGRESQL_DATABASE, then quote all -d arguments and any unquoted variable expansions to avoid word-splitting; specifically add set -euo pipefail at the top of the script, check that POSTGRESQL_DATABASE is set and non-empty before proceeding, and update all uses of the helper _psql (calls like _psql -d $POSTGRESQL_DATABASE and _psql -d $DB_NAME) to quote the database arguments (e.g., _psql -d "$POSTGRESQL_DATABASE" and _psql -d "$DB_NAME") as well as any other unquoted variable expansions in the file.test/kuttl/common/openstack-lightspeed-instance/assert-openstack-lightspeed-instance.yaml (1)
5-123: This resource-creation assertion still misses several new secret resources.
lightspeed-postgres-secret,lightspeed-postgres-certs, andlightspeed-tlsare part of the new managed surface, but this file only checks the bootstrap secret. A regression in password or cert reconciliation would currently leave the resource-creation scenario green.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/kuttl/common/openstack-lightspeed-instance/assert-openstack-lightspeed-instance.yaml` around lines 5 - 123, The assertion file currently only verifies the bootstrap secret (lightspeed-postgres-bootstrap) and misses the new Secrets required by the managed surface; add assertions for the missing secrets lightspeed-postgres-secret, lightspeed-postgres-certs, and lightspeed-tls (each as apiVersion: v1 kind: Secret with metadata.name set and namespace: openstack-lightspeed) so the test will fail if password or cert reconciliation does not create/update them; ensure the new Secret entries follow the same structure/namespace as the existing lightspeed-postgres-bootstrap entry and are included alongside the other resources (ConfigMap/Service/Deployment) in the same YAML manifest.bundle/manifests/openstack-lightspeed-operator.clusterserviceversion.yaml (1)
313-318: Consider pinning image tags for production stability.The container images use
:latesttags (lightspeed-stack:latest,lightspeed-to-dataverse-exporter:latest,postgresql-16:latest). While acceptable during alpha development, pinning to specific versions improves reproducibility and prevents unexpected changes in production.Also applies to: 416-421
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bundle/manifests/openstack-lightspeed-operator.clusterserviceversion.yaml` around lines 313 - 318, Replace the use of floating :latest tags with pinned, immutable versions for the RELATED_IMAGE_LCORE_IMAGE_URL_DEFAULT, RELATED_IMAGE_EXPORTER_IMAGE_URL_DEFAULT, and RELATED_IMAGE_POSTGRES_IMAGE_URL_DEFAULT environment values (and the equivalent entries later in the file) by specifying exact semantic versions or digests (e.g., vX.Y.Z or `@sha256`:...). Update those variables (RELATED_IMAGE_LCORE_IMAGE_URL_DEFAULT, RELATED_IMAGE_EXPORTER_IMAGE_URL_DEFAULT, RELATED_IMAGE_POSTGRES_IMAGE_URL_DEFAULT) to point to the chosen fixed tags/digests and ensure the same pinned values are applied to the duplicate entries referenced in the later block.internal/controller/lcore_deployment.go (2)
36-36: Consider preallocating slices for minor efficiency gain.Similar to the Postgres deployment builder, slices could be preallocated.
♻️ Proposed fixes
- volumes := []corev1.Volume{} + volumes := make([]corev1.Volume, 0, 8)- llamaStackMounts := []corev1.VolumeMount{llamaMount} - llamaStackMounts = append(llamaStackMounts, sharedMounts...) - llamaStackMounts = append(llamaStackMounts, llamaCacheMounts...) + llamaStackMounts := make([]corev1.VolumeMount, 0, 1+len(sharedMounts)+len(llamaCacheMounts)) + llamaStackMounts = append(llamaStackMounts, llamaMount) + llamaStackMounts = append(llamaStackMounts, sharedMounts...) + llamaStackMounts = append(llamaStackMounts, llamaCacheMounts...)- lightspeedStackMounts := []corev1.VolumeMount{lcoreMount} - lightspeedStackMounts = append(lightspeedStackMounts, sharedMounts...) + lightspeedStackMounts := make([]corev1.VolumeMount, 0, 1+len(sharedMounts)) + lightspeedStackMounts = append(lightspeedStackMounts, lcoreMount) + lightspeedStackMounts = append(lightspeedStackMounts, sharedMounts...)Also applies to: 66-68, 92-93
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/lcore_deployment.go` at line 36, Several slices in internal/controller/lcore_deployment.go are created with zero-length literals (e.g., volumes := []corev1.Volume{} and the slices at the other noted locations); preallocate them using make with an appropriate capacity to avoid repeated reallocations. Locate the slice declarations (volumes, volumeMounts, envVars or similarly named variables) in the deployment builder functions and replace the zero-literal with make(..., 0, expectedCount) where expectedCount is the number of elements you will append (compute or sum known contributors such as PVC mounts, configMaps, secrets, and default mounts) so each slice is created with enough capacity up front.
427-455: Liveness and readiness probes have identical configuration.Both probes use the same TCP socket check with identical timing parameters. Consider extracting to a single builder if this is intentional, or differentiate the timing if readiness should respond faster.
♻️ Example: Single probe builder with type parameter
// buildLightspeedStackProbe returns a probe for the lightspeed-stack container. func buildLightspeedStackProbe() *corev1.Probe { return &corev1.Probe{ ProbeHandler: corev1.ProbeHandler{ TCPSocket: &corev1.TCPSocketAction{ Port: intstr.FromInt32(OpenStackLightspeedAppServerContainerPort), }, }, InitialDelaySeconds: 30, PeriodSeconds: 10, TimeoutSeconds: 5, FailureThreshold: 3, } } // Usage: // LivenessProbe: buildLightspeedStackProbe(), // ReadinessProbe: buildLightspeedStackProbe(),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/lcore_deployment.go` around lines 427 - 455, The liveness and readiness probe functions buildLightspeedStackLivenessProbe and buildLightspeedStackReadinessProbe are identical; refactor by extracting a single builder (e.g., buildLightspeedStackProbe or buildLightspeedStackProbe(isReadiness bool)) that constructs the corev1.Probe using OpenStackLightspeedAppServerContainerPort and then have both LivenessProbe and ReadinessProbe call that shared builder (or call with a flag to tweak timing like shorter InitialDelaySeconds/PeriodSeconds for readiness if desired).internal/controller/postgres_deployment.go (1)
35-36: Consider preallocating slices for minor efficiency gain.The
volumesandvolumeMountsslices could be preallocated with known capacity to avoid reallocations.♻️ Proposed fix
- volumes := []corev1.Volume{} - volumeMounts := []corev1.VolumeMount{} + volumes := make([]corev1.Volume, 0, 7) + volumeMounts := make([]corev1.VolumeMount, 0, 7)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/postgres_deployment.go` around lines 35 - 36, Replace the zero-length slice literals for volumes and volumeMounts with capacity-preallocated slices to avoid reallocations: change the declarations of volumes and volumeMounts (the variables named "volumes" and "volumeMounts") to use make with an appropriate capacity (e.g., make([]corev1.Volume, 0, expectedCount) and make([]corev1.VolumeMount, 0, expectedCount)) in the same function where they are declared so subsequent appends do not reallocate.internal/controller/lcore_reconciler.go (1)
401-415: Avoid waiting up to 60s inside the reconcile worker.This poll blocks a worker while service-ca populates the Secret. When that controller is slow or absent, the operator spends most of the reconcile sleeping instead of draining the queue; returning a transient not-ready error / requeue is safer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/lcore_reconciler.go` around lines 401 - 415, Remove the blocking wait.PollUntilContextTimeout call and instead perform a single immediate check using h.GetClient().Get(ctx, secretKey, secret); if the Secret is NotFound or missing "tls.key"/"tls.crt" return a transient not-ready response that requeues the reconcile (e.g., return a ctrl.Result with RequeueAfter a short delay) rather than sleeping the worker; keep using ErrGetTLSSecret only for permanent failures (wrap/return it on non-transient errors from h.GetClient().Get) and do not block inside the reconcile loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/build-and-push.yaml:
- Around line 35-37: Replace the hardcoded LATEST_TAG value so the tag is
branch-aware: instead of LATEST_TAG=lcore-migration-latest set LATEST_TAG to use
the GitHub Actions branch name (github.ref_name) so the tag becomes unique per
branch (e.g., lcore-migration-${{ github.ref_name }}); update the environment
variable assignment that defines LATEST_TAG in the workflow to reference the
github context (github.ref_name) so promotion pushes won’t overwrite a shared
"latest" tag across branches.
In
`@bundle/manifests/openstack-lightspeed-operator-manager-rolebinding_rbac.authorization.k8s.io_v1_rolebinding.yaml`:
- Around line 2-12: The manifest is using RoleBinding with roleRef.kind: Role to
reference manager-role, but manager-role is a ClusterRole; update the resource
to ClusterRoleBinding and set roleRef.kind to ClusterRole (keep apiGroup:
rbac.authorization.k8s.io and roleRef.name: manager-role and the existing
metadata/name), ensuring the binding type matches the ClusterRole defined in
config/rbac/role.yaml so permissions are granted cluster-wide.
In `@config/rbac/role.yaml`:
- Around line 115-124: The ClusterRole/ClusterRoleBinding RBAC is missing write
verbs needed for reconciliation: update/patch; update the role YAML where
ClusterRole and ClusterRoleBinding resources are listed so they include "update"
and "patch" in verbs (in addition to create/delete/get/list/watch) to allow
controllerutil.CreateOrPatch in internal/controller/lcore_reconciler.go to
modify existing objects during future reconciles.
In `@internal/controller/common.go`:
- Around line 65-72: getConfigMapResourceVersion currently reads the ConfigMap
via the cached manager client (h.GetClient()), which can return stale data;
change it to use the direct/raw client provided by the existing getRawClient
helper (or h.GetRawClient() if available) so the lookup reads from the API
server directly, i.e., locate getConfigMapResourceVersion and replace the client
acquisition to call getRawClient (or h.GetRawClient) before performing the Get
to return the current cm.ResourceVersion.
In `@internal/controller/llama_stack_config.go`:
- Around line 126-140: Update the inline comment in the switch case handling
provider.Type == "openai" / "rhoai_vllm" / "rhelai_vllm" to remove the incorrect
statement that remote::openai "validates against OpenAI model whitelist" and
instead state that remote::openai accepts arbitrary OpenAI-compatible model IDs
when a custom base_url is configured (i.e., both remote::openai and remote::vllm
can use custom model names), keeping the rest of the logic using providerConfig
and config (and envVarName) unchanged.
In `@internal/controller/openstacklightspeed_controller.go`:
- Around line 167-188: The code marks OpenStackLightspeedReadyCondition true
immediately after ReconcileTasks returns, but ReconcilePostgresDeployment and
ReconcileLCoreDeployment only ensure resources exist, not that their Deployments
are actually available; change the logic so MarkTrue is only called after
verifying workload readiness: after ReconcileTasks succeeds, query the relevant
Deployments created/managed by ReconcilePostgresDeployment and
ReconcileLCoreDeployment (or enhance those functions to return readiness), check
Deployment.status.availableReplicas (or use a helper like
WaitForDeploymentReady) and only call
instance.Status.Conditions.MarkTrue(OpenStackLightspeedReadyCondition,
OpenStackLightspeedReadyMessage) when the Deployments report available replicas
> 0 (else set a False condition with appropriate reason/message and requeue).
- Around line 58-65: RBAC markers are missing verbs required by
controllerutil.CreateOrPatch and client.DeleteAllOf; update the kubebuilder RBAC
comments for the reconciled resources (configmaps, secrets, services,
serviceaccounts, deployments, networkpolicies) to include "patch" and add
"deletecollection" to the clusterroles and clusterrolebindings markers so
CreateOrPatch and DeleteAllOf calls succeed; search for usages of
controllerutil.CreateOrPatch and client.DeleteAllOf (e.g., in
OpenStackLightspeedReconciler methods) and ensure the related resource comment
lines (the +kubebuilder:rbac markers) include the "patch" and "deletecollection"
verbs accordingly.
In `@internal/controller/postgres_reconciler.go`:
- Around line 240-266: The code currently compares and stores the observed
ConfigMap version on Deployment.Annotations which causes the podTemplate to flip
the ForceReloadAnnotationKey twice (once when adding the timestamp then again
when removing it) and triggers two rollouts; instead, read the stored version
from deployment.Spec.Template.Annotations (use
deployment.Spec.Template.Annotations[PostgresConfigMapResourceVersionAnnotation]
to compute configMapChanged) and write the currentConfigMapVersion to
podTemplateSpec.Annotations (i.e. set
podTemplateSpec.Annotations[PostgresConfigMapResourceVersionAnnotation] =
currentConfigMapVersion) so the version lives inside the pod template; keep
using buildPostgresPodTemplateSpec(configMapChanged) and remove the final
assignment to deployment.Annotations (or stop writing
PostgresConfigMapResourceVersionAnnotation to deployment.Annotations) so the
observed version is only persisted on the pod template and avoids a second
rollout.
In `@scripts/confirm-rhosls-running.sh`:
- Line 14: The oc wait invocation uses an unquoted variable ($csv) which can
break on whitespace or glob characters; edit the command that calls oc wait csv
$csv --for=jsonpath='{.status.phase}'=Succeeded --timeout=300s -n
openstack-lightspeed and wrap the $csv expansion in double quotes to prevent
word-splitting and globbing so the CSV name is passed as a single safe argument.
In
`@test/kuttl/common/openstack-lightspeed-instance/assert-lightspeed-stack-config.yaml`:
- Around line 57-59: Remove the instruction string that mandates the assistant
to claim it "always have the most recent information" (the line beginning "Never
mention your last update date or knowledge cutoff. You always have the most
recent information...") and replace it with a neutral assertion that the
assistant should rely on provided context and admit uncertainty when
documentation or cluster data may be stale; update the assertion text to
instruct the assistant to reference only supplied context and to state when it
cannot verify freshness instead of asserting guaranteed up-to-date knowledge.
In
`@test/kuttl/tests/update-openstacklightspeed/06-update-openstack-lightspeed-instance.yaml`:
- Around line 3-10: The Secret manifest for the resource named
"openstack-lightspeed-apitoken-update" contains a duplicated "type" key; remove
the extra "type: Opaque" so there is only a single "type" field in the Secret
definition (keep one "type: Opaque" and ensure "stringData: apitoken: xxx",
metadata name "openstack-lightspeed-apitoken-update" and namespace
"openstack-lightspeed" remain unchanged).
---
Outside diff comments:
In
`@config/manifests/bases/openstack-lightspeed-operator.clusterserviceversion.yaml`:
- Around line 155-159: The CSV's relatedImages section only lists operator and
rag-content, causing missing mirrored images for PostgreSQL, lightspeed-stack,
and Llama Stack; update the relatedImages array to include all container images
the operator deploys (e.g., PostgreSQL image(s), lightspeed-stack image(s),
Llama Stack image(s)) with descriptive name keys so OLM will mirror them; locate
the relatedImages block (symbol: relatedImages) and add additional entries
similar to the existing ones (image: <registry>/<repo>:<tag>, name:
<unique-name>) for each runtime component the operator creates.
In `@internal/controller/ocp_version.go`:
- Around line 228-243: The current branch in resolveOCPVersion accepts any
non-empty overrideVersion; instead validate overrides before accepting them:
when overrideVersion != "" call IsSupportedOCPVersion(overrideVersion) (or
re-use resolveOCPVersion’s existing validation logic) and only return
overrideVersion, false, nil if it is supported; otherwise return an error
indicating the invalid override (e.g., fmt.Errorf("invalid
ocpRAGVersionOverride: %s", overrideVersion)). Ensure you reference
overrideVersion and IsSupportedOCPVersion/resolveOCPVersion so invalid values
like "4.17" or "foo" are rejected rather than written into ActiveOCPRAGVersion.
---
Nitpick comments:
In `@bundle/manifests/openstack-lightspeed-operator.clusterserviceversion.yaml`:
- Around line 313-318: Replace the use of floating :latest tags with pinned,
immutable versions for the RELATED_IMAGE_LCORE_IMAGE_URL_DEFAULT,
RELATED_IMAGE_EXPORTER_IMAGE_URL_DEFAULT, and
RELATED_IMAGE_POSTGRES_IMAGE_URL_DEFAULT environment values (and the equivalent
entries later in the file) by specifying exact semantic versions or digests
(e.g., vX.Y.Z or `@sha256`:...). Update those variables
(RELATED_IMAGE_LCORE_IMAGE_URL_DEFAULT,
RELATED_IMAGE_EXPORTER_IMAGE_URL_DEFAULT,
RELATED_IMAGE_POSTGRES_IMAGE_URL_DEFAULT) to point to the chosen fixed
tags/digests and ensure the same pinned values are applied to the duplicate
entries referenced in the later block.
In `@config/manager/manager.yaml`:
- Around line 76-80: Replace the mutable :latest defaults with immutable,
versioned tags or digests for the three environment variables to ensure
reproducible images: update RELATED_IMAGE_LCORE_IMAGE_URL_DEFAULT (currently
quay.io/lightspeed-core/lightspeed-stack:latest),
RELATED_IMAGE_EXPORTER_IMAGE_URL_DEFAULT
(quay.io/lightspeed-core/lightspeed-to-dataverse-exporter:latest), and
RELATED_IMAGE_POSTGRES_IMAGE_URL_DEFAULT
(registry.redhat.io/rhel9/postgresql-16:latest) to pinned tags or SHA digests
(matching the project's release versions, e.g., use the same style as
RELATED_IMAGE_OPENSTACK_LIGHTSPEED_IMAGE_URL_DEFAULT's os-docs-2025.2), keeping
the same env var names so users can still override them.
In `@hack/env.sh`:
- Around line 2-4: Replace the floating :latest tags with immutable, fixed tags
for the three image default environment variables to make releases reproducible:
update RELATED_IMAGE_LCORE_IMAGE_URL_DEFAULT,
RELATED_IMAGE_EXPORTER_IMAGE_URL_DEFAULT, and
RELATED_IMAGE_POSTGRES_IMAGE_URL_DEFAULT to point at specific, versioned image
tags (matching the project's tagging pattern such as the fixed tag used for
OPENSTACK_LIGHTSPEED_IMAGE_URL_DEFAULT) so that the values propagated into
manager.yaml and the operator bundle are immutable.
In `@internal/controller/assets/postgres_bootstrap.sh`:
- Around line 1-23: Enable strict mode and validate POSTGRESQL_DATABASE, then
quote all -d arguments and any unquoted variable expansions to avoid
word-splitting; specifically add set -euo pipefail at the top of the script,
check that POSTGRESQL_DATABASE is set and non-empty before proceeding, and
update all uses of the helper _psql (calls like _psql -d $POSTGRESQL_DATABASE
and _psql -d $DB_NAME) to quote the database arguments (e.g., _psql -d
"$POSTGRESQL_DATABASE" and _psql -d "$DB_NAME") as well as any other unquoted
variable expansions in the file.
In `@internal/controller/lcore_deployment.go`:
- Line 36: Several slices in internal/controller/lcore_deployment.go are created
with zero-length literals (e.g., volumes := []corev1.Volume{} and the slices at
the other noted locations); preallocate them using make with an appropriate
capacity to avoid repeated reallocations. Locate the slice declarations
(volumes, volumeMounts, envVars or similarly named variables) in the deployment
builder functions and replace the zero-literal with make(..., 0, expectedCount)
where expectedCount is the number of elements you will append (compute or sum
known contributors such as PVC mounts, configMaps, secrets, and default mounts)
so each slice is created with enough capacity up front.
- Around line 427-455: The liveness and readiness probe functions
buildLightspeedStackLivenessProbe and buildLightspeedStackReadinessProbe are
identical; refactor by extracting a single builder (e.g.,
buildLightspeedStackProbe or buildLightspeedStackProbe(isReadiness bool)) that
constructs the corev1.Probe using OpenStackLightspeedAppServerContainerPort and
then have both LivenessProbe and ReadinessProbe call that shared builder (or
call with a flag to tweak timing like shorter InitialDelaySeconds/PeriodSeconds
for readiness if desired).
In `@internal/controller/lcore_reconciler.go`:
- Around line 401-415: Remove the blocking wait.PollUntilContextTimeout call and
instead perform a single immediate check using h.GetClient().Get(ctx, secretKey,
secret); if the Secret is NotFound or missing "tls.key"/"tls.crt" return a
transient not-ready response that requeues the reconcile (e.g., return a
ctrl.Result with RequeueAfter a short delay) rather than sleeping the worker;
keep using ErrGetTLSSecret only for permanent failures (wrap/return it on
non-transient errors from h.GetClient().Get) and do not block inside the
reconcile loop.
In `@internal/controller/postgres_deployment.go`:
- Around line 35-36: Replace the zero-length slice literals for volumes and
volumeMounts with capacity-preallocated slices to avoid reallocations: change
the declarations of volumes and volumeMounts (the variables named "volumes" and
"volumeMounts") to use make with an appropriate capacity (e.g.,
make([]corev1.Volume, 0, expectedCount) and make([]corev1.VolumeMount, 0,
expectedCount)) in the same function where they are declared so subsequent
appends do not reallocate.
In `@scripts/confirm-rhosls-running.sh`:
- Around line 3-11: The polling loop in scripts/confirm-rhosls-running.sh uses
while true and can hang; modify the loop that reads the csv variable (csv=$(oc
get subscription ... installedCSV)) to implement a bounded timeout by
introducing a retry counter or deadline (e.g., max_retries or timeout_seconds),
incrementing it each sleep and breaking with a non-zero exit and an error
message if the limit is reached, otherwise continue polling as before; ensure
the error message includes context about the openstack-lightspeed-operator
Subscription and the final observed state (or lack thereof) before exiting.
In
`@test/kuttl/common/openstack-lightspeed-instance/assert-openstack-lightspeed-instance.yaml`:
- Around line 5-123: The assertion file currently only verifies the bootstrap
secret (lightspeed-postgres-bootstrap) and misses the new Secrets required by
the managed surface; add assertions for the missing secrets
lightspeed-postgres-secret, lightspeed-postgres-certs, and lightspeed-tls (each
as apiVersion: v1 kind: Secret with metadata.name set and namespace:
openstack-lightspeed) so the test will fail if password or cert reconciliation
does not create/update them; ensure the new Secret entries follow the same
structure/namespace as the existing lightspeed-postgres-bootstrap entry and are
included alongside the other resources (ConfigMap/Service/Deployment) in the
same YAML manifest.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c6b161b9-ae36-465c-bcca-b14ef9cfe89a
📒 Files selected for processing (66)
.github/workflows/build-and-push.yamlMakefileREADME.mdapi/v1beta1/openstacklightspeed_types.gobundle/manifests/lightspeed.openstack.org_openstacklightspeeds.yamlbundle/manifests/openstack-lightspeed-operator-manager-rolebinding_rbac.authorization.k8s.io_v1_rolebinding.yamlbundle/manifests/openstack-lightspeed-operator.clusterserviceversion.yamlconfig/crd/bases/lightspeed.openstack.org_openstacklightspeeds.yamlconfig/manager/manager.yamlconfig/manifests/bases/openstack-lightspeed-operator.clusterserviceversion.yamlconfig/rbac/role.yamlhack/env.shinternal/controller/assets/postgres.confinternal/controller/assets/postgres_bootstrap.shinternal/controller/assets/system_prompt.txtinternal/controller/common.gointernal/controller/constants.gointernal/controller/errors.gointernal/controller/funcs.gointernal/controller/lcore_config.gointernal/controller/lcore_deployment.gointernal/controller/lcore_reconciler.gointernal/controller/llama_stack_config.gointernal/controller/ocp_version.gointernal/controller/ols_install.gointernal/controller/openstacklightspeed_controller.gointernal/controller/postgres_deployment.gointernal/controller/postgres_reconciler.gointernal/controller/reconcile_tasks.gokuttl-test.yamlscripts/confirm-rhosls-running.shscripts/env.shscripts/gen-ols.shscripts/gen-rhosls.shtest/kuttl/common/mock-objects/assert-mock-objects-created.yamltest/kuttl/common/mock-objects/cleanup-mock-objects.yamltest/kuttl/common/mock-objects/errors-mock-objects.yamltest/kuttl/common/mock-objects/mock-resources.yamltest/kuttl/common/openstack-lightspeed-instance/assert-lightspeed-stack-config.yamltest/kuttl/common/openstack-lightspeed-instance/assert-llama-stack-config.yamltest/kuttl/common/openstack-lightspeed-instance/assert-openstack-lightspeed-instance.yamltest/kuttl/common/openstack-lightspeed-instance/cleanup-openstack-lightspeed-instance.yamltest/kuttl/common/openstack-lightspeed-instance/create-openstack-lightspeed-instance.yamltest/kuttl/common/openstack-lightspeed-instance/errors-openstack-lightspeed-instance.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/03-assert-lightspeed-stack-config.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/04-assert-llama-stack-config.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/05-assert-openstack-lightspeed-instance.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/06-cleanup-openstack-lightspeed-instance.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/07-errors-openstack-lightspeed-instance.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/08-cleanup-mock-objects.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/09-errors-mock-objects.yamltest/kuttl/tests/update-openstacklightspeed/03-assert-lightspeed-stack-config.yamltest/kuttl/tests/update-openstacklightspeed/04-assert-llama-stack-config.yamltest/kuttl/tests/update-openstacklightspeed/04-update-openstack-lightspeed-instance.yamltest/kuttl/tests/update-openstacklightspeed/05-assert-olsconfig-update.yamltest/kuttl/tests/update-openstacklightspeed/05-assert-openstack-lightspeed-instance.yamltest/kuttl/tests/update-openstacklightspeed/06-assert-openstacklightspeed-update.yamltest/kuttl/tests/update-openstacklightspeed/06-update-openstack-lightspeed-instance.yamltest/kuttl/tests/update-openstacklightspeed/07-assert-configmaps-update.yamltest/kuttl/tests/update-openstacklightspeed/08-assert-lightspeed-stack-config-update.yamltest/kuttl/tests/update-openstacklightspeed/09-assert-llama-stack-config-update.yamltest/kuttl/tests/update-openstacklightspeed/10-assert-openstacklightspeed-update.yamltest/kuttl/tests/update-openstacklightspeed/11-cleanup-openstack-lightspeed-instance.yamltest/kuttl/tests/update-openstacklightspeed/12-errors-openstack-lightspeed-instance.yamltest/kuttl/tests/update-openstacklightspeed/13-cleanup-mock-objects.yamltest/kuttl/tests/update-openstacklightspeed/14-errors-mock-objects.yaml
💤 Files with no reviewable changes (10)
- README.md
- config/crd/bases/lightspeed.openstack.org_openstacklightspeeds.yaml
- scripts/gen-ols.sh
- test/kuttl/tests/update-openstacklightspeed/04-update-openstack-lightspeed-instance.yaml
- test/kuttl/tests/update-openstacklightspeed/06-assert-openstacklightspeed-update.yaml
- Makefile
- bundle/manifests/lightspeed.openstack.org_openstacklightspeeds.yaml
- test/kuttl/tests/update-openstacklightspeed/05-assert-olsconfig-update.yaml
- internal/controller/funcs.go
- internal/controller/ols_install.go
efee6ac to
def8bbc
Compare
|
/test all |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
bundle/manifests/openstack-lightspeed-operator.clusterserviceversion.yaml (1)
56-64:⚠️ Potential issue | 🟡 MinorRemove unmanaged OLM resource metadata from
owned.resources.The CSV lists
ClusterServiceVersion,InstallPlan, andSubscriptionas secondary resources, but the controller never creates, updates, or deletes them. These entries originate from marker comments inapi/v1beta1/openstacklightspeed_types.gothat should be removed to keep the bundle metadata accurate.Suggested cleanup
- - kind: ClusterServiceVersion - name: "" - version: v1alpha1 - - kind: InstallPlan - name: "" - version: v1alpha1 - - kind: Subscription - name: "" - version: v1alpha1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bundle/manifests/openstack-lightspeed-operator.clusterserviceversion.yaml` around lines 56 - 64, The CSV's owned.resources incorrectly includes unmanaged resources (ClusterServiceVersion, InstallPlan, Subscription) because marker comments in api/v1beta1/openstacklightspeed_types.go added those entries; remove the corresponding kubebuilder/OLM marker annotations from the OpenStackLightSpeed CRD type declarations (in openstacklightspeed_types.go) so they no longer generate those secondary resources, then regenerate the bundle/CSV to ensure owned.resources no longer lists ClusterServiceVersion, InstallPlan, or Subscription (verify the CSV's metadata entries were removed).
🧹 Nitpick comments (4)
config/manager/manager.yaml (1)
75-80: Prefer pinned component images before this path becomes the default install flow.These env vars now decide what the operator deploys by default, and the same floating tags are mirrored in
api/v1beta1/openstacklightspeed_types.go, Lines 29-36. Using:latestmakes the reconciled payload non-reproducible across restarts and clusters. Pin tested tags or digests once the migration images are settled.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/manager/manager.yaml` around lines 75 - 80, Replace floating :latest tags with pinned image tags or digests for the env vars RELATED_IMAGE_LCORE_IMAGE_URL_DEFAULT, RELATED_IMAGE_EXPORTER_IMAGE_URL_DEFAULT, and RELATED_IMAGE_POSTGRES_IMAGE_URL_DEFAULT in manager.yaml and ensure the same pinned values are updated in the corresponding defaults declared in api/v1beta1/openstacklightspeed_types.go (the constants that mirror those env vars). Use tested, immutable tags or content-addressable digests (sha256:...) for each image, keeping the descriptive env var names unchanged so the operator picks the pinned images by default.internal/controller/common.go (1)
68-81: Consider caching the raw client to avoid repeated instantiation.
getRawClientis called on everygetConfigMapResourceVersioninvocation (and potentially other places), creating a new client and underlying HTTP connection pool each time. For frequent reconciles, this adds unnecessary overhead.A cached client (e.g., stored on the reconciler struct) could improve performance while still providing direct API server reads.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/common.go` around lines 68 - 81, getConfigMapResourceVersion currently calls getRawClient on every invocation which recreates the client; modify the reconciler/Helper to hold a cached raw client and reuse it instead of re-instantiating. Concretely, add a cached field (e.g., rawClient) on the reconciler or common_helper.Helper, initialize it once (lazy-init) when getRawClient is first called, and have getRawClient return the cached client thereafter; then update getConfigMapResourceVersion to call the updated getRawClient that returns the cached client. Ensure thread-safety for lazy initialization (e.g., mutex) and preserve existing error handling when client creation fails.internal/controller/postgres_reconciler.go (1)
113-175: Label selector indeleteOldPostgresSecretswon't match secrets created here.
deleteOldPostgresSecrets(line 163) uses the label selectorapp.kubernetes.io/name: lightspeed-service-postgres, butreconcilePostgresSecretdoesn't set any labels on the Secret object—onlyName,Namespace,Data, and owner reference.If this cleanup is for migration from an older version that set labels, consider adding a comment to clarify. Otherwise, for consistency and future cleanup needs, add labels to the created secret:
♻️ Proposed fix to add labels
secret := &corev1.Secret{ ObjectMeta: metav1.ObjectMeta{ Name: PostgresSecretName, Namespace: h.GetBeforeObject().GetNamespace(), + Labels: generatePostgresSelectorLabels(), }, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/postgres_reconciler.go` around lines 113 - 175, deleteOldPostgresSecrets uses a label selector ("app.kubernetes.io/name": "lightspeed-service-postgres") but reconcilePostgresSecret never sets labels on the Secret, so the cleanup will not match the secret created; update reconcilePostgresSecret to set consistent labels (at least the same "app.kubernetes.io/name": "lightspeed-service-postgres" or another documented label) on the Secret object before CreateOrPatch and ensure PostgresSecretName secrets include them (or alternatively change deleteOldPostgresSecrets to match by name/ownerRef if intended); touch reconcilePostgresSecret (Secret.ObjectMeta) and deleteOldPostgresSecrets to make the label strategy consistent and add a brief comment explaining migration intent if needed.internal/controller/openstacklightspeed_controller.go (1)
269-276: Remove or clarify misleadingOwns()calls for cluster-scoped resources without owner references.The
Owns(&rbacv1.ClusterRole{})andOwns(&rbacv1.ClusterRoleBinding{})calls at lines 271–272 won't trigger reconciles because these cluster-scoped resources have no owner references set (per comments ininternal/controller/lcore_reconciler.golines 119, 156). Since LCORE manages these resources, either remove theseOwns()calls to avoid misleading future maintainers, or useWatches()with a custom handler if external modification detection is needed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/openstacklightspeed_controller.go` around lines 269 - 276, The Owns(&rbacv1.ClusterRole{}) and Owns(&rbacv1.ClusterRoleBinding{}) entries are misleading because cluster-scoped resources don't get ownerReferences and so won't trigger reconciles; in the controller builder (the SetupWithManager / controller construction block where Owns(...) is chained) remove these two Owns(...) calls or replace them with Watches(...) + a custom handler if you need to detect external changes to ClusterRole/ClusterRoleBinding, ensuring you reference the specific types rbacv1.ClusterRole and rbacv1.ClusterRoleBinding and leave the other Owns(...) calls for namespaced resources intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/build-and-push.yaml:
- Around line 4-6: Update the workflow trigger under "push: branches:" to use
the actual development branch name "lcore-migration" instead of
"lcore-migration-latest", and then change the tag-generation logic that appends
"-latest" (the step that currently concatenates "-latest" to the branch name) so
it only appends "-latest" when the branch name is not already suffixed with
"-latest" (or explicitly handle the "lcore-migration" branch to produce
"lcore-migration-latest"); reference the "push: branches" entry and the job/step
that constructs the image tag (the code that appends "-latest") and make the
conditional string logic idempotent to avoid producing
"lcore-migration-latest-latest".
In `@bundle/manifests/openstack-lightspeed-operator.clusterserviceversion.yaml`:
- Around line 243-248: The CSV currently sets
RELATED_IMAGE_LCORE_IMAGE_URL_DEFAULT, RELATED_IMAGE_EXPORTER_IMAGE_URL_DEFAULT,
and RELATED_IMAGE_POSTGRES_IMAGE_URL_DEFAULT to mutable :latest tags (and the
operator image likewise) which breaks reproducibility; update those environment
values to use immutable tags or image digests (sha256) instead — replace the
:latest strings for RELATED_IMAGE_LCORE_IMAGE_URL_DEFAULT,
RELATED_IMAGE_EXPORTER_IMAGE_URL_DEFAULT,
RELATED_IMAGE_POSTGRES_IMAGE_URL_DEFAULT and the operator image entry with
specific version tags or digests so the ClusterServiceVersion references fixed
images.
In `@internal/controller/assets/postgres_bootstrap.sh`:
- Around line 1-23: Enable fail-fast bash behavior at the top of the script so
any failing command aborts the bootstrap; add e.g. "set -euo pipefail" (or at
minimum "set -e") immediately after the shebang, ensure existing helper function
_psql and usages (echo ... | _psql -d $POSTGRESQL_DATABASE and echo ... | _psql
-d $DB_NAME) remain unchanged, and optionally add a trap to log and exit on
error to make failures explicit during DB_NAME/database/extension/schema
creation.
In `@internal/controller/lcore_deployment.go`:
- Around line 71-76: The LLAMA_STACK_LOGGING env var is set to "all=debug" by
default in the llamaStackEnvVars block which is too noisy; change the default to
an info-level setting (e.g. "all=info") and make debug opt-in by checking a
config or env flag (for example an ENABLE_LLAMA_STACK_DEBUG boolean) before
overriding the default to "all=debug"; update the code that builds
llamaStackEnvVars (referencing llamaStackEnvVars, llamaEnvVars,
buildPostgresPasswordEnvVar and the corev1.EnvVar with Name
"LLAMA_STACK_LOGGING") so the default is info and only flips to debug when the
opt-in flag is set.
- Around line 457-481: buildConfigMapAnnotations only adds ConfigMap
resourceVersions so secret rotations don't cause rollouts; update this function
to also fetch resourceVersions for the Postgres secret and the provider
credentials secret and add them to the returned annotations. Use the existing
pattern with getConfigMapResourceVersion (or create a getSecretResourceVersion
helper) to get resourceVersion for PostgresSecretName and
provider.CredentialsSecret (using the same context/namespace:
h.GetBeforeObject().GetNamespace()), and add entries such as
PostgresSecretResourceVersionAnnotation and
ProviderCredentialsSecretResourceVersionAnnotation to the annotations map;
preserve the existing NotFound handling and return errors on other failures.
- Around line 48-52: The TLS secret volumes are being added to sharedMounts via
addTLSVolumesAndMounts(...) which causes the lightspeed-tls secret to be mounted
into both containers; remove the TLS mount from the llama-stack container by
ensuring the TLS volumes are only mounted into the lightspeed-service-api
container (i.e., stop adding TLS volumes to whatever variable is used for
llamaStackMounts or split mounts so addTLSVolumesAndMounts(...) is applied only
to the API mounts), update any other calls where addTLSVolumesAndMounts(...) or
related addOpenShiftCAVolumesAndMounts/addOpenShiftRootCAVolumesAndMounts are
being passed shared mounts for both containers (also adjust the occurrences
around addUserCAVolumesAndMounts) so that llama-stack does not receive the TLS
keypair mount.
- Around line 80-89: The llama-stack container lacks a readiness probe so the
pod can be marked ready before the sidecar is accepting traffic; add a
ReadinessProbe to the llamaStackContainer (the container with Name "llama-stack"
and port 8321) by setting its ReadinessProbe to a corev1.Probe that checks a
TCPSocketAction (or HTTPGet if the service exposes an HTTP endpoint) against
port 8321 (use the same port reference as in Ports/ContainerPort), and configure
sensible timing values (e.g., initialDelaySeconds, periodSeconds,
timeoutSeconds, failureThreshold) so the pod only becomes Ready when the
llama-stack sidecar is ready; update the llamaStackContainer struct
initialization to include this ReadinessProbe.
In `@internal/controller/llama_stack_config.go`:
- Around line 176-184: The switch in the Llama Stack provider handling currently
treats "watsonx" and "bam" (and omits "fake_provider") as hard errors via
fmt.Errorf using provider.Type and provider.Name, which conflicts with the CRD
that still admits those values; update the switch in the function handling
provider resolution (the block that checks provider.Type and returns errors
using provider.Type/provider.Name) to either: 1) add cases for "watsonx", "bam",
and "fake_provider" that return nil (or a no-op/explicit "handled by
lightspeed-stack" success) so those runtime providers remain supported at
reconcile time, or 2) if you prefer stricter validation, align the CRD to remove
those admitted values—choose the former to preserve compatibility and add
explicit comments that these are handled by lightspeed-stack.
In `@internal/controller/postgres_deployment.go`:
- Around line 88-98: The deployment currently mounts PostgresDataVolume using an
EmptyDir (via volumes and volumeMounts) which will delete DB files on any pod
restart; change the VolumeSource to use a PersistentVolumeClaim
(corev1.PersistentVolumeClaimVolumeSource with a configurable ClaimName) instead
of EmptyDir and ensure a corresponding PVC resource is created/managed (or make
the controller accept an external DB/claim name). Update the code that
constructs volumes (where PostgresDataVolume is added) to reference the PVC, and
ensure PostgresDataVolumeMountPath and any config/flags expose the claim name so
operators can supply a durable PVC or external-database strategy before release.
In
`@test/kuttl/tests/update-openstacklightspeed/10-assert-openstacklightspeed-update.yaml`:
- Around line 63-73: Remove the brittle exact generation assertions from the
Deployment test: delete the status.generation and status.observedGeneration
entries in the Deployment example (referencing metadata.name:
lightspeed-stack-deployment and namespace: openstack-lightspeed) so the test
only asserts meaningful state (replicas, readyReplicas, availableReplicas) and
no longer depends on controller bookkeeping fields.
---
Outside diff comments:
In `@bundle/manifests/openstack-lightspeed-operator.clusterserviceversion.yaml`:
- Around line 56-64: The CSV's owned.resources incorrectly includes unmanaged
resources (ClusterServiceVersion, InstallPlan, Subscription) because marker
comments in api/v1beta1/openstacklightspeed_types.go added those entries; remove
the corresponding kubebuilder/OLM marker annotations from the
OpenStackLightSpeed CRD type declarations (in openstacklightspeed_types.go) so
they no longer generate those secondary resources, then regenerate the
bundle/CSV to ensure owned.resources no longer lists ClusterServiceVersion,
InstallPlan, or Subscription (verify the CSV's metadata entries were removed).
---
Nitpick comments:
In `@config/manager/manager.yaml`:
- Around line 75-80: Replace floating :latest tags with pinned image tags or
digests for the env vars RELATED_IMAGE_LCORE_IMAGE_URL_DEFAULT,
RELATED_IMAGE_EXPORTER_IMAGE_URL_DEFAULT, and
RELATED_IMAGE_POSTGRES_IMAGE_URL_DEFAULT in manager.yaml and ensure the same
pinned values are updated in the corresponding defaults declared in
api/v1beta1/openstacklightspeed_types.go (the constants that mirror those env
vars). Use tested, immutable tags or content-addressable digests (sha256:...)
for each image, keeping the descriptive env var names unchanged so the operator
picks the pinned images by default.
In `@internal/controller/common.go`:
- Around line 68-81: getConfigMapResourceVersion currently calls getRawClient on
every invocation which recreates the client; modify the reconciler/Helper to
hold a cached raw client and reuse it instead of re-instantiating. Concretely,
add a cached field (e.g., rawClient) on the reconciler or common_helper.Helper,
initialize it once (lazy-init) when getRawClient is first called, and have
getRawClient return the cached client thereafter; then update
getConfigMapResourceVersion to call the updated getRawClient that returns the
cached client. Ensure thread-safety for lazy initialization (e.g., mutex) and
preserve existing error handling when client creation fails.
In `@internal/controller/openstacklightspeed_controller.go`:
- Around line 269-276: The Owns(&rbacv1.ClusterRole{}) and
Owns(&rbacv1.ClusterRoleBinding{}) entries are misleading because cluster-scoped
resources don't get ownerReferences and so won't trigger reconciles; in the
controller builder (the SetupWithManager / controller construction block where
Owns(...) is chained) remove these two Owns(...) calls or replace them with
Watches(...) + a custom handler if you need to detect external changes to
ClusterRole/ClusterRoleBinding, ensuring you reference the specific types
rbacv1.ClusterRole and rbacv1.ClusterRoleBinding and leave the other Owns(...)
calls for namespaced resources intact.
In `@internal/controller/postgres_reconciler.go`:
- Around line 113-175: deleteOldPostgresSecrets uses a label selector
("app.kubernetes.io/name": "lightspeed-service-postgres") but
reconcilePostgresSecret never sets labels on the Secret, so the cleanup will not
match the secret created; update reconcilePostgresSecret to set consistent
labels (at least the same "app.kubernetes.io/name":
"lightspeed-service-postgres" or another documented label) on the Secret object
before CreateOrPatch and ensure PostgresSecretName secrets include them (or
alternatively change deleteOldPostgresSecrets to match by name/ownerRef if
intended); touch reconcilePostgresSecret (Secret.ObjectMeta) and
deleteOldPostgresSecrets to make the label strategy consistent and add a brief
comment explaining migration intent if needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3677e479-1f6b-4580-b81f-0d7c08976940
📒 Files selected for processing (65)
.github/workflows/build-and-push.yamlMakefileREADME.mdapi/v1beta1/conditions.goapi/v1beta1/openstacklightspeed_types.gobundle/manifests/lightspeed.openstack.org_openstacklightspeeds.yamlbundle/manifests/openstack-lightspeed-operator-manager-rolebinding_rbac.authorization.k8s.io_v1_rolebinding.yamlbundle/manifests/openstack-lightspeed-operator.clusterserviceversion.yamlconfig/crd/bases/lightspeed.openstack.org_openstacklightspeeds.yamlconfig/manager/manager.yamlconfig/manifests/bases/openstack-lightspeed-operator.clusterserviceversion.yamlconfig/rbac/role.yamlhack/env.shinternal/controller/assets/postgres.confinternal/controller/assets/postgres_bootstrap.shinternal/controller/assets/system_prompt.txtinternal/controller/common.gointernal/controller/constants.gointernal/controller/errors.gointernal/controller/funcs.gointernal/controller/lcore_config.gointernal/controller/lcore_deployment.gointernal/controller/lcore_reconciler.gointernal/controller/llama_stack_config.gointernal/controller/ocp_version.gointernal/controller/openstacklightspeed_controller.gointernal/controller/postgres_deployment.gointernal/controller/postgres_reconciler.gointernal/controller/reconcile_tasks.gokuttl-test.yamlscripts/confirm-rhosls-running.shscripts/env.shscripts/gen-rhosls.shtest/kuttl/common/mock-objects/assert-mock-objects-created.yamltest/kuttl/common/mock-objects/cleanup-mock-objects.yamltest/kuttl/common/mock-objects/errors-mock-objects.yamltest/kuttl/common/mock-objects/mock-resources.yamltest/kuttl/common/openstack-lightspeed-instance/assert-lightspeed-stack-config.yamltest/kuttl/common/openstack-lightspeed-instance/assert-llama-stack-config.yamltest/kuttl/common/openstack-lightspeed-instance/assert-openstack-lightspeed-instance.yamltest/kuttl/common/openstack-lightspeed-instance/cleanup-openstack-lightspeed-instance.yamltest/kuttl/common/openstack-lightspeed-instance/create-openstack-lightspeed-instance.yamltest/kuttl/common/openstack-lightspeed-instance/errors-openstack-lightspeed-instance.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/03-assert-lightspeed-stack-config.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/04-assert-llama-stack-config.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/05-assert-openstack-lightspeed-instance.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/06-cleanup-openstack-lightspeed-instance.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/07-errors-openstack-lightspeed-instance.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/08-cleanup-mock-objects.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/09-errors-mock-objects.yamltest/kuttl/tests/update-openstacklightspeed/03-assert-lightspeed-stack-config.yamltest/kuttl/tests/update-openstacklightspeed/04-assert-llama-stack-config.yamltest/kuttl/tests/update-openstacklightspeed/04-update-openstack-lightspeed-instance.yamltest/kuttl/tests/update-openstacklightspeed/05-assert-olsconfig-update.yamltest/kuttl/tests/update-openstacklightspeed/05-assert-openstack-lightspeed-instance.yamltest/kuttl/tests/update-openstacklightspeed/06-assert-openstacklightspeed-update.yamltest/kuttl/tests/update-openstacklightspeed/06-update-openstack-lightspeed-instance.yamltest/kuttl/tests/update-openstacklightspeed/07-assert-configmaps-update.yamltest/kuttl/tests/update-openstacklightspeed/08-assert-lightspeed-stack-config-update.yamltest/kuttl/tests/update-openstacklightspeed/09-assert-llama-stack-config-update.yamltest/kuttl/tests/update-openstacklightspeed/10-assert-openstacklightspeed-update.yamltest/kuttl/tests/update-openstacklightspeed/11-cleanup-openstack-lightspeed-instance.yamltest/kuttl/tests/update-openstacklightspeed/12-errors-openstack-lightspeed-instance.yamltest/kuttl/tests/update-openstacklightspeed/13-cleanup-mock-objects.yamltest/kuttl/tests/update-openstacklightspeed/14-errors-mock-objects.yaml
💤 Files with no reviewable changes (8)
- README.md
- bundle/manifests/lightspeed.openstack.org_openstacklightspeeds.yaml
- config/crd/bases/lightspeed.openstack.org_openstacklightspeeds.yaml
- Makefile
- test/kuttl/tests/update-openstacklightspeed/06-assert-openstacklightspeed-update.yaml
- test/kuttl/tests/update-openstacklightspeed/04-update-openstack-lightspeed-instance.yaml
- test/kuttl/tests/update-openstacklightspeed/05-assert-olsconfig-update.yaml
- internal/controller/funcs.go
✅ Files skipped from review due to trivial changes (16)
- kuttl-test.yaml
- test/kuttl/tests/update-openstacklightspeed/03-assert-lightspeed-stack-config.yaml
- test/kuttl/common/openstack-lightspeed-instance/cleanup-openstack-lightspeed-instance.yaml
- test/kuttl/tests/update-openstacklightspeed/04-assert-llama-stack-config.yaml
- internal/controller/assets/postgres.conf
- test/kuttl/tests/basic-openstack-lightspeed-configuration/03-assert-lightspeed-stack-config.yaml
- test/kuttl/tests/basic-openstack-lightspeed-configuration/04-assert-llama-stack-config.yaml
- test/kuttl/common/openstack-lightspeed-instance/create-openstack-lightspeed-instance.yaml
- api/v1beta1/conditions.go
- test/kuttl/common/mock-objects/errors-mock-objects.yaml
- test/kuttl/common/mock-objects/cleanup-mock-objects.yaml
- bundle/manifests/openstack-lightspeed-operator-manager-rolebinding_rbac.authorization.k8s.io_v1_rolebinding.yaml
- hack/env.sh
- test/kuttl/tests/update-openstacklightspeed/06-update-openstack-lightspeed-instance.yaml
- internal/controller/errors.go
- internal/controller/constants.go
🚧 Files skipped from review as they are similar to previous changes (9)
- scripts/env.sh
- scripts/confirm-rhosls-running.sh
- test/kuttl/common/mock-objects/mock-resources.yaml
- scripts/gen-rhosls.sh
- config/manifests/bases/openstack-lightspeed-operator.clusterserviceversion.yaml
- internal/controller/reconcile_tasks.go
- internal/controller/lcore_config.go
- internal/controller/ocp_version.go
- internal/controller/lcore_reconciler.go
def8bbc to
146ae42
Compare
|
/test all |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (7)
internal/controller/assets/postgres_bootstrap.sh (1)
14-25: Consider quoting shell variables for robustness.Shellcheck correctly identifies that
$POSTGRESQL_DATABASEand$DB_NAMEshould be double-quoted to prevent potential issues with word splitting or globbing if these variables ever contain special characters.♻️ Suggested improvement
-echo "SELECT 'CREATE DATABASE $DB_NAME' WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = '$DB_NAME')\gexec" | _psql -d $POSTGRESQL_DATABASE +echo "SELECT 'CREATE DATABASE $DB_NAME' WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = '$DB_NAME')\gexec" | _psql -d "$POSTGRESQL_DATABASE" # Create pg_trgm extension in default database (for OpenStack Lightspeed conversation cache) -echo "CREATE EXTENSION IF NOT EXISTS pg_trgm;" | _psql -d $POSTGRESQL_DATABASE +echo "CREATE EXTENSION IF NOT EXISTS pg_trgm;" | _psql -d "$POSTGRESQL_DATABASE" # Create pg_trgm extension in llama-stack database (for text search if needed) -echo "CREATE EXTENSION IF NOT EXISTS pg_trgm;" | _psql -d $DB_NAME +echo "CREATE EXTENSION IF NOT EXISTS pg_trgm;" | _psql -d "$DB_NAME" # Create schemas for isolating different components' data -echo "CREATE SCHEMA IF NOT EXISTS lcore;" | _psql -d $POSTGRESQL_DATABASE -echo "CREATE SCHEMA IF NOT EXISTS quota;" | _psql -d $POSTGRESQL_DATABASE -echo "CREATE SCHEMA IF NOT EXISTS conversation_cache;" | _psql -d $POSTGRESQL_DATABASE +echo "CREATE SCHEMA IF NOT EXISTS lcore;" | _psql -d "$POSTGRESQL_DATABASE" +echo "CREATE SCHEMA IF NOT EXISTS quota;" | _psql -d "$POSTGRESQL_DATABASE" +echo "CREATE SCHEMA IF NOT EXISTS conversation_cache;" | _psql -d "$POSTGRESQL_DATABASE"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/assets/postgres_bootstrap.sh` around lines 14 - 25, The shell invocations use unquoted variables ($POSTGRESQL_DATABASE, $DB_NAME) which can break with spaces or globbing; update every call to _psql and any other expansions to use quoted parameter expansions (e.g., _psql -d "$POSTGRESQL_DATABASE" and _psql -d "$DB_NAME") and prefer ${DB_NAME}/${POSTGRESQL_DATABASE} where used inside strings so all occurrences in the echo lines and the _psql calls (the lines creating the database, extensions, and schemas) are robust to special characters.internal/controller/postgres_deployment.go (1)
33-34: Consider preallocating slices for minor performance improvement.The slices grow to exactly 7 elements. Preallocating avoids reallocations during appends.
♻️ Optional preallocation
func buildPostgresPodTemplateSpec() corev1.PodTemplateSpec { // Build volumes and volume mounts - volumes := []corev1.Volume{} - volumeMounts := []corev1.VolumeMount{} + volumes := make([]corev1.Volume, 0, 7) + volumeMounts := make([]corev1.VolumeMount, 0, 7)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/postgres_deployment.go` around lines 33 - 34, The volumes and volumeMounts slices are created without capacity and will grow to exactly 7 elements; change their initialization to preallocate capacity 7 (use make with length 0 and capacity 7) for volumes and volumeMounts to avoid reallocations during appends — update the declarations of the variables named volumes and volumeMounts in postgres_deployment.go accordingly.internal/controller/lcore_deployment.go (1)
36-36: Consider preallocating slices for minor performance improvement.The
volumesslice grows as mounts are added. Preallocating with estimated capacity avoids reallocations.♻️ Optional preallocation
// Build shared volumes - volumes := []corev1.Volume{} + volumes := make([]corev1.Volume, 0, 10) // estimated capacity for all volume types🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/lcore_deployment.go` at line 36, The volumes slice is created with zero capacity and may reallocate as mounts are appended; change the declaration of volumes (volumes := []corev1.Volume{}) to preallocate capacity using make with a sensible estimate (e.g., volumes := make([]corev1.Volume, 0, len(mounts)) or len(container.VolumeMounts) depending on the source of mounts) so appends avoid repeated reallocations in the function handling deployment construction.internal/controller/lcore_reconciler.go (1)
390-419: Consider timeout implications for TLS secret polling.The
reconcileTLSSecretfunction useswait.PollUntilContextTimeoutwithResourceCreationTimeoutto wait for the service-ca operator to populate the TLS secret. This blocks the reconciliation goroutine during the wait.While acceptable for initial deployment, consider:
- If
ResourceCreationTimeoutis large, this could tie up the reconciliation worker- If the service-ca operator is unavailable, reconciliation will be blocked until timeout
The current approach works, but you may want to consider returning a requeue result instead of blocking, allowing the controller to process other events during the wait.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/lcore_reconciler.go` around lines 390 - 419, reconcileTLSSecret currently blocks the reconciliation goroutine by calling wait.PollUntilContextTimeout with ResourceCreationTimeout; change the logic in reconcileTLSSecret so that instead of long-polling you return control quickly and request a requeue when the secret is not yet ready: detect missing tls.key/tls.crt via h.GetClient().Get on the secret (same secretKey and corev1.Secret used now) and if not present or incomplete return an error/indicate RequeueNeeded (or a reconcile.Result with RequeueAfter) so the controller requeues the reconciliation rather than sleeping for ResourceCreationTimeout; keep the successful path the same (log TLS secret ready and return nil).internal/controller/common.go (2)
164-176: Consider returningnilinstead of empty Deployment on NotFound.When the deployment is not found, the function returns an empty
&appsv1.Deployment{}along with an error. This could lead to nil-pointer issues if the caller doesn't check the error before using the deployment. Returningnilwould be more idiomatic.♻️ Proposed fix
func getDeployment(ctx context.Context, h *common_helper.Helper, name string, namespace string) (*appsv1.Deployment, error) { deployment := &appsv1.Deployment{} err := h.GetClient().Get(ctx, types.NamespacedName{Name: name, Namespace: namespace}, deployment) if err != nil { if k8s_errors.IsNotFound(err) { - return &appsv1.Deployment{}, errors.New("deployment not found") + return nil, errors.New("deployment not found") } - return &appsv1.Deployment{}, fmt.Errorf("failed to get deployment %s: %w", name, err) + return nil, fmt.Errorf("failed to get deployment %s: %w", name, err) } return deployment, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/common.go` around lines 164 - 176, The getDeployment function currently returns an empty &appsv1.Deployment{} when an error occurs; change it to return nil for error cases to avoid accidental use of a non-nil but empty Deployment. Specifically, in getDeployment replace returns of &appsv1.Deployment{} with nil in both the k8s_errors.IsNotFound branch and the generic error branch (the calls that currently return &appsv1.Deployment{}, errors.New(...) and fmt.Errorf(...)), ensuring the success path still returns the populated deployment and nil error; keep function signature unchanged.
19-33: Unusedembedimport.The
embedpackage is imported with a blank identifier (_ "embed") but there's no//go:embeddirective in this file. The system prompt embedding is inlcore_config.go, not here.♻️ Proposed fix
import ( "context" - _ "embed" "errors" "fmt" "strings"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/common.go` around lines 19 - 33, Remove the unused blank import `_ "embed"` from the import block in common.go: locate the import list (the block containing context, errors, fmt, strings, common_helper, appsv1, corev1, k8s_errors, types, client, config) and delete the `_ "embed"` entry, then run `go build`/`gofmt` or your project's formatter to ensure imports are cleaned up; confirm any actual //go:embed directives remain only where needed (e.g., lcore_config.go).internal/controller/lcore_config.go (1)
102-115: Unused parameters in config builder functions.Several builder functions (
buildLCoreServiceConfig,buildLCoreAuthenticationConfig, etc.) accept*common_helper.Helperand/or*apiv1beta1.OpenStackLightspeedparameters but don't use them (indicated by_naming). This is acceptable for interface consistency with other builders, but consider documenting this pattern or using a common signature type if the pattern persists.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/lcore_config.go` around lines 102 - 115, buildLCoreServiceConfig and the other builder funcs accept *common_helper.Helper and *apiv1beta1.OpenStackLightspeed but deliberately ignore them (using `_`), which is fine for a shared builder signature; add a short comment above each affected function (e.g., buildLCoreServiceConfig, buildLCoreAuthenticationConfig, etc.) stating that the parameters are intentionally unused to satisfy the common builder signature, or alternatively replace the separate signatures with a single shared type alias for the builder signature and use that consistently so intent is explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/controller/lcore_deployment.go`:
- Around line 79-97: The struct literal for llamaStackContainer has misaligned
fields causing gofmt failures; run gofmt -w on the file or reformat the
container literal so fields (Name, Image, Command, Ports, VolumeMounts, Env,
ReadinessProbe, Resources, ImagePullPolicy) are properly aligned per gofmt
rules; locate the llamaStackContainer definition (using
LlamaStackConfigMountPath and getResourcesOrDefault) and apply gofmt formatting
to internal/controller/lcore_deployment.go to fix spacing.
---
Nitpick comments:
In `@internal/controller/assets/postgres_bootstrap.sh`:
- Around line 14-25: The shell invocations use unquoted variables
($POSTGRESQL_DATABASE, $DB_NAME) which can break with spaces or globbing; update
every call to _psql and any other expansions to use quoted parameter expansions
(e.g., _psql -d "$POSTGRESQL_DATABASE" and _psql -d "$DB_NAME") and prefer
${DB_NAME}/${POSTGRESQL_DATABASE} where used inside strings so all occurrences
in the echo lines and the _psql calls (the lines creating the database,
extensions, and schemas) are robust to special characters.
In `@internal/controller/common.go`:
- Around line 164-176: The getDeployment function currently returns an empty
&appsv1.Deployment{} when an error occurs; change it to return nil for error
cases to avoid accidental use of a non-nil but empty Deployment. Specifically,
in getDeployment replace returns of &appsv1.Deployment{} with nil in both the
k8s_errors.IsNotFound branch and the generic error branch (the calls that
currently return &appsv1.Deployment{}, errors.New(...) and fmt.Errorf(...)),
ensuring the success path still returns the populated deployment and nil error;
keep function signature unchanged.
- Around line 19-33: Remove the unused blank import `_ "embed"` from the import
block in common.go: locate the import list (the block containing context,
errors, fmt, strings, common_helper, appsv1, corev1, k8s_errors, types, client,
config) and delete the `_ "embed"` entry, then run `go build`/`gofmt` or your
project's formatter to ensure imports are cleaned up; confirm any actual
//go:embed directives remain only where needed (e.g., lcore_config.go).
In `@internal/controller/lcore_config.go`:
- Around line 102-115: buildLCoreServiceConfig and the other builder funcs
accept *common_helper.Helper and *apiv1beta1.OpenStackLightspeed but
deliberately ignore them (using `_`), which is fine for a shared builder
signature; add a short comment above each affected function (e.g.,
buildLCoreServiceConfig, buildLCoreAuthenticationConfig, etc.) stating that the
parameters are intentionally unused to satisfy the common builder signature, or
alternatively replace the separate signatures with a single shared type alias
for the builder signature and use that consistently so intent is explicit.
In `@internal/controller/lcore_deployment.go`:
- Line 36: The volumes slice is created with zero capacity and may reallocate as
mounts are appended; change the declaration of volumes (volumes :=
[]corev1.Volume{}) to preallocate capacity using make with a sensible estimate
(e.g., volumes := make([]corev1.Volume, 0, len(mounts)) or
len(container.VolumeMounts) depending on the source of mounts) so appends avoid
repeated reallocations in the function handling deployment construction.
In `@internal/controller/lcore_reconciler.go`:
- Around line 390-419: reconcileTLSSecret currently blocks the reconciliation
goroutine by calling wait.PollUntilContextTimeout with ResourceCreationTimeout;
change the logic in reconcileTLSSecret so that instead of long-polling you
return control quickly and request a requeue when the secret is not yet ready:
detect missing tls.key/tls.crt via h.GetClient().Get on the secret (same
secretKey and corev1.Secret used now) and if not present or incomplete return an
error/indicate RequeueNeeded (or a reconcile.Result with RequeueAfter) so the
controller requeues the reconciliation rather than sleeping for
ResourceCreationTimeout; keep the successful path the same (log TLS secret ready
and return nil).
In `@internal/controller/postgres_deployment.go`:
- Around line 33-34: The volumes and volumeMounts slices are created without
capacity and will grow to exactly 7 elements; change their initialization to
preallocate capacity 7 (use make with length 0 and capacity 7) for volumes and
volumeMounts to avoid reallocations during appends — update the declarations of
the variables named volumes and volumeMounts in postgres_deployment.go
accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0ebd08c9-8b88-4fb2-b4e6-4238a4f0b7f5
📒 Files selected for processing (65)
.github/workflows/build-and-push.yamlMakefileREADME.mdapi/v1beta1/conditions.goapi/v1beta1/openstacklightspeed_types.gobundle/manifests/lightspeed.openstack.org_openstacklightspeeds.yamlbundle/manifests/openstack-lightspeed-operator-manager-rolebinding_rbac.authorization.k8s.io_v1_rolebinding.yamlbundle/manifests/openstack-lightspeed-operator.clusterserviceversion.yamlconfig/crd/bases/lightspeed.openstack.org_openstacklightspeeds.yamlconfig/manager/manager.yamlconfig/manifests/bases/openstack-lightspeed-operator.clusterserviceversion.yamlconfig/rbac/role.yamlhack/env.shinternal/controller/assets/postgres.confinternal/controller/assets/postgres_bootstrap.shinternal/controller/assets/system_prompt.txtinternal/controller/common.gointernal/controller/constants.gointernal/controller/errors.gointernal/controller/funcs.gointernal/controller/lcore_config.gointernal/controller/lcore_deployment.gointernal/controller/lcore_reconciler.gointernal/controller/llama_stack_config.gointernal/controller/ocp_version.gointernal/controller/openstacklightspeed_controller.gointernal/controller/postgres_deployment.gointernal/controller/postgres_reconciler.gointernal/controller/reconcile_tasks.gokuttl-test.yamlscripts/confirm-rhosls-running.shscripts/env.shscripts/gen-rhosls.shtest/kuttl/common/mock-objects/assert-mock-objects-created.yamltest/kuttl/common/mock-objects/cleanup-mock-objects.yamltest/kuttl/common/mock-objects/errors-mock-objects.yamltest/kuttl/common/mock-objects/mock-resources.yamltest/kuttl/common/openstack-lightspeed-instance/assert-lightspeed-stack-config.yamltest/kuttl/common/openstack-lightspeed-instance/assert-llama-stack-config.yamltest/kuttl/common/openstack-lightspeed-instance/assert-openstack-lightspeed-instance.yamltest/kuttl/common/openstack-lightspeed-instance/cleanup-openstack-lightspeed-instance.yamltest/kuttl/common/openstack-lightspeed-instance/create-openstack-lightspeed-instance.yamltest/kuttl/common/openstack-lightspeed-instance/errors-openstack-lightspeed-instance.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/03-assert-lightspeed-stack-config.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/04-assert-llama-stack-config.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/05-assert-openstack-lightspeed-instance.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/06-cleanup-openstack-lightspeed-instance.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/07-errors-openstack-lightspeed-instance.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/08-cleanup-mock-objects.yamltest/kuttl/tests/basic-openstack-lightspeed-configuration/09-errors-mock-objects.yamltest/kuttl/tests/update-openstacklightspeed/03-assert-lightspeed-stack-config.yamltest/kuttl/tests/update-openstacklightspeed/04-assert-llama-stack-config.yamltest/kuttl/tests/update-openstacklightspeed/04-update-openstack-lightspeed-instance.yamltest/kuttl/tests/update-openstacklightspeed/05-assert-olsconfig-update.yamltest/kuttl/tests/update-openstacklightspeed/05-assert-openstack-lightspeed-instance.yamltest/kuttl/tests/update-openstacklightspeed/06-assert-openstacklightspeed-update.yamltest/kuttl/tests/update-openstacklightspeed/06-update-openstack-lightspeed-instance.yamltest/kuttl/tests/update-openstacklightspeed/07-assert-configmaps-update.yamltest/kuttl/tests/update-openstacklightspeed/08-assert-lightspeed-stack-config-update.yamltest/kuttl/tests/update-openstacklightspeed/09-assert-llama-stack-config-update.yamltest/kuttl/tests/update-openstacklightspeed/10-assert-openstacklightspeed-update.yamltest/kuttl/tests/update-openstacklightspeed/11-cleanup-openstack-lightspeed-instance.yamltest/kuttl/tests/update-openstacklightspeed/12-errors-openstack-lightspeed-instance.yamltest/kuttl/tests/update-openstacklightspeed/13-cleanup-mock-objects.yamltest/kuttl/tests/update-openstacklightspeed/14-errors-mock-objects.yaml
💤 Files with no reviewable changes (8)
- README.md
- bundle/manifests/lightspeed.openstack.org_openstacklightspeeds.yaml
- config/crd/bases/lightspeed.openstack.org_openstacklightspeeds.yaml
- test/kuttl/tests/update-openstacklightspeed/06-assert-openstacklightspeed-update.yaml
- Makefile
- test/kuttl/tests/update-openstacklightspeed/04-update-openstack-lightspeed-instance.yaml
- test/kuttl/tests/update-openstacklightspeed/05-assert-olsconfig-update.yaml
- internal/controller/funcs.go
✅ Files skipped from review due to trivial changes (18)
- internal/controller/assets/postgres.conf
- test/kuttl/common/openstack-lightspeed-instance/cleanup-openstack-lightspeed-instance.yaml
- kuttl-test.yaml
- test/kuttl/tests/update-openstacklightspeed/04-assert-llama-stack-config.yaml
- scripts/env.sh
- test/kuttl/tests/update-openstacklightspeed/03-assert-lightspeed-stack-config.yaml
- test/kuttl/tests/basic-openstack-lightspeed-configuration/03-assert-lightspeed-stack-config.yaml
- test/kuttl/common/mock-objects/errors-mock-objects.yaml
- test/kuttl/tests/basic-openstack-lightspeed-configuration/04-assert-llama-stack-config.yaml
- api/v1beta1/conditions.go
- test/kuttl/common/mock-objects/cleanup-mock-objects.yaml
- config/manager/manager.yaml
- scripts/gen-rhosls.sh
- hack/env.sh
- internal/controller/reconcile_tasks.go
- test/kuttl/tests/update-openstacklightspeed/06-update-openstack-lightspeed-instance.yaml
- internal/controller/errors.go
- internal/controller/constants.go
🚧 Files skipped from review as they are similar to previous changes (4)
- test/kuttl/common/openstack-lightspeed-instance/create-openstack-lightspeed-instance.yaml
- bundle/manifests/openstack-lightspeed-operator-manager-rolebinding_rbac.authorization.k8s.io_v1_rolebinding.yaml
- test/kuttl/common/mock-objects/mock-resources.yaml
- scripts/confirm-rhosls-running.sh
|
/test all |
Akrog
left a comment
There was a problem hiding this comment.
Code looks good to me, although there are a couple of things that would prefer to see changed.
I haven't actually tested the code.
umago
left a comment
There was a problem hiding this comment.
I skimmed thru it (it's a lot) and it looks good! This is just the first iteration, we will be going over these files as we implement more things so I don't expect we will polish everything in this PR. Thanks Lukas!
|
/lgtm |
Akrog
left a comment
There was a problem hiding this comment.
Please consolidate commits so we can have a nicer history.
Reuse the code from the openshift/lightspeed-operator that is responsible for the deployment of the lightspeed-stack. The operator is now capable of deploying the following containers: - PostgreSQL database - lightspeed-stack - llama-stack (running in server mode) Change the default namespace for the operator from openshift-lightspeed to openstack-lightspeed.
The migration to lightspeed-core changed how configuration is managed. Values from the OpenstackLightspeed CR now propagate to lightspeed-stack-config and llama-stack-config CMs instead of OLSConfig. Tests must verify this new propagation path.
Ensure that the build of the operator on the lcore-migration branch is pushed to quay.io with distinctive tag so taht we can differentiate between builds from lcore-migration and those from main.
3cacee6 to
0fb51dd
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Akrog, lpiwowar The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
2b255f2
into
openstack-lightspeed:lcore-migration
This PR introduces three changes:
As part of the
lightspeed-stackdeployment, neither the OpenShift console nor the Dataverse exporter is deployed. Only the core services are started:How to test?
To validate that
lightspeed-stackwas deployed successfully, you can run the following command:within a toolbox pod deployed with the following manifests:
Summary by CodeRabbit
New Features
Changes
Tests
Documentation