Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-mgmt doesn't reload configmaps if opa container restarts #189

Closed
alex0z1 opened this issue Mar 3, 2023 · 6 comments
Closed

kube-mgmt doesn't reload configmaps if opa container restarts #189

alex0z1 opened this issue Mar 3, 2023 · 6 comments

Comments

@alex0z1
Copy link

alex0z1 commented Mar 3, 2023

I have the following configuration


---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: opa
  namespace: opa
  name: opa
spec:
  replicas: 1
  selector:
    matchLabels:
      app: opa
  template:
    metadata:
      labels:
        app: opa
      name: opa
    spec:
      containers:
        # WARNING: OPA is NOT running with an authorization policy configured. This
        # means that clients can read and write policies in OPA. If you are
        # deploying OPA in an insecure environment, be sure to configure
        # authentication and authorization on the daemon. See the Security page for
        # details: https://www.openpolicyagent.org/docs/security.html.
        - name: opa
          resources:
            limits:
              cpu: 100m
              memory: 128Mi
            requests:
              cpu: 100m
              memory: 128Mi
          image: openpolicyagent/opa:0.49.2-static
          args:
            - "run"
            - "--server"
            - --disable-telemetry
            - "--tls-cert-file=/certs/tls.crt"
            - "--tls-private-key-file=/certs/tls.key"
            - "--addr=0.0.0.0:8443"
            - "--addr=http://127.0.0.1:8181"
            - --authentication=token
            - --authorization=basic
            - /policies/authz.rego
            - --ignore=.*
          volumeMounts:
            - readOnly: true
              mountPath: /certs
              name: opa-server
            - mountPath: /policies
              name: policies
              readOnly: true
          livenessProbe:
              failureThreshold: 3
              httpGet:
                path: /health
                port: 8443
                scheme: HTTPS
              initialDelaySeconds: 3
              periodSeconds: 5
              successThreshold: 1
              timeoutSeconds: 1
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /health
              port: 8443
              scheme: HTTPS
            initialDelaySeconds: 3
            periodSeconds: 5
            successThreshold: 1
            timeoutSeconds: 1
        - name: kube-mgmt
          volumeMounts:
          - mountPath: /policies
            name: policies
            readOnly: true        
          resources:
            limits:
              cpu: 100m
              memory: 128Mi
            requests:
              cpu: 100m
              memory: 128Mi
          image: openpolicyagent/kube-mgmt:8.0.1
          args:
            - --replicate-cluster=v1/namespaces
            - --replicate=networking.k8s.io/v1/ingresses
            - --replicate=v1/services
            - --replicate=policy/v1/poddisruptionbudgets
            - --opa-auth-token-file=/policies/token
            - --require-policy-label=true
            - --log-level=debug
      volumes:
        - name: opa-server
          secret:
            secretName: opa-server
        - name: policies
          secret:
            secretName: policies

kube-mgmt loads configmaps from opa namespace during first pod initialization but if I kill opa container (for instance by logging into minikube node and do `pkill -f "opa run", or if the liveness probe fails for any reason) then kube-mgmt does not put configmaps into opa container anymore. I have to restart pod (or kill kube-mgmt container) or do some dummy changes in configmaps.

as result OPA container returns 404

{"client_addr":"10.244.0.1:48693","level":"info","msg":"Sent response.","req_id":172,"req_method":"POST","req_path":"/","resp_bytes":86,"resp_duration":0.365959,"resp_status":404,"time":"2023-03-03T22:00:57Z"}

and client gets

k apply -f ingress-bad.yaml -n qa
Error from server (InternalError): error when creating "ingress-bad.yaml": Internal error occurred: failed calling webhook "validating-webhook.openpolicyagent.org": failed to call webhook: the server could not find the requested resource

is there any known workaround for this ? maybe some health check for kube-mgmt to check if opa has rules loaded ? or is there a way to make kube-mgmt periodically put configmaps into OPA container's API ?

@alex0z1
Copy link
Author

alex0z1 commented Mar 4, 2023

maybe changing this to non zero value https://github.com/open-policy-agent/kube-mgmt/blob/8.0.1/pkg/configmap/configmap.go#L151

and in the new else statement here https://github.com/open-policy-agent/kube-mgmt/blob/8.0.1/pkg/configmap/configmap.go#L175-L182 (because configMap version doesn't change when OnUpdate is called by NewInformer) implement a test that will retrieve policies and if the result is empty, call https://github.com/open-policy-agent/kube-mgmt/blob/8.0.1/pkg/configmap/configmap.go#L202

@tehlers320
Copy link
Contributor

We too see a similar issue and i did notice that the previous version we ran had a 60... v0.12.1...8.0.0#diff-6aa7780e80409d3ad0fb397be31e6f2d64ab520750d4317267f7138ebcee6606L146

@mvaalexp
Copy link

mvaalexp commented Mar 8, 2023

I think these 2 might be the same problem:
#194

I think its broken with even 1 replica because when a rollout happens , it brings up a new pod and the listener triggers on the existing pod.

scenario
current deployment pod 1 is healthy
new release, pod 2 comes up, failure, annotation updated
existing pod 1 listener triggers, its already fine so it marks it as ok
pod 2 triggers again and it thinks its ok and doesn't load the rule

@eshepelyuk
Copy link
Contributor

Folks if anyone is willing to work on this - I have some ideas how to approach the issue.

@alex0z1
Copy link
Author

alex0z1 commented May 31, 2023

I realized that caches need to be reloaded in addition to policies, so it is more complicated than I thought.

Maybe adding liveness probe container to pod can work, use liveness container health endpont in kube-mgmt and if opa container has no policies then liveness container reports failure and kube-mgmt restarts.

Similar to this https://github.com/kubernetes-csi/livenessprobe

@eshepelyuk
Copy link
Contributor

#210 and #211 can be implemented to address the bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants