question: Could you explain how you treat cluster data? #139

juan-vg · 2023-03-29T16:37:00Z

Checklist:

I've searched for similar issues and couldn't find anything matching

Subject of the issue

I would like to know how my private cluster-data is treated. I would not like to disclose any personal info to GTP/OpenAI. I miss some information about this topic in the README.

Is the collected data anonymized before sending it to GPT? Which data is collected? Are my private and sensible clusters going to keep safe? Why? Is this tools GDPR compliant? Etc...

AlexsJones · 2023-03-29T16:41:23Z

Thank you, this is an important question.
Let me answer it

k8sgpt analyze

This connects to the Kubernetes API server and only looks at Conditions and Status messages on objects.
There is no AI used in this step at all

k8sgpt analyze --explain

This still will forward an aggregate parcel of information which may have personally identifiable information in it e.g. error back off container/alexsjones:latest is failing! for example.

If you have any doubts or worries I would not recommend using --explain if you do not trust OpenAI.
Although it is all over TLS I cannot speak to their data retention policy other than this document I found here

As for the k8sgpt we store nothing aside from the ~/.k8sgpt.yaml which would only cache the results of --explain on your local machine if you choose to use it.

I hope this helps.

juan-vg · 2023-03-29T16:48:28Z

I see. Thank you for clarifying it!

Regarding the OpenAI data retention policy, it comes to my mind to apply some kind of data anonymization before sending it to them. It could be very cool to implement it or to allow a plugin to do so. From my point of view it's not about trusting OpenAI or not, but about preventing them from using that sensitive data for their future trainings.

AlexsJones · 2023-03-29T17:08:55Z

I think this could be a game changer, I know that Google has "Data loss prevention API" in GCP.
I wonder if we could start somewhere simply with Regex and build it out into a module?

juan-vg · 2023-03-31T09:03:10Z

I've been playing with ChatGPT, providing some outputs from k8sgpt and asking for a Go script to automatically detect sensitive data and transform it into random chars while respecting the same word/sentence structure. I based the detection on word entropy but I realize it's not the way to go (works better for detecting random char strings, usually used in passwords and tokens). Is it possible for you to provide examples of each message the k8sgpt could output so we can then feed ChatGPT with better training data for the detection?

AlexsJones · 2023-04-02T11:29:16Z

Just to round out this issue, we will add a task to clarify this in the documentation.

As for examples, here are some but none of my workloads directly expose PII in their error strings:

0 argocd/argocd-application-controller-0(StatefulSet/argocd-application-controller)                                   
- Error: back-off 5m0s restarting failed container=argocd-application-controller-instrumentation pod=argocd-application-controller-0_argocd(1748b1c2-28b2-40d9-b58e-9906220fbb0d)


1 default/deathstar-5b559d699b-smvrn(Deployment/deathstar)
- Error: Back-off pulling image "docker.io/cilium/starwaraaes"


2 foo/alpha-6c85f869-d7tbq(Deployment/alpha)
- Error: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: container init was OOM-killed (memory limit too low?): unknown


3 observability/loki-read-0(StatefulSet/loki-read)
- Error: 0/5 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 1 node(s) had volume node affinity conflict, 3 node(s) were unschedulable. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..


4 observability/loki-read-1(StatefulSet/loki-read)
- Error: 0/5 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 1 node(s) had volume node affinity conflict, 3 node(s) were unschedulable. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..


5 observability/loki-write-0(StatefulSet/loki-write)
- Error: 0/5 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 1 node(s) had volume node affinity conflict, 3 node(s) were unschedulable. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..


6 observability/loki-write-1(StatefulSet/loki-write)
- Error: 0/5 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 1 node(s) had volume node affinity conflict, 3 node(s) were unschedulable. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..


7 observability/prometheus-observability-kube-prometh-prometheus-0(prometheus-observability-kube-prometh-prometheus)
- Error: 0/5 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 1 node(s) had volume node affinity conflict, 3 node(s) were unschedulable. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..


8 argocd/argocd-applicationset-controller-5c5496c549-7ppk2(Deployment/argocd-applicationset-controller)
- Error: back-off 5m0s restarting failed container=argocd-applicationset-controller-instrumentation pod=argocd-applicationset-controller-5c5496c549-7ppk2_argocd(993f24eb-f345-4199-a0c1-8cced885716d)


9 argocd/argocd-dex-server-5fcdf867b7-mv496(Deployment/argocd-dex-server)
- Error: back-off 5m0s restarting failed container=dex-instrumentation pod=argocd-dex-server-5fcdf867b7-mv496_argocd(32e8f49e-88f0-4e54-8408-ea3162292f7d)


10 argocd/argocd-repo-server-5785865bd8-pgtc7(Deployment/argocd-repo-server)
- Error: back-off 5m0s restarting failed container=argocd-repo-server-instrumentation pod=argocd-repo-server-5785865bd8-pgtc7_argocd(66078850-2252-4230-97aa-c22fe98dff8d)


11 argocd/argocd-server-59677d6f74-n2p5m(Deployment/argocd-server)
- Error: back-off 5m0s restarting failed container=argocd-server-instrumentation pod=argocd-server-59677d6f74-n2p5m_argocd(060722da-5f30-41f9-9895-8c375abe2034)


12 default/deathstar(deathstar)
- Error: Service has not ready endpoints, pods: [Pod/deathstar-5b559d699b-smvrn], expected 1


13 argocd/argocd-applicationset-controller(argocd-applicationset-controller)
- Error: Service has not ready endpoints, pods: [Pod/argocd-applicationset-controller-5c5496c549-7ppk2], expected 1


14 argocd/argocd-server(argocd-server)
- Error: Service has not ready endpoints, pods: [Pod/argocd-server-59677d6f74-n2p5m], expected 1


15 cats/cats(cats)
- Error: Service has no endpoints, expected label app.kubernetes.io/instance=cats
- Error: Service has no endpoints, expected label app.kubernetes.io/name=cats


16 argocd/argocd-server-metrics(argocd-server-metrics)
- Error: Service has not ready endpoints, pods: [Pod/argocd-server-59677d6f74-n2p5m], expected 1


17 khole-system/khole-controller-manager-metrics-service(khole-controller-manager-metrics-service)
- Error: Service has no endpoints, expected label control-plane=controller-manager


18 metallb-system/metallb-webhook-service(metallb-webhook-service)
- Error: Service has no endpoints, expected label app.kubernetes.io/component=controller
- Error: Service has no endpoints, expected label app.kubernetes.io/instance=metallb
- Error: Service has no endpoints, expected label app.kubernetes.io/name=metallb


19 observability/observability-kube-prometh-prometheus(observability-kube-prometh-prometheus)
- Error: Service has no endpoints, expected label app.kubernetes.io/name=prometheus
- Error: Service has no endpoints, expected label prometheus=observability-kube-prometh-prometheus


20 observability/prometheus-operated(prometheus-operated)
- Error: Service has no endpoints, expected label app.kubernetes.io/name=prometheus


21 argocd/argocd-dex-server(argocd-dex-server)
- Error: Service has not ready endpoints, pods: [Pod/argocd-dex-server-5fcdf867b7-mv496], expected 1


22 argocd/argocd-metrics(argocd-metrics)
- Error: Service has not ready endpoints, pods: [Pod/argocd-application-controller-0], expected 1


23 argocd/argocd-repo-server(argocd-repo-server)
- Error: Service has not ready endpoints, pods: [Pod/argocd-repo-server-5785865bd8-pgtc7], expected 1

Signed-off-by: Aris Boutselis <arisboutselis08@gmail.com> Co-authored-by: Aris Boutselis <arisboutselis08@gmail.com>

AlexsJones added the question Further information is requested label Mar 29, 2023

matthisholleville mentioned this issue Apr 10, 2023

feat: add anonymize option #242

Merged

4 tasks

AlexsJones closed this as completed Apr 13, 2023

fyuan1316 pushed a commit to fyuan1316/k8sgpt that referenced this issue Jun 26, 2023

fix: check first extraOptions reference (k8sgpt-ai#139)

d48562d

Signed-off-by: Aris Boutselis <arisboutselis08@gmail.com> Co-authored-by: Aris Boutselis <arisboutselis08@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question: Could you explain how you treat cluster data? #139

question: Could you explain how you treat cluster data? #139

juan-vg commented Mar 29, 2023

AlexsJones commented Mar 29, 2023 •

edited

juan-vg commented Mar 29, 2023 •

edited

AlexsJones commented Mar 29, 2023

juan-vg commented Mar 31, 2023 •

edited

AlexsJones commented Apr 2, 2023

question: Could you explain how you treat cluster data? #139

question: Could you explain how you treat cluster data? #139

Comments

juan-vg commented Mar 29, 2023

Subject of the issue

AlexsJones commented Mar 29, 2023 • edited

juan-vg commented Mar 29, 2023 • edited

AlexsJones commented Mar 29, 2023

juan-vg commented Mar 31, 2023 • edited

AlexsJones commented Apr 2, 2023

AlexsJones commented Mar 29, 2023 •

edited

juan-vg commented Mar 29, 2023 •

edited

juan-vg commented Mar 31, 2023 •

edited