You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've searched for similar issues and couldn't find anything matching
I've included steps to reproduce the behavior
Affected Components
K8sGPT (CLI)
K8sGPT Operator
K8sGPT Version
v0.3.27
Kubernetes Version
v1.26.5
Host OS and its Version
Linux
Steps to reproduce
Run the Go code as a container package main import "fmt" import "os" func main() { var arr []int fmt.Fprintln(os.Stdout, arr[0]) }
Run the k8sgpt analyze --filter=Pod
You can see the error default/go-panic-pod(go-panic-pod) - Error: back-off 2m40s restarting failed container=go-panic-container pod=go-panic-pod_default(60286b08-47b6-4e7a-be19-576d3e9e6f5d) - Error: the last termination reason is Error container=go-panic-container pod=go-panic-pod
See the logs of the pod it will show the real error kubectl logs go-panic-pod panic: runtime error: index out of range [0] with length 0
Expected behaviour
In Pod analyzer, when the pod is in crashloopbackoff it fetches the message back-off 5m0s restarting failed container=prometheus pod=prometheus-prometheus-kube-prometheus-0_ from the pod's CR. Instead of fetching the message from CR , if we can fetch the error from logs of the pod ,then we can know exactly what the problem is.
If it fetches from the logs, then in mycase I will get : level=error err="opening storage failed: open /prometheus/wal/00000828: no space left on device" , with this i can know better about the cause of the crashloop backoff error.
Actual behaviour
When the pod is in crashloopbackoff the error message is fetched from the pod's CR . So I cannot get the exact reason why the pod in crashloopbackoff.
When the pod is in CrashLoopBackOff it fetch the below message as error
state:
waiting:
message: back-off 5m0s restarting failed container=prometheus pod=prometheus-prometheus-kube-prometheus-0_tcl-monitoring(29368fc9-fa1d-4b3d-9333-241acf0fbece)
reason: CrashLoopBackOff
We have the optional log analyzer, which is still experimental and risky since logs may container sensitive data but it sends logs to your AI backend.
We don't want to expand Pod's analyzer to fetch logs from the pod, cause the logs are arbitrary to each workload and that adds unnecessary complexity for this analyzer.
The goal at some point is to have a way to compound the errors from analyzers and contextualize them so there is a cohesion between them rather than stretching individual analyzers
@arbreezy Thanks for that. We unable to use free OpenAI account for k8sgpt, can you provide some details about the k8sgpt and AI, because we created one new OpenAI account and add the Api key and by running the command "k8sgpt analyze
--filter=Pod --namespace=default --explain", there is one pod with unhealth, but it shows exhausted quota. do we need to use paid account to access AI for k8sgpt or can we able to access that in free account, if we can be able to access through free account then what is the limit for free account? Can you share the detail information? We don't have clear idea about that, we contacted many but no responses.
Checklist
Affected Components
K8sGPT Version
v0.3.27
Kubernetes Version
v1.26.5
Host OS and its Version
Linux
Steps to reproduce
Run the Go code as a container
package main import "fmt" import "os" func main() { var arr []int fmt.Fprintln(os.Stdout, arr[0]) }
Run the
k8sgpt analyze --filter=Pod
You can see the error
default/go-panic-pod(go-panic-pod) - Error: back-off 2m40s restarting failed container=go-panic-container pod=go-panic-pod_default(60286b08-47b6-4e7a-be19-576d3e9e6f5d) - Error: the last termination reason is Error container=go-panic-container pod=go-panic-pod
See the logs of the pod it will show the real error
kubectl logs go-panic-pod
panic: runtime error: index out of range [0] with length 0
Expected behaviour
In Pod analyzer, when the pod is in crashloopbackoff it fetches the message
back-off 5m0s restarting failed container=prometheus pod=prometheus-prometheus-kube-prometheus-0_
from the pod's CR. Instead of fetching the message from CR , if we can fetch the error from logs of the pod ,then we can know exactly what the problem is.If it fetches from the logs, then in mycase I will get :
level=error err="opening storage failed: open /prometheus/wal/00000828: no space left on device"
, with this i can know better about the cause of the crashloop backoff error.Actual behaviour
When the pod is in crashloopbackoff the error message is fetched from the pod's CR . So I cannot get the exact reason why the pod in crashloopbackoff.
Additional Information
Below I mention the real case:
kubectl get pod -n tcl-monitoring
prometheus-prometheus-kube-prometheus-0 1/2 CrashLoopBackOff 704 (3m30s ago) 22d
When the pod is in CrashLoopBackOff it fetch the below message as error
state:
waiting:
message: back-off 5m0s restarting failed container=prometheus pod=prometheus-prometheus-kube-prometheus-0_tcl-monitoring(29368fc9-fa1d-4b3d-9333-241acf0fbece)
reason: CrashLoopBackOff
k8sgpt analyze --filter=Pod --namespace=tcl-monitoring --explain
0 tcl-monitoring/prometheus-prometheus-kube-prometheus-0(prometheus-prometheus-kube-prometheus)
kubectl logs -n tcl-monitoring prometheus-prometheus-kube-prometheus-0
ts=2024-04-10T06:59:00.178Z caller=main.go:1180 level=error err="opening storage failed: open /prometheus/wal/00000828: no space left on device"
"So we can get the error from the logs instead of getting from the CR message when the pod is in crashloopbackoff"
The text was updated successfully, but these errors were encountered: