Replies: 6 comments 2 replies
-
I disagree with this assumption, I don't think we should start shipping Kubernetes debugging tools - that's a bit beyond running Cassandra in Kubernetes and feels like a serious scope creep. There are projects that do that, including metrics collection (such as Prometheus) or log collection (such as Loki, logstash, etc) and JVM debugging tools and so on. If it's a tool that inspects the running Cassandra cluster, then that's a different case, but we should not go all the way to creating Kubernetes anti-patterns just because something was done like this before Kubernetes. For example, the diagnostic-collection mostly collects stuff that's not Cassandra related, but more "DSE specific" or node/VM specific - but that's more a job of Kubernetes administrator and not necessarily something the user even has access to, not to mention all the managed options of each cloud. |
Beta Was this translation helpful? Give feedback.
-
Thanks Michael, this would be for a running Cassandra cluster, and we would definitely have to better hone the collection around Cassandra and Kubernetes. Also, this would just be used internally, it wouldn't be for the user, it only came about because Support was looking for an easy way to gather logs/configs from all of the running nodes at once, but we weren't aware of how to do it. Maybe there is already a way to do what we want in K8ssandra, we just need the education. We do have some customers running DSE in a Kubernetes environment already with the cass-operator, so I'm not sure if some Kubernetes friendly script is what we need, or if we just need to ramp up the Kubernetes training so we know how to quickly dump the info we'd want to see when diagnosing a problem. |
Beta Was this translation helpful? Give feedback.
-
The request definitely makes sense :) We basically want a k8s version of https://github.com/DataStax-Toolkit/diagnostic-collection. At the end of the day, it may be entirely different, but it is similar in that it is a tool to collect logs, metrics, and other diagnostic info to help troubleshoot and debug problems with a Cassandra cluster. @paayers would you mind if I convert this to a discussion? |
Beta Was this translation helpful? Give feedback.
-
not at all John, whatever works best, I'm all for it |
Beta Was this translation helpful? Give feedback.
-
Replicated has a spec for making application diagnostic tools that export "support bundles" https://troubleshoot.sh/docs/ I don't know if it would be useful to follow some of those paradigms for this issue. |
Beta Was this translation helpful? Give feedback.
-
I had a k8ssandra cluster today that I had to git clone the diag collection toolkit, then kubectl cp collect_node_diag.sh to the pod, collect the diag info, then kubectl cp the diag off again. That's painful to do for each node in a cluster, and the collect_diag.sh doesn't work because that's using scp:
The collection is helpful for outside troubleshooting because we get the Cassandra config files, logs, schema, and OS and kernel-level info. |
Beta Was this translation helpful? Give feedback.
-
If we could implement a diagnostic collection for k8ssandra that would run similarly to what we have already it would help support better handle k8ssandra problems:
https://github.com/DataStax-Toolkit/diagnostic-collection
We would also need to implement into the collection anything that would be helpful for diagnosing a problem in a Kubernetes environment.
Beta Was this translation helpful? Give feedback.
All reactions