From a67d2fc0e03c0964e79898005489fe19026044f9 Mon Sep 17 00:00:00 2001 From: Alejandro Acevedo Date: Fri, 14 Nov 2025 17:24:37 +0100 Subject: [PATCH 01/14] STAC-23751: Document procedure to lower retention on SG and recover data immediately --- .../setup/data-management/data_retention.adoc | 48 +++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc b/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc index 2f8d1d95..bb232e7a 100644 --- a/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc +++ b/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc @@ -24,6 +24,54 @@ Note that by adding more time to the data retention period, the amount of data s When lowering the retention period, it can take some time until disk space is freed up (at least 15 minutes). +=== Troubleshooting topology disk space issues. +In case of running into disk space issues, we usually find in the namenode a log line like `Not enough replicas was chosen. Reason: {NOT_ENOUGH_STORAGE_SPACE=1`, to deal with this scenario: + +* Lower the retention and prepare the instance to recover disk space immediately. Trigger a helm upgrade with: +[,yaml] +---- +stackstate: + topology: + # Retention set to 1 week in case you are running with the default 1 month + retentionHours: 144 +hbase: + console: + enabled: true + replicaCount: 1 + hdfs: + datanode: + extraEnv: + open: + HDFS_CONF_dfs_datanode_du_reserved_pct: "0" +---- + +[NOTE] +==== +Wait until all the hbase and hdfs pods are stable before moving on to the next step. +==== + +* Trigger the compaction of historic data +[,bash] +---- +kubectl exec -t --namespace suse-observability $(kubectl get pods --namespace suse-observability --no-headers | grep "console" | awk '{print $1}' | head -n 1) -- /bin/bash -c "stackgraph-console run println\(retention.removeExpiredDataImmediately\(\)\)" +---- + +* Follow the progress using +---- +kubectl exec -t --namespace suse-observability $(kubectl get pods --namespace suse-observability --no-headers | grep "console" | awk '{print $1}' | head -n 1) -- /bin/bash -c "stackgraph-console run println\(retention.removeExpiredDataImmediatelyStatus\(\)\)" +---- + +* Contact support to analyze why the budgeted disk space was insufficient. + +* Restore the settings. Once the status is no longer inProgress `Status(inProgress = false, lastFailure = null)` trigger a helm upgrade just preserving the new retention as part of your values. +[,yaml] +---- +stackstate: + topology: + # Retention set to 1 week in case you are running with the default 1 month + retentionHours: 144 +---- + == Retention of events and logs === SUSE Observability data store From 40a132f06933045301dcf1e3d04d9113545df92c Mon Sep 17 00:00:00 2001 From: Alejandro Acevedo Date: Mon, 17 Nov 2025 11:58:48 +0100 Subject: [PATCH 02/14] Update docs/latest/modules/en/pages/setup/data-management/data_retention.adoc Co-authored-by: akashraj4261 --- .../modules/en/pages/setup/data-management/data_retention.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc b/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc index bb232e7a..2239c739 100644 --- a/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc +++ b/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc @@ -24,7 +24,7 @@ Note that by adding more time to the data retention period, the amount of data s When lowering the retention period, it can take some time until disk space is freed up (at least 15 minutes). -=== Troubleshooting topology disk space issues. +=== Troubleshooting topology disk space issues In case of running into disk space issues, we usually find in the namenode a log line like `Not enough replicas was chosen. Reason: {NOT_ENOUGH_STORAGE_SPACE=1`, to deal with this scenario: * Lower the retention and prepare the instance to recover disk space immediately. Trigger a helm upgrade with: From 4c5861c9abf32403b5e4368f2d6a4eca1a61b55f Mon Sep 17 00:00:00 2001 From: Alejandro Acevedo Date: Mon, 17 Nov 2025 11:59:05 +0100 Subject: [PATCH 03/14] Update docs/latest/modules/en/pages/setup/data-management/data_retention.adoc Co-authored-by: akashraj4261 --- .../modules/en/pages/setup/data-management/data_retention.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc b/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc index 2239c739..cbd7683d 100644 --- a/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc +++ b/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc @@ -25,7 +25,7 @@ Note that by adding more time to the data retention period, the amount of data s When lowering the retention period, it can take some time until disk space is freed up (at least 15 minutes). === Troubleshooting topology disk space issues -In case of running into disk space issues, we usually find in the namenode a log line like `Not enough replicas was chosen. Reason: {NOT_ENOUGH_STORAGE_SPACE=1`, to deal with this scenario: +In case of running into disk space issues, a log line - `Not enough replicas was chosen. Reason: {NOT_ENOUGH_STORAGE_SPACE=1` appears in the namenode. Follow the below steps to deal with this scenario: * Lower the retention and prepare the instance to recover disk space immediately. Trigger a helm upgrade with: [,yaml] From cb83d45aaa58e44be662e641a5d41eeec5b5f5a1 Mon Sep 17 00:00:00 2001 From: Alejandro Acevedo Date: Mon, 17 Nov 2025 11:59:12 +0100 Subject: [PATCH 04/14] Update docs/latest/modules/en/pages/setup/data-management/data_retention.adoc Co-authored-by: akashraj4261 --- .../modules/en/pages/setup/data-management/data_retention.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc b/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc index cbd7683d..e49608a7 100644 --- a/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc +++ b/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc @@ -27,7 +27,7 @@ When lowering the retention period, it can take some time until disk space is fr === Troubleshooting topology disk space issues In case of running into disk space issues, a log line - `Not enough replicas was chosen. Reason: {NOT_ENOUGH_STORAGE_SPACE=1` appears in the namenode. Follow the below steps to deal with this scenario: -* Lower the retention and prepare the instance to recover disk space immediately. Trigger a helm upgrade with: +* Lower the retention, prepare the instance to recover disk space immediately, and trigger a helm upgrade: [,yaml] ---- stackstate: From cc7d03585d3441087a6f68e28633aeea408db1f2 Mon Sep 17 00:00:00 2001 From: Alejandro Acevedo Date: Mon, 17 Nov 2025 11:59:18 +0100 Subject: [PATCH 05/14] Update docs/latest/modules/en/pages/setup/data-management/data_retention.adoc Co-authored-by: akashraj4261 --- .../modules/en/pages/setup/data-management/data_retention.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc b/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc index e49608a7..15005fbf 100644 --- a/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc +++ b/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc @@ -56,7 +56,7 @@ Wait until all the hbase and hdfs pods are stable before moving on to the next s kubectl exec -t --namespace suse-observability $(kubectl get pods --namespace suse-observability --no-headers | grep "console" | awk '{print $1}' | head -n 1) -- /bin/bash -c "stackgraph-console run println\(retention.removeExpiredDataImmediately\(\)\)" ---- -* Follow the progress using +* Follow the progress using: ---- kubectl exec -t --namespace suse-observability $(kubectl get pods --namespace suse-observability --no-headers | grep "console" | awk '{print $1}' | head -n 1) -- /bin/bash -c "stackgraph-console run println\(retention.removeExpiredDataImmediatelyStatus\(\)\)" ---- From c87ad41e38e65890c4450d9f9d4e4a9bfd8d9eaf Mon Sep 17 00:00:00 2001 From: Alejandro Acevedo Date: Mon, 17 Nov 2025 11:59:26 +0100 Subject: [PATCH 06/14] Update docs/latest/modules/en/pages/setup/data-management/data_retention.adoc Co-authored-by: akashraj4261 --- .../modules/en/pages/setup/data-management/data_retention.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc b/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc index 15005fbf..a4b8e9c8 100644 --- a/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc +++ b/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc @@ -61,7 +61,7 @@ kubectl exec -t --namespace suse-observability $(kubectl get pods --namespace s kubectl exec -t --namespace suse-observability $(kubectl get pods --namespace suse-observability --no-headers | grep "console" | awk '{print $1}' | head -n 1) -- /bin/bash -c "stackgraph-console run println\(retention.removeExpiredDataImmediatelyStatus\(\)\)" ---- -* Contact support to analyze why the budgeted disk space was insufficient. +* In case the budgeted disk space is insufficient, contact . * Restore the settings. Once the status is no longer inProgress `Status(inProgress = false, lastFailure = null)` trigger a helm upgrade just preserving the new retention as part of your values. [,yaml] From 5907594b724694a0d1ae1af78ac00787912fa4f5 Mon Sep 17 00:00:00 2001 From: Alejandro Acevedo Date: Mon, 17 Nov 2025 11:59:32 +0100 Subject: [PATCH 07/14] Update docs/latest/modules/en/pages/setup/data-management/data_retention.adoc Co-authored-by: akashraj4261 --- .../modules/en/pages/setup/data-management/data_retention.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc b/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc index a4b8e9c8..36f0ca68 100644 --- a/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc +++ b/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc @@ -50,7 +50,7 @@ hbase: Wait until all the hbase and hdfs pods are stable before moving on to the next step. ==== -* Trigger the compaction of historic data +* Trigger the compaction of historic data: [,bash] ---- kubectl exec -t --namespace suse-observability $(kubectl get pods --namespace suse-observability --no-headers | grep "console" | awk '{print $1}' | head -n 1) -- /bin/bash -c "stackgraph-console run println\(retention.removeExpiredDataImmediately\(\)\)" From 7e6484a76d6d9824f74661777923ab7ade596e66 Mon Sep 17 00:00:00 2001 From: Alejandro Acevedo Date: Mon, 17 Nov 2025 12:01:10 +0100 Subject: [PATCH 08/14] Update docs/latest/modules/en/pages/setup/data-management/data_retention.adoc Co-authored-by: akashraj4261 --- .../modules/en/pages/setup/data-management/data_retention.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc b/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc index 36f0ca68..dc02eb18 100644 --- a/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc +++ b/docs/latest/modules/en/pages/setup/data-management/data_retention.adoc @@ -63,7 +63,7 @@ kubectl exec -t --namespace suse-observability $(kubectl get pods --namespace s * In case the budgeted disk space is insufficient, contact . -* Restore the settings. Once the status is no longer inProgress `Status(inProgress = false, lastFailure = null)` trigger a helm upgrade just preserving the new retention as part of your values. +* Restore the settings. Once the status is no longer in progress - `Status(inProgress = false, lastFailure = null)`, trigger a helm upgrade to preserving the new retention as part of your values. [,yaml] ---- stackstate: From bd10747abc6e105acea8654b459bde1c84f69ae4 Mon Sep 17 00:00:00 2001 From: Bram Schuur Date: Wed, 19 Nov 2025 16:49:17 +0100 Subject: [PATCH 09/14] STAC-23748: Add workload observer to logs collector --- .../suse-observability_logs_collector.sh | 27 +++++++++++++++---- 1 file changed, 22 insertions(+), 5 deletions(-) mode change 100644 => 100755 docs/latest/modules/en/attachments/suse-observability_logs_collector.sh diff --git a/docs/latest/modules/en/attachments/suse-observability_logs_collector.sh b/docs/latest/modules/en/attachments/suse-observability_logs_collector.sh old mode 100644 new mode 100755 index 8bfe30c2..bf3ea5f3 --- a/docs/latest/modules/en/attachments/suse-observability_logs_collector.sh +++ b/docs/latest/modules/en/attachments/suse-observability_logs_collector.sh @@ -59,10 +59,14 @@ do done # Check if KUBECONFIG is set -if [[ -z "$KUBECONFIG" || ! -f "$KUBECONFIG" ]]; then - echo "Error: KUBECONFIG is not set. Please ensure KUBECONFIG is set to the path of a valid kubeconfig file before running this script." - echo "If kubeconfig is not set, use the command: export KUBECONFIG=PATH-TO-YOUR/kubeconfig. Exiting..." - exit 1 +if ! kubectl config current-context > /dev/null; then + echo "Error: Could not find kubernetes cluster to connect to." + echo "Please ensure KUBECONFIG is set to the path of a valid kubeconfig file before running this script." + echo "If kubeconfig is not set, use the command: export KUBECONFIG=PATH-TO-YOUR/kubeconfig. Exiting..." + exit 1 +else + CONTEXT=$(kubectl config current-context) + echo "Retrieving logs from kubernetes context: $CONTEXT" fi # Check if namespace exist or not @@ -71,7 +75,7 @@ if ! kubectl get namespace "$NAMESPACE" &>/dev/null; then exit 1 fi # Directory to store logs -OUTPUT_DIR="${NAMESPACE}_logs_$(date +%Y%m%d%H%M%S)" +OUTPUT_DIR="${NAMESPACE}_logs_$(date -u +%Y-%m-%d_%H-%M-%SZ)" ARCHIVE_FILE="${OUTPUT_DIR}.tar.gz" techo() { @@ -247,6 +251,18 @@ EOF kill $CHILD } +collect_workload_observer_data() { + techo "Collecting workload observer data..." + POD=$(kubectl -n "$NAMESPACE" get pod -l app.kubernetes.io/component=workload-observer -o jsonpath='{.items[0].metadata.name}' 2>/dev/null) + if [ "$POD" == "" ]; then + techo "INFO: No workload observer pod found, skipping" + return + fi + + mkdir -p "$OUTPUT_DIR/workload-observer-data" + kubectl -n "$NAMESPACE" cp "$POD:/report-data" "$OUTPUT_DIR/workload-observer-data/" > /dev/null 2>&1 & +} + archive_and_cleanup() { echo "Creating archive $ARCHIVE_FILE..." tar -czf "$ARCHIVE_FILE" "$OUTPUT_DIR" @@ -304,6 +320,7 @@ kubectl -n "$NAMESPACE" get events --sort-by='.metadata.creationTimestamp' > "$O collect_pod_logs collect_pod_disk_usage collect_yaml_configs +collect_workload_observer_data if [ $HELM_RELEASES ]; then collect_helm_releases fi From d85912397fcec1b8b409d96772898205ef476c4e Mon Sep 17 00:00:00 2001 From: Bram Schuur Date: Thu, 20 Nov 2025 15:45:03 +0100 Subject: [PATCH 10/14] STAC-23748: Add workload observer to ack persistent volumes --- .../setup/install-stackstate/kubernetes_openshift/ack.adoc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/latest/modules/en/pages/setup/install-stackstate/kubernetes_openshift/ack.adoc b/docs/latest/modules/en/pages/setup/install-stackstate/kubernetes_openshift/ack.adoc index 3fe6f0fd..8a8f204d 100644 --- a/docs/latest/modules/en/pages/setup/install-stackstate/kubernetes_openshift/ack.adoc +++ b/docs/latest/modules/en/pages/setup/install-stackstate/kubernetes_openshift/ack.adoc @@ -34,6 +34,9 @@ stackstate: vmagent: persistence: size: 20Gi + workloadObserver: + persistence: + size: 20Gi features: storeTransactionLogsToPVC: volumeSize: 20Gi From 756d2acf08d0ff21d35812e94282467441d1f12d Mon Sep 17 00:00:00 2001 From: Bram Schuur Date: Thu, 20 Nov 2025 15:47:29 +0100 Subject: [PATCH 11/14] STAC-23748: Also fix tephra --- .../setup/install-stackstate/kubernetes_openshift/ack.adoc | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/latest/modules/en/pages/setup/install-stackstate/kubernetes_openshift/ack.adoc b/docs/latest/modules/en/pages/setup/install-stackstate/kubernetes_openshift/ack.adoc index 8a8f204d..728e319b 100644 --- a/docs/latest/modules/en/pages/setup/install-stackstate/kubernetes_openshift/ack.adoc +++ b/docs/latest/modules/en/pages/setup/install-stackstate/kubernetes_openshift/ack.adoc @@ -15,6 +15,10 @@ We provide a dedicated set of Helm values that adjusts all volume sizes to meet zookeeper: persistence: size: 20Gi +hbase: + tephra: + persistence: + size: 2oGi stackstate: components: checks: From b751b4eb7469ac42cffdd63a337186363614394f Mon Sep 17 00:00:00 2001 From: Bram Schuur Date: Thu, 20 Nov 2025 16:18:29 +0100 Subject: [PATCH 12/14] STAC-23748: Typo --- .../setup/install-stackstate/kubernetes_openshift/ack.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/latest/modules/en/pages/setup/install-stackstate/kubernetes_openshift/ack.adoc b/docs/latest/modules/en/pages/setup/install-stackstate/kubernetes_openshift/ack.adoc index 728e319b..51cd5d99 100644 --- a/docs/latest/modules/en/pages/setup/install-stackstate/kubernetes_openshift/ack.adoc +++ b/docs/latest/modules/en/pages/setup/install-stackstate/kubernetes_openshift/ack.adoc @@ -18,7 +18,7 @@ zookeeper: hbase: tephra: persistence: - size: 2oGi + size: 20Gi stackstate: components: checks: From 93340ceb4c5f93295353dfc087593bdb0f5c29b1 Mon Sep 17 00:00:00 2001 From: Daniel Barra <188492274+dmbarrasuse@users.noreply.github.com> Date: Tue, 25 Nov 2025 08:03:42 -0300 Subject: [PATCH 13/14] Mergin staging and main to create release notes (#131) * STAC-23822 Add hdfs status report to log collection * STAC-23822 Fix boolean logic for ES and helm values collection * STAC-23822 Add collection of hbase and hdfs reports * Only keep the relevant parts of the helm config * STAC-0: Some clarification around sizing * Bump product-docs-common from `4dc90cb` to `c31fda2` Bumps [product-docs-common](https://github.com/rancher/product-docs-common) from `4dc90cb` to `c31fda2`. - [Commits](https://github.com/rancher/product-docs-common/compare/4dc90cb6d651991b3e5f5b1de1f22e955fa29c54...c31fda22fd075ddf1eb97b92045d05db8f8a38a7) --- updated-dependencies: - dependency-name: product-docs-common dependency-version: c31fda22fd075ddf1eb97b92045d05db8f8a38a7 dependency-type: direct:production ... Signed-off-by: dependabot[bot] * Update docs/latest/modules/en/pages/k8s-suse-rancher-prime.adoc Co-authored-by: akashraj4261 * Update docs/latest/modules/en/pages/k8s-suse-rancher-prime.adoc Co-authored-by: akashraj4261 * Update docs/latest/modules/en/pages/k8s-suse-rancher-prime.adoc Co-authored-by: akashraj4261 * STAC-23583: rewrite rbac roles description in easier digestible format (#123) * STAC-23583: rewrite rbac roles description in easier digestible format * STAC-23583: try to remove potentially confusing phrases * Apply suggestions from code review Co-authored-by: akashraj4261 --------- Co-authored-by: akashraj4261 * Aligned the local attributes with the global attributes. Signed-off-by: akashraj4261 * Fix api endpoints documentation to avoid confusion * Fix mistake * Fix condition * Fix condition Co-authored-by: rb3ckers --------- Signed-off-by: dependabot[bot] Signed-off-by: akashraj4261 Co-authored-by: Remco Beckers Co-authored-by: Bram Schuur Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Bram Schuur Co-authored-by: akashraj4261 Co-authored-by: Frank van Lankvelt --- .../suse-observability_logs_collector.sh | 59 ++++++++++-- .../en/pages/k8s-suse-rancher-prime.adoc | 17 ++-- .../getting-started-k8s-operator.adoc | 2 +- .../getting-started/getting-started-k8s.adoc | 2 +- .../getting-started-lambda.adoc | 2 +- .../getting-started-linux.adoc | 2 +- .../en/pages/setup/otel/otlp-apis.adoc | 12 ++- .../setup/security/authentication/oidc.adoc | 4 +- .../setup/security/rbac/rbac_rancher.adoc | 95 +++++++++++++------ product-docs-common | 2 +- ss-local-playbook.yml | 6 +- ss-remote-playbook.yml | 4 +- 12 files changed, 149 insertions(+), 58 deletions(-) diff --git a/docs/latest/modules/en/attachments/suse-observability_logs_collector.sh b/docs/latest/modules/en/attachments/suse-observability_logs_collector.sh index bf3ea5f3..61b7e6e6 100755 --- a/docs/latest/modules/en/attachments/suse-observability_logs_collector.sh +++ b/docs/latest/modules/en/attachments/suse-observability_logs_collector.sh @@ -1,6 +1,6 @@ #!/bin/bash -ELASTICSEARCH_LOGS=0 +ELASTICSEARCH_LOGS=false ELASTICSEARCH_RANGE="7d" while getopts "her:" option; do case $option in @@ -23,7 +23,7 @@ options: EOF exit 0;; e) # Collect elasticsearch logs - ELASTICSEARCH_LOGS=1;; + ELASTICSEARCH_LOGS=true;; r) # Time range for elasticsearch logs ELASTICSEARCH_RANGE=$OPTARG;; \?) # Invalid option @@ -49,12 +49,12 @@ for cmd in ${COMMANDS[@]}; do done # skip helm release analysis when not all its dependencies are present -HELM_RELEASES=1 +HELM_RELEASES=true for cmd in base64 gzip jq do if ! command -v $cmd &>/dev/null; then echo "$cmd is not installed. Skipping analysis of helm releases." - HELM_RELEASES=0 + HELM_RELEASES=false fi done @@ -134,11 +134,25 @@ collect_pod_disk_usage() { collect_helm_releases() { techo "Collecting helm releases..." mkdir -p "$OUTPUT_DIR/releases" + + # Restrict keys extracted from Helm values to only this include-list to avoid including any + included_keys='["resources", "affinity", "nodeSelector", "tolerations"]' + + # 1. --argjson keys "$included_keys": Passes the shell variable as a JSON array $keys. + # 2. . as $input: Saves the entire original JSON into a variable $input. + # 3. [ paths | ... ]: Gathers all paths from the JSON. + # 4. select(.[-1] as $last | $keys | index($last)): Selects only paths where + # the last element (.[-1]) is found inside the $keys array. + # 5. reduce .[] as $p (null; ...): Starts with an empty (null) document + # and iterates over every path ($p) that was selected. + # 6. setpath($p; $input | getpath($p)): For each path, it sets that path + # in the *new* document, pulling the *value* from the original $input. + RELEASES=$(kubectl -n "$NAMESPACE" get secrets -l owner=helm -o jsonpath="{.items[*].metadata.name}") for release in $RELEASES; do kubectl -n "$NAMESPACE" get secret "$release" -o jsonpath='{.data.release}' | \ base64 --decode | base64 --decode | gzip -d | \ - jq '{ info: .info, metadata: .chart.metadata, config: .config }' > "$OUTPUT_DIR/releases/$release" + jq --argjson keys "$included_keys" '{ info: .info, metadata: .chart.metadata, config: ( .config as $input | [ .config | paths | select(.[-1] as $last | $keys | index($last)) ] | reduce .[] as $p (null; setpath($p; $input | getpath($p)))) }' > "$OUTPUT_DIR/releases/$release" done } @@ -251,6 +265,35 @@ EOF kill $CHILD } +collect_hdfs_report() { + POD=$(kubectl -n "$NAMESPACE" get pod -l app.kubernetes.io/component=hdfs-nn -o jsonpath='{.items[0].metadata.name}' 2>/dev/null) || true + if [ -n "$POD" ]; then + techo "Collecting HDFS report..." + mkdir -p "$OUTPUT_DIR/reports" + kubectl exec -n "$NAMESPACE" "$POD" -c namenode -- bash -c "unset HADOOP_OPTS; hdfs dfsadmin -report" > "$OUTPUT_DIR/reports/hdfs.log" + fi +} + +collect_hbase_report() { + POD=$(kubectl -n "$NAMESPACE" get pod -l app.kubernetes.io/component=hbase-master -o jsonpath='{.items[0].metadata.name}' 2>/dev/null) || true + if [ -n "$POD" ]; then + # Running in HA Mode + techo "Collecting HBase report..." + mkdir -p "$OUTPUT_DIR/reports" + kubectl exec -n "$NAMESPACE" "$POD" -c master -- bash -c 'hbase hbck -details 2>&1' > "$OUTPUT_DIR/reports/hbase.log" + else + POD=$(kubectl -n "$NAMESPACE" get pod -l app.kubernetes.io/component=stackgraph -o jsonpath='{.items[0].metadata.name}' 2>/dev/null) || true + if [ -n "$POD" ]; then + # Running in non-HA mode + techo "Collecting HBase report..." + mkdir -p "$OUTPUT_DIR/reports" + kubectl exec -n "$NAMESPACE" "$POD" -c stackgraph -- bash -c 'hbase hbck -details 2>&1' > "$OUTPUT_DIR/reports/hbase.log" + else + techo "Could not find HBase or StackGraph pod to generate HBase report." + fi + fi +} + collect_workload_observer_data() { techo "Collecting workload observer data..." POD=$(kubectl -n "$NAMESPACE" get pod -l app.kubernetes.io/component=workload-observer -o jsonpath='{.items[0].metadata.name}' 2>/dev/null) @@ -319,12 +362,14 @@ kubectl -n "$NAMESPACE" get events --sort-by='.metadata.creationTimestamp' > "$O # Run the pod logs collection function collect_pod_logs collect_pod_disk_usage +collect_hdfs_report +collect_hbase_report collect_yaml_configs collect_workload_observer_data -if [ $HELM_RELEASES ]; then +if $HELM_RELEASES; then collect_helm_releases fi -if [ $ELASTICSEARCH_LOGS ]; then +if $ELASTICSEARCH_LOGS; then collect_pod_logs_from_elasticsearch fi diff --git a/docs/latest/modules/en/pages/k8s-suse-rancher-prime.adoc b/docs/latest/modules/en/pages/k8s-suse-rancher-prime.adoc index a902b796..778a1f72 100644 --- a/docs/latest/modules/en/pages/k8s-suse-rancher-prime.adoc +++ b/docs/latest/modules/en/pages/k8s-suse-rancher-prime.adoc @@ -21,14 +21,17 @@ A license key for SUSE Observability server can be obtained via the SUSE Custome === Requirements -To install SUSE Observability, ensure that the nodes have enough CPU and memory capacity. Below are the specific requirements. +To install SUSE Observability, ensure that the cluster has enough CPU and memory capacity. Below are the specific requirements. -There are different installation options available for SUSE Observability. It is possible to install SUSE Observability either in a High-Availability (HA) or single instance (non-HA) setup. The non-HA setup is recommended for testing purposes or small environments. For production environments, it is recommended to install SUSE Observability in a HA setup. +There are different installation options available for {stackstate-product-name}. It is possible to install {stackstate-product-name} either in a High-Availability (HA) or single instance (non-HA) setup. The non-HA setup is recommended for testing purposes or small environments. For production environments, it is recommended to install {stackstate-product-name} in a HA setup. The HA production setup can support from 150 up to 4000 observed nodes. An observed node in this sizing table is taken to be 4 vCPUs and 16GB of memory, our `default node size`. If nodes in your observed cluster are bigger, they can count for multiple `default nodes`, so a node of 12vCPU and 48GB counts as 3 `default nodes` under observation when picking a profile. -The Non-HA setup can support up to 100 Nodes under observation. +The Non-HA setup can support up to 100 `default nodes` under observation. + + +The following table describes the resources required to deploy the {stackstate-product-name} server in a cluster, given the amount of `default nodes` that will be observed and whether the installation should be HA or not. |=== | | trial | 10 non-HA | 20 non-HA | 50 non-HA | 100 non-HA | 150 HA | 250 HA | 500 HA | 4000 HA @@ -93,12 +96,12 @@ NOTE: An additional 20% of resources is required for pod (unequal) distribution, [NOTE] ==== -The requirement shown for profile represent the total amount of resources needed to run the Suse Observability server. +The requirement shown for profile represent the total amount of resources needed to run the {stackstate-product-name} server. To ensure that all different services of Suse Observability server can be allocated: -* For non-HA installations the recommended node size is 4VCPU, 8GB -* For HA installations up to 500 nodes the min recommended node size is 8VCPU, 16GB -* For 4000 nodes HA installations the min recommended node size is 16VCPU, 32GB +* For non-HA installations the minimum per-node size is 4VCPU, 8GB +* For HA installations up to 500 nodes the minimum per-node size is 8VCPU, 16GB +* For 4000 nodes HA installations the minimum per-node size is 16VCPU, 32GB ==== diff --git a/docs/latest/modules/en/pages/setup/otel/getting-started/getting-started-k8s-operator.adoc b/docs/latest/modules/en/pages/setup/otel/getting-started/getting-started-k8s-operator.adoc index d4d47ef3..71d9073c 100644 --- a/docs/latest/modules/en/pages/setup/otel/getting-started/getting-started-k8s-operator.adoc +++ b/docs/latest/modules/en/pages/setup/otel/getting-started/getting-started-k8s-operator.adoc @@ -158,7 +158,7 @@ spec: otlp/suse-observability: auth: authenticator: bearertokenauth - # Put in your own otlp endpoint, for example suse-observability.my.company.com:443 + # Put in your own otlp endpoint, for example otlp-suse-observability.my.company.com:443 endpoint: compression: snappy processors: diff --git a/docs/latest/modules/en/pages/setup/otel/getting-started/getting-started-k8s.adoc b/docs/latest/modules/en/pages/setup/otel/getting-started/getting-started-k8s.adoc index f4529326..8ca90093 100644 --- a/docs/latest/modules/en/pages/setup/otel/getting-started/getting-started-k8s.adoc +++ b/docs/latest/modules/en/pages/setup/otel/getting-started/getting-started-k8s.adoc @@ -97,7 +97,7 @@ config: otlp/suse-observability: auth: authenticator: bearertokenauth - # Put in your own otlp endpoint, for example suse-observability.my.company.com:443 + # Put in your own otlp endpoint, for example otlp-suse-observability.my.company.com:443 endpoint: compression: snappy processors: diff --git a/docs/latest/modules/en/pages/setup/otel/getting-started/getting-started-lambda.adoc b/docs/latest/modules/en/pages/setup/otel/getting-started/getting-started-lambda.adoc index f25ade45..50f9a99d 100644 --- a/docs/latest/modules/en/pages/setup/otel/getting-started/getting-started-lambda.adoc +++ b/docs/latest/modules/en/pages/setup/otel/getting-started/getting-started-lambda.adoc @@ -86,7 +86,7 @@ config: otlp: auth: authenticator: bearertokenauth - # Put in your own otlp endpoint, for example suse-observability.my.company.com:443 + # Put in your own otlp endpoint, for example otlp-suse-observability.my.company.com:443 endpoint: service: diff --git a/docs/latest/modules/en/pages/setup/otel/getting-started/getting-started-linux.adoc b/docs/latest/modules/en/pages/setup/otel/getting-started/getting-started-linux.adoc index 0cabae37..d9db02fc 100644 --- a/docs/latest/modules/en/pages/setup/otel/getting-started/getting-started-linux.adoc +++ b/docs/latest/modules/en/pages/setup/otel/getting-started/getting-started-linux.adoc @@ -129,7 +129,7 @@ exporters: compression: snappy auth: authenticator: bearertokenauth - # Put in your own otlp endpoint, for example suse-observability.my.company.com:443 + # Put in your own otlp endpoint, for example otlp-suse-observability.my.company.com:443 endpoint: processors: memory_limiter: diff --git a/docs/latest/modules/en/pages/setup/otel/otlp-apis.adoc b/docs/latest/modules/en/pages/setup/otel/otlp-apis.adoc index c6a338fe..ecb9a247 100644 --- a/docs/latest/modules/en/pages/setup/otel/otlp-apis.adoc +++ b/docs/latest/modules/en/pages/setup/otel/otlp-apis.adoc @@ -9,8 +9,8 @@ SUSE Observability supports 2 versions of the OTLP protocol, the `grpc` version The endpoints for SUSE Cloud Observability are: -* OTLP: `+https://otlp-.app.stackstate.io:443+` -* OTLP over HTTP: `+https://otlp-http-.app.stackstate.io+` +* OTLP: `+otlp-.app.stackstate.io:443+` (without the URL scheme) +* OTLP over HTTP: `+https://otlp-http-.app.stackstate.io+` (with the URL scheme: `https`) == Self-hosted SUSE Observability @@ -26,6 +26,8 @@ The GRPC protocol does not support sending credentials over an insecure connecti == Collector configuration +=== OTLP protocol + The examples in the collector configuration use the OTLP protocol like this: ---- @@ -37,6 +39,7 @@ exporters: otlp/suse-observability: auth: authenticator: bearertokenauth + # Put in your own otlp endpoint, for example otlp-suse-observability.my.company.com:443 endpoint: # Optional TLS configurations: #tls: @@ -44,7 +47,9 @@ exporters: # insecure_skip_verify: true ---- -To use the OTLP over HTTP protocol instead use the `otlphttp` exporter instead. Don't forget to update the exporter references, `otlp/suse-observability`, in your pipelines to `otlphttp/suse-observability`! +=== OTLP HTTP protocol + +To use the OTLP over HTTP protocol instead use the `otlphttp` exporter instead. Also update all exporter references in your pipelines from `otlp/suse-observability` to `otlphttp/suse-observability`! Use a find/replace to make sure you change all occurrences. ---- extensions: @@ -55,6 +60,7 @@ exporters: otlphttp/stackstate: auth: authenticator: bearertokenauth + # Put in your own otlp-http endpoint, for example https://otlp-http-suse-observability.my.company.com:443 endpoint: # Optional TLS configurations: #tls: diff --git a/docs/latest/modules/en/pages/setup/security/authentication/oidc.adoc b/docs/latest/modules/en/pages/setup/security/authentication/oidc.adoc index 8c89d59f..9b84b4d6 100644 --- a/docs/latest/modules/en/pages/setup/security/authentication/oidc.adoc +++ b/docs/latest/modules/en/pages/setup/security/authentication/oidc.adoc @@ -14,7 +14,7 @@ Before you can configure SUSE Observability to authenticate using OIDC, you need === Rancher [NOTE] -This only works with Rancher 2.12 or later. You need to [configure Rancher as an OIDC provider](https://documentation.suse.com/cloudnative/rancher-manager/latest/en/rancher-admin/users/authn-and-authz/configure-oidc-provider.html). +This only works with Rancher 2.12 or later. You need to https://documentation.suse.com/cloudnative/rancher-manager/latest/en/rancher-admin/users/authn-and-authz/configure-oidc-provider.html[configure Rancher as an OIDC provider]. Create an OIDCClient resource in the Rancher local cluster: [,yaml] @@ -73,7 +73,7 @@ The result of this configuration should produce a *clientId* and a *secret*. Cop === Rancher [NOTE] -This only works with Rancher 2.12 or later. You need to [configure Rancher as an OIDC provider](https://documentation.suse.com/cloudnative/rancher-manager/latest/en/rancher-admin/users/authn-and-authz/configure-oidc-provider.html). +This only works with Rancher 2.12 or later. You need to https://documentation.suse.com/cloudnative/rancher-manager/latest/en/rancher-admin/users/authn-and-authz/configure-oidc-provider.html[configure Rancher as an OIDC provider]. To configure Rancher as the OIDC provider for SUSE Observability, you need to add the OIDC details to the authentication values: [,yaml] diff --git a/docs/latest/modules/en/pages/setup/security/rbac/rbac_rancher.adoc b/docs/latest/modules/en/pages/setup/security/rbac/rbac_rancher.adoc index 38dcb8ed..5c8619f3 100644 --- a/docs/latest/modules/en/pages/setup/security/rbac/rbac_rancher.adoc +++ b/docs/latest/modules/en/pages/setup/security/rbac/rbac_rancher.adoc @@ -8,41 +8,63 @@ The SUSE Rancher Prime Observability Extension uses Kubernetes RBAC to grant access to Rancher users in SUSE Observability. If you do not use Rancher, look at xref:/setup/security/rbac/rbac_roles.adoc[How to set up roles] in a standalone installation. -NOTE: for Rancher RBAC to function, authentication for SUSE Observability must be configured with the xref:setup/security/authentication/oidc.adoc#_rancher[Rancher OIDC Provider]. +[NOTE] +==== +For Rancher RBAC to function, + +* authentication for {stackstate-product-name} must be configured with the xref:setup/security/authentication/oidc.adoc#_rancher[Rancher OIDC Provider]. +* the {stackstate-product-name} Agent must have the RBAC Agent enabled and must authenticate using a service token. +==== + +Every authenticated user has the *Instance Basic Access* role that allows them to use the system. These permissions provide access to the views, settings, metric bindings, and lets a user see system notifications. They do NOT grant access to any {stackstate-product-name} data. In order to see any data, a user needs to be given an additional role. Two directions for extending the *Instance Basic Access* role are provided with Rancher *Role Templates*: -You can use two kinds of roles for accessing SUSE Observability: -* A _scope role_ (Observer) grants access to data - either all data in a SUSE Observability instance, data coming from a cluster, or just the data for a namespace. This role is provisioned in a cluster to be observed. -* An _instance role_ grants permissions to access or modify functionality of SUSE Observability itself. These roles grant access to all data in SUSE Observability. +Instance Roles:: Enables you to configure or personalize {stacktate-product-name}. +Scoped Roles:: Grants access to {stackstate-product-name} data from observed clusters. -Several `RoleTemplate`s are available to achieve this, with common groupings of permissions. Binding these templates to users or groups on a cluster or namespace triggers roles and role-bindings for provisioning on the target cluster. A description of the default templates is below. It is possible to define your own combinations of permissions in a custom RoleTemplate. +== Instance roles -=== Observer role +You can assign the *Role Templates* for *Instance Roles* to users or groups in the *Project* that is running {stackstate-product-name}. If no instance roles are explicitly assigned to a member of a project, then they will have the permissions of the *Instance Basic Access* role. -The observer role grants a user the permission to read topology, metrics, logs and trace data for a namespace or a cluster. There are three `RoleTemplate`s that grant access to observability data: +=== Instance roles with access to {stackstate-product-name} data -* *Observer* - grants access to data coming from namespaces in a Project. You can use this in the "Project Membership" section of the cluster configuration. -* *Cluster Observer* - grants access to all data coming from a Cluster. You can use this template in the "Cluster Membership" section of the cluster configuration. -* *Instance Observer* - grants access to all data in a SUSE Observability instance. You can use this template on the Project that includes SUSE Observability itself. +A couple of "global" roles allow access to all {stackstate-product-name} data - in any of the observed clusters. These roles are intended to be used for setting up the system and for troubleshooting system-level problems. For users with any of these roles, it is not necessary to configure xref:scoped[Scoped Roles]. -To use these observer roles, it is recommended that the following role is granted on the Project running SUSE Observability itself: -* *Recommended Access* - has recommended permissions for using SUSE Observability. +Instance Admin:: Grants full access to all views and all permissions. +Instance Troubleshooter:: Grants all permissions required to use SUSE Observability for troubleshooting, including the ability to enable/disable monitors, create custom views, and use the CLI. +Instance Observer:: Grants access to all data in a SUSE Observability instance. -=== Instance roles +=== Instance roles without access to {stackstate-product-name} data -There are two roles predefined in SUSE Observability, for configuring the system. This includes setting up views, monitors, notifications and so on. -As these concern "global" settings of SUSE Observability, these roles include access to all data in an observability instance. +These roles need to be combined with the *Instance Observer* role or one of the xref:scoped[Scoped Roles] (see below). Otherwise, no {stackstate-product-name} data is accessible and the UI will show a "No components found" message. This applies to all Rancher users, including users, such as Project owners. -* *Instance Troubleshooter* - has all permissions required to use SUSE Observability for troubleshooting, including the ability to enable/disable monitors, create custom views and use the Cli. -* *Instance Administrator* - has full access to all views and has all permissions. +Instance Recommended Access:: Grants recommended permissions to use SUSE Observability. This role includes permissions that are not strictly necessary, but provide (limited) means of personalization {stackstate-product-name}. +Instance Basic Access:: Grants minimal permissions to use {stackstate-product-name}. This role does not need to be explicitly assigned and there is no *Role Template* for it; every logged-in user has it. You can find the permissions assigned to each predefined SUSE Observability role below. For details of the different permissions and how to manage them using the `sts` CLI, see xref:/setup/security/rbac/rbac_permissions.adoc[Role based access control (RBAC) permissions] [tabs] ==== +Basic Access:: ++ +-- +Basic access grants minimal permissions for using SUSE Observability. To be combined with an Observer (Instance, Cluster or Project). +These permissions are granted to all users. + +|=== +|Resource |Verbs + +|metric-bindings |get +|settings |get +|system-notifications |get +|views |get +|=== + +-- Recommended Access:: + -- -Recommended access grants permissions that are not strictly necessary, but that make SUSE Observability a lot more useful. +Recommended access grants permissions that are not strictly necessary, but that make SUSE Observability a lot more useful. It provides a limited degree of personalization. +To be combined with an Observer (Instance, Cluster or Project). |=== |Resource |Verbs @@ -54,6 +76,20 @@ Recommended access grants permissions that are not strictly necessary, but that |visualization-settings |update |=== +-- +Observer:: ++ +-- +Observer grants access to all observability data in a SUSE Observability instance. Combine with *Recommended Access* for a better experience. + +|=== +|Resource |Verbs + +|topology |get +|metrics |get +|traces |get +|=== + -- Troubleshooter:: + @@ -84,7 +120,7 @@ The Troubleshooter role has access to all data available in SUSE Observability a |=== -- -Administrator:: +Admin:: + -- The Administrator role has all permissions assigned. @@ -121,21 +157,27 @@ The Administrator role has all permissions assigned. -- ==== -=== Resource details +[#scoped] +== Scoped roles + +You can assign the following *Role Templates* to users or groups in an observed cluster. They grant access to {stackstate-product-name} data coming from (a *Project* in) the *Cluster*, giving a user permission to read topology, metrics, logs and trace data. -These resources correspond to those of xref:/setup/security/rbac/rbac_permissions.adoc[RBAC Permissions]. In particular *scoped permissions* apply to data collected by the SUSE Observability agent and access should typically be limited on a cluster or a namespace level. The following resources are available in the `scope.observability.cattle.io` API Group: +Observer:: Grants access to data coming from namespaces in a *Project*. You can use this in the *Project Membership* section of the cluster configuration. +Cluster Observer:: Grants access to all data coming from a *Cluster*. You can use this template in the *Cluster Membership* section of the cluster configuration. + +The resources in these roles correspond to xref:/setup/security/rbac/rbac_permissions.adoc#_scoped_permissions[Scoped Permissions]. They are available in the `scope.observability.cattle.io` API Group (with just verb `get` as these resources are read only): * `topology` - components (deployments, pods, etcetera) from the cluster or namespace * `traces` - spans from the cluster or namespace * `metrics` - metric data originating from the cluster or namespace -These resources are read only, so the only applicable verb is `get`. +Note that access to logs is controlled by the `topology` resource. -Other permissions, those that are not *scoped*, define user capabilities and access to parts of SUSE Observability. These "system permissions" allow, for example, executing queries or scripts and configuring SUSE Observability. Those are collected from the `instance.observability.cattle.io` API Group. +Enable personalization for users with these observer roles by granting the *Instance Recommended Access* role on the *Project* running {stackstate-product-name}. -=== Custom roles +== Custom roles -To grant additional permissions beyond Recommended Access, create a custom Project `RoleTemplate` in Rancher, inheriting from "SUSE Observability Instance Recommended Access". Then, for example, to grant the rights to view monitors and metric charts, add rules with: +To grant additional permissions beyond Recommended Access, create a custom Project *RoleTemplate* in Rancher, inheriting from *SUSE Observability Instance Recommended Access*. Then, for example, to grant the rights to view monitors and metric charts, add rules with: * Verb: `get` * Resource: `metricbindings` and `monitors` @@ -145,12 +187,11 @@ image::rancher-custom-role.png[Custom RoleTemplate for richer access] You can specify any resource and verb combination defined in the xref:/setup/security/rbac/rbac_permissions.adoc[RBAC Permissions]. Note that the dashes (`-`) are dropped from resource names, so the permission `get-metric-bindings` becomes the Kubernetes RBAC resource `metricbindings` with the verb `get`. + == Troubleshooting * Verify that the Rbac Agent for the cluster is able to communicate with the platform. -NOTE: the Rbac Agent must authenticate using service tokens. - * xref:/setup/security/rbac/rbac_permissions.adoc#_list_subjects_for_a_user[Inspect the user subjects] (user and roles). ** Verify any roles configuration on the OIDC provider. * xref:/setup/security/rbac/rbac_permissions.adoc#_show_granted_permissions[Inspect the subject permission] diff --git a/product-docs-common b/product-docs-common index 4dc90cb6..c31fda22 160000 --- a/product-docs-common +++ b/product-docs-common @@ -1 +1 @@ -Subproject commit 4dc90cb6d651991b3e5f5b1de1f22e955fa29c54 +Subproject commit c31fda22fd075ddf1eb97b92045d05db8f8a38a7 diff --git a/ss-local-playbook.yml b/ss-local-playbook.yml index bc22eeeb..7a569b54 100644 --- a/ss-local-playbook.yml +++ b/ss-local-playbook.yml @@ -17,9 +17,7 @@ ui: asciidoc: attributes: ss-build-type: 'product' # 'community' or 'product' - ss-rancher-product-name: 'SUSE® Rancher Prime: Observability' - ss-rancher-product-short-name: 'SUSE® Observability' - ss-community-product-name: 'StackState' + stackstate-product-name: "SUSE® Observability" page-pagination: '' tabs-sync-option: '' extensions: @@ -38,4 +36,4 @@ antora: enabled: true output: - dir: build/site + dir: build/site \ No newline at end of file diff --git a/ss-remote-playbook.yml b/ss-remote-playbook.yml index 4f74f72d..6d0407dd 100644 --- a/ss-remote-playbook.yml +++ b/ss-remote-playbook.yml @@ -18,9 +18,7 @@ asciidoc: attributes: page-draft-preview-only: 'true' ss-build-type: 'product' - ss-rancher-product-name: 'SUSE® Rancher Prime: Observability' - ss-rancher-product-short-name: 'SUSE® Observability' - ss-community-product-name: 'StackState' + stackstate-product-name: "SUSE® Observability" page-pagination: '' tabs-sync-option: '' extensions: From 5990aac8ead5e3c5e78e96f2b83a79c20c3703ba Mon Sep 17 00:00:00 2001 From: Daniel Barra Date: Tue, 25 Nov 2025 08:11:10 -0300 Subject: [PATCH 14/14] STAC-23862: Add release notes 2.6.3 --- docs/latest/modules/en/nav.adoc | 1 + .../en/pages/setup/release-notes/v2.6.3.adoc | 32 +++++++++++++++++++ 2 files changed, 33 insertions(+) create mode 100644 docs/latest/modules/en/pages/setup/release-notes/v2.6.3.adoc diff --git a/docs/latest/modules/en/nav.adoc b/docs/latest/modules/en/nav.adoc index c31732e1..2e8aa596 100644 --- a/docs/latest/modules/en/nav.adoc +++ b/docs/latest/modules/en/nav.adoc @@ -139,6 +139,7 @@ *** xref:setup/release-notes/v2.6.0.adoc[v2.6.0 - 29/Sep/2025] *** xref:setup/release-notes/v2.6.1.adoc[v2.6.1 - 13/Oct/2025] *** xref:setup/release-notes/v2.6.2.adoc[v2.6.2 - 03/Nov/2025] +*** xref:setup/release-notes/v2.6.3.adoc[v2.6.3 - 25/Nov/2025] ** xref:setup/upgrade-stackstate/README.adoc[Upgrade SUSE Observability] *** xref:setup/upgrade-stackstate/migrate-from-6.adoc[Migration from StackState] *** xref:setup/upgrade-stackstate/steps-to-upgrade.adoc[Steps to upgrade] diff --git a/docs/latest/modules/en/pages/setup/release-notes/v2.6.3.adoc b/docs/latest/modules/en/pages/setup/release-notes/v2.6.3.adoc new file mode 100644 index 00000000..d39320fe --- /dev/null +++ b/docs/latest/modules/en/pages/setup/release-notes/v2.6.3.adoc @@ -0,0 +1,32 @@ += v2.6.3 - 25/Nov/2025 +:revdate: 2025-11-25 +:page-revdate: {revdate} +:description: SUSE Observability Self-hosted + +== Release Notes: {stackstate-product-name} Helm Chart v2.6.3 + +== New Features & Enhancements + +* *HDFS Upgrade:* HDFS (Hadoop Distributed File System) and its associated dependencies have been upgraded. +* *StackPack: Partial Topology Sync Monitor:* A new monitor has been added to the StackState StackPack to alert on **partial Topology Synchronization snapshots**. +* *vmagent Resource Increase:* The memory and CPU resource requirements for the `vmagent` component have been increased in the `4000-ha` profile. +* *Image Upgrades:* +** The **Kafka** container image has been upgraded. +** The **ClickHouse** container image has been upgraded. + +== Bug Fixes + +* *OpenTelemetry Metric Scoping:* Fixed a critical issue where metrics ingested via the OpenTelemetry collector were missing the `_scope_` label. This prevented **scoped users** from being able to observe these metrics. +* *Metric Explorer Sorting:* The **Metric Explorer** now uses numerical sorting for values in the value column. +* *Platform: StackGraph Corruption (Timed-Out Transactions):* Fixed a **StackGraph corruption issue** where data from timed-out transactions that should have been rolled back could inadvertently reappear. +* *Platform: State Pod Validation:* Added **additional data validation and logging** to the state pod for improved stability and debugging. +* *StackGraph: Edge Deletion Invariant:* Added an invariant to prevent inconsistent edge references when performing a delete edge operation in **StackGraph**. +* *StackGraph Integrity Verifier:* An **experimental perpetual integrity verifier** has been added for StackGraph. It can be enabled by setting `hbase.console.integrity.enabled=true`. +* *StackPack Remediation Guides:* Fixed several remediation guides within the SUSE Observability stackpack that incorrectly referenced `tags` instead of the correct term, **`labels`**. +* *Duplicate OpenTelemetry StackPack:* Removed a duplicate **OpenTelemetry stackpack** installation. +* *Platform: Agent Restart Snapshot Loop:* Fixed an issue where a restart of an agent could cause the **'active snapshot'** to continuously occur. +* *Platform: Kafka JMX OOM Fix:* Resolved an Out-Of-Memory (OOM) issue for the Kafka JMX container on RKE2 Kubernetes versions 1.31 and 1.30. + +=== Agent Bug Fixes + +* *Agent: /proc//stat Panic:* The agent now includes a fix to prevent a panic when a `/proc//stat` file is found to be empty.git c \ No newline at end of file