Skip to content
78 changes: 78 additions & 0 deletions acm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
## Setup ACM with NetObserv metrics
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for reviewers: this file is just a recipe for internal purpose, not the blog post; for the blog, look at blogs/acm/leverage-metrics-in-acm.md


cf also [blog post](./blogs/acm/leverage-metrics-in-acm.md).

This is more a quick guide for the development teams.

Quick guide:

1. Create 2 clusters (or more)
2. Choose one for being the main one / hub: install ACM operator on it; Create a default MultiClusterHub
3. In console top bar, select "all cluster" then start procedure to import an existing cluster. You may define labels "netobserv=true" during import.

You have two options, either you use ACM policies to automate the install, or you install manually netobserv or each cluster.

### Option 1: with ACM policies

Note that this doesn't cover Loki installation, so in this mode Loki & Console plugin will be disabled. Of course it is possible to also automate Loki installation, by creating new policy objects. Feel free to contribute!

```bash
oc apply -f ./examples/ACM/acm-policy-netobserv-1.4.yaml
oc apply -f ./examples/ACM/acm-policy-flowcollector-v1beta1-noloki.yaml
oc apply -f ./examples/ACM/acm-bindings.yaml
```

Then on each cluster you want to include, add the label "netobserv=true" if you haven't already done so. It will enable the policies for it, triggering automated install. You can do it from the console under Infrastructure > Clusters > Edit labels (on each row / kebab menu).

### Option 2: manual install

On each cluster:
1. Install netobserv downstream (user workload prometheus won't work the same way)
2. Create a FlowCollector, with these metrics enabled (`spec.processor.metrics.includeList`) :

```yaml
includeList:
- namespace_flows_total
- node_ingress_bytes_total
- workload_ingress_bytes_total
- workload_egress_bytes_total
- workload_egress_packets_total
- workload_ingress_packets_total
```

cf steps at https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.8/html/observability/observing-environments-intro#enabling-observability :

```bash
oc create namespace open-cluster-management-observability
DOCKER_CONFIG_JSON=`oc extract secret/pull-secret -n openshift-config --to=-`
oc create secret generic multiclusterhub-operator-pull-secret \
-n open-cluster-management-observability \
--from-literal=.dockerconfigjson="$DOCKER_CONFIG_JSON" \
--type=kubernetes.io/dockerconfigjson
```

Setup S3, Thanos Secret and ACM observability:

```bash
./examples/ACM/thanos-s3.sh yourname-thanos us-east-2
oc apply -f examples/ACM/acm-observability.yaml
oc get pods -n open-cluster-management-observability -w
oc apply -f examples/ACM/netobserv-metrics.yaml
```

To debug the above config, check logs here:

```bash
oc logs -n open-cluster-management-addon-observability -l component=metrics-collector
```

Deploying dashboards:

```bash
oc apply -f examples/ACM/dashboards
```

Metrics resolution = 5 minutes

Designing dashboards: https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.8/html/observability/using-grafana-dashboards#setting-up-the-grafana-developer-instance

Binary file added blogs/acm/images/console-acm-all-clusters.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blogs/acm/images/console-acm-grafana.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blogs/acm/images/overview-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blogs/acm/images/overview-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blogs/acm/images/per-cluster-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blogs/acm/images/per-cluster-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blogs/acm/images/search-dashboard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
254 changes: 254 additions & 0 deletions blogs/acm/leverage-metrics-in-acm.md

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions examples/ACM/acm-observability.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
apiVersion: observability.open-cluster-management.io/v1beta2
kind: MultiClusterObservability
metadata:
name: observability
spec:
observabilityAddonSpec: {}
storageConfig:
metricObjectStorage:
name: thanos-object-storage
key: thanos.yaml
12 changes: 12 additions & 0 deletions examples/ACM/dashboards/clusters-overview.yaml

Large diffs are not rendered by default.

12 changes: 12 additions & 0 deletions examples/ACM/dashboards/per-cluster.yaml

Large diffs are not rendered by default.

33 changes: 33 additions & 0 deletions examples/ACM/netobserv-metrics.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
kind: ConfigMap
apiVersion: v1
metadata:
name: observability-metrics-custom-allowlist
namespace: open-cluster-management-observability
data:
metrics_list.yaml: |
rules:
# Namespaces
- record: namespace:netobserv_workload_egress_bytes_total:src:rate5m
expr: sum(label_replace(rate(netobserv_workload_egress_bytes_total[5m]),\"namespace\",\"$1\",\"SrcK8S_Namespace\",\"(.*)\")) by (namespace)
- record: namespace:netobserv_workload_ingress_bytes_total:dst:rate5m
expr: sum(label_replace(rate(netobserv_workload_ingress_bytes_total[5m]),\"namespace\",\"$1\",\"DstK8S_Namespace\",\"(.*)\")) by (namespace)
- record: namespace:netobserv_workload_egress_packets_total:src:rate5m
expr: sum(label_replace(rate(netobserv_workload_egress_packets_total[5m]),\"namespace\",\"$1\",\"SrcK8S_Namespace\",\"(.*)\")) by (namespace)
- record: namespace:netobserv_workload_ingress_packets_total:dst:rate5m
expr: sum(label_replace(rate(netobserv_workload_ingress_packets_total[5m]),\"namespace\",\"$1\",\"DstK8S_Namespace\",\"(.*)\")) by (namespace)

# Namespaces / cluster ingress|egress
- record: namespace:netobserv_workload_egress_bytes_total:src:unknown_dst:rate5m
expr: sum(label_replace(rate(netobserv_workload_egress_bytes_total{DstK8S_OwnerType=\"\"}[5m]),\"namespace\",\"$1\",\"SrcK8S_Namespace\",\"(.*)\")) by (namespace)
- record: namespace:netobserv_workload_ingress_bytes_total:dst:unknown_src:rate5m
expr: sum(label_replace(rate(netobserv_workload_ingress_bytes_total{SrcK8S_OwnerType=\"\"}[5m]),\"namespace\",\"$1\",\"DstK8S_Namespace\",\"(.*)\")) by (namespace)
- record: namespace:netobserv_workload_egress_packets_total:src:unknown_dst:rate5m
expr: sum(label_replace(rate(netobserv_workload_egress_packets_total{DstK8S_OwnerType=\"\"}[5m]),\"namespace\",\"$1\",\"SrcK8S_Namespace\",\"(.*)\")) by (namespace)
- record: namespace:netobserv_workload_ingress_packets_total:dst:unknown_src:rate5m
expr: sum(label_replace(rate(netobserv_workload_ingress_packets_total{SrcK8S_OwnerType=\"\"}[5m]),\"namespace\",\"$1\",\"DstK8S_Namespace\",\"(.*)\")) by (namespace)

# Workloads
- record: workload:netobserv_workload_egress_bytes_total:src:rate5m
expr: sum(label_replace(label_replace(label_replace(rate(netobserv_workload_egress_bytes_total[5m]),\"namespace\",\"$1\",\"SrcK8S_Namespace\",\"(.*)\"),\"workload\",\"$1\",\"SrcK8S_OwnerName\",\"(.*)\"),\"kind\",\"$1\",\"SrcK8S_OwnerType\",\"(.*)\")) by (namespace,workload,kind)
- record: workload:netobserv_workload_ingress_bytes_total:dst:rate5m
expr: sum(label_replace(label_replace(label_replace(rate(netobserv_workload_ingress_bytes_total[5m]),\"namespace\",\"$1\",\"DstK8S_Namespace\",\"(.*)\"),\"workload\",\"$1\",\"DstK8S_OwnerName\",\"(.*)\"),\"kind\",\"$1\",\"DstK8S_OwnerType\",\"(.*)\")) by (namespace,workload,kind)
28 changes: 28 additions & 0 deletions examples/ACM/policies/acm-bindings.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
apiVersion: apps.open-cluster-management.io/v1
kind: PlacementRule
metadata:
name: placement-policy-netobserv
spec:
clusterConditions:
- status: "True"
type: ManagedClusterConditionAvailable
clusterSelector:
matchExpressions:
- {key: netobserv, operator: In, values: ["true"]}
---
apiVersion: policy.open-cluster-management.io/v1
kind: PlacementBinding
metadata:
name: binding-policy-netobserv
placementRef:
name: placement-policy-netobserv
kind: PlacementRule
apiGroup: apps.open-cluster-management.io
subjects:
- name: netobserv
kind: Policy
apiGroup: policy.open-cluster-management.io
- name: netobserv-flowcollector
kind: Policy
apiGroup: policy.open-cluster-management.io
36 changes: 36 additions & 0 deletions examples/ACM/policies/acm-policy-flowcollector-v1beta1-noloki.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
name: netobserv-flowcollector
spec:
disabled: false
dependencies:
- apiVersion: policy.open-cluster-management.io/v1
kind: Policy
name: netobserv
compliance: Compliant
policy-templates:
- objectDefinition:
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: netobserv-flowcollector
spec:
remediationAction: enforce
severity: medium
object-templates:
- complianceType: musthave
objectDefinition:
apiVersion: flows.netobserv.io/v1beta1
kind: FlowCollector
metadata:
name: cluster
spec:
processor:
metrics:
ignoreTags:
- nodes-flows
- workloads-flows
- namespaces
loki:
enable: false
101 changes: 101 additions & 0 deletions examples/ACM/policies/acm-policy-netobserv-1.4.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
name: netobserv
spec:
disabled: false
policy-templates:
- objectDefinition:
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: netobserv-operator-namespace
spec:
remediationAction: enforce
severity: medium
object-templates:
- complianceType: musthave
objectDefinition:
apiVersion: v1
kind: Namespace
metadata:
name: openshift-netobserv-operator
- extraDependencies:
- apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
name: netobserv-operator-namespace
namespace: ""
compliance: Compliant
objectDefinition:
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: netobserv-operatorgroup
spec:
remediationAction: enforce
severity: medium
object-templates:
- complianceType: musthave
objectDefinition:
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: netobserv
namespace: openshift-netobserv-operator
spec:
upgradeStrategy: Default
- extraDependencies:
- apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
name: netobserv-operatorgroup
namespace: ""
compliance: Compliant
objectDefinition:
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: netobserv-subscription
spec:
remediationAction: enforce
severity: medium
object-templates:
- complianceType: musthave
objectDefinition:
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: netobserv-operator
namespace: openshift-netobserv-operator
spec:
channel: stable
installPlanApproval: Automatic
name: netobserv-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
startingCSV: network-observability-operator.v1.4.2
- extraDependencies:
- apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
name: netobserv-subscription
namespace: ""
compliance: Compliant
objectDefinition:
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: netobserv-csv-check
spec:
remediationAction: inform
severity: medium
object-templates:
- complianceType: musthave
objectDefinition:
apiVersion: operators.coreos.com/v1alpha1
kind: ClusterServiceVersion
metadata:
namespace: openshift-netobserv-operator
spec:
displayName: Network Observability
status:
phase: Succeeded
reason: InstallSucceeded
22 changes: 22 additions & 0 deletions examples/ACM/thanos-s3.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash

if [[ "$#" -lt 2 || "$1" = "--help" ]]; then
echo "Syntax: $0 S3_NAME AWS_REGION"
echo ""
echo "Create S3 bucket and the related secret to use with Thanos"
echo "You need to have the AWS CLI installed and configured."
echo ""
echo " e.g: $0 yourname-thanos eu-west-1"
echo ""
exit
fi

export YOUR_S3_BUCKET="$1"
export YOUR_S3_REGION="$2"
export YOUR_ACCESS_KEY=$(aws configure get aws_access_key_id)
export YOUR_SECRET_KEY=$(aws configure get aws_secret_access_key)
export YOUR_S3_ENDPOINT="s3.${YOUR_S3_REGION}.amazonaws.com"

aws s3api create-bucket --bucket $YOUR_S3_BUCKET --region $YOUR_S3_REGION --create-bucket-configuration LocationConstraint=$YOUR_S3_REGION

curl -s -L "https://raw.githubusercontent.com/netobserv/documents/main/examples/ACM/thanos-secret.yaml" | envsubst | kubectl apply -f -
15 changes: 15 additions & 0 deletions examples/ACM/thanos-secret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: v1
kind: Secret
metadata:
name: thanos-object-storage
namespace: open-cluster-management-observability
type: Opaque
stringData:
thanos.yaml: |
type: s3
config:
bucket: $YOUR_S3_BUCKET
endpoint: $YOUR_S3_ENDPOINT
insecure: true
access_key: $YOUR_ACCESS_KEY
secret_key: $YOUR_SECRET_KEY