Skip to content

Commit

Permalink
OSDOCS-6754: Network Observability easier configuration
Browse files Browse the repository at this point in the history
  • Loading branch information
skrthomas committed Sep 19, 2023
1 parent f38fd03 commit d493a38
Show file tree
Hide file tree
Showing 8 changed files with 99 additions and 12 deletions.
6 changes: 4 additions & 2 deletions modules/network-observability-flowcollector-view.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,15 @@ spec:
type: configmap
name: loki-gateway-ca-bundle
certFile: service-ca.crt
namespace: loki-namespace # <5>
consolePlugin:
register: true
logLevel: info
portNaming:
enable: true
portNames:
"3100": loki
quickFilters: <5>
quickFilters: <6>
- name: Applications
filter:
src_namespace!: 'openshift-,netobserv'
Expand All @@ -87,4 +88,5 @@ spec:
<2> You can set the Sampling specification, `spec.agent.ebpf.sampling`, to manage resources. Lower sampling values might consume a large amount of computational, memory and storage resources. You can mitigate this by specifying a sampling ratio value. A value of 100 means 1 flow every 100 is sampled. A value of 0 or 1 means all flows are captured. The lower the value, the increase in returned flows and the accuracy of derived metrics. By default, eBPF sampling is set to a value of 50, so 1 flow every 50 is sampled. Note that more sampled flows also means more storage needed. It is recommend to start with default values and refine empirically, to determine which setting your cluster can manage.
<3> The optional specifications `spec.processor.logTypes`, `spec.processor.conversationHeartbeatInterval`, and `spec.processor.conversationEndTimeout` can be set to enable conversation tracking. When enabled, conversation events are queryable in the web console. The values for `spec.processor.logTypes` are as follows: `FLOWS` `CONVERSATIONS`, `ENDED_CONVERSATIONS`, or `ALL`. Storage requirements are highest for `ALL` and lowest for `ENDED_CONVERSATIONS`.
<4> The Loki specification, `spec.loki`, specifies the Loki client. The default values match the Loki install paths mentioned in the Installing the Loki Operator section. If you used another installation method for Loki, specify the appropriate client information for your install.
<5> The `spec.quickFilters` specification defines filters that show up in the web console. The `Application` filter keys,`src_namespace` and `dst_namespace`, are negated (`!`), so the `Application` filter shows all traffic that _does not_ originate from, or have a destination to, any `openshift-` or `netobserv` namespaces. For more information, see Configuring quick filters below.
<5> The original certificates are copied to the Network Observability instance namespace and watched for updates. When not provided, the namespace defaults to be the same as "spec.namespace". If you chose to install Loki in a different namespace, you must specify it in the `spec.loki.tls.caCert.namespace` field. Similarly, the `spec.exporters.kafka.tls.caCert.namespace` field is available for Kafka installed in a different namespace.
<6> The `spec.quickFilters` specification defines filters that show up in the web console. The `Application` filter keys,`src_namespace` and `dst_namespace`, are negated (`!`), so the `Application` filter shows all traffic that _does not_ originate from, or have a destination to, any `openshift-` or `netobserv` namespaces. For more information, see Configuring quick filters below.
9 changes: 5 additions & 4 deletions modules/network-observability-loki-install.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
:_content-type: PROCEDURE
[id="network-observability-loki-installation_{context}"]
= Installing the Loki Operator
It is recommended to install link:https://catalog.redhat.com/software/containers/openshift-logging/loki-rhel8-operator/622b46bcae289285d6fcda39[Loki Operator version 5.7], This version provides the ability to create a LokiStack instance using the `openshift-network` tenant configuration mode. It also provides fully automatic, in-cluster authentication and authorization support for Network Observability.
It is recommended to install link:https://catalog.redhat.com/software/containers/openshift-logging/loki-rhel8-operator/622b46bcae289285d6fcda39[Loki Operator version 5.7+]. This version provides the ability to create a LokiStack instance using the `openshift-network` tenant configuration mode. It also provides fully automatic in-cluster authentication and authorization support for Network Observability.

.Prerequisites

Expand Down Expand Up @@ -34,25 +34,26 @@ There are several ways you can install Loki. One way you can install the Loki Op
+
. Create a `Secret` YAML file. You can create this secret in the web console or CLI.
.. Using the web console, navigate to the *Project* -> *All Projects* dropdown and select *Create Project*. Name the project `netobserv` and click *Create*.
.. Navigate to the Import icon, *+*, in the top right corner. Drop your YAML file into the editor. It is important to create this YAML file in the `netobserv` namespace that uses the `access_key_id` and `access_key_secret` to specify your credentials.
.. Navigate to the Import icon, *+*, in the top right corner. Drop your YAML file into the editor.

.. Once you create the secret, you should see it listed under *Workloads* -> *Secrets* in the web console.
+
The following shows an example secret YAML file:
+
[source,yaml]
----
apiVersion: v1
kind: Secret
metadata:
name: loki-s3
namespace: netobserv
namespace: netobserv <1>
stringData:
access_key_id: QUtJQUlPU0ZPRE5ON0VYQU1QTEUK
access_key_secret: d0phbHJYVXRuRkVNSS9LN01ERU5HL2JQeFJmaUNZRVhBTVBMRUtFWQo=
bucketnames: s3-bucket-name
endpoint: https://s3.eu-central-1.amazonaws.com
region: eu-central-1
----
<1> The installation examples in this documentation use the same namespace, `netobserv`, across all components. You can optionally use a different namespace for the different components

[IMPORTANT]
====
Expand Down
9 changes: 5 additions & 4 deletions modules/network-observability-lokistack-create.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
:_content-type: PROCEDURE
[id="network-observability-lokistack-create_{context}"]
= Create a LokiStack custom resource
It is recommended to deploy the LokiStack in the same namespace referenced by the FlowCollector specification, `spec.namespace`. You can use the web console or CLI to create a namespace, or new project.
You can deploy a LokiStack using the web console or CLI.

.Procedure

Expand All @@ -20,7 +20,7 @@ It is recommended to deploy the LokiStack in the same namespace referenced by th
kind: LokiStack
metadata:
name: loki
namespace: netobserv
namespace: netobserv <1>
spec:
size: 1x.small
storage:
Expand All @@ -30,11 +30,12 @@ It is recommended to deploy the LokiStack in the same namespace referenced by th
secret:
name: loki-s3
type: s3
storageClassName: gp3 <1>
storageClassName: gp3 <2>
tenants:
mode: openshift-network
----
<1> Use a storage class name that is available on the cluster for `ReadWriteOnce` access mode. You can use `oc get storageclasses` to see what is available on your cluster.
<1> The installation examples in this documentation use the same namespace, `netobserv`, across all components. You can optionally use a different namespace.
<2> Use a storage class name that is available on the cluster for `ReadWriteOnce` access mode. You can use `oc get storageclasses` to see what is available on your cluster.
+
[IMPORTANT]
====
Expand Down
30 changes: 30 additions & 0 deletions modules/network-observability-rate-limit-alert.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
//
// network_observability/configuring-operator.adoc

:_content-type: CONCEPT
[id="network-observability-netobserv-dashboard-rate-limit-alerts_{context}"]
= Creating Loki rate limit alerts for the NetObserv dashboard
You can create custom rules for the *Netobserv* dashboard metrics to trigger alerts when Loki rate limits have been reached.

An example of an alerting rule configuration YAML file is as follows:
[source,yaml]
----
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: loki-alerts
namespace: openshift-operators-redhat
spec:
groups:
- name: LokiRateLimitAlerts
rules:
- alert: LokiTenantRateLimit
annotations:
message: |-
{{ $labels.job }} {{ $labels.route }} is experiencing 429 errors.
summary: "At any number of requests are responded with the rate limit error code."
expr: sum(irate(loki_request_duration_seconds_count{status_code="429"}[1m])) by (job, namespace, route) / sum(irate(loki_request_duration_seconds_count[1m])) by (job, namespace, route) * 100 > 0
for: 10s
labels:
severity: warning
----
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
// Module included in the following assemblies:

// * networking/network_observability/troubleshooting-network-observability.adoc

:_content-type: PROCEDURE
[id="network-observability-troubleshooting-loki-tenant-rate-limit_{context}"]
= LokiStack rate limit errors
A rate-limit placed on the Loki tenant can result in potential temporary loss of data and a 429 error: `Per stream rate limit exceeded (limit:xMB/sec) while attempting to ingest for stream`. You might consider having an alert set to notify you of this error. For more information, see "Creating Loki rate limit alerts for the NetObserv dashboard" in the Additional resources of this section.

You can update the LokiStack CRD with the `perStreamRateLimit` and `perStreamRateLimitBurst` specifications, as shown in the following procedure.

.Procedure
. Navigate to *Operators* -> *Installed Operators*, viewing *All projects* from the *Project* dropdown.
. Look for *Loki Operator*, and select the *LokiStack* tab.
. Create or edit an existing *LokiStack* instance using the *YAML view* to add the `perStreamRateLimit` and `perStreamRateLimitBurst` specifications:
+
[source, yaml]
----
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: loki
namespace: netobserv
spec:
limits:
global:
ingestion:
perStreamRateLimit: 6 <1>
perStreamRateLimitBurst: 30 <2>
tenants:
mode: openshift-network
managementState: Managed
----
<1> The default value for `perStreamRateLimit` is `3`.
<2> The default value for `perStreamRateLimitBurst` is `15`.

. Click *Save*.

.Verification
Once you update the `perStreamRateLimit` and `perStreamRateLimitBurst` specifications, the pods in your cluster restart and the 429 rate-limit error no longer occurs.
4 changes: 4 additions & 0 deletions networking/network_observability/installing-operators.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ The Loki Operator can also be used for xref:../../logging/cluster-logging-loki.a
====

include::modules/network-observability-loki-install.adoc[leveloffset=+1]
[role="_additional-resources"]
.Additional resources
* For more information about the option to use different namespaces for the separate components, see the `spec.loki.tls.caCert.namespace` specification in the xref:../network_observability/flowcollector-api.adoc#network-observability-flowcollector-api-specifications_network_observability[Flow Collector API Reference] and callout number 5 in the xref:../network_observability/configuring-operator.adoc#network-observability-flowcollector-view_network_observability[Flow Collector sample resource].
include::modules/network-observability-lokistack-create.adoc[leveloffset=+2]
include::modules/network-observability-lokistack-ingestion-query.adoc[leveloffset=+2]
include::modules/network-observability-auth-multi-tenancy.adoc[leveloffset=+1]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,9 @@ You can use the web console to monitor alerts related to the health of the Netwo


include::modules/network-observability-viewing-alerts.adoc[leveloffset=+1]
include::modules/network-observability-disabling-health-alerts.adoc[leveloffset=+2]
include::modules/network-observability-disabling-health-alerts.adoc[leveloffset=+2]
include::modules/network-observability-rate-limit-alert.adoc[leveloffset=+1]

[role="_additional-resources"]
.Additional resources
* For more information about creating alerts that you can see on the dashboard, see xref:../../monitoring/managing-alerts.adoc#creating-alerting-rules-for-user-defined-projects_managing-alerts[Creating alerting rules for user-defined projects].
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,8 @@ include::modules/troubleshooting-network-observability-flowlogs-pipeline-kafka.a

include::modules/troubleshooting-network-observability-network-flow.adoc[leveloffset=+1]

include::modules/troubleshooting-network-observability-controller-manager-pod-out-of-memory.adoc[leveloffset=+1]
include::modules/troubleshooting-network-observability-controller-manager-pod-out-of-memory.adoc[leveloffset=+1]

== Resource troubleshooting

include::modules/troubleshooting-network-observability-loki-tenant-rate-limit.adoc[leveloffset=+1]

0 comments on commit d493a38

Please sign in to comment.