Skip to content
69 changes: 63 additions & 6 deletions content/integrate/redis-data-integration/observability.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,67 @@ RDI metrics with the RDI monitoring screen in Redis Insight or with the
[`redis-di status`]({{< relref "/integrate/redis-data-integration/reference/cli/redis-di-status" >}})
command from the CLI.{{< /note >}}

## Collector metrics
## Accessing the metrics

The way you access the metrics endpoints depends on whether you are using a VM installation or a Helm installation for RDI. The sections below describe the correct approach for each installation type.

### VM Installation

For VM installations, the metrics are available by default on the following endpoints:
- Collector metrics: `https://<RDI_HOST>/collector-source/metrics`
- Stream processor metrics: `https://<RDI_HOST>/metrics`
- Operator metrics: `https://<RDI_HOST>/operator/metrics`

Please note that for RDI versions prior to 1.16.0 the collector metrics are not accessible.

### Helm installation

For Helm installations, the metrics are available via autodiscovery in the K8s cluster. Follow the steps below to use them:
1. Make sure you have the Prometheus Operator installed in your K8s cluster (see the
[Prometheus Operator installation guide](https://prometheus-operator.dev/docs/getting-started/installation/) for more information about this).

2. Update your values.yaml file to enable metrics for the operator, collector and stream processor components.

- For the collector, update the `collector` section, under the `dataPlane` section:
```yaml
dataPlane:
collector:
# Enable service monitor
serviceMonitor:
enabled: true

# Make sure to label the ServiceMonitor so that Prometheus can discover it
labels:
release: prometheus
```

- For the stream processor, update the `rdiMetricsExporter` section:
```yaml
rdiMetricsExporter:
# Enable service monitor
serviceMonitor:
enabled: true

# Make sure to label the ServiceMonitor so that Prometheus can discover it
labels:
release: prometheus
```

- For the operator, update the `operator` section:
```yaml
operator:
prometheus:
enabled: true
labels:
release: prometheus
metrics:
enabled: true
```

{{< note >}}The Prometheus service discovery loop runs at regular intervals. This means that after deploying or updating RDI with the above configuration, it may take a few minutes for Prometheus to discover the new ServiceMonitors and start scraping metrics from the RDI components.
{{< /note >}}

The endpoint for the collector metrics is `https://<RDI_HOST>/metrics/collector-source`
## Collector metrics

These metrics are divided into three groups:

Expand Down Expand Up @@ -92,10 +150,8 @@ The following table lists all collector metrics and their descriptions:
{{< note >}}
Many metrics include context labels that specify the phase (`snapshot` or `streaming`), database name, and other contextual information. Metrics with a value of `-1` typically indicate that the measurement is not applicable in the current state.
{{< /note >}}

## Stream processor metrics

The endpoint for the stream processor metrics is `https://<RDI_HOST>/metrics/rdi`
## Stream processor metrics

RDI reports metrics during the two main phases of the ingest pipeline, the *snapshot*
phase and the *change data capture (CDC)* phase. (See the
Expand Down Expand Up @@ -156,7 +212,8 @@ The endpoint for operator metrics is `https://<RDI_HOST>/operator/metrics` (or t

### Operator metric types

The operator reports two key metrics:
Most of the metrics exposed by the RDI operator are standard controller-runtime [metrics](https://book.kubebuilder.io/reference/metrics-reference).
The metrics that are relevant for RDI operations are listed in the table below:

| Metric Name | Metric Type | Metric Description | Alerting Recommendations |
|-------------|-------------|-------------------|-------------------------|
Expand Down
Loading