Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export currentOP uptime query metrics #704

Closed
tregubov-av opened this issue Sep 6, 2023 · 2 comments · Fixed by #706
Closed

Export currentOP uptime query metrics #704

tregubov-av opened this issue Sep 6, 2023 · 2 comments · Fixed by #706

Comments

@tregubov-av
Copy link
Contributor

tregubov-av commented Sep 6, 2023

There is a need to track the time of requests that are still being processed.
This information can be obtained from the output of CurrentOP to raise an alert about operations taking too long.
For example alert:

apiVersion: v1
data:
  kube-state-metrics-mongodb.rules: |-
    groups:
    - name: kube-state-metrics-mongodb.rules
      rules:
      - alert: MongodbCurrentQueryTime
        expr: (mongodb_currentop_query_uptime > 3e+8) / 1000
        labels:
          severity: critical
        annotations:
          description: "Opid: {{ $labels.opid }}\nDesc: {{ $labels.desc }}\nNs: {{ $labels.ns }}\nOp : {{ $labels.op }}\nUptime : {{ $value }} ms\n"
          summary: "MongoDB\nCurrent slow query on: {{ $labels.endpoint }}"
kind: ConfigMap
metadata:
  labels:
    app: prometheus
    prometheus: kube-prometheus
    release: kube-prometheus-stack
    role: alert-rules
  name: kube-prometheus-exporter-mongodb
  namespace: monitoring

Need to create a currentop collector that will issue a metric in the gauge format.
For example:

package exporter

import (
	"context"
	"fmt"
	"strconv"

	"github.com/prometheus/client_golang/prometheus"
	"github.com/sirupsen/logrus"
	"go.mongodb.org/mongo-driver/bson"
	"go.mongodb.org/mongo-driver/bson/primitive"
	"go.mongodb.org/mongo-driver/mongo"
)

type currentopCollector struct {
	ctx            context.Context
	base           *baseCollector
	compatibleMode bool
	topologyInfo   labelsGetter
}

var ErrInvalidOrMissingInprogEntry = fmt.Errorf("invalid or misssing inprog entry in currentop results")

func newCurrentopCollector(ctx context.Context, client *mongo.Client, logger *logrus.Logger,
	compatible bool, topology labelsGetter,
) *currentopCollector {
	return &currentopCollector{
		ctx:            ctx,
		base:           newBaseCollector(client, logger),
		compatibleMode: compatible,
		topologyInfo:   topology,
	}
}

func (d *currentopCollector) Describe(ch chan<- *prometheus.Desc) {
	d.base.Describe(d.ctx, ch, d.collect)
}

func (d *currentopCollector) Collect(ch chan<- prometheus.Metric) {
	d.base.Collect(ch)
}

func (d *currentopCollector) collect(ch chan<- prometheus.Metric) {
	defer measureCollectTime(ch, "mongodb", "currentop")()

	logger := d.base.logger
	client := d.base.client

	cmd := bson.D{
		{Key: "currentOp", Value: true},
		{Key: "active", Value: true},
		{Key: "microsecs_running", Value: bson.D{
			{Key: "$exists", Value: true}}},
		{Key: "op", Value: bson.D{{Key: "$ne", Value: ""}}},
		{Key: "ns", Value: bson.D{
			{Key: "$ne", Value: ""},
			{Key: "$not", Value: bson.D{{Key: "$regex", Value: "^admin.*|^local.*"}}},
		}},
	}
	res := client.Database("admin").RunCommand(d.ctx, cmd)

	var r primitive.M
	if err := res.Decode(&r); err != nil {
		ch <- prometheus.NewInvalidMetric(prometheus.NewInvalidDesc(err), err)
		return
	}

	logger.Debug("currentop response from MongoDB:")
	debugResult(logger, r)

	inprog, ok := r["inprog"].(primitive.A)

	if !ok {
		ch <- prometheus.NewInvalidMetric(prometheus.NewInvalidDesc(ErrInvalidOrMissingInprogEntry),
			ErrInvalidOrMissingInprogEntry)
	}

	for _, bsonMap := range inprog {

		bsonMapElement, ok := bsonMap.(primitive.M)
		if !ok {
			logger.Errorf("Invalid type primitive.M assertion for bsonMap: %t", ok)
			break
		}
		opid, ok := bsonMapElement["opid"].(int32)
		if !ok {
			logger.Errorf("Invalid type int32 assertion for 'opid': %t", ok)
			break
		}
		namespace, ok := bsonMapElement["ns"].(string)
		if !ok {
			logger.Errorf("Invalid type string assertion for 'ns': %t", ok)
			break
		}
		db, collection := splitNamespace(namespace)
		op, ok := bsonMapElement["op"].(string)
		if !ok {
			logger.Errorf("Invalid type string assertion for 'op': %t", ok)
			break
		}
		decs, ok := bsonMapElement["desc"].(string)
		if !ok {
			logger.Errorf("Invalid type string assertion for 'desc': %t", ok)
			break
		}
		microsecs_running, ok := bsonMapElement["microsecs_running"].(int64)
		if !ok {
			logger.Errorf("Invalid type int64 assertion for 'microsecs_running': %t", ok)
			break
		}

		labels := d.topologyInfo.baseLabels()
		labels["opid"] = strconv.Itoa(int(opid))
		labels["op"] = op
		labels["decs"] = decs
		labels["database"] = db
		labels["collection"] = collection
		labels["ns"] = namespace

		m := primitive.M{"uptime": microsecs_running}

		for _, metric := range makeMetrics("currentop_query", m, labels, d.compatibleMode) {
			ch <- metric
		}
	}
}

For example response:

# HELP mongodb_currentop_query_uptime currentop_query.
# TYPE mongodb_currentop_query_uptime untyped
mongodb_currentop_query_uptime{cl_id="",cl_role="",collection="collection_name",database="database_name",decs="conn49456",ns="database_name.collection_name",op="command",opid="450576885",rs_nm="",rs_state=""} 110642

PR: #706

@Delvish
Copy link

Delvish commented Sep 7, 2023

I agree with it, a very necessary thing

tregubov-av added a commit to tregubov-av/mongodb_exporter that referenced this issue Sep 7, 2023
tregubov-av added a commit to tregubov-av/mongodb_exporter that referenced this issue Sep 7, 2023
@Gerasimov94
Copy link

Gerasimov94 commented Sep 7, 2023

I agree with it, a very necessary thing

same thing, it seems like useful feature

tregubov-av added a commit to tregubov-av/mongodb_exporter that referenced this issue Sep 7, 2023
artemgavrilov added a commit that referenced this issue Sep 15, 2023
* Export currentOP uptime query metrics #704

* Update exporter/currentop_collector.go

Co-authored-by: Artem Gavrilov <charlieblackwood7@gmail.com>

* Export currentOP uptime query metrics

---------

Co-authored-by: Artem Gavrilov <charlieblackwood7@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants