Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMM-12989 Reduce error logs from diagnostic data on arbiter nodes #820

Merged
merged 31 commits into from
May 29, 2024
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
14860ab
PMM-12989 return early when checking diagnostic data for arbiter nodes
idoqo Mar 18, 2024
6a9d825
fix lint
idoqo Mar 18, 2024
e7c75b5
bump test timeout
idoqo Mar 18, 2024
4632c93
skip diagnostic data if we can't run command
idoqo Mar 18, 2024
042a1ab
remove unneeded test
idoqo Mar 18, 2024
d65f236
Merge branch 'main' into PMM-12989-arbiter-error-logs
idoqo Mar 25, 2024
f221e30
refactor tests
idoqo Mar 25, 2024
de7e193
drop timeout
idoqo Mar 26, 2024
b89545f
revert timeout'
idoqo Mar 26, 2024
05490bd
changeset for tests
idoqo Apr 2, 2024
67e0886
Merge branch 'main' into PMM-12989-arbiter-error-logs
idoqo Apr 7, 2024
499d087
change keyfile permission
idoqo Apr 7, 2024
eaa6a20
change file permission before running
idoqo Apr 7, 2024
0aa1c35
bump permissions
idoqo Apr 7, 2024
cf6df50
add entrypoint for keyfile
idoqo Apr 7, 2024
3ed4e27
use custom dockerfile for authenticated instances
idoqo Apr 8, 2024
f0afc4a
use only default cluster as shard server
idoqo Apr 8, 2024
9bad12c
drop logs
idoqo Apr 9, 2024
ff005f1
Merge branch 'main' into PMM-12989-arbiter-error-logs
idoqo Apr 9, 2024
c937ecc
Merge branch 'main' into PMM-12989-arbiter-error-logs
idoqo Apr 9, 2024
76248ea
fix linter errors
idoqo Apr 9, 2024
9e4c7d5
Merge branch 'main' into PMM-12989-arbiter-error-logs
idoqo May 14, 2024
9358e94
Merge branch 'main' into PMM-12989-arbiter-error-logs
BupycHuk May 23, 2024
5dd6a44
split metrics collection
idoqo May 27, 2024
8826c95
Merge branch 'main' into PMM-12989-arbiter-error-logs
idoqo May 27, 2024
31eacbf
show warning earlier when running on arbiter
idoqo May 28, 2024
ccd6aa1
show warning earlier when running on arbiter
idoqo May 29, 2024
6a2fb3b
Merge branch 'main' into PMM-12989-arbiter-error-logs
idoqo May 29, 2024
649a3cc
Merge branch 'main' into PMM-12989-arbiter-error-logs
BupycHuk May 29, 2024
6f6647a
Update diagnostic_data_collector.go
BupycHuk May 29, 2024
6ab0b21
Update v1_compatibility.go
BupycHuk May 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,18 +91,24 @@ It will install `goimports`, `goreleaser`, `golangci-lint` and `reviewdog`.

The testing sandbox starts `n` MongoDB instances as follows:

- 3 Instances for shard 1 at ports 17001, 17002, 17003
- 3 instances for shard 2 at ports 17004, 17005, 17006
- 3 Instances for shard 1 at ports 17001, 17002, 17003 (with no authentication)
- 3 instances for shard 2 at ports 17004, 17005, 17006 (with authentication enabled)
Comment on lines +94 to +95
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

- 3 config servers at ports 17007, 17008, 17009
- 1 mongos server at port 17000
- 1 stand alone instance at port 27017

All instances are currently running without user and password so for example, to connect to the **mongos** you can just use:
To connect to the **mongos** on shard 1, you can use:

```
mongo mongodb://127.0.0.1:17001/admin
```

To connect to the **mongos** on shard 2 (with authentication enabled), you can use:

```
mongo mongodb://admin:admin@127.0.0.1:17001/admin
```

The sandbox can be started using the provided Makefile using: `make test-cluster` and it can be stopped using `make test-cluster-clean`.

### Running tests
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ test-race: env ## Run all tests with race flag.
go test -race -v -timeout 30s ./...

test-cluster: env ## Starts MongoDB test cluster. Use env var TEST_MONGODB_IMAGE to set flavor and version. Example: TEST_MONGODB_IMAGE=mongo:3.6 make test-cluster
docker compose up -d
docker compose up --build -d

test-cluster-clean: env ## Stops MongoDB test cluster.
docker compose down --remove-orphans
33 changes: 25 additions & 8 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,37 +63,52 @@ services:

mongo-2-2:
container_name: "mongo-2-2"
image: ${TEST_MONGODB_IMAGE:-mongo:4.2}
build:
dockerfile: ./docker/mongodb-auth.dockerfile
args:
TEST_MONGODB_IMAGE: ${TEST_MONGODB_IMAGE}
environment:
- MONGO_INITDB_ROOT_USERNAME=${TEST_MONGODB_USERNAME:-admin}
- MONGO_INITDB_ROOT_PASSWORD=${TEST_MONGODB_PASSWORD:-admin}
ports:
- "${TEST_MONGODB_S2_PRIMARY_PORT:-17004}:27017"
command: mongod --replSet rs2 --shardsvr --port 27017 --oplogSize 16
command: mongod --replSet rs2 --port 27017 --oplogSize 16 --auth --keyFile=/opt/keyfile
networks:
- rs2

mongo-2-3:
container_name: "mongo-2-3"
image: ${TEST_MONGODB_IMAGE:-mongo:4.2}
build:
dockerfile: ./docker/mongodb-auth.dockerfile
args:
TEST_MONGODB_IMAGE: ${TEST_MONGODB_IMAGE}
ports:
- "${TEST_MONGODB_S2_SECONDARY1_PORT:-17005}:27017"
command: mongod --replSet rs2 --shardsvr --port 27017 --oplogSize 16
command: mongod --replSet rs2 --port 27017 --oplogSize 16 --auth --keyFile=/opt/keyfile
networks:
- rs2

mongo-2-1:
container_name: "mongo-2-1"
image: ${TEST_MONGODB_IMAGE:-mongo:4.2}
build:
dockerfile: ./docker/mongodb-auth.dockerfile
args:
TEST_MONGODB_IMAGE: ${TEST_MONGODB_IMAGE}
ports:
- "${TEST_MONGODB_S2_SECONDARY2_PORT:-17006}:27017"
command: mongod --replSet rs2 --shardsvr --port 27017 --oplogSize 16
command: mongod --replSet rs2 --port 27017 --oplogSize 16 --auth --keyFile=/opt/keyfile
networks:
- rs2

mongo-2-arbiter:
container_name: "mongo-2-arbiter"
image: ${TEST_MONGODB_IMAGE:-mongo:4.2}
build:
dockerfile: ./docker/mongodb-auth.dockerfile
args:
TEST_MONGODB_IMAGE: ${TEST_MONGODB_IMAGE}
ports:
- "${TEST_MONGODB_S2_ARBITER:-17012}:27017"
command: mongod --replSet rs1 --shardsvr --port 27017 --oplogSize 16
command: mongod --replSet rs2 --port 27017 --oplogSize 16 --auth --keyFile=/opt/keyfile
networks:
- rs2

Expand All @@ -114,6 +129,8 @@ services:
- ARBITER=mongo-2-arbiter
- RS=rs2
- VERSION=${TEST_MONGODB_IMAGE}
- MONGO_INITDB_ROOT_USERNAME=${TEST_MONGODB_USERNAME:-admin}
- MONGO_INITDB_ROOT_PASSWORD=${TEST_MONGODB_PASSWORD:-admin}
entrypoint: [ "/scripts/setup.sh" ]
networks:
- rs2
Expand Down
6 changes: 6 additions & 0 deletions docker/mongodb-auth.dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
ARG TEST_MONGODB_IMAGE=mongo:4.2
FROM ${TEST_MONGODB_IMAGE}
USER root
COPY docker/secret/keyfile /opt/keyfile
RUN chown mongodb /opt/keyfile && chmod 400 /opt/keyfile && mkdir -p /home/mongodb/ && chown mongodb /home/mongodb
USER mongodb
10 changes: 9 additions & 1 deletion docker/scripts/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ MONGODB_CLIENT="mongosh --quiet"
PARSED=(${VERSION//:/ })
MONGODB_VERSION=${PARSED[1]}
MONGODB_VENDOR=${PARSED[0]}

if [ "`echo ${MONGODB_VERSION} | cut -c 1`" = "4" ]; then
MONGODB_CLIENT="mongo"
fi
Expand All @@ -17,6 +18,9 @@ mongodb2=`getent hosts ${MONGO2} | awk '{ print $1 }'`
mongodb3=`getent hosts ${MONGO3} | awk '{ print $1 }'`
arbiter=`getent hosts ${ARBITER} | awk '{ print $1 }'`

username=${MONGO_INITDB_ROOT_USERNAME}
password=${MONGO_INITDB_ROOT_PASSWORD}

port=${PORT:-27017}

echo "Waiting for startup.."
Expand Down Expand Up @@ -60,7 +64,11 @@ EOF

function general_servers() {
echo "setup servers on ${MONGO1}(${mongodb1}:${port})"
${MONGODB_CLIENT} --host ${mongodb1}:${port} <<EOF
command="${MONGODB_CLIENT} --host ${mongodb1}:${port}"
if [[ -n "$username" && -n "$password" ]]; then
command="${MONGODB_CLIENT} --host ${mongodb1}:${port} -u ${username} -p ${password}"
fi
${command} <<EOF
var cfg = {
"_id": "${RS}",
"protocolVersion": 1,
Expand Down
16 changes: 16 additions & 0 deletions docker/secret/keyfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
P4CWhUl1qj2EiSM+r3mzAHSATiAMBv8l2rAClMrAqOqyyCzVd2tIFW0nOEbSRvoL
Ap7oZkC7IvjX5ELJnqQbxT00+dv1s8U1UFk3J96xro1x3mLgAmASJ1TrFSKXUkss
bbwZ6QBNONCrTNCqlL2q1umLiYUTPeUH0RlLg2GsH1H66a9dEKZaS5AtJIO1hJ6s
znivgdx2vnSA3bs1coR3nEBeNRsMoHp2Dtn9P6FDNlhssZh+EO9GdoRh16cqRiEM
V2VnS2vjEZPpcMrX3ZGU/JBy/bw5GhGXrIfzQrvuKanUsuAVA3bmh6Fh8Rnu66mA
GETO09R/3cjLxHBngZdL9Zn3+3HgDTQHKtpW0ZfjdiK5Xh4JqGXKtDr68S88qgAb
eYQaeJMBoe0o1JXeKYRba+fXmubO7nGN3IDYWulNtkJLe9bx1mseVqlh9FSg7CuG
++sszE7nzmtAlZlaLKlONMm9+dLjhT9n7cSkoWbSIUHvWIzGKC2yadabt7T21VPc
Cyxdaat28xo93J5qjUetjTLKWQmG4eL/AG2/plT7WrTOjQqYCjfFU3XmpVBgtaAr
wkJB8sOUbjcuVJok235Eu/8SxHJKQb0dl8ALwbwLG7CPZMrZ5YF6eJgjE+Sy+WKn
OqfyzNwS7j7nvOvdpRgnHUwwc0G6eIfj90S+XdieJMMuzt7NUS8NsCuQx3nAK0EB
FNWk02pUYE5yHWvmG+7Mo+Jd9OBWcQnGErux6NTYA42uYiMamZgsByBJ2Y8S2a2L
VtbJVhO2indAvGqrGNbdOKGKagFjIQP2A7zsvLFkgfwbBG785CI3Il9jaeQhcefY
Xs0cSSFhFkz/Ak17gU5l0WF6h5HHacWaCzmWdOiZVbOeZboaaxSBry+uZ8oRXPbp
BavuRtu1afTSrikgJdCcoA8Opn7kkETt5xa5PP1dNLEWIMFui2BOlRAw8BDWJ+nD
8fYBaVaqdDs/Y8K0xWHuJjja+s91
82 changes: 50 additions & 32 deletions exporter/diagnostic_data_collector.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,54 +66,72 @@ func (d *diagnosticDataCollector) collect(ch chan<- prometheus.Metric) {
logger := d.base.logger
client := d.base.client

nodeType, err := getNodeType(d.ctx, client)
if err != nil {
logger.WithFields(logrus.Fields{
"component": "diagnosticDataCollector",
}).Errorf("Cannot get node type: %s", err)
}

var metrics []prometheus.Metric
cmd := bson.D{{Key: "getDiagnosticData", Value: "1"}}
res := client.Database("admin").RunCommand(d.ctx, cmd)
if res.Err() != nil {
logger.Errorf("failed to run command: getDiagnosticData: %s", res.Err())
logger.Warn("cannot run getDiagnosticData, some metrics might be unavailable.")
}
logger.Warnf("failed to run command: getDiagnosticData, some metrics might be unavailable %s", res.Err())
} else {
if err := res.Decode(&m); err != nil {
logger.Errorf("cannot run getDiagnosticData: %s", err)
return
}

if err := res.Decode(&m); err != nil {
logger.Errorf("cannot run getDiagnosticData: %s", err)
}
if m == nil || m["data"] == nil {
logger.Error("cannot run getDiagnosticData: response is empty")
}

if m == nil || m["data"] == nil {
logger.Error("cannot run getDiagnosticData: response is empty")
}
var ok bool
m, ok = m["data"].(bson.M)
if !ok {
err = errors.Wrapf(errUnexpectedDataType, "%T for data field", m["data"])
logger.Errorf("cannot decode getDiagnosticData: %s", err)
}

m, ok := m["data"].(bson.M)
if !ok {
err := errors.Wrapf(errUnexpectedDataType, "%T for data field", m["data"])
logger.Debug("getDiagnosticData result")
debugResult(logger, m)

logger.Errorf("cannot decode getDiagnosticData: %s", err)
}
metrics = makeMetrics("", m, d.topologyInfo.baseLabels(), d.compatibleMode)
metrics = append(metrics, locksMetrics(logger, m)...)

logger.Debug("getDiagnosticData result")
debugResult(logger, m)
securityMetric, err := d.getSecurityMetricFromLineOptions(client)
if err != nil {
logger.Errorf("failed to run command: getCmdLineOptions: %s", err)
} else if securityMetric != nil {
metrics = append(metrics, securityMetric)
}

metrics := makeMetrics("", m, d.topologyInfo.baseLabels(), d.compatibleMode)
metrics = append(metrics, locksMetrics(logger, m)...)
if d.compatibleMode {
metrics = append(metrics, specialMetrics(d.ctx, client, m, nodeType, logger)...)

securityMetric, err := d.getSecurityMetricFromLineOptions(client)
if err != nil {
logger.Errorf("cannot decode getCmdLineOtpions: %s", err)
} else if securityMetric != nil {
metrics = append(metrics, securityMetric)
if cem, err := cacheEvictedTotalMetric(m); err == nil {
metrics = append(metrics, cem)
}
}
}

if d.compatibleMode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point m might be empty, so it makes not much sense to call lines https://github.com/percona/mongodb_exporter/pull/820/files#diff-f08bb8e2230b960e7f7941dbdf020e3aa607391de1dc0f8683c315265d77b2f6R112-R116
I would include them to else part of the statement above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW can we get diagnostic data from mongos?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW can we get diagnostic data from mongos?

No actually, the command won't fail, but m["data"] will be empty.

metrics = append(metrics, specialMetrics(d.ctx, client, m, logger)...)
buildInfo, err := retrieveMongoDBBuildInfo(d.ctx, client, logger)
if err != nil {
logger.Errorf("cannot retrieve MongoDB buildInfo: %s", err)
}

if cem, err := cacheEvictedTotalMetric(m); err == nil {
metrics = append(metrics, cem)
metrics = append(metrics, serverVersion(buildInfo))

if nodeType == typeArbiter {
if hm := arbiterMetrics(d.ctx, client, logger); hm != nil {
metrics = append(metrics, hm...)
}
}

nodeType, err := getNodeType(d.ctx, client)
if err != nil {
logger.WithFields(logrus.Fields{
"component": "diagnosticDataCollector",
}).Errorf("Cannot get node type to check if this is a mongos: %s", err)
} else if nodeType == typeMongos {
if nodeType == typeMongos {
metrics = append(metrics, mongosMetrics(d.ctx, client, logger)...)
}
}
Expand Down
78 changes: 78 additions & 0 deletions exporter/diagnostic_data_collector_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/testutil"
"github.com/sirupsen/logrus"
logrustest "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"go.mongodb.org/mongo-driver/bson"
Expand Down Expand Up @@ -236,6 +237,83 @@ func TestAllDiagnosticDataCollectorMetrics(t *testing.T) {
}
}

//nolint:funlen
func TestDiagnosticDataErrors(t *testing.T) {
idoqo marked this conversation as resolved.
Show resolved Hide resolved
idoqo marked this conversation as resolved.
Show resolved Hide resolved
t.Parallel()
type log struct {
message string
level uint32
}

type testCase struct {
name string
containerName string
expectedMessage string
}

cases := []testCase{
{
name: "authenticated arbiter fails to get metric",
containerName: "mongo-2-arbiter",
expectedMessage: "failed to run command: getDiagnosticData",
},
{
idoqo marked this conversation as resolved.
Show resolved Hide resolved
name: "authenticated data node has no error in logs",
containerName: "mongo-1-1",
expectedMessage: "",
},
{
idoqo marked this conversation as resolved.
Show resolved Hide resolved
name: "unauthenticated arbiter has no error in logs",
containerName: "mongo-1-arbiter",
expectedMessage: "",
},
}

for _, tc := range cases {
idoqo marked this conversation as resolved.
Show resolved Hide resolved
tc := tc
t.Run(tc.name, func(t *testing.T) {
t.Parallel()
ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
defer cancel()

port, err := tu.PortForContainer(tc.containerName)
require.NoError(t, err)
client := tu.TestClient(ctx, port, t)

logger, hook := logrustest.NewNullLogger()
ti := newTopologyInfo(ctx, client, logger)
c := newDiagnosticDataCollector(ctx, client, logger, true, ti)

reg := prometheus.NewRegistry()
err = reg.Register(c)
require.NoError(t, err)
_ = helpers.CollectMetrics(c)

var errorLogs []log
for _, entry := range hook.Entries {
if entry.Level == logrus.ErrorLevel || entry.Level == logrus.WarnLevel {
errorLogs = append(errorLogs, log{
message: entry.Message,
level: uint32(entry.Level),
})
}
}

if tc.expectedMessage == "" {
assert.Empty(t, errorLogs)
} else {
require.NotEmpty(t, errorLogs)
assert.True(
t,
strings.HasPrefix(hook.LastEntry().Message, tc.expectedMessage),
"'%s' has no prefix: '%s'",
hook.LastEntry().Message,
tc.expectedMessage)
}
})
}
}

func TestContextTimeout(t *testing.T) {
ctx := context.Background()

Expand Down
Loading