-
Notifications
You must be signed in to change notification settings - Fork 1.7k
/
monitoring.txt
556 lines (395 loc) · 20 KB
/
monitoring.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
======================
Monitoring for MongoDB
======================
.. default-domain:: mongodb
.. contents:: On this page
:local:
:backlinks: none
:depth: 1
:class: singlecol
Monitoring is a critical component of all database administration. A
firm grasp of MongoDB's reporting will allow you to assess the state
of your database and maintain your deployment without crisis.
Additionally, a sense of MongoDB's normal operational parameters will
allow you to diagnose problems before they escalate to failures.
This document presents an overview of the available monitoring utilities
and the reporting statistics
available in MongoDB. It also introduces diagnostic strategies
and suggestions for monitoring replica sets and
sharded clusters.
.. include:: /includes/fact-mms-summary.rst
Monitoring Strategies
---------------------
There are three methods for collecting data about the state of a
running MongoDB instance:
- First, there is a set of utilities distributed with MongoDB that
provides real-time reporting of database activities.
- Second, :doc:`database commands </reference/command>` return
statistics regarding the current database state with greater
fidelity.
- Third, `MongoDB Atlas <https://www.mongodb.com/cloud/atlas?jmp=docs>`_
is a cloud-hosted database-as-a-service for running, monitoring, and
maintaining MongoDB deployments. |mms-home|, a hosted service, and
:products:`Ops Manager, an on-premise solution available in MongoDB
Enterprise Advanced </mongodb-enterprise-advanced?jmp=docs>`, provide
monitoring to collect data from running MongoDB deployments as well
as providing visualization and alerts based on that data.
Each strategy can help answer different questions and is useful in
different contexts. These methods are complementary.
MongoDB Reporting Tools
-----------------------
This section provides an overview of the reporting methods distributed
with MongoDB. It also offers examples of the kinds of questions that
each method is best suited to help you address.
Utilities
~~~~~~~~~
The MongoDB distribution includes a number of utilities that quickly
return statistics about instances' performance and activity. Typically,
these are most useful for diagnosing issues and assessing normal
operation.
``mongostat``
`````````````
:binary:`~bin.mongostat` captures and returns the counts of database
operations by type (e.g. insert, query, update, delete, etc.). These
counts report on the load distribution on the server.
Use :binary:`~bin.mongostat` to understand the distribution of operation types
and to inform capacity planning. See the :doc:`mongostat manual
</reference/program/mongostat>` for details.
``mongotop``
````````````
:binary:`~bin.mongotop` tracks and reports the current read and write
activity of a MongoDB instance, and reports these statistics on a per
collection basis.
Use :binary:`~bin.mongotop` to check if your database activity and use
match your expectations. See the :doc:`mongotop manual
</reference/program/mongotop>` for details.
.. _http-console:
HTTP Console
````````````
.. include:: /includes/fact-deprecated-http-interface.rst
MongoDB provides a web interface that exposes diagnostic
and monitoring information in a simple web page. The web interface is
accessible at ``localhost:<port>``, where the
``<port>`` number is **1000** more than the :binary:`~bin.mongod` port .
For example, if a locally running :binary:`~bin.mongod` is using the
default port ``27017``, access the HTTP console at
``http://localhost:28017``.
Commands
~~~~~~~~
MongoDB includes a number of commands that report on the state of the
database.
These data may provide a finer level of granularity than the utilities
discussed above. Consider using their output in scripts and programs to
develop custom alerts, or to modify the behavior of your application in
response to the activity of your instance. The :method:`db.currentOp`
method is another useful tool for identifying the database instance's
in-progress operations.
``serverStatus``
````````````````
The :dbcommand:`serverStatus` command, or :method:`db.serverStatus()`
from the shell, returns a general overview of the status of the
database, detailing disk usage, memory use, connection, journaling,
and index access. The command returns quickly and does not impact
MongoDB performance.
:dbcommand:`serverStatus` outputs an account of the state of a MongoDB
instance. This command is rarely run directly. In most cases, the data
is more meaningful when aggregated, as one would see with monitoring
tools including |mms-home| and :products:`Ops Manager
</mongodb-enterprise-advanced?jmp=docs>`. Nevertheless, all
administrators should be familiar with the data provided by
:dbcommand:`serverStatus`.
``dbStats``
```````````
The :dbcommand:`dbStats` command, or :method:`db.stats()` from the shell,
returns a document that addresses storage use and data volumes. The
:dbcommand:`dbStats` reflect the amount of
storage used, the quantity of data contained in the database, and
object, collection, and index counters.
Use this data to monitor the state and storage capacity
of a specific database. This output also allows you to compare
use between databases and to determine the average
:term:`document` size in a database.
``collStats``
`````````````
The :dbcommand:`collStats` or :method:`db.collection.stats()` from the
shell that provides statistics that resemble :dbcommand:`dbStats` on
the collection level, including a count of the objects in the
collection, the size of the collection, the amount of disk space used
by the collection, and information about its indexes.
``replSetGetStatus``
````````````````````
The :dbcommand:`replSetGetStatus` command (:method:`rs.status()` from
the shell) returns an overview of your replica set's status. The :doc:`replSetGetStatus
</reference/command/replSetGetStatus>` document details the
state and configuration of the replica set and statistics about its members.
Use this data to ensure that replication is properly configured,
and to check the connections between the current host and the other members
of the replica set.
Third Party Tools
~~~~~~~~~~~~~~~~~
A number of third party monitoring tools have support for MongoDB,
either directly, or through their own plugins.
Self Hosted Monitoring Tools
````````````````````````````
These are monitoring tools that you must install, configure and maintain
on your own servers. Most are open source.
.. list-table::
:header-rows: 1
* - **Tool**
- **Plugin**
- **Description**
* - `Ganglia <http://sourceforge.net/apps/trac/ganglia/wiki>`_
- `mongodb-ganglia <https://github.com/quiiver/mongodb-ganglia>`_
- Python script to report operations per second, memory usage,
btree statistics, master/slave status and current connections.
* - Ganglia
- `gmond_python_modules <https://github.com/ganglia/gmond_python_modules>`_
- Parses output from the :dbcommand:`serverStatus` and
:dbcommand:`replSetGetStatus` commands.
* - `Motop <https://github.com/tart/motop>`_
- *None*
- Realtime monitoring tool for MongoDB servers. Shows
current operations ordered by durations every second.
* - `mtop <https://github.com/beaufour/mtop>`_
- *None*
- A top like tool.
* - `Munin <http://munin-monitoring.org/>`_
- `mongo-munin <https://github.com/erh/mongo-munin>`_
- Retrieves server statistics.
* - Munin
- `mongomon <https://github.com/pcdummy/mongomon>`_
- Retrieves collection statistics (sizes, index sizes, and each
(configured) collection count for one DB).
* - Munin
- `munin-plugins Ubuntu PPA
<https://launchpad.net/~chris-lea/+archive/munin-plugins>`_
- Some additional munin plugins not in the main distribution.
* - `Nagios <http://www.nagios.org/>`_
- `nagios-plugin-mongodb
<https://github.com/mzupan/nagios-plugin-mongodb>`_
- A simple Nagios check script, written in Python.
* - `SPM Performance Monitoring <https://sematext.com/spm/>`__
- `MongoDB Docker Agent <https://hub.docker.com/r/sematext/spm-agent-mongodb/>`_
- `Monitoring, Anomaly Detection and Alerting <https://sematext.com/spm/integrations/mongodb-monitoring/>`_ SPM monitors all key MongoDB metrics together with infrastructure incl. Docker and other application metrics e.g. Node.js, Java, NGINX, Apache, HAProxy or Elasticsearch. SPM is available On Premises and in the Cloud (SaaS) and provides correlation of metrics and logs.
Also consider `dex <https://github.com/mongolab/dex>`_, an index and
query analyzing tool for MongoDB that compares MongoDB log files and
indexes to make indexing recommendations.
.. seealso::
:products:`Ops Manager, an on-premise solution available in MongoDB
Enterprise Advanced </mongodb-enterprise-advanced?jmp=docs>`.
Hosted (SaaS) Monitoring Tools
``````````````````````````````
These are monitoring tools provided as a hosted service, usually through
a paid subscription.
.. list-table::
:header-rows: 1
* - **Name**
- **Notes**
* - |mms-home|
- |MMS| is a cloud-based suite of services for managing MongoDB
deployments. |MMS| provides monitoring, backup, and automation
functionality. For an on-premise solution, see also
:products:`Ops Manager, available in MongoDB Enterprise Advanced
</mongodb-enterprise-advanced?jmp=docs>`.
* - `VividCortex <https://www.vividcortex.com/>`_
- VividCortex provides deep insights into MongoDB `production
database workload and query performance
<https://www.vividcortex.com/product/how-it-works>`_ -- in
one-second resolution. Track latency, throughput, errors, and
more to ensure scalability and exceptional performance of your
application on MongoDB.
* - `Scout <http://scoutapp.com>`_
- Several plugins, including `MongoDB Monitoring
<https://scoutapp.com/plugin_urls/391-mongodb-monitoring>`_,
`MongoDB Slow Queries
<http://scoutapp.com/plugin_urls/291-mongodb-slow-queries>`_,
and `MongoDB Replica Set Monitoring
<http://scoutapp.com/plugin_urls/2251-mongodb-replica-set-monitoring>`_.
* - `Server Density <http://www.serverdensity.com>`_
- `Dashboard for MongoDB
<http://www.serverdensity.com/mongodb-monitoring/>`_, MongoDB
specific alerts, replication failover timeline and iPhone, iPad
and Android mobile apps.
* - `Application Performance Management <http://ibmserviceengage.com>`_
- IBM has an Application Performance Management SaaS offering that
includes monitor for MongoDB and other applications and middleware.
* - `New Relic <http://newrelic.com/>`_
- New Relic offers full support for application performance
management. In addition, New Relic Plugins and Insights enable you to view
monitoring metrics from Cloud Manager in New Relic.
* - `Datadog <https://www.datadoghq.com/>`_
- `Infrastructure monitoring
<http://docs.datadoghq.com/integrations/mongodb/>`_ to visualize
the performance of your MongoDB deployments.
* - `SPM Performance Monitoring <https://sematext.com/spm>`__
- `Monitoring, Anomaly Detection and Alerting
<https://sematext.com/spm/integrations/mongodb-monitoring/>`_ SPM monitors all key MongoDB metrics together with infrastructure incl. Docker and other application metrics, e.g. Node.js, Java, NGINX, Apache, HAProxy or Elasticsearch. SPM provides correlation of metrics and logs.
.. _stdout:
.. _standard-output:
.. _monitoring-standard-loggging:
Process Logging
---------------
During normal operation, :binary:`~bin.mongod` and :binary:`~bin.mongos`
instances report a live account of all server activity and operations
to either
standard output or a log file. The following runtime settings
control these options.
- :setting:`~systemLog.quiet`. Limits the amount of information written to the
log or output.
- :setting:`~systemLog.verbosity`. Increases the amount of information written to
the log or output. You can also modify the logging verbosity during
runtime with the :parameter:`logLevel` parameter or the
:method:`db.setLogLevel()` method in the shell.
- :setting:`~systemLog.path`. Enables logging to a file, rather than the standard
output. You must specify the full path to the log file when adjusting
this setting.
- :setting:`~systemLog.logAppend`. Adds information to a log
file instead of overwriting the file.
.. note::
You can specify these configuration operations as the command line
arguments to :doc:`mongod </reference/program/mongod>` or :doc:`mongos
</reference/program/mongos>`
For example:
.. code-block:: javascript
mongod -v --logpath /var/log/mongodb/server1.log --logappend
Starts a :binary:`~bin.mongod` instance in :setting:`verbose
<systemLog.verbosity>` mode, appending data to the log file at
``/var/log/mongodb/server1.log/``.
The following :term:`database commands <database command>` also
affect logging:
- :dbcommand:`getLog`. Displays recent messages from the
:binary:`~bin.mongod` process log.
- :dbcommand:`logRotate`. Rotates the log files for :binary:`~bin.mongod`
processes only. See :doc:`/tutorial/rotate-log-files`.
Diagnosing Performance Issues
-----------------------------
.. include:: /includes/intro-performance.rst
.. _replica-set-monitoring:
Replication and Monitoring
--------------------------
Beyond the basic monitoring requirements for any MongoDB instance, for
replica sets, administrators must monitor *replication
lag*. "Replication lag" refers to the amount of time that it takes to
copy (i.e. replicate) a write operation on the :term:`primary` to a
:term:`secondary`. Some small delay period may be acceptable, but two
significant problems emerge as replication lag grows:
- First, operations that occurred during the period of lag are not
replicated to one or more secondaries. If you're using replication
to ensure data persistence, exceptionally long delays may impact the
integrity of your data set.
- Second, if the replication lag exceeds the length of the operation
log (:term:`oplog`) then MongoDB will have to perform an initial
sync on the secondary, copying all data from the :term:`primary` and
rebuilding all indexes. This is uncommon under normal circumstances,
but if you configure the oplog to be smaller than the default,
the issue can arise.
.. note::
The size of the oplog is only configurable during the first
run using the :option:`--oplogSize <mongod --oplogSize>` argument to
the :binary:`~bin.mongod` command, or preferably, the
:setting:`~replication.oplogSizeMB` setting
in the MongoDB configuration file. If you do not specify this on the
command line before running with the :option:`--replSet <mongod --replSet>`
option, :binary:`~bin.mongod` will create a default sized oplog.
By default, the oplog is 5 percent of total available disk space
on 64-bit systems. For more information about changing the oplog
size, see the :doc:`/tutorial/change-oplog-size`
For causes of replication lag, see :ref:`Replication Lag
<replica-set-replication-lag>`.
Replication issues are most often the result of network connectivity
issues between members, or the result of a :term:`primary` that does not
have the resources to support application and replication traffic. To
check the status of a replica, use the :dbcommand:`replSetGetStatus` or
the following helper in the shell:
.. code-block:: javascript
rs.status()
The :dbcommand:`replSetGetStatus` reference provides a more in-depth
overview view of this output. In general, watch the value of
:data:`~replSetGetStatus.members.optimeDate`, and pay particular attention
to the time difference between the :term:`primary` and the
:term:`secondary` members.
Sharding and Monitoring
-----------------------
In most cases, the components of :term:`sharded clusters <sharded cluster>`
benefit from the same monitoring and analysis as all other MongoDB
instances. In addition, clusters require further monitoring to ensure
that data is effectively distributed among nodes and that sharding
operations are functioning appropriately.
.. seealso:: See the :doc:`/sharding` documentation for more
information.
Config Servers
~~~~~~~~~~~~~~
The :term:`config database` maintains a map identifying which
documents are on which shards. The cluster updates this map as
:term:`chunks <chunk>` move between shards. When a configuration
server becomes inaccessible, certain sharding operations become
unavailable, such as moving chunks and starting :binary:`~bin.mongos`
instances. However, clusters remain accessible from already-running
:binary:`~bin.mongos` instances.
Because inaccessible configuration servers can seriously impact
the availability of a sharded cluster, you should monitor your
configuration servers to ensure that the cluster remains well
balanced and that :binary:`~bin.mongos` instances can restart.
|mms-home| and :products:`Ops Manager
</mongodb-enterprise-advanced?jmp=docs>` monitor config servers and can
create notifications if a config server becomes inaccessible. See the
|mms-docs| and :opsmgr:`Ops Manager documentation
</application>` for more information.
Balancing and Chunk Distribution
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The most effective :term:`sharded cluster` deployments evenly balance
:term:`chunks <chunk>` among the shards. To facilitate this, MongoDB
has a background :term:`balancer` process that distributes data to ensure that
chunks are always optimally distributed among the :term:`shards <shard>`.
Issue the :method:`db.printShardingStatus()` or :method:`sh.status()`
command to the :binary:`~bin.mongos` by way of the :binary:`~bin.mongo`
shell. This returns an overview of the entire cluster including the
database name, and a list of the chunks.
Stale Locks
~~~~~~~~~~~
To check the lock status of the database, connect to a
:binary:`~bin.mongos` instance using the :binary:`~bin.mongo` shell. Issue the
following command sequence to switch to the ``config`` database and
display all outstanding locks on the shard database:
.. code-block:: javascript
use config
db.locks.find()
The balancing process takes a special "balancer" lock that prevents
other balancing activity from transpiring. In the ``config`` database,
use the following command to view the "balancer" lock.
.. code-block:: javascript
db.locks.find( { _id : "balancer" } )
If this lock exists, make sure that the balancer process is actively
using this lock.
.. _storage-node-watchdog:
Storage Node Watchdog
---------------------
.. versionadded:: 3.2.16
.. note:: Available only in MongoDB Enterprise. Not available on macOS.
The :ref:`Storage Node Watchdog <storage-node-watchdog>` monitors the
filesystems used by MongoDB to detect unresponsive conditions.
The :ref:`Storage Node Watchdog <storage-node-watchdog>` can be enabled with
the :parameter:`watchdogPeriodSeconds` parameter on a :binary:`~bin.mongod`.
When enabled, the :ref:`Storage Node Watchdog <storage-node-watchdog>`
monitors the following directories:
* The :option:`--dbpath <mongod --dbpath>` directory
* The ``journal`` directory inside the :option:`--dbpath <mongod --dbpath>` directory if
:option:`journaling <mongod --journal>` is enabled
* The directory of :option:`--logpath <mongod --logpath>` file
* The directory of :option:`--auditPath <mongod --auditPath>` file
If any of the filesystems containing these directories become unresponsive,
the :ref:`Storage Node Watchdog <storage-node-watchdog>` terminates the
:binary:`~bin.mongod` and exits with a status code of 61. If the
:binary:`~bin.mongod` is serving as the :term:`primary`, terminating initiates
:term:`failover` allowing another member to become primary.
Once a :binary:`~bin.mongod` has terminated, it may not be possible to cleanly
restart it on the *same* machine.
The maximum time the :ref:`Storage Node Watchdog <storage-node-watchdog>` can
take to detect an unresponsive filesystem and terminate is nearly *twice* the
value of :parameter:`watchdogPeriodSeconds`.
.. class:: hidden
.. toctree::
:titlesonly:
/tutorial/monitor-with-snmp
/tutorial/monitor-with-snmp-on-windows
/tutorial/troubleshoot-snmp