Fetching contributors…
Cannot retrieve contributors at this time
348 lines (254 sloc) 14.2 KB
.. _troubleshooting:
FAQ: MongoDB Diagnostics
.. default-domain:: mongodb
.. contents:: On this page
:backlinks: none
:depth: 1
:class: singlecol
This document provides answers to common diagnostic questions and
If you don't find the answer you're looking for, check
the :doc:`complete list of FAQs </faq>` or post your question to the
`MongoDB User Mailing List <!forum/mongodb-user>`_.
Where can I find information about a ``mongod`` process that stopped running unexpectedly?
If :binary:`~bin.mongod` shuts down unexpectedly on a UNIX or UNIX-based
platform, and if :binary:`~bin.mongod` fails to log a shutdown or error
message, then check your system logs for messages pertaining to MongoDB.
For example, for logs located in ``/var/log/messages``, use the
following commands:
.. code-block:: sh
sudo grep mongod /var/log/messages
sudo grep score /var/log/messages
.. _faq-keepalive:
Does TCP ``keepalive`` time affect MongoDB Deployments?
If you experience socket errors between clients and servers or between
members of a sharded cluster or replica set that do not have other
reasonable causes, check the TCP keepalive value (for example, the
``tcp_keepalive_time`` value on Linux systems). A common keepalive period is
``7200`` seconds (2 hours); however, different distributions and macOS may have
different settings.
For MongoDB, you will have better results with shorter keepalive
periods, on the order of ``120`` seconds (two minutes).
If your MongoDB deployment experiences keepalive-related issues, you
must alter the keep alive value on *all* machines hosting MongoDB
processes. This includes all machines hosting :binary:`~bin.mongos` or
:binary:`~bin.mongod` processes and all machines hosting client processes
that connect to MongoDB.
.. note::
For non-Linux systems, values greater than or equal to 600 seconds
(10 minutes) will be ignored by :binary:`~bin.mongod` and
:binary:`~bin.mongos`. For Linux, values greater than 300 seconds (5
minutes) will be overridden on the :binary:`~bin.mongod` and
:binary:`~bin.mongos` sockets with a maximum of 300 seconds.
.. include:: /includes/fact-tcp-keepalive-linux.rst
.. include:: /includes/fact-tcp-keepalive-osx.rst
.. include:: /includes/fact-tcp-keepalive-windows.rst
You will need to restart :binary:`~bin.mongod` and :binary:`~bin.mongos`
processes for new system-wide keepalive settings to take effect.
Why does MongoDB log so many "Connection Accepted" events?
If you see a very large number connection and re-connection messages
in your MongoDB log, then clients are frequently connecting and
disconnecting to the MongoDB server. This is normal behavior for
applications that do not use request pooling, such as CGI. Consider
using FastCGI, an Apache Module, or some other kind of persistent
application server to decrease the connection overhead.
If these connections do not impact your performance you can use the
run-time :setting:`~systemLog.quiet` option or the command-line option
:option:`--quiet <mongod --quiet>` to suppress these messages from the
What tools are available for monitoring MongoDB?
The |mms-home| and
:products:`Ops Manager, an on-premise solution available in MongoDB
Enterprise Advanced </mongodb-enterprise-advanced?jmp=docs>` include
monitoring functionality, which collects data from running MongoDB
deployments and provides visualization and alerts based on that data.
For more information, see also the |mms-docs| and
:opsmgr:`Ops Manager documentation </application>`.
A full list of third-party tools is available as part of the
:doc:`/administration/monitoring/` documentation.
.. include:: /includes/replacement-mms.rst
.. _faq-memory:
Memory Diagnostics for the MMAPv1 Storage Engine
Do I need to configure swap space?
Always configure systems to have swap space. Without swap, your system
may not be reliant in some situations with extreme memory constraints,
memory leaks, or multiple programs using the same memory. Think of
the swap space as something like a steam release valve that allows the
system to release extra pressure without affecting the overall
functioning of the system.
Nevertheless, systems running MongoDB *do not* need swap for routine
operation. Database files are :ref:`memory-mapped
<faq-storage-memory-mapped-files>` and should constitute most of your
MongoDB memory use. Therefore, it is unlikely that :binary:`~bin.mongod`
will ever use any swap space in normal operation. The operating system
will release memory from the memory mapped files without needing
swap and MongoDB can write data to the data files without needing the swap
.. _faq-fundamentals-working-set:
What is a "working set"?
The *working set* is the portion of your data that clients access most
Must my working set size fit RAM?
Your working set should stay in memory to achieve good performance.
Otherwise many random disk IO's will occur, and unless you are using
SSD, this can be quite slow.
One area to watch specifically in managing the size of your working set
is index access patterns. If you are inserting into indexes at random
locations (as would happen with id's that are randomly
generated by hashes), you will continually be updating the whole index.
If instead you are able to create your id's in approximately ascending
order (for example, day concatenated with a random id), all the updates
will occur at the right side of the b-tree and the working set size for
index pages will be much smaller.
It is fine if databases and thus virtual size are much larger than RAM.
How do I calculate how much RAM I need for my application?
.. todo Improve this FAQ
The amount of RAM you need depends on several factors, including but not
limited to:
- The relationship between :doc:`database storage </faq/storage>` and working set.
- The operating system's cache strategy for LRU (Least Recently Used)
- The impact of :doc:`journaling </core/journaling>`
- The number or rate of page faults and other |MMS| gauges to detect when
you need more RAM
- Each database connection thread will need up to 1 MB of RAM.
MongoDB defers to the operating system when loading data into memory
from disk. It simply :ref:`memory maps <faq-storage-memory-mapped-files>` all
its data files and relies on the operating system to cache data. The OS
typically evicts the least-recently-used data from RAM when it runs low
on memory. For example if clients access indexes more frequently than
documents, then indexes will more likely stay in RAM, but it depends on
your particular usage.
To calculate how much RAM you need, you must calculate your working set
size, or the portion of your data that clients use most often. This
depends on your access patterns, what indexes you have, and the size of
your documents. Because MongoDB uses a thread per connection model, each
database connection also will need up to 1 MB of RAM, whether active or idle.
If page faults are infrequent, your
working set fits in RAM. If fault rates rise higher than that, you risk
performance degradation. This is less critical with SSD drives than
with spinning disks.
How do I read memory statistics in the UNIX ``top`` command
Because :binary:`~bin.mongod` uses :ref:`memory-mapped files
<faq-storage-memory-mapped-files>`, the memory statistics in ``top``
require interpretation in a special way. On a large database, ``VSIZE``
(virtual bytes) tends to be the size of the entire database. If the
:binary:`~bin.mongod` doesn't have other processes running, ``RSIZE``
(resident bytes) is the total memory of the machine, as this counts
file system cache contents.
For Linux systems, use the ``vmstat`` command to help determine how
the system uses memory. On macOS systems use ``vm_stat``.
.. _faq-memory-wt:
Memory Diagnostics for the WiredTiger Storage Engine
Must my working set size fit RAM?
.. include:: /includes/extracts/wt-cache-size.rst
How do I calculate how much RAM I need for my application?
.. include:: /includes/extracts/wt-configure-cache.rst
Sharded Cluster Diagnostics
The two most important factors in maintaining a successful sharded cluster are:
- :ref:`choosing an appropriate shard key <sharding-internals-shard-keys>` and
- :ref:`sufficient capacity to support current and future operations
You can prevent most issues encountered with sharding by ensuring that
you choose the best possible :term:`shard key` for your deployment and
ensure that you are always adding additional capacity to your cluster
well before the current resources become saturated. Continue reading
for specific issues you may encounter in a production environment.
.. _sharding-troubleshooting-not-splitting:
In a new sharded cluster, why does all data remains on one shard?
Your cluster must have sufficient data for sharding to make
sense. Sharding works by migrating chunks between the shards until
each shard has roughly the same number of chunks.
The default chunk size is 64 megabytes. MongoDB will not begin
migrations until the imbalance of chunks in the cluster exceeds the
:ref:`migration threshold <sharding-migration-thresholds>`. This
behavior helps prevent unnecessary chunk migrations, which can degrade
the performance of your cluster as a whole.
If you have just deployed a sharded cluster, make sure that you have
enough data to make sharding effective. If you do not have sufficient
data to create more than eight 64 megabyte chunks, then all data will
remain on one shard. Either lower the :ref:`chunk size
<sharding-chunk-size>` setting, or add more data to the cluster.
As a related problem, the system will split chunks only on
inserts or updates, which means that if you configure sharding and do not
continue to issue insert and update operations, the database will not
create any chunks. You can either wait until your application inserts
data *or* :doc:`split chunks manually </tutorial/split-chunks-in-sharded-cluster>`.
Finally, if your shard key has a low :ref:`cardinality
<sharding-shard-key-cardinality>`, MongoDB may not be able to create
sufficient splits among the data.
Why would one shard receive a disproportion amount of traffic in a sharded cluster?
In some situations, a single shard or a subset of the cluster will
receive a disproportionate portion of the traffic and workload. In
almost all cases this is the result of a shard key that does not
effectively allow :ref:`write scaling <sharding-shard-key-write-scaling>`.
It's also possible that you have "hot chunks." In this case, you may
be able to solve the problem by splitting and then migrating parts of
these chunks.
In the worst case, you may have to consider re-sharding your data
and :ref:`choosing a different shard key <sharding-internals-choose-shard-key>`
to correct this pattern.
What can prevent a sharded cluster from balancing?
If you have just deployed your sharded cluster, you may want to
consider the :ref:`troubleshooting suggestions for a new cluster where
data remains on a single shard <sharding-troubleshooting-not-splitting>`.
If the cluster was initially balanced, but later developed an uneven
distribution of data, consider the following possible causes:
- You have deleted or removed a significant amount of data from the
cluster. If you have added additional data, it may have a
different distribution with regards to its shard key.
- Your :term:`shard key` has low :ref:`cardinality <sharding-shard-key-cardinality>`
and MongoDB cannot split the chunks any further.
- Your data set is growing faster than the balancer can distribute
data around the cluster. This is uncommon and
typically is the result of:
- a :ref:`balancing window <sharding-schedule-balancing-window>` that
is too short, given the rate of data growth.
- an uneven distribution of :ref:`write operations
<sharding-shard-key-write-scaling>` that requires more data
migration. You may have to choose a different shard key to resolve
this issue.
- poor network connectivity between shards, which may lead to chunk
migrations that take too long to complete. Investigate your
network configuration and interconnections between shards.
Why do chunk migrations affect sharded cluster performance?
If migrations impact your cluster or application's performance,
consider the following options, depending on the nature of the impact:
#. If migrations only interrupt your clusters sporadically, you can
limit the :ref:`balancing window
<sharding-schedule-balancing-window>` to prevent balancing activity
during peak hours. Ensure that there is enough time remaining to
keep the data from becoming out of balance again.
#. If the balancer is always migrating chunks to the detriment of
overall cluster performance:
- You may want to attempt :doc:`decreasing the chunk size </tutorial/modify-chunk-size-in-sharded-cluster>`
to limit the size of the migration.
- Your cluster may be over capacity, and you may want to attempt to
:ref:`add one or two shards <sharding-procedure-add-shard>` to
the cluster to distribute load.
It's also possible that your shard key causes your
application to direct all writes to a single shard. This kind of
activity pattern can require the balancer to migrate most data soon after writing
it. Consider redeploying your cluster with a shard key that provides
better :ref:`write scaling <sharding-shard-key-write-scaling>`.