Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
100644 359 lines (264 sloc) 15.276 kb
5c223d7 @tychoish creating diagnostics faq
tychoish authored
1 .. _troubleshooting:
3 ========================
4 FAQ: MongoDB Diagnostics
5 ========================
7 .. default-domain:: mongodb
9 This document provides answers to common diagnostic questions and
10 issues.
12 If you don't find the answer you're looking for, check
13 the :doc:`complete list of FAQs </faq>` or post your question to the
14 `MongoDB User Mailing List <!forum/mongodb-user>`_.
16 Where can I find information about a ``mongod`` process that stopped running unexpectedly?
17 ------------------------------------------------------------------------------------------
19 If :program:`mongod` shuts down unexpectedly on a UNIX or UNIX-based
20 platform, and if :program:`mongod` fails to log a shutdown or error
21 message, then check your system logs for messages pertaining to MongoDB.
ad93411 @puentesarrin Repeated term: in
puentesarrin authored
22 For example, for logs located in ``/var/log/messages``, use the
23 following commands:
5c223d7 @tychoish creating diagnostics faq
tychoish authored
25 .. code-block:: sh
27 sudo grep mongod /var/log/messages
28 sudo grep score /var/log/messages
e045dec @bgrabar DOCS-426 socket exception when primary steps down
bgrabar authored
30 .. _faq-keepalive:
5c223d7 @tychoish creating diagnostics faq
tychoish authored
32 Does TCP ``keepalive`` time affect sharded clusters and replica sets?
33 ---------------------------------------------------------------------
35 If you experience socket errors between members of a sharded cluster
36 or replica set, that do not have other reasonable causes, check the
37 TCP keep alive value, which Linux systems store as the
38 ``tcp_keepalive_time`` value. A common keep alive period is ``7200``
39 seconds (2 hours); however, different distributions and OS X may have
7a231d0 Link and typo run
Ed Costello authored
40 different settings. For MongoDB, you will have better experiences with
5c223d7 @tychoish creating diagnostics faq
tychoish authored
41 shorter keepalive periods, on the order of ``300`` seconds (five minutes).
43 On Linux systems you can use the following operation to check the
44 value of ``tcp_keepalive_time``:
46 .. code-block:: sh
48 cat /proc/sys/net/ipv4/tcp_keepalive_time
50 You can change the ``tcp_keepalive_time`` value with the following
51 operation:
53 .. code-block:: sh
55 echo 300 > /proc/sys/net/ipv4/tcp_keepalive_time
57 The new ``tcp_keepalive_time`` value takes effect without requiring
58 you to restart the :program:`mongod` or :program:`mongos`
59 servers. When you reboot or restart your system you will need to set
60 the new ``tcp_keepalive_time`` value, or see your operating system's
61 documentation for setting the TCP keepalive value persistently.
63 For OS X systems, issue the following command to view the keep alive
64 setting:
66 .. code-block:: sh
68 sysctl net.inet.tcp.keepinit
70 To set a shorter keep alive period use the following invocation:
664281b @jrassi fix minor formatting in faq/diagnostics.txt
jrassi authored
72 .. code-block:: sh
5c223d7 @tychoish creating diagnostics faq
tychoish authored
74 sysctl -w net.inet.tcp.keepinit=300
d40d8cf @tychoish DOCS-987: edits and revision to sharding reorganization
tychoish authored
76 If your replica set or sharded cluster experiences keepalive-related
70b46fe @puentesarrin Fixed several duplicate terms
puentesarrin authored
77 issues, you must alter the ``tcp_keepalive_time`` value on all machines
78 hosting MongoDB processes. This includes all machines hosting
5c223d7 @tychoish creating diagnostics faq
tychoish authored
79 :program:`mongos` or :program:`mongod` servers.
5395dd0 @tychoish DOCS-491 editing and organization
tychoish authored
80186d7 @tychoish DOCS-1325: adding links to the keepalive documentation for windows.
tychoish authored
81 Windows users should consider the `Windows Server Technet Article on
82 KeepAliveTime configuration
83 <>`_
84 for more information on setting keep alive for MongoDB deployments on
85 Windows systems.
0394114 @tychoish adding mms to diagnostics faq
tychoish authored
87 What tools are available for monitoring MongoDB?
88 ------------------------------------------------
d429fac @tychoish mms rebrand fix
tychoish authored
90 The `MongoDB Management Services <>` includes
91 monitoring. MMS Monitoring is a free, hosted services for monitoring
92 MongoDB deployments. A full list of third-party tools is available as
93 part of the :doc:`/administration/monitoring/` documentation. Also
94 consider the `MMS Documentation <>`_.
0394114 @tychoish adding mms to diagnostics faq
tychoish authored
63edf3f @tychoish minor: adding cross referecning and link target
tychoish authored
96 .. _faq-memory:
d40d8cf @tychoish DOCS-987: edits and revision to sharding reorganization
tychoish authored
98 Memory Diagnostics
99 ------------------
5395dd0 @tychoish DOCS-491 editing and organization
tychoish authored
101 Do I need to configure swap space?
d40d8cf @tychoish DOCS-987: edits and revision to sharding reorganization
tychoish authored
102 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5395dd0 @tychoish DOCS-491 editing and organization
tychoish authored
104 Always configure systems to have swap space. Without swap, your system
2303998 Edits for style and grace
Ed Costello authored
105 may not be reliant in some situations with extreme memory constraints,
5395dd0 @tychoish DOCS-491 editing and organization
tychoish authored
106 memory leaks, or multiple programs using the same memory. Think of
107 the swap space as something like a steam release valve that allows the
108 system to release extra pressure without affecting the overall
109 functioning of the system.
111 Nevertheless, systems running MongoDB *do not* need swap for routine
112 operation. Database files are :ref:`memory-mapped
113 <faq-storage-memory-mapped-files>` and should constitute most of your
114 MongoDB memory use. Therefore, it is unlikely that :program:`mongod`
115 will ever use any swap space in normal operation. The operating system
116 will release memory from the memory mapped files without needing
117 swap and MongoDB can write data to the data files without needing the swap
118 system.
120 .. _faq-fundamentals-working-set:
c9312bf Update diagnostics.txt
Jake Angerman authored
122 What is "working set" and how can I estimate its size?
123 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2d396c0 @tychoish note about working set estimator in output of server status in faq
tychoish authored
125 The *working set* for a MongoDB database is the portion of your data
126 that clients access most often. You can estimate size of the working
127 set, using the :data:`~serverStatus.workingSet` document in the output
128 of :dbcommand:`serverStatus`. To return :dbcommand:`serverStatus` with
129 the :data:`~serverStatus.workingSet` document, issue a command in the
130 following form:
132 .. code-block:: javascript
134 db.runCommand( { serverStatus: 1, workingSet: 1 } )
5395dd0 @tychoish DOCS-491 editing and organization
tychoish authored
136 Must my working set size fit RAM?
d40d8cf @tychoish DOCS-987: edits and revision to sharding reorganization
tychoish authored
137 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5395dd0 @tychoish DOCS-491 editing and organization
tychoish authored
139 Your working set should stay in memory to achieve good performance.
140 Otherwise many random disk IO's will occur, and unless you are using
141 SSD, this can be quite slow.
143 One area to watch specifically in managing the size of your working set
144 is index access patterns. If you are inserting into indexes at random
145 locations (as would happen with id's that are randomly
146 generated by hashes), you will continually be updating the whole index.
147 If instead you are able to create your id's in approximately ascending
148 order (for example, day concatenated with a random id), all the updates
149 will occur at the right side of the b-tree and the working set size for
150 index pages will be much smaller.
152 It is fine if databases and thus virtual size are much larger than RAM.
154 .. todo Commenting out for now:
156 .. _faq-fundamentals-working-set-size:
158 How can I measure working set size?
159 -----------------------------------
161 Measuring working set size can be difficult; even if it is much
162 smaller than total RAM. If the database is much larger than RAM in
163 total, all memory will be indicated as in use for the cache. Thus you
164 need a different way to estimate the working set size.
166 One technique is to use the `eatmem.cpp
167 <>`_.
168 utility, which reserves a certain amount of system memory for itself.
169 You can run the utility with a certain amount specified and see if
170 the server continues to perform well. If not, the working set is
171 larger than the total RAM minus the consumed RAM. The test will eject
172 some data from the file system cache, which might take time to page
173 back in after the utility is terminated.
175 Running eatmem.cpp continuously with a small percentage of total RAM,
176 such as 20%, is a good technique to get an early warning if memory is
177 too low. If disk I/O activity increases significantly, terminate
178 eatmem.cpp to mitigate the problem for the moment until further steps
179 can be taken.
181 In :term:`replica sets <replica set>`, if one server is underpowered
182 the eatmem.cpp utility could help as an early warning mechanism for
183 server capacity. Of course, the server must be receiving
184 representative traffic to get an indication.
186 How do I calculate how much RAM I need for my application?
d40d8cf @tychoish DOCS-987: edits and revision to sharding reorganization
tychoish authored
187 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5395dd0 @tychoish DOCS-491 editing and organization
tychoish authored
189 .. todo Improve this FAQ
191 The amount of RAM you need depends on several factors, including but not
192 limited to:
194 - The relationship between :doc:`database storage </faq/storage>` and working set.
196 - The operating system's cache strategy for LRU (Least Recently Used)
2ecf2c5 @kay-kim admin reorg
kay-kim authored
198 - The impact of :doc:`journaling </core/journaling>`
5395dd0 @tychoish DOCS-491 editing and organization
tychoish authored
200 - The number or rate of page faults and other MMS gauges to detect when
201 you need more RAM
203 MongoDB defers to the operating system when loading data into memory
204 from disk. It simply :ref:`memory maps <faq-storage-memory-mapped-files>` all
205 its data files and relies on the operating system to cache data. The OS
206 typically evicts the least-recently-used data from RAM when it runs low
207 on memory. For example if clients access indexes more frequently than
208 documents, then indexes will more likely stay in RAM, but it depends on
209 your particular usage.
211 To calculate how much RAM you need, you must calculate your working set
212 size, or the portion of your data that clients use most often. This
213 depends on your access patterns, what indexes you have, and the size of
214 your documents.
216 If page faults are infrequent, your
217 working set fits in RAM. If fault rates rise higher than that, you risk
218 performance degradation. This is less critical with SSD drives than
219 with spinning disks.
221 How do I read memory statistics in the UNIX ``top`` command
d40d8cf @tychoish DOCS-987: edits and revision to sharding reorganization
tychoish authored
222 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5395dd0 @tychoish DOCS-491 editing and organization
tychoish authored
224 Because :program:`mongod` uses :ref:`memory-mapped files
225 <faq-storage-memory-mapped-files>`, the memory statistics in ``top``
226 require interpretation in a special way. On a large database, ``VSIZE``
227 (virtual bytes) tends to be the size of the entire database. If the
228 :program:`mongod` doesn't have other processes running, ``RSIZE``
229 (resident bytes) is the total memory of the machine, as this counts
230 file system cache contents.
232 For Linux systems, use the ``vmstat`` command to help determine how
233 the system uses memory. On OS X systems use ``vm_stat``.
d40d8cf @tychoish DOCS-987: edits and revision to sharding reorganization
tychoish authored
235 Sharded Cluster Diagnostics
236 ---------------------------
238 The two most important factors in maintaining a successful sharded cluster are:
240 - :ref:`choosing an appropriate shard key <sharding-internals-shard-keys>` and
242 - :ref:`sufficient capacity to support current and future operations
243 <sharding-capacity-planning>`.
245 You can prevent most issues encountered with sharding by ensuring that
246 you choose the best possible :term:`shard key` for your deployment and
247 ensure that you are always adding additional capacity to your cluster
248 well before the current resources become saturated. Continue reading
249 for specific issues you may encounter in a production environment.
251 .. _sharding-troubleshooting-not-splitting:
253 In a new sharded cluster, why does all data remains on one shard?
254 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
256 Your cluster must have sufficient data for sharding to make
257 sense. Sharding works by migrating chunks between the shards until
258 each shard has roughly the same number of chunks.
260 The default chunk size is 64 megabytes. MongoDB will not begin
261 migrations until the imbalance of chunks in the cluster exceeds the
262 :ref:`migration threshold <sharding-migration-thresholds>`. While the
263 default chunk size is configurable with the :setting:`chunkSize`
264 setting, these behaviors help prevent unnecessary chunk migrations,
265 which can degrade the performance of your cluster as a whole.
267 If you have just deployed a sharded cluster, make sure that you have
268 enough data to make sharding effective. If you do not have sufficient
269 data to create more than eight 64 megabyte chunks, then all data will
270 remain on one shard. Either lower the :ref:`chunk size
271 <sharding-chunk-size>` setting, or add more data to the cluster.
273 As a related problem, the system will split chunks only on
274 inserts or updates, which means that if you configure sharding and do not
275 continue to issue insert and update operations, the database will not
276 create any chunks. You can either wait until your application inserts
da6d9da DOCS-1731: Split manage chunks in sharded cluster tutorial page
Zack Brown authored
277 data *or* :doc:`split chunks manually </tutorial/split-chunks-in-sharded-cluster>`.
d40d8cf @tychoish DOCS-987: edits and revision to sharding reorganization
tychoish authored
279 Finally, if your shard key has a low :ref:`cardinality
280 <sharding-shard-key-cardinality>`, MongoDB may not be able to create
281 sufficient splits among the data.
283 Why would one shard receive a disproportion amount of traffic in a sharded cluster?
284 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
286 In some situations, a single shard or a subset of the cluster will
287 receive a disproportionate portion of the traffic and workload. In
288 almost all cases this is the result of a shard key that does not
289 effectively allow :ref:`write scaling <sharding-shard-key-write-scaling>`.
291 It's also possible that you have "hot chunks." In this case, you may
292 be able to solve the problem by splitting and then migrating parts of
293 these chunks.
295 In the worst case, you may have to consider re-sharding your data
296 and :ref:`choosing a different shard key <sharding-internals-choose-shard-key>`
297 to correct this pattern.
a13aae9 @dcardon Update source/faq/diagnostics.txt
dcardon authored
299 What can prevent a sharded cluster from balancing?
300 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
d40d8cf @tychoish DOCS-987: edits and revision to sharding reorganization
tychoish authored
302 If you have just deployed your sharded cluster, you may want to
303 consider the :ref:`troubleshooting suggestions for a new cluster where
304 data remains on a single shard <sharding-troubleshooting-not-splitting>`.
306 If the cluster was initially balanced, but later developed an uneven
307 distribution of data, consider the following possible causes:
309 - You have deleted or removed a significant amount of data from the
310 cluster. If you have added additional data, it may have a
311 different distribution with regards to its shard key.
313 - Your :term:`shard key` has low :ref:`cardinality <sharding-shard-key-cardinality>`
314 and MongoDB cannot split the chunks any further.
316 - Your data set is growing faster than the balancer can distribute
317 data around the cluster. This is uncommon and
318 typically is the result of:
320 - a :ref:`balancing window <sharding-schedule-balancing-window>` that
321 is too short, given the rate of data growth.
323 - an uneven distribution of :ref:`write operations
324 <sharding-shard-key-write-scaling>` that requires more data
325 migration. You may have to choose a different shard key to resolve
326 this issue.
328 - poor network connectivity between shards, which may lead to chunk
329 migrations that take too long to complete. Investigate your
330 network configuration and interconnections between shards.
332 Why do chunk migrations affect sharded cluster performance?
333 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
335 If migrations impact your cluster or application's performance,
336 consider the following options, depending on the nature of the impact:
338 #. If migrations only interrupt your clusters sporadically, you can
339 limit the :ref:`balancing window
340 <sharding-schedule-balancing-window>` to prevent balancing activity
341 during peak hours. Ensure that there is enough time remaining to
342 keep the data from becoming out of balance again.
344 #. If the balancer is always migrating chunks to the detriment of
345 overall cluster performance:
da6d9da DOCS-1731: Split manage chunks in sharded cluster tutorial page
Zack Brown authored
347 - You may want to attempt :doc:`decreasing the chunk size </tutorial/modify-chunk-size-in-sharded-cluster>`
d40d8cf @tychoish DOCS-987: edits and revision to sharding reorganization
tychoish authored
348 to limit the size of the migration.
350 - Your cluster may be over capacity, and you may want to attempt to
46f9b4b @tychoish minor: build correction
tychoish authored
351 :ref:`add one or two shards <sharding-procedure-add-shard>` to
d40d8cf @tychoish DOCS-987: edits and revision to sharding reorganization
tychoish authored
352 the cluster to distribute load.
354 It's also possible that your shard key causes your
355 application to direct all writes to a single shard. This kind of
356 activity pattern can require the balancer to migrate most data soon after writing
357 it. Consider redeploying your cluster with a shard key that provides
358 better :ref:`write scaling <sharding-shard-key-write-scaling>`.
Something went wrong with that request. Please try again.