/
diagnostics.txt
292 lines (207 loc) · 11 KB
/
diagnostics.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
.. _troubleshooting:
========================
FAQ: MongoDB Diagnostics
========================
.. default-domain:: mongodb
.. contents:: On this page
:local:
:backlinks: none
:depth: 1
:class: singlecol
This document provides answers to common diagnostic questions and
issues.
If you don't find the answer you're looking for, check
the :ref:`complete list of FAQs <faq-manual>` or post your question to
the `MongoDB Community
<https://community.mongodb.com/?tck=docs_server>`_.
Where can I find information about a ``mongod`` process that stopped running unexpectedly?
------------------------------------------------------------------------------------------
If :binary:`~bin.mongod` shuts down unexpectedly on a UNIX or UNIX-based
platform, and if :binary:`~bin.mongod` fails to log a shutdown or error
message, then check your system logs for messages pertaining to MongoDB.
For example, for logs located in ``/var/log/messages``, use the
following commands:
.. code-block:: bash
sudo grep mongod /var/log/messages
sudo grep score /var/log/messages
.. _faq-keepalive:
Does TCP ``keepalive`` time affect MongoDB Deployments?
-------------------------------------------------------
If you experience network timeouts or socket errors in communication
between clients and servers, or between members of a sharded cluster or
replica set, check the *TCP keepalive value* for the affected systems.
Many operating systems set this value to ``7200`` seconds (two hours) by
default. For MongoDB, you will generally experience better results with
a shorter keepalive value, on the order of ``120`` seconds (two
minutes).
If your MongoDB deployment experiences keepalive-related issues, you
must alter the keepalive value on *all* affected systems. This includes
all machines running :binary:`~bin.mongod` or :binary:`~bin.mongos`
processes and all machines hosting client processes that connect to
MongoDB.
Adjusting the TCP keepalive value:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. tabs::
.. tab:: Linux
:tabid: linux-keepalive
.. include:: /includes/fact-tcp-keepalive-linux.rst
.. tab:: Windows
:tabid: windows-keepalive
.. include:: /includes/fact-tcp-keepalive-windows.rst
.. tab:: macOS
:tabid: macos-keepalive
.. include:: /includes/fact-tcp-keepalive-osx.rst
You will need to restart :binary:`~bin.mongod` and :binary:`~bin.mongos`
processes for new system-wide keepalive settings to take effect.
.. _faq-tcp_retries2:
Do TCP Retransmission Timeouts affect MongoDB Deployments?
----------------------------------------------------------
If you experience long stalls (stalls greater than two minutes) followed
by network timeouts or socket errors between clients and server
or between members of a sharded cluster or replica set,
check the ``tcp_retries2`` value for the affected systems.
Most Linux operating systems set this value to ``15`` by default, while
Windows sets it to ``5``. For MongoDB, you experience better results
with a lower ``tcp_retries2`` value, on the order of ``5`` (12 seconds)
or lower.
If your MongoDB deployment experiences TCP retransmission timeout-related
issues, change the ``tcp_retries2`` value (``TcpMaxDataRetransmission``
on Windows) for *all* affected systems. This includes all machines running
:binary:`~bin.mongod` or :binary:`~bin.mongos` processes and
all machines hosting client processes that connect to MongoDB.
Adjust the TCP Retransmission Timeout
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. tabs::
.. tab:: Linux
:tabid: linux
.. include:: /includes/fact-tcp-retries-linux.rst
.. tab:: Windows
:tabid: windows
.. include:: /includes/fact-tcp-retries-windows.rst
Why does MongoDB log so many "Connection Accepted" events?
----------------------------------------------------------
If you see a very large number of connection and re-connection messages
in your MongoDB log, then clients are frequently connecting and
disconnecting to the MongoDB server. This is normal behavior for
applications that do not use request pooling, such as CGI. Consider
using FastCGI, an Apache Module, or some other kind of persistent
application server to decrease the connection overhead.
If these connections do not impact your performance you can use the
run-time :setting:`~systemLog.quiet` option or the command-line option
:option:`--quiet <mongod --quiet>` to suppress these messages from the
log.
What tools are available for monitoring MongoDB?
------------------------------------------------
.. include:: /includes/replacement-mms.rst
The |mms-home| and
:products:`Ops Manager, an on-premise solution available in MongoDB
Enterprise Advanced </mongodb-enterprise-advanced?tck=docs_server>` include
monitoring functionality, which collects data from running MongoDB
deployments and provides visualization and alerts based on that data.
For more information, see also the |mms-docs| and
:opsmgr:`Ops Manager documentation </application>`.
A full list of third-party tools is available as part of the
:doc:`/administration/monitoring/` documentation.
.. _faq-memory-wt:
Memory Diagnostics for the WiredTiger Storage Engine
----------------------------------------------------
Must my working set size fit RAM?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
No.
.. include:: /includes/extracts/wt-cache-size.rst
How do I calculate how much RAM I need for my application?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. include:: /includes/extracts/wt-configure-cache.rst
Sharded Cluster Diagnostics
---------------------------
The two most important factors in maintaining a successful sharded cluster are:
- :ref:`choosing an appropriate shard key <sharding-internals-shard-keys>` and
- :ref:`sufficient capacity to support current and future operations
<sharding-capacity-planning>`.
While you can :ref:`change your shard key <change-a-shard-key>` later,
it is important to carefully consider your shard key choice to avoid
scalability and performance issues. Continue reading for specific issues
you may encounter in a production environment.
.. _sharding-troubleshooting-not-splitting:
In a new sharded cluster, why does all data remain on one shard?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Your cluster must have sufficient data for sharding to make
sense. Sharding works by migrating chunks between the shards until
each shard has roughly the same number of chunks.
The default chunk size is 128 megabytes. MongoDB will not begin
migrations until the imbalance of chunks in the cluster exceeds the
:ref:`migration threshold <sharding-migration-thresholds>`. This
behavior helps prevent unnecessary chunk migrations, which can degrade
the performance of your cluster as a whole.
If you have just deployed a sharded cluster, make sure that you have
enough data to make sharding effective. If you do not have sufficient
data to create more than eight 128 megabyte chunks, then all data will
remain on one shard. Either lower the :ref:`chunk size
<sharding-chunk-size>` setting, or add more data to the cluster.
As a related problem, the system will split chunks only on
inserts or updates, which means that if you configure sharding and do not
continue to issue insert and update operations, the database will not
create any chunks. You can either wait until your application inserts
data *or* :ref:`split chunks manually <split-chunks-sharded-cluster>`.
Finally, if your shard key has a low :ref:`cardinality
<sharding-shard-key-cardinality>`, MongoDB may not be able to create
sufficient splits among the data.
Why would one shard receive a disproportionate amount of traffic in a sharded cluster?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In some situations, a single shard or a subset of the cluster will
receive a disproportionate portion of the traffic and workload. In
almost all cases this is the result of a shard key that does not
effectively allow :ref:`write scaling <sharding-shard-key-write-scaling>`.
It's also possible that you have "hot chunks." In this case, you may
be able to solve the problem by splitting and then migrating parts of
these chunks.
You may have to consider :ref:`resharding your collection
<sharding-resharding>` with a :ref:`different shard key
<sharding-internals-choose-shard-key>` to correct this pattern.
What can prevent a sharded cluster from balancing?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you have just deployed your sharded cluster, you may want to
consider the :ref:`troubleshooting suggestions for a new cluster where
data remains on a single shard <sharding-troubleshooting-not-splitting>`.
If the cluster was initially balanced, but later developed an uneven
distribution of data, consider the following possible causes:
- You have deleted or removed a significant amount of data from the
cluster. If you have added additional data, it may have a
different distribution with regards to its shard key.
- Your :term:`shard key` has low :ref:`cardinality <sharding-shard-key-cardinality>`
and MongoDB cannot split the chunks any further.
- Your data set is growing faster than the balancer can distribute
data around the cluster. This is uncommon and
typically is the result of:
- a :ref:`balancing window <sharding-schedule-balancing-window>` that
is too short, given the rate of data growth.
- an uneven distribution of :ref:`write operations
<sharding-shard-key-write-scaling>` that requires more data
migration. You may have to choose a different shard key to resolve
this issue.
- poor network connectivity between shards, which may lead to chunk
migrations that take too long to complete. Investigate your
network configuration and interconnections between shards.
Why do chunk migrations affect sharded cluster performance?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If migrations impact your cluster or application's performance,
consider the following options, depending on the nature of the impact:
#. If migrations only interrupt your clusters sporadically, you can
limit the :ref:`balancing window
<sharding-schedule-balancing-window>` to prevent balancing activity
during peak hours. Ensure that there is enough time remaining to
keep the data from becoming out of balance again.
#. If the balancer is always migrating chunks to the detriment of
overall cluster performance:
- You may want to attempt :ref:`decreasing the chunk size <tutorial-modifying-chunk-size>`
to limit the size of the migration.
- Your cluster may be over capacity, and you may want to attempt to
:ref:`add one or two shards <sharding-procedure-add-shard>` to
the cluster to distribute load.
It's also possible that your shard key causes your
application to direct all writes to a single shard. This kind of
activity pattern can require the balancer to migrate most data soon after writing
it. You may have to consider :ref:`resharding your collection
<sharding-resharding>` with a :ref:`different shard key
<sharding-internals-choose-shard-key>` that provides better :ref:`write
scaling <sharding-shard-key-write-scaling>`.