-
Notifications
You must be signed in to change notification settings - Fork 1.7k
/
storage.txt
374 lines (261 loc) · 12.6 KB
/
storage.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
====================
FAQ: MongoDB Storage
====================
.. default-domain:: mongodb
.. contents:: On this page
:local:
:backlinks: none
:depth: 1
:class: singlecol
This document addresses common questions regarding MongoDB's storage
system.
Storage Engine Fundamentals
---------------------------
What is a storage engine?
~~~~~~~~~~~~~~~~~~~~~~~~~
A storage engine is the part of a database that is responsible for
managing how data is stored, both in memory and on disk. Many databases
support multiple storage engines, where different engines perform better
for specific workloads. For example, one storage engine might offer
better performance for read-heavy workloads, and another might support a
higher-throughput for write operations.
.. seealso:: :doc:`/core/storage-engines`
Can you mix storage engines in a replica set?
---------------------------------------------
Yes. You can have a replica set members that use different storage
engines.
When designing these multi-storage engine deployments consider the
following:
- the oplog on each member may need to be sized differently to account
for differences in throughput between different storage engines.
- recovery from backups may become more complex if your backup
captures data files from MongoDB: you may need to maintain backups
for each storage engine.
WiredTiger Storage Engine
-------------------------
Can I upgrade an existing deployment to a WiredTiger?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Yes. See:
- :doc:`/tutorial/change-standalone-wiredtiger`
- :doc:`/tutorial/change-replica-set-wiredtiger`
- :doc:`/tutorial/change-sharded-cluster-wiredtiger`
How much compression does WiredTiger provide?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ratio of compressed data to uncompressed data depends on your data
and the compression library used. By default, collection data in
WiredTiger use :term:`Snappy block compression <snappy>`; :term:`zlib`
compression is also available. Index data use :term:`prefix
compression` by default.
.. _wt-cache-and-eviction:
To what size should I set the WiredTiger internal cache?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. include:: /includes/extracts/wt-configure-cache.rst
How frequently does WiredTiger write to disk?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. include:: /includes/extracts/wt-snapshot-frequency.rst
For journal data, MongoDB writes to disk according to the following
intervals or condition:
.. include:: /includes/extracts/wt-journal-frequency.rst
MMAPv1 Storage Engine
---------------------
.. _faq-storage-memory-mapped-files:
What are memory mapped files?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A memory-mapped file is a file with data that the operating system
places in memory by way of the ``mmap()`` system call. ``mmap()`` thus
*maps* the file to a region of virtual memory. Memory-mapped files are
the critical piece of the MMAPv1 storage engine in MongoDB. By using memory
mapped files, MongoDB can treat the contents of its data files as if
they were in memory. This provides MongoDB with an extremely fast and
simple method for accessing and manipulating data.
How do memory mapped files work?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
MongoDB uses memory mapped files for managing and interacting with all
data.
Memory mapping assigns files to a block of virtual memory with a direct
byte-for-byte correlation. MongoDB memory maps data files to memory as
it accesses documents. Unaccessed data is *not* mapped to memory.
Once mapped, the relationship between file and memory allows MongoDB to
interact with the data in the file as if it were memory.
How frequently does MMAPv1 write to disk?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. include:: /includes/fact-mmapv1-write-to-disk.rst
.. _faq-disk-size:
Why are the files in my data directory larger than the data in my database?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The data files in your data directory, which is the :file:`/data/db`
directory in default configurations, might be larger than the data set
inserted into the database. Consider the following possible causes:
Preallocated data files
```````````````````````
MongoDB preallocates its data files to avoid filesystem fragmentation, and
because of this, the size of these files do not necessarily reflect the size of
your data.
The :setting:`storage.mmapv1.smallFiles` option will reduce the
size of these files, which may be useful if you have many small databases on
disk.
The ``oplog``
`````````````
If this :binary:`~bin.mongod` is a member of a replica set, the data
directory includes the :term:`oplog.rs <oplog>` file, which is a
preallocated :term:`capped collection` in the ``local``
database.
The default allocation is approximately 5% of disk space
on 64-bit installations. In most cases, you should not need to resize the oplog.
See :ref:`Oplog Sizing <replica-set-oplog-sizing>` for more information.
The ``journal``
```````````````
The data directory contains the journal files, which store write
operations on disk before MongoDB applies them to databases. See
:doc:`/core/journaling`.
.. _faq-empty-records:
Empty records
`````````````
MongoDB maintains lists of empty records in data files as it deletes
documents and collections. MongoDB can reuse this space,
but will not, by default, return this space to the operating system.
To allow MongoDB to more effectively reuse the space, you can
de-fragment your data. To de-fragment, use the :dbcommand:`compact`
command. The :dbcommand:`compact` requires up to 2 gigabytes of extra
disk space to run. Do not use :dbcommand:`compact` if you are
critically low on disk space. For more information on its behavior and
other considerations, see :dbcommand:`compact`.
:dbcommand:`compact` only removes fragmentation from MongoDB data files
within a collection and does not return any disk space to the operating
system. To return disk space to the operating system, see
:ref:`faq-reclaim-disk-space`.
.. _faq-reclaim-disk-space:
How do I reclaim disk space?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following provides some option to consider when reclaiming disk
space for a member of a replica set.
.. note::
You do not need to reclaim disk space for MongoDB to reuse freed
space. See :ref:`faq-empty-records` for information on reuse of
freed space.
For a secondary member of a replica set, you can perform a :doc:`resync
of the member </tutorial/resync-replica-set-member>` by: stopping the
secondary member to resync, deleting all data and subdirectories from
the member's data directory, and restarting.
For details, see :doc:`/tutorial/resync-replica-set-member`.
.. _faq-working-set:
What is the working set?
~~~~~~~~~~~~~~~~~~~~~~~~
Working set represents the total body of data that the application
uses in the course of normal operation. Often this is a subset of the
total data size, but the specific size of the working set depends on
actual moment-to-moment use of the database.
If you run a query that requires MongoDB to scan every document in a
collection, the working set will expand to include every
document. Depending on physical memory size, this may cause documents
in the working set to "page out," or to be removed from physical memory by
the operating system. The next time MongoDB needs to access these
documents, MongoDB may incur a hard page fault.
For best performance, the majority of your *active* set should fit in
RAM.
.. _faq-storage-page-faults:
What are page faults?
~~~~~~~~~~~~~~~~~~~~~
.. include:: /includes/fact-page-fault.rst
If there is free memory, then the operating system can find the page
on disk and load it to memory directly. However, if there is no free
memory, the operating system must:
- find a page in memory that is stale or no longer needed, and write
the page to disk.
- read the requested page from disk and load it into memory.
This process, on an active system, can take a long time,
particularly in comparison to reading a page that is already in
memory.
See :ref:`administration-monitoring-page-faults` for more information.
What is the difference between soft and hard page faults?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
:term:`Page faults <page fault>` occur when MongoDB, with the MMAP
storage engine, needs access to data that isn't currently in active
memory. A "hard" page fault refers to situations when MongoDB must
access a disk to access the data. A "soft" page fault, by contrast,
merely moves memory pages from one list to another, such as from an
operating system file cache.
See :ref:`administration-monitoring-page-faults` for more information.
.. TODO consider removing manual padding question in v3.4
.. _faq-developers-manual-padding:
Can I manually pad documents to prevent moves during updates?
-------------------------------------------------------------
.. versionchanged:: 3.0.0
With the :doc:`MMAPv1 storage engine </core/mmapv1>`, an update can
cause a document to move on disk if the document grows in size. To
*minimize* document movements, MongoDB uses :term:`padding`.
You should not have to pad manually because by default, MongoDB uses
:ref:`power-of-2-allocation` to add :ref:`padding automatically
<record-allocation-strategies>`. The :ref:`power-of-2-allocation`
ensures that MongoDB allocates document space in sizes that are powers
of 2, which helps ensure that MongoDB can efficiently reuse free space
created by document deletion or relocation as well as reduce the
occurrences of reallocations in many cases.
However, *if you must* pad a document manually, you can add a
temporary field to the document and then :update:`$unset` the field,
as in the following example.
.. warning:: Do not manually pad documents in a capped
collection. Applying manual padding to a document in a capped
collection can break replication. Also, the padding is not
preserved if you re-sync the MongoDB instance.
.. code-block:: javascript
var myTempPadding = [ "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"];
db.myCollection.insert( { _id: 5, paddingField: myTempPadding } );
db.myCollection.update( { _id: 5 },
{ $unset: { paddingField: "" } }
)
db.myCollection.update( { _id: 5 },
{ $set: { realField: "Some text that I might have needed padding for" } }
)
.. seealso::
:ref:`record-allocation-strategies`
Data Storage Diagnostics
------------------------
How can I check the size of a collection?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To view the statistics for a collection, including the data size, use
the :method:`db.collection.stats()` method from the :binary:`~bin.mongo`
shell. The following example issues :method:`db.collection.stats()` for
the ``orders`` collection:
.. code-block:: javascript
db.orders.stats();
MongoDB also provides the following methods to return specific sizes
for the collection:
- :method:`db.collection.dataSize()` to return data size in bytes for
the collection.
- :method:`db.collection.storageSize()` to return allocation size in
bytes, including unused space.
- :method:`db.collection.totalSize()` to return the data size plus the
index size in bytes.
- :method:`db.collection.totalIndexSize()` to return the index size in
bytes.
The following script prints the statistics for each database:
.. code-block:: javascript
db._adminCommand("listDatabases").databases.forEach(function (d) {
mdb = db.getSiblingDB(d.name);
printjson(mdb.stats());
})
The following script prints the statistics for each collection in each
database:
.. code-block:: javascript
db._adminCommand("listDatabases").databases.forEach(function (d) {
mdb = db.getSiblingDB(d.name);
mdb.getCollectionNames().forEach(function(c) {
s = mdb[c].stats();
printjson(s);
})
})
How can I check the size of indexes for a collection?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To view the size of the data allocated for an index, use the
:method:`db.collection.stats()` method and check the
:data:`~collStats.indexSizes` field in the returned document.
.. _faq-tools-for-measuring-storage-use:
How can I get information on the storage use of a database?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The :method:`db.stats()` method in the :binary:`~bin.mongo` shell returns
the current state of the "active" database. For the description of the
returned fields, see :ref:`dbStats Output <dbstats-output>`.