-
Notifications
You must be signed in to change notification settings - Fork 1.7k
/
journaling.txt
258 lines (189 loc) · 9.67 KB
/
journaling.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
.. _journaling-internals:
==========
Journaling
==========
.. default-domain:: mongodb
.. contents:: On this page
:local:
:backlinks: none
:depth: 1
:class: singlecol
To provide durability in the event of a failure, MongoDB uses *write
ahead logging* to on-disk :term:`journal` files.
.. _journaling-wiredTiger:
Journaling and the WiredTiger Storage Engine
--------------------------------------------
.. important::
The *log* mentioned in this section refers to the WiredTiger
write-ahead log (i.e. the journal) and not the MongoDB log file.
:doc:`WiredTiger </core/wiredtiger>` uses :ref:`checkpoints
<storage-wiredtiger-checkpoints>` to provide a consistent view of data
on disk and allow MongoDB to recover from the last checkpoint. However,
if MongoDB exits unexpectedly in between checkpoints, journaling is
required to recover information that occurred after the last checkpoint.
With journaling, the recovery process:
#. Looks in the data files to find the identifier of the last
checkpoint.
#. Searches in the journal files for the record that
matches the identifier of the last checkpoint.
#. Apply the operations in the journal files since the last checkpoint.
.. _journal-process:
Journaling Process
~~~~~~~~~~~~~~~~~~
.. versionchanged:: 3.2
With journaling, WiredTiger creates one journal record for each client
initiated write operation. The journal record includes any internal
write operations caused by the initial write. For example, an update to
a document in a collection may result in modifications to the indexes;
WiredTiger creates a single journal record that includes both the
update operation and its associated index modifications.
MongoDB configures WiredTiger to use in-memory buffering for storing
the journal records. Threads coordinate to allocate and copy
into their portion of the buffer. All journal records up to 128
kB are buffered.
WiredTiger syncs the buffered journal records to disk according to the
following intervals or conditions:
.. include:: /includes/extracts/wt-journal-frequency.rst
.. important::
In between write operations, while the journal records
remain in the WiredTiger buffers, updates can be lost following a
hard shutdown of :binary:`~bin.mongod`.
.. seealso::
The :dbcommand:`serverStatus` command returns information on the
WiredTiger journal statistics in the :data:`wiredTiger.log
<data:: serverStatus.wiredTiger.log>` field.
Journal Files
~~~~~~~~~~~~~
For the journal files, MongoDB creates a subdirectory named ``journal``
under the :setting:`~storage.dbPath` directory. WiredTiger journal
files have names with the following format ``WiredTigerLog.<sequence>``
where ``<sequence>`` is a zero-padded number starting from
``0000000001``.
Journal files contain a record per each write operation. Each record
has a unique identifier.
MongoDB configures WiredTiger to use snappy compression for the
journaling data.
.. include:: /includes/fact-wiredtiger-log-compression-limit.rst
WiredTiger journal files for MongoDB have a maximum size limit of
approximately 100 MB. Once the file exceeds that limit, WiredTiger
creates a new journal file.
WiredTiger automatically removes old journal files to maintain only the
files needed to recover from last checkpoint.
WiredTiger will pre-allocate journal files.
.. _journaling-mmapv1:
Journaling and the MMAPv1 Storage Engine
----------------------------------------
With :doc:`MMAPv1 </core/mmapv1>`, when a write operation occurs,
MongoDB updates the in-memory view. With journaling enabled, MongoDB
writes the in-memory changes first to on-disk journal files. If MongoDB
should terminate or encounter an error before committing the changes to
the data files, MongoDB can use the journal files to apply the write
operation to the data files and maintain a consistent state.
.. _journaling-storage-views:
.. _journaling-record-write-operation:
Journaling Process
~~~~~~~~~~~~~~~~~~
With journaling, MongoDB's storage layer has two internal views of the
data set: the *private view*, used to write to the journal files, and
the *shared view*, used to write to the data files:
#. MongoDB first applies write operations to the private view.
#. MongoDB then applies the changes in the private view to the on-disk
:ref:`journal files <journaling-journal-files>` in the ``journal``
directory roughly every 100 milliseconds. MongoDB records the
write operations to the on-disk journal files in batches called
*group commits*. Grouping the commits help minimize the performance
impact of journaling since these commits must block all writers
during the commit. Writes to the journal are atomic, ensuring the
consistency of the on-disk journal files. For information on the
frequency of the commit interval, see
:setting:`storage.journal.commitIntervalMs`.
#. Upon a journal commit, MongoDB applies the changes from the journal
to the shared view.
#. Finally, MongoDB applies the changes in the shared view to the data
files. More precisely, at default intervals of 60 seconds, MongoDB
asks the operating system to flush the shared view to the data
files. The operating system may choose to flush the shared view to
disk at a higher frequency than 60 seconds, particularly if the
system is low on free memory. To change the interval for writing to
the data files, use the :setting:`storage.syncPeriodSecs` setting.
If the :binary:`~bin.mongod` instance were to crash without having applied
the writes to the data files, the journal could replay the writes to
the shared view for eventual write to the data files.
When MongoDB flushes write operations to the data files, MongoDB notes
which journal writes have been flushed. Once a journal file contains
only flushed writes, it is no longer needed for recovery and MongoDB
can recycle it for a new journal file.
Once the journal operations have been applied to the shared view and
flushed to disk (i.e. pages in the shared view and private view are in
sync), MongoDB asks the operating system to remap the shared view to
the private view in order to save physical RAM. MongoDB routinely asks
the operating system to remap the shared view to the private
view in order to save physical RAM. Upon a new remapping, the
operating system knows that physical memory pages can be shared between
the shared view and the private view mappings.
.. note::
The interaction between the shared view and the on-disk data
files is similar to how MongoDB works *without* journaling. Without
journaling, MongoDB asks the operating system to flush in-memory
changes to the data files every 60 seconds.
.. _journaling-journal-files:
Journal Files
~~~~~~~~~~~~~
With journaling enabled, MongoDB creates a subdirectory named
``journal`` under the :setting:`~storage.dbPath` directory. The
``journal`` directory contains journal files named ``j._<sequence>``
where ``<sequence>`` is an integer starting from ``0`` and a "last
sequence number" file ``lsn``.
Journal files contain the write ahead logs; each journal entry
describes the bytes the write operation changed in the data files.
Journal files are append-only files. When a journal file holds 1
gigabyte of data, MongoDB creates a new journal file. If you use the
:setting:`storage.smallFiles` option when starting :binary:`~bin.mongod`,
you limit the size of each journal file to 128 megabytes.
The ``lsn`` file contains the last time MongoDB flushed the changes to
the data files.
Once MongoDB applies all the write operations in a particular journal
file to the data files, MongoDB can recycle it for a new journal file.
Unless you write *many* bytes of data per second, the ``journal``
directory should contain only two or three journal files.
A clean shutdown removes all the files in the journal directory. A
dirty shutdown (crash) leaves files in the journal directory; these are
used to automatically recover the database to a consistent state when
the mongod process is restarted.
Journal Directory
`````````````````
To speed the frequent sequential writes that occur to the current
journal file, you can ensure that the journal directory is on a
different filesystem from the database data files.
.. important::
If you place the journal on a different filesystem from your data
files, you *cannot* use a filesystem snapshot alone to capture valid
backups of a :setting:`~storage.dbPath` directory. In this case, use
:method:`~db.fsyncLock()` to ensure that database files are consistent
before the snapshot and :method:`~db.fsyncUnlock()` once the snapshot
is complete.
Preallocation Lag
`````````````````
MongoDB may preallocate journal files if the :binary:`~bin.mongod` process
determines that it is more efficient to preallocate journal files than
create new journal files as needed.
Depending on your filesystem, you might experience a preallocation lag
the first time you start a :binary:`~bin.mongod` instance with journaling
enabled. The amount of time required to pre-allocate files might last
several minutes; during this time, you will not be able to connect to
the database. This is a one-time preallocation and does not occur with
future invocations.
To avoid preallocation lag, see :ref:`journaling-avoid-preallocation-lag`.
.. _journaling-inMemory:
Journaling and the In-Memory Storage Engine
-------------------------------------------
Starting in MongoDB Enterprise version 3.2.6, the :doc:`In-Memory
Storage Engine </core/inmemory>` is part of general availability (GA).
Because its data is kept in memory, there is no separate journal. Write
operations with a write concern of :writeconcern:`j: true <j>` are
immediately acknowledged.
.. seealso:: :ref:`In-Memory Storage Engine: Durability <inmemory-durability>`
.. class:: hidden
.. toctree::
:titlesonly:
/tutorial/manage-journaling