Fetching contributors…
Cannot retrieve contributors at this time
271 lines (196 sloc) 9.96 KB
.. _journaling-internals:
.. default-domain:: mongodb
.. contents:: On this page
:backlinks: none
:depth: 1
:class: singlecol
To provide durability in the event of a failure, MongoDB uses *write
ahead logging* to on-disk :term:`journal` files.
.. _journaling-wiredTiger:
Journaling and the WiredTiger Storage Engine
.. important::
The *log* mentioned in this section refers to the WiredTiger
write-ahead log (i.e. the journal) and not the MongoDB log file.
:doc:`WiredTiger </core/wiredtiger>` uses :ref:`checkpoints
<storage-wiredtiger-checkpoints>` to provide a consistent view of data
on disk and allow MongoDB to recover from the last checkpoint. However,
if MongoDB exits unexpectedly in between checkpoints, journaling is
required to recover information that occurred after the last checkpoint.
.. note::
.. include:: /includes/wiredtiger-node-nojournal.rst
With journaling, the recovery process:
#. Looks in the data files to find the identifier of the last
#. Searches in the journal files for the record that
matches the identifier of the last checkpoint.
#. Apply the operations in the journal files since the last checkpoint.
.. _journal-process:
Journaling Process
.. versionchanged:: 3.2
With journaling, WiredTiger creates one journal record for each client
initiated write operation. The journal record includes any internal
write operations caused by the initial write. For example, an update to
a document in a collection may result in modifications to the indexes;
WiredTiger creates a single journal record that includes both the
update operation and its associated index modifications.
MongoDB configures WiredTiger to use in-memory buffering for storing
the journal records. Threads coordinate to allocate and copy
into their portion of the buffer. All journal records up to 128
kB are buffered.
WiredTiger syncs the buffered journal records to disk according to the
following intervals or conditions:
.. include:: /includes/extracts/wt-journal-frequency.rst
.. important::
In between write operations, while the journal records
remain in the WiredTiger buffers, updates can be lost following a
hard shutdown of :binary:`~bin.mongod`.
.. seealso::
The :dbcommand:`serverStatus` command returns information on the
WiredTiger journal statistics in the :data:`wiredTiger.log
<data:: serverStatus.wiredTiger.log>` field.
Journal Files
For the journal files, MongoDB creates a subdirectory named ``journal``
under the :setting:`~storage.dbPath` directory. WiredTiger journal
files have names with the following format ``WiredTigerLog.<sequence>``
where ``<sequence>`` is a zero-padded number starting from
Journal files contain a record per each write operation. Each record
has a unique identifier.
MongoDB configures WiredTiger to use snappy compression for the
journaling data.
.. include:: /includes/fact-wiredtiger-log-compression-limit.rst
WiredTiger journal files for MongoDB have a maximum size limit of
approximately 100 MB. Once the file exceeds that limit, WiredTiger
creates a new journal file.
WiredTiger automatically removes old journal files to maintain only the
files needed to recover from last checkpoint.
WiredTiger will pre-allocate journal files.
.. _journaling-mmapv1:
Journaling and the MMAPv1 Storage Engine
.. note::
MongoDB 4.0 :ref:`deprecates the MMAPv1 storage engine
With :doc:`MMAPv1 </core/mmapv1>`, when a write operation occurs,
MongoDB updates the in-memory view. With journaling enabled, MongoDB
writes the in-memory changes first to on-disk journal files. If MongoDB
should terminate or encounter an error before committing the changes to
the data files, MongoDB can use the journal files to apply the write
operation to the data files and maintain a consistent state.
.. _journaling-storage-views:
.. _journaling-record-write-operation:
Journaling Process
With journaling, MongoDB's storage layer has two internal views of the
data set: the *private view*, used to write to the journal files, and
the *shared view*, used to write to the data files:
#. MongoDB first applies write operations to the private view.
#. MongoDB then applies the changes in the private view to the on-disk
:ref:`journal files <journaling-journal-files>` in the ``journal``
directory roughly every 100 milliseconds. MongoDB records the
write operations to the on-disk journal files in batches called
*group commits*. Grouping the commits help minimize the performance
impact of journaling since these commits must block all writers
during the commit. Writes to the journal are atomic, ensuring the
consistency of the on-disk journal files. For information on the
frequency of the commit interval, see
#. Upon a journal commit, MongoDB applies the changes from the journal
to the shared view.
#. Finally, MongoDB applies the changes in the shared view to the data
files. More precisely, at default intervals of 60 seconds, MongoDB
asks the operating system to flush the shared view to the data
files. The operating system may choose to flush the shared view to
disk at a higher frequency than 60 seconds, particularly if the
system is low on free memory. To change the interval for writing to
the data files, use the :setting:`storage.syncPeriodSecs` setting.
If the :binary:`~bin.mongod` instance were to crash without having applied
the writes to the data files, the journal could replay the writes to
the shared view for eventual write to the data files.
When MongoDB flushes write operations to the data files, MongoDB notes
which journal writes have been flushed. Once a journal file contains
only flushed writes, it is no longer needed for recovery and MongoDB
can recycle it for a new journal file.
Once the journal operations have been applied to the shared view and
flushed to disk (i.e. pages in the shared view and private view are in
sync), MongoDB asks the operating system to remap the shared view to
the private view in order to save physical RAM. MongoDB routinely asks
the operating system to remap the shared view to the private
view in order to save physical RAM. Upon a new remapping, the
operating system knows that physical memory pages can be shared between
the shared view and the private view mappings.
.. note::
The interaction between the shared view and the on-disk data
files is similar to how MongoDB works *without* journaling. Without
journaling, MongoDB asks the operating system to flush in-memory
changes to the data files every 60 seconds.
.. _journaling-journal-files:
Journal Files
With journaling enabled, MongoDB creates a subdirectory named
``journal`` under the :setting:`~storage.dbPath` directory. The
``journal`` directory contains journal files named ``j._<sequence>``
where ``<sequence>`` is an integer starting from ``0`` and a "last
sequence number" file ``lsn``.
Journal files contain the write ahead logs; each journal entry
describes the bytes the write operation changed in the data files.
Journal files are append-only files. When a journal file holds 1
gigabyte of data, MongoDB creates a new journal file. If you use the
:setting:`storage.smallFiles` option when starting :binary:`~bin.mongod`,
you limit the size of each journal file to 128 megabytes.
The ``lsn`` file contains the last time MongoDB flushed the changes to
the data files.
Once MongoDB applies all the write operations in a particular journal
file to the data files, MongoDB can recycle it for a new journal file.
Unless you write *many* bytes of data per second, the ``journal``
directory should contain only two or three journal files.
A clean shutdown removes all the files in the journal directory. A
dirty shutdown (crash) leaves files in the journal directory; these are
used to automatically recover the database to a consistent state when
the mongod process is restarted.
Journal Directory
To speed the frequent sequential writes that occur to the current
journal file, you can ensure that the journal directory is on a
different filesystem from the database data files.
.. important::
If you place the journal on a different filesystem from your data
files, you *cannot* use a filesystem snapshot alone to capture valid
backups of a :setting:`~storage.dbPath` directory. In this case, use
:method:`~db.fsyncLock()` to ensure that database files are consistent
before the snapshot and :method:`~db.fsyncUnlock()` once the snapshot
is complete.
Preallocation Lag
MongoDB may preallocate journal files if the :binary:`~bin.mongod` process
determines that it is more efficient to preallocate journal files than
create new journal files as needed.
Depending on your filesystem, you might experience a preallocation lag
the first time you start a :binary:`~bin.mongod` instance with journaling
enabled. The amount of time required to pre-allocate files might last
several minutes; during this time, you will not be able to connect to
the database. This is a one-time preallocation and does not occur with
future invocations.
To avoid preallocation lag, see :ref:`journaling-avoid-preallocation-lag`.
.. _journaling-inMemory:
Journaling and the In-Memory Storage Engine
Starting in MongoDB Enterprise version 3.2.6, the :doc:`In-Memory
Storage Engine </core/inmemory>` is part of general availability (GA).
Because its data is kept in memory, there is no separate journal. Write
operations with a write concern of :writeconcern:`j: true <j>` are
immediately acknowledged.
.. include:: /includes/extracts/no-journaling-writeConcernMajorityJournalDefault-false.rst
.. include:: /includes/extracts/no-journaling-rollback.rst
.. seealso:: :ref:`In-Memory Storage Engine: Durability <inmemory-durability>`
.. toctree::