source/core/journaling.txt

.. _journaling-internals:

==========
Journaling
==========

.. default-domain:: mongodb

.. contents:: On this page
   :local:
   :backlinks: none
   :depth: 1
   :class: singlecol

To provide durability in the event of a failure, MongoDB uses *write
ahead logging* to on-disk :term:`journal` files.

.. _journaling-wiredTiger:

Journaling and the WiredTiger Storage Engine
--------------------------------------------

.. important::

   The *log* mentioned in this section refers to the WiredTiger
   write-ahead log (i.e. the journal) and not the MongoDB log file.

:doc:`WiredTiger </core/wiredtiger>` uses :ref:`checkpoints
<storage-wiredtiger-checkpoints>` to provide a consistent view of data
on disk and allow MongoDB to recover from the last checkpoint. However,
if MongoDB exits unexpectedly in between checkpoints, journaling is
required to recover information that occurred after the last checkpoint.

With journaling, the recovery process:

#. Looks in the data files to find the identifier of the last
   checkpoint.

#. Searches in the journal files for the record that
   matches the identifier of the last checkpoint.

#. Apply the operations in the journal files since the last checkpoint.

.. _journal-process:

Journaling Process
~~~~~~~~~~~~~~~~~~

.. versionchanged:: 3.2

With journaling, WiredTiger creates one journal record for each client
initiated write operation. The journal record includes any internal
write operations caused by the initial write. For example, an update to
a document in a collection may result in modifications to the indexes;
WiredTiger creates a single journal record that includes both the
update operation and its associated index modifications.

MongoDB configures WiredTiger to use in-memory buffering for storing
the journal records. Threads coordinate to allocate and copy
into their portion of the buffer. All journal records up to 128
kB are buffered.

WiredTiger syncs the buffered journal records to disk according to the
following intervals or conditions:

.. include:: /includes/extracts/wt-journal-frequency.rst

.. important::

   In between write operations, while the journal records
   remain in the WiredTiger buffers, updates can be lost following a
   hard shutdown of :binary:`~bin.mongod`.

.. seealso::
   The :dbcommand:`serverStatus` command returns information on the
   WiredTiger journal statistics in the :data:`wiredTiger.log
   <data:: serverStatus.wiredTiger.log>` field.

Journal Files
~~~~~~~~~~~~~

For the journal files, MongoDB creates a subdirectory named ``journal``
under the :setting:`~storage.dbPath` directory. WiredTiger journal
files have names with the following format ``WiredTigerLog.<sequence>``
where ``<sequence>`` is a zero-padded number starting from
``0000000001``.

Journal files contain a record per each write operation. Each record
has a unique identifier.

MongoDB configures WiredTiger to use snappy compression for the
journaling data.

.. include:: /includes/fact-wiredtiger-log-compression-limit.rst

WiredTiger journal files for MongoDB have a maximum size limit of
approximately 100 MB. Once the file exceeds that limit, WiredTiger
creates a new journal file.

WiredTiger automatically removes old journal files to maintain only the
files needed to recover from last checkpoint.

WiredTiger will pre-allocate journal files.

.. _journaling-mmapv1:

Journaling and the MMAPv1 Storage Engine
----------------------------------------

With :doc:`MMAPv1 </core/mmapv1>`, when a write operation occurs,
MongoDB updates the in-memory view. With journaling enabled, MongoDB
writes the in-memory changes first to on-disk journal files. If MongoDB
should terminate or encounter an error before committing the changes to
the data files, MongoDB can use the journal files to apply the write
operation to the data files and maintain a consistent state.

.. _journaling-storage-views:
.. _journaling-record-write-operation:

Journaling Process
~~~~~~~~~~~~~~~~~~

With journaling, MongoDB's storage layer has two internal views of the
data set: the *private view*, used to write to the journal files, and
the *shared view*, used to write to the data files:

#. MongoDB first applies write operations to the private view.

#. MongoDB then applies the changes in the private view to the on-disk
   :ref:`journal files <journaling-journal-files>` in the ``journal``
   directory roughly every 100 milliseconds. MongoDB records the
   write operations to the on-disk journal files in batches called
   *group commits*. Grouping the commits help minimize the performance
   impact of journaling since these commits must block all writers
   during the commit. Writes to the journal are atomic, ensuring the
   consistency of the on-disk journal files. For information on the
   frequency of the commit interval, see
   :setting:`storage.journal.commitIntervalMs`.

#. Upon a journal commit, MongoDB applies the changes from the journal
   to the shared view.

#. Finally, MongoDB applies the changes in the shared view to the data
   files. More precisely, at default intervals of 60 seconds, MongoDB
   asks the operating system to flush the shared view to the data
   files. The operating system may choose to flush the shared view to
   disk at a higher frequency than 60 seconds, particularly if the
   system is low on free memory. To change the interval for writing to
   the data files, use the :setting:`storage.syncPeriodSecs` setting.

If the :binary:`~bin.mongod` instance were to crash without having applied
the writes to the data files, the journal could replay the writes to
the shared view for eventual write to the data files.

When MongoDB flushes write operations to the data files, MongoDB notes
which journal writes have been flushed. Once a journal file contains
only flushed writes, it is no longer needed for recovery and MongoDB
can recycle it for a new journal file.

Once the journal operations have been applied to the shared view and
flushed to disk (i.e. pages in the shared view and  private view are in
sync), MongoDB asks the operating system to remap the shared view to
the private view in order to save physical RAM. MongoDB routinely asks
the operating system to remap the shared view to the private
view in order to save physical RAM. Upon a new remapping, the
operating system knows that physical memory pages can be shared between
the shared view and the private view mappings.

.. note::

   The interaction between the shared view and the on-disk data
   files is similar to how MongoDB works *without* journaling. Without
   journaling, MongoDB asks the operating system to flush in-memory
   changes to the data files every 60 seconds.

.. _journaling-journal-files:

Journal Files
~~~~~~~~~~~~~

With journaling enabled, MongoDB creates a subdirectory named
``journal`` under the :setting:`~storage.dbPath` directory. The
``journal`` directory contains journal files named ``j._<sequence>``
where ``<sequence>`` is an integer starting from ``0`` and a "last
sequence number" file ``lsn``.

Journal files contain the write ahead logs; each journal entry
describes the bytes the write operation changed in the data files.
Journal files are append-only files. When a journal file holds 1
gigabyte of data, MongoDB creates a new journal file. If you use the
:setting:`storage.smallFiles` option when starting :binary:`~bin.mongod`,
you limit the size of each journal file to 128 megabytes.

The ``lsn`` file contains the last time MongoDB flushed the changes to
the data files.

Once MongoDB applies all the write operations in a particular journal
file to the data files, MongoDB can recycle it for a new journal file.

Unless you write *many* bytes of data per second, the ``journal``
directory should contain only two or three journal files.

A clean shutdown removes all the files in the journal directory. A
dirty shutdown (crash) leaves files in the journal directory; these are
used to automatically recover the database to a consistent state when
the mongod process is restarted.

Journal Directory
`````````````````

To speed the frequent sequential writes that occur to the current
journal file, you can ensure that the journal directory is on a
different filesystem from the database data files.

.. important::

   If you place the journal on a different filesystem from your data
   files, you *cannot* use a filesystem snapshot alone to capture valid
   backups of a :setting:`~storage.dbPath` directory. In this case, use
   :method:`~db.fsyncLock()` to ensure that database files are consistent
   before the snapshot and :method:`~db.fsyncUnlock()` once the snapshot
   is complete.

Preallocation Lag
`````````````````

MongoDB may preallocate journal files if the :binary:`~bin.mongod` process
determines that it is more efficient to preallocate journal files than
create new journal files as needed. 

Depending on your filesystem, you might experience a preallocation lag
the first time you start a :binary:`~bin.mongod` instance with journaling
enabled. The amount of time required to pre-allocate files might last
several minutes; during this time, you will not be able to connect to
the database. This is a one-time preallocation and does not occur with
future invocations.

To avoid preallocation lag, see :ref:`journaling-avoid-preallocation-lag`.

.. _journaling-inMemory:

Journaling and the In-Memory Storage Engine
-------------------------------------------

Starting in MongoDB Enterprise version 3.2.6, the :doc:`In-Memory
Storage Engine </core/inmemory>` is part of general availability (GA).
Because its data is kept in memory, there is no separate journal. Write
operations with a write concern of :writeconcern:`j: true <j>` are
immediately acknowledged.

.. seealso:: :ref:`In-Memory Storage Engine: Durability <inmemory-durability>`

.. class:: hidden

   .. toctree::
      :titlesonly:

      /tutorial/manage-journaling