source/core/bulk-inserts.txt

.. _write-operations-bulk-insert:

=======================
Bulk Inserts in MongoDB
=======================

.. default-domain:: mongodb

.. contents:: On this page
   :local:
   :backlinks: none
   :depth: 1
   :class: singlecol

In some situations you may need to insert or ingest a large amount of
data into a MongoDB database. These *bulk inserts* have some
special considerations that are different from other write
operations.

.. TODO: section on general write operation considerations?

Use the ``insert()`` Method
---------------------------

The :method:`insert() <db.collection.insert()>` method, when passed an
array of documents, performs a bulk insert, and inserts each document
atomically. Bulk inserts can significantly increase performance by
amortizing :ref:`write concern <write-operations-write-concern>` costs.

.. versionadded:: 2.2
   :method:`insert() <db.collection.insert()>` in the :binary:`~bin.mongo`
   shell gained support for bulk inserts in version 2.2.

In the :doc:`drivers </applications/drivers>`, you can configure write
concern for batches rather than on a per-document level.

Drivers have a ``ContinueOnError`` option in their insert operation, so
that the bulk operation will continue to insert remaining documents in a
batch even if an insert fails.

.. note::

   If multiple errors occur during a bulk insert, clients only receive
   the last error generated.

.. seealso::

   :doc:`Driver documentation </applications/drivers>` for details
   on performing bulk inserts in your application. Also see
   :doc:`/core/import-export`.

Bulk Inserts on Sharded Clusters
--------------------------------

While ``ContinueOnError`` is optional on unsharded clusters, all bulk
operations to a :term:`sharded collection <sharded cluster>` run with
``ContinueOnError``, which cannot be disabled.

Large bulk insert operations, including initial data inserts or routine
data import, can affect :term:`sharded cluster` performance. For
bulk inserts, consider the following strategies:

Pre-Split the Collection
~~~~~~~~~~~~~~~~~~~~~~~~

If the sharded collection is empty, then the collection has only
one initial :term:`chunk`, which resides on a single shard.
MongoDB must then take time to receive data, create splits, and
distribute the split chunks to the available shards. To avoid this
performance cost, you can pre-split the collection, as described in
:doc:`/tutorial/split-chunks-in-sharded-cluster`.

Insert to Multiple ``mongos``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To parallelize import processes, send insert operations to more than
one :binary:`~bin.mongos` instance. Pre-split empty collections first as
described in :doc:`/tutorial/split-chunks-in-sharded-cluster`.

Avoid Monotonic Throttling
~~~~~~~~~~~~~~~~~~~~~~~~~~

If your shard key increases monotonically during an insert, then all
inserted data goes to the last chunk in the collection, which will
always end up on a single shard. Therefore, the insert capacity of the
cluster will never exceed the insert capacity of that single shard.

If your insert volume is larger than what a single shard can process,
and if you cannot avoid a monotonically increasing shard key, then
consider the following modifications to your application:

- Reverse the binary bits of the shard key. This preserves the
  information and avoids correlating insertion order with increasing
  sequence of values.

- Swap the first and last 16-bit words to "shuffle" the inserts.

.. example:: The following example, in C++, swaps the leading and
   trailing 16-bit word of :term:`BSON` :term:`ObjectIds <ObjectId>`
   generated so that they are no longer monotonically increasing.

   .. code-block:: cpp

      using namespace mongo;
      OID make_an_id() {
        OID x = OID::gen();
        const unsigned char *p = x.getData();
        swap( (unsigned short&) p[0], (unsigned short&) p[10] );
        return x;
      }

      void foo() {
        // create an object
        BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" );
        // now we may insert o into a sharded collection
      }

.. seealso:: :ref:`sharding-shard-key` for information
   on choosing a sharded key. Also see :ref:`Shard Key
   Internals <sharding-internals-shard-keys>` (in particular,
   :ref:`sharding-internals-operations-and-reliability`).