-
Notifications
You must be signed in to change notification settings - Fork 1.7k
/
bulk-inserts.txt
121 lines (90 loc) · 4.23 KB
/
bulk-inserts.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
.. _write-operations-bulk-insert:
=======================
Bulk Inserts in MongoDB
=======================
.. default-domain:: mongodb
.. contents:: On this page
:local:
:backlinks: none
:depth: 1
:class: singlecol
In some situations you may need to insert or ingest a large amount of
data into a MongoDB database. These *bulk inserts* have some
special considerations that are different from other write
operations.
.. TODO: section on general write operation considerations?
Use the ``insert()`` Method
---------------------------
The :method:`insert() <db.collection.insert()>` method, when passed an
array of documents, performs a bulk insert, and inserts each document
atomically. Bulk inserts can significantly increase performance by
amortizing :ref:`write concern <write-operations-write-concern>` costs.
.. versionadded:: 2.2
:method:`insert() <db.collection.insert()>` in the :binary:`~bin.mongo`
shell gained support for bulk inserts in version 2.2.
In the :doc:`drivers </applications/drivers>`, you can configure write
concern for batches rather than on a per-document level.
Drivers have a ``ContinueOnError`` option in their insert operation, so
that the bulk operation will continue to insert remaining documents in a
batch even if an insert fails.
.. note::
If multiple errors occur during a bulk insert, clients only receive
the last error generated.
.. seealso::
:doc:`Driver documentation </applications/drivers>` for details
on performing bulk inserts in your application. Also see
:doc:`/core/import-export`.
Bulk Inserts on Sharded Clusters
--------------------------------
While ``ContinueOnError`` is optional on unsharded clusters, all bulk
operations to a :term:`sharded collection <sharded cluster>` run with
``ContinueOnError``, which cannot be disabled.
Large bulk insert operations, including initial data inserts or routine
data import, can affect :term:`sharded cluster` performance. For
bulk inserts, consider the following strategies:
Pre-Split the Collection
~~~~~~~~~~~~~~~~~~~~~~~~
If the sharded collection is empty, then the collection has only
one initial :term:`chunk`, which resides on a single shard.
MongoDB must then take time to receive data, create splits, and
distribute the split chunks to the available shards. To avoid this
performance cost, you can pre-split the collection, as described in
:doc:`/tutorial/split-chunks-in-sharded-cluster`.
Insert to Multiple ``mongos``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To parallelize import processes, send insert operations to more than
one :binary:`~bin.mongos` instance. Pre-split empty collections first as
described in :doc:`/tutorial/split-chunks-in-sharded-cluster`.
Avoid Monotonic Throttling
~~~~~~~~~~~~~~~~~~~~~~~~~~
If your shard key increases monotonically during an insert, then all
inserted data goes to the last chunk in the collection, which will
always end up on a single shard. Therefore, the insert capacity of the
cluster will never exceed the insert capacity of that single shard.
If your insert volume is larger than what a single shard can process,
and if you cannot avoid a monotonically increasing shard key, then
consider the following modifications to your application:
- Reverse the binary bits of the shard key. This preserves the
information and avoids correlating insertion order with increasing
sequence of values.
- Swap the first and last 16-bit words to "shuffle" the inserts.
.. example:: The following example, in C++, swaps the leading and
trailing 16-bit word of :term:`BSON` :term:`ObjectIds <ObjectId>`
generated so that they are no longer monotonically increasing.
.. code-block:: cpp
using namespace mongo;
OID make_an_id() {
OID x = OID::gen();
const unsigned char *p = x.getData();
swap( (unsigned short&) p[0], (unsigned short&) p[10] );
return x;
}
void foo() {
// create an object
BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" );
// now we may insert o into a sharded collection
}
.. seealso:: :ref:`sharding-shard-key` for information
on choosing a sharded key. Also see :ref:`Shard Key
Internals <sharding-internals-shard-keys>` (in particular,
:ref:`sharding-internals-operations-and-reliability`).