-
Notifications
You must be signed in to change notification settings - Fork 1.7k
/
sharding-data-partitioning.txt
194 lines (145 loc) · 7.14 KB
/
sharding-data-partitioning.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
.. _sharding-data-partitioning:
=============================
Data Partitioning with Chunks
=============================
.. default-domain:: mongodb
.. contents:: On this page
:local:
:backlinks: none
:depth: 1
:class: singlecol
MongoDB uses the :term:`shard key` associated to the collection to partition
the data into :term:`chunks<chunk>`. A :term:`chunk` consists of a subset of
sharded data. Each chunk has a inclusive lower and exclusive upper range based
on the :term:`shard key`.
.. include:: /images/sharding-range-based.rst
The :binary:`~bin.mongos` routes writes to the appropriate chunk based on the
:term:`shard key` value. MongoDB splits chunks when they grow beyond the
configured :ref:`chunk size<sharding-chunk-size>`. Both inserts and updates
can trigger a chunk split.
The smallest range a chunk can represent is a single unique shard key
value. A chunk that only contains documents with a single shard key value
cannot be :ref:`split<sharding-chunk-split>`.
.. index:: sharding; chunk size
.. _sharding-chunk-size:
Chunk Size
----------
The default :term:`chunk` size in MongoDB is 64 megabytes. You can
:doc:`increase or reduce the chunk size
</tutorial/modify-chunk-size-in-sharded-cluster>`. Consider the
implications of changing the default chunk size:
#. Small chunks lead to a more even distribution of data at the
expense of more frequent migrations. This creates expense at the
query routing (:binary:`~bin.mongos`) layer.
#. Large chunks lead to fewer migrations. This is more efficient both
from the networking perspective *and* in terms of internal overhead at
the query routing layer. But, these efficiencies come at
the expense of a potentially uneven distribution of data.
#. Chunk size affects the
:limit:`Maximum Number of Documents Per Chunk to Migrate`.
#. Chunk size affects the maximum collection size when sharding an
:limit:`existing collection<Sharding Existing Collection Data Size>`.
Post-sharding, chunk size does not constrain collection size.
For many deployments, it makes sense to avoid frequent and potentially
spurious migrations at the expense of a slightly less evenly
distributed data set.
Limitations
~~~~~~~~~~~
Changing the chunk size affects when chunks split but there are some
limitations to its effects.
- Automatic splitting only occurs during inserts or updates. If you
lower the chunk size, it may take time for all chunks to split to the
new size.
- Splits cannot be "undone". If you increase the chunk size, existing
chunks must grow through inserts or updates until they reach the new
size.
.. _sharding-chunk-splits:
.. _sharding-chunk-split:
Chunk Splits
------------
Splitting is a process that keeps chunks from growing too large. When a chunk
grows beyond a :ref:`specified chunk size <sharding-chunk-size>`, or if the
number of documents in the chunk exceeds :limit:`Maximum Number of Documents
Per Chunk to Migrate`, MongoDB splits the chunk based on the shard key values
the chunk represent. A chunk may be split into multiple chunks where necessary.
Inserts and updates may trigger splits. Splits are an efficient meta-data
change. To create splits, MongoDB does *not* migrate any data or affect the
shards.
.. include:: /images/sharding-splitting.rst
Splits may lead to an uneven distribution of the chunks for a
collection across the shards. In such cases, the balancer redistributes
chunks across shards. See :ref:`sharding-internals-balancing` for more
details on balancing chunks across shards.
.. _sharding-chunk-migration:
Chunk Migration
---------------
MongoDB migrates chunks in a :term:`sharded cluster` to distribute the
chunks of a sharded collection evenly among shards. Migrations may be
either:
- Manual. Only use manual migration in limited cases, such as
to distribute data during bulk inserts. See :doc:`Migrating Chunks
Manually </tutorial/migrate-chunks-in-sharded-cluster>` for more details.
- Automatic. The :ref:`balancer <sharding-balancing>` process
automatically migrates chunks when there is an uneven distribution of
a sharded collection's chunks across the shards. See :ref:`Migration
Thresholds <sharding-migration-thresholds>` for more details.
For more information on the sharded cluster :term:`balancer`, see
:ref:`sharding-balancing`.
.. TODO:: Move this to balancing section instead
Balancing
~~~~~~~~~
The :ref:`balancer <sharding-balancing-internals>` is a background
process that manages chunk migrations. If the difference in
number of chunks between the largest and smallest shard exceed the
:ref:`migration thresholds<sharding-migration-thresholds>`, the balancer
begins migrating chunks across the cluster to ensure an even
distribution of data.
.. include:: /images/sharding-migrating.rst
You can :doc:`manage</tutorial/manage-sharded-cluster-balancer>` certain
aspects of the balancer. The balancer also respects any :term:`zones <zone>`
created as a part of configuring :ref:`zones <zone-sharding>` in a sharded
cluster.
See :ref:`sharding-balancing` for more information on the
:term:`balancer`.
.. _jumbo-chunks:
.. _jumbo-chunk:
Indivisible Chunks
------------------
In some cases, chunks can grow beyond the :ref:`specified chunk size
<sharding-chunk-size>` but cannot undergo a :ref:`split<sharding-chunk-split>`.
The most common scenario is when a chunk represents a single shard key value.
Since the chunk cannot split, it continues to grow beyond the chunk size,
becoming a :ref:`jumbo chunk<jumbo-chunk>`. These jumbo chunks can become a
performance bottleneck as they continue to grow, especially if the shard key
value occurs with high :ref:`frequency<shard-key-frequency>`.
The addition of new data or new shards can result in data distribution
imbalances within the cluster. A particular shard may acquire
more chunks than another shard, or the size of a chunk may grow beyond the
configured maximum chunk size.
MongoDB ensures a balanced cluster using two processes:
chunk splitting and the balancer.
.. _moveChunk-directory:
``moveChunk`` directory
-----------------------
In MongoDB 2.6 and MongoDB 3.0, :setting:`sharding.archiveMovedChunks` is
enabled by default. All other MongoDB versions have this disabled by default. With :setting:`sharding.archiveMovedChunks`
enabled, the source shard archives the documents in the migrated chunks
in a directory named after the collection namespace under the
``moveChunk`` directory in the :setting:`storage.dbPath`.
If some error occurs during a :doc:`migration
</core/sharding-balancer-administration>`, these files may be helpful
in recovering documents affected during the migration.
Once the migration has completed successfully and there is no need to
recover documents from these files, you may safely delete these files.
Or, if you have an existing backup of the database that you can use
for recovery, you may also delete these files after migration.
To determine if all migrations are complete, run
:method:`sh.isBalancerRunning()` while connected to a :binary:`~bin.mongos`
instance.
.. toctree::
:titlesonly:
:hidden:
/tutorial/create-chunks-in-sharded-cluster
/tutorial/split-chunks-in-sharded-cluster
/tutorial/merge-chunks-in-sharded-cluster
/tutorial/modify-chunk-size-in-sharded-cluster