/
aggregation-pipeline.txt
134 lines (100 loc) · 4.77 KB
/
aggregation-pipeline.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
====================
Aggregation Pipeline
====================
.. versionadded:: 2.2
.. default-domain:: mongodb
.. contents:: On this page
:local:
:backlinks: none
:depth: 1
:class: singlecol
The aggregation pipeline is a framework for data aggregation modeled
on the concept of data processing pipelines. Documents enter a
multi-stage pipeline that transforms the documents into an aggregated
results.
.. include:: /images/aggregation-pipeline.rst
The aggregation pipeline provides an alternative to :term:`map-reduce`
and may be the preferred solution for aggregation tasks where the
complexity of map-reduce may be unwarranted.
Aggregation pipeline have some limitations on value types and result
size. See :doc:`/core/aggregation-pipeline-limits` for details on
limits and restrictions on the aggregation pipeline.
.. _aggregation-pipeline:
Pipeline
--------
The MongoDB aggregation pipeline consists of :ref:`stages
<aggregation-pipeline-operator-reference>`. Each stage transforms the
documents as they pass through the pipeline. Pipeline stages do not
need to produce one output document for every input document; e.g.,
some stages may generate new documents or filter out documents.
Pipeline stages can appear multiple times in the pipeline.
MongoDB provides the :method:`db.collection.aggregate()` method in the
:binary:`~bin.mongo` shell and the :dbcommand:`aggregate` command for
aggregation pipeline. See
:ref:`aggregation-pipeline-operator-reference` for the available stages.
For example usage of the aggregation pipeline, consider
:doc:`/tutorial/aggregation-with-user-preference-data` and
:doc:`/tutorial/aggregation-zip-code-data-set`.
.. _aggregation-pipeline-expressions:
Pipeline Expressions
--------------------
Some pipeline stages takes a pipeline expression as its operand.
Pipeline expressions specify the transformation to apply to the input
documents. Expressions have a :doc:`document </core/document>`
structure and can contain other :ref:`expression
<aggregation-expressions>`.
Pipeline expressions can only operate on the current document in the
pipeline and cannot refer to data from other documents: expression
operations provide in-memory transformation of documents.
Generally, expressions are stateless and are only evaluated when seen
by the aggregation process with one exception: :ref:`accumulator
<aggregation-accumulator-operators>` expressions.
The accumulators, used with the :pipeline:`$group` pipeline operator,
maintain their state (e.g. totals, maximums, minimums, and related
data) as documents progress through the pipeline.
For more information on expressions, see
:ref:`aggregation-expressions`.
.. _aggregation-optimize-performance:
Aggregation Pipeline Behavior
-----------------------------
In MongoDB, the :dbcommand:`aggregate` command operates on a single
collection, logically passing the *entire* collection into the
aggregation pipeline. To optimize the operation, wherever possible, use
the following strategies to avoid scanning the entire collection.
.. _aggregation-pipeline-operators-and-performance:
Pipeline Operators and Indexes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The :pipeline:`$match` and :pipeline:`$sort` pipeline operators can
take advantage of an index when they occur at the **beginning** of the
pipeline.
.. versionadded:: 2.4
The :pipeline:`$geoNear` pipeline operator takes advantage of a
geospatial index. When using :pipeline:`$geoNear`, the
:pipeline:`$geoNear` pipeline operation must appear as the first
stage in an aggregation pipeline.
Even when the pipeline uses an index, aggregation still requires access
to the actual documents; i.e. indexes cannot fully cover an aggregation
pipeline.
.. versionchanged:: 2.6
In previous versions, for very select use cases, an index could
cover a pipeline.
Early Filtering
~~~~~~~~~~~~~~~
If your aggregation operation requires only a subset of the data in a
collection, use the :pipeline:`$match`, :pipeline:`$limit`, and
:pipeline:`$skip` stages to restrict the documents that enter at the
beginning of the pipeline. When placed at the beginning of a pipeline,
:pipeline:`$match` operations use suitable indexes to scan only
the matching documents in a collection.
Placing a :pipeline:`$match` pipeline stage followed by a
:pipeline:`$sort` stage at the start of the pipeline is logically
equivalent to a single query with a sort and can use an index. When
possible, place :pipeline:`$match` operators at the beginning of the
pipeline.
Additional Features
~~~~~~~~~~~~~~~~~~~
The aggregation pipeline has an internal optimization phase that
provides improved performance for certain sequences of operators. For
details, see :doc:`/core/aggregation-pipeline-optimization`.
The aggregation pipeline supports operations on sharded collections.
See :ref:`aggregation-pipeline-sharded-collection`.