Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF-5374 Improve comment headers for multiplanner/ workloads #1213

Merged
merged 28 commits into from
Jun 21, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
7d3550a
PERF-5374 Improve comment headers for multiplanner/ workloads
dpercy May 7, 2024
9ac18f3
typo of -> or
dpercy May 10, 2024
727c8f5
no need to split on "classic and SBE"
dpercy May 10, 2024
3966b55
typo "prepence" -> presence
dpercy May 10, 2024
608dbb0
don't split on "choice of multiplanner"
dpercy May 10, 2024
46a82aa
collectionSize one word
dpercy May 10, 2024
d813797
Merge branch 'master' into PERF-5374-comments
dpercy May 10, 2024
bb2a774
state explicitly ClusteredCollection doesn't do a clustered scan
dpercy May 10, 2024
dc9b3af
other -> remaining
dpercy May 10, 2024
70fd686
however, however however; however.
dpercy May 10, 2024
9953a7b
equally bad
dpercy May 10, 2024
9948cf8
Merge remote-tracking branch 'origin/master' into PERF-5374-comments
dpercy Jun 17, 2024
d691dd8
fix bad merge
dpercy Jun 17, 2024
e12b6eb
remove "We expect ..." comment about SBE multiplanner
dpercy Jun 4, 2024
2b0b7bb
avoid "empty data" wording
dpercy Jun 4, 2024
c9d3933
typo
dpercy Jun 17, 2024
d0b801d
update docs
dpercy Jun 17, 2024
69cbdc6
rephrase SBE multiplanner as "historical"
dpercy Jun 17, 2024
8524dec
note about residual selectivity
dpercy Jun 17, 2024
d61a02b
update docs
dpercy Jun 17, 2024
e52494b
appease linter by adding keyword to unchanged file
dpercy Jun 17, 2024
d5c7d76
dont compare
dpercy Jun 18, 2024
1830416
the multiplanner handles group
dpercy Jun 18, 2024
42a536b
empty bounds
dpercy Jun 18, 2024
b5ba2ba
not large strings
dpercy Jun 18, 2024
539fad1
update docs
dpercy Jun 18, 2024
324a0c3
Merge branch 'master' into PERF-5374-comments
dpercy Jun 18, 2024
e9a2b5e
trailing spaces
dpercy Jun 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions src/workloads/query/multiplanner/BlockingSort.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
SchemaVersion: 2018-07-01
Owner: "@mongodb/query"
Description: |
The goal of this test is to exercise multiplanning. We create as many indexes as possible, and run
a query that makes all of them eligible, so we get as many competing plans as possible. We also
add a sort stage on an unindexed field, ensuring that every plan is a blocking plan. Because all
plans are blocking and return as many documents as possible, multiplanning will hit "max works"
instead of EOF of numToReturn. This maximizes the overhead of multiplanning on both classic and SBE.
The goal of this test is to show how a blocking sort can increase the overhead of multiplanning.
We create as many indexes as possible, and run a query that makes all of them eligible, so we
get as many competing plans as possible. We also add a sort stage on an unindexed field,
ensuring that every plan is a blocking plan. Because all plans are blocking and return as many
documents as possible, multiplanning will hit "max works" instead of EOF or numToReturn.
This maximizes the overhead of multiplanning.

We expect classic to have better latency and throughput than SBE on this workload,
dstorch marked this conversation as resolved.
Show resolved Hide resolved
and we expect the combination of classic planner + SBE execution (PM-3591) to perform about
Expand Down
13 changes: 7 additions & 6 deletions src/workloads/query/multiplanner/ClusteredCollection.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
SchemaVersion: 2018-07-01
Owner: "@mongodb/query"
Description: |
The goal of this test is to exercise multiplanning. We create as many indexes as possible, and run a
query that makes all of them eligible, so we get as many competing plans as possible. Here, we do this on a
clustered collection that has very large strings as _id.
The goal of this test is to exercise multiplanning in the presence of clustered indexes. We
create as many indexes as possible, and run a query that makes all of them eligible, so we get
as many competing plans as possible. The collection is clustered and has very large strings as
_id.

We expect classic to have better latency and throughput than SBE on this workload,
and we expect the combination of classic planner + SBE execution (PM-3591) to perform about
as well as classic.
This workload is similar to 'Simple.yml' except that the collection is clustered. None of the
competing plans actually take advantage of the clustering (there is no bounded collection scan
plan). Maybe we expect the larger record IDs to make fetch take more wall clock time.

GlobalDefaults:
dbname: &db test
Expand Down
22 changes: 22 additions & 0 deletions src/workloads/query/multiplanner/CompoundIndexes.yml
Original file line number Diff line number Diff line change
Expand Up @@ -402,6 +402,27 @@ Actors:
query: *query

AutoRun:
<<<<<<< HEAD
dstorch marked this conversation as resolved.
Show resolved Hide resolved
- When:
mongodb_setup:
$eq:
- standalone-sbe
- standalone-80-feature-flags # At time of writing this will enable PM-3591.
- standalone-all-feature-flags # At time of writing this will enable PM-3591.
- standalone-classic-query-engine
branch_name:
$gte: v7.3
||||||| 9aedad60
- When:
mongodb_setup:
$eq:
- standalone-sbe
- standalone-80-feature-flags # At time of writing this will enable PM-3591.
- standalone-all-feature-flags # At time of writing this will enable PM-3591.
- standalone-classic-query-engine
branch_name:
$gte: v7.3
=======
- When:
mongodb_setup:
$eq:
Expand All @@ -411,3 +432,4 @@ AutoRun:
- standalone-classic-query-engine
branch_name:
$gte: v7.3
>>>>>>> master
22 changes: 14 additions & 8 deletions src/workloads/query/multiplanner/MultiplannerWithGroup.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,21 @@
SchemaVersion: 2018-07-01
Owner: "@mongodb/query"
Description: |
The goal of this test is to exercise multiplanning. We create as many indexes as possible, and run a
query that makes all of them eligible, so we get as many competing plans as possible. We add a
group stage, which is blocking. The SBE multiplanner will multiplan group as it is a part of the
canonical query, but the classic multiplanner will not plan. This means the SBE multiplanner will
have the overhead of trial running blocking plans when compared to the classic multiplanner.
This test was created to show how three different multiplanners handle $group.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This test was created to show how three different multiplanners handle $group.
This test was created to show how the multiplanner handles $group.

The query is essentially the one from 'Simple.yml': we have as many indexed predicates as
possible, to create as many indexed plans as possible, but only one of those predicates is
selective, which means only one of those plans is efficient. Where this test departs from
dstorch marked this conversation as resolved.
Show resolved Hide resolved
'Simple.yml' is by adding a $group stage after the access-path part of the query.

We expect classic to have better latency and throughput than SBE on this workload,
and we expect the combination of classic planner + SBE execution (PM-3591) to perform about
as well as classic.
We expect the Classic multiplanner with Classic execution to reach EOF during planning, and
then feed those documents into the $group stage. By contrast the SBE multiplanner will not
dstorch marked this conversation as resolved.
Show resolved Hide resolved
reuse results gathered during multiplanning.

The Classic multiplanner with SBE execution would normally be able to avoid starting over,
when the query finishes during multiplanning, but in this test it can't because of the $group.
When there are any pipeline stages beyond the access-path part of the query, then when
multiplanning finishes we construct a new SBE plan with both the access path and the other
dpercy marked this conversation as resolved.
Show resolved Hide resolved
pipeline stages.

GlobalDefaults:
dbname: &db test
Expand Down
17 changes: 9 additions & 8 deletions src/workloads/query/multiplanner/NoGoodPlan.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
SchemaVersion: 2018-07-01
Owner: "@mongodb/query"
Description: |
The goal of this test is to exercise multiplanning. We create 64 indexes and run a query that
makes all of them eligible, so we get as many competing plans as possible. The only selective
field is unindexed, however, meaning no index will be effective in planning. By ensuring all plans
are relatively equally bad, we are likely to hit the works limit sooner than the 101 results
limit.
The goal of this test is to exercise the case in multiplanning where all competing plans are bad.

As in 'Simple.yml' we create 64 indexes and run a query that makes all of them eligible, so we
dpercy marked this conversation as resolved.
Show resolved Hide resolved
get as many competing plans as possible. However, unlike 'Simple.yml', in this workload the only
selective field is unindexed, however, meaning no index will be effective in planning. By
dstorch marked this conversation as resolved.
Show resolved Hide resolved
ensuring all plans are relatively equally bad, we are likely to hit the works limit sooner than
dpercy marked this conversation as resolved.
Show resolved Hide resolved
the 101 results limit.

We expect classic to have better latency and throughput than SBE on this workload,
and we expect the combination of classic planner + SBE execution (PM-3591) to perform about
as well as classic.
This case is intended not so much to show a difference between Classic and SBE, but to show a
case where any multiplanner will struggle.

GlobalDefaults:
dbname: &db test
Expand Down
10 changes: 2 additions & 8 deletions src/workloads/query/multiplanner/NoResults.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,8 @@
SchemaVersion: 2018-07-01
Owner: "@mongodb/query"
Description: |
The goal of this test is to exercise multiplanning. We create as many indexes as possible, and run a
query that makes all of them eligible, so we get as many competing plans as possible. All predicates
are very selective (match 0% of the documents). With zero results, we do no hit the EOF optimization
and all competing plans hit the works limit instead of document limit.

We expect classic to have better latency and throughput than SBE on this workload,
and we expect the combination of classic planner + SBE execution (PM-3591) to perform about
as well as classic.
This test shows an example where we have many competing plans, but they all are very efficient:
they all have empty index bounds, which means any competing plan will finish immediately.

GlobalDefaults:
dbname: &db test
Expand Down
7 changes: 4 additions & 3 deletions src/workloads/query/multiplanner/NoSuchField.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed this description as part of #1224 which I merged this morning. So when you merge with the latest from master, I think this PR should end up in a state where it does not make any changes to this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, after updating it looks like NoSuchField.yml is unchanged in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except the linter complained so I had to add a "Keywords" section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, why did the linter complain for this workload but not the others? In your branch, only two of the multiplanner workloads have the keyword specified. I guess you should either add Keywords to all the multi-planner workloads in this patch or file a ticket about doing so later. (If we file a ticket, it's probably something we would stick into the neweng bucket?)

Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
SchemaVersion: 2018-07-01
Owner: "@mongodb/query"
Description: |
The goal of this test is to exercise multiplanning. We create as many indexes as possible, and run a
query that makes all of them eligible, so we get as many competing plans as possible. Here, we add
an additional predicate: {no_such_field: "none"} to guarantee that we hit getTrialPeriodMaxWorks().
The goal of this test is to exercise the "max works" case of multiplanning. The test is similar
dpercy marked this conversation as resolved.
Show resolved Hide resolved
to 'Simple.yml' except we add an additional predicate: {no_such_field: "none"}, which is always
false on this dataset. This guarantees that the query will not be able to finish multiplanning
by producing enough documents, so instead we will hit getTrialPeriodMaxWorks().

We expect classic to have better latency and throughput than SBE on this workload,
and we expect the combination of classic planner + SBE execution (PM-3591) to perform about
Expand Down
21 changes: 14 additions & 7 deletions src/workloads/query/multiplanner/NonBlockingVsBlocking.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,20 @@
SchemaVersion: 2018-07-01
Owner: "@mongodb/query"
Description: |
The goal of this test is to exercise multiplanning. If the selectivity value is small enough (less
than 0.5), the optimal plan is to employ a blocking plan by scanning a segment of empty data and
conducting a blocking-sort operation, whereas the other plans' index provides the right sort
order, but requires a full scan, and every document is rejected after the FETCH stage. Because the
SBE multiplanner can't round-robin, it has a heuristic "try nonblocking plans first". This
scenario is a worst case for that heuristic, because we'll try the best plan last. Otherwise, an
IXSCAN and FETCH non-blocking plan will be used.
The goal of this test is to exercise multiplanning when both blocking and non-blocking plans are
available.

If the selectivity value is small enough (less than 0.5), the optimal plan is to employ a
blocking plan by scanning a segment of empty data and conducting a blocking-sort operation,
dstorch marked this conversation as resolved.
Show resolved Hide resolved
whereas the other plans' index provides the right sort order, but requires a full scan, and
every document is rejected after the FETCH stage.
dstorch marked this conversation as resolved.
Show resolved Hide resolved

Because the SBE multiplanner can't round-robin, it has a heuristic "try nonblocking plans first".
dstorch marked this conversation as resolved.
Show resolved Hide resolved
This scenario is a worst case for that heuristic, because we'll try the best plan last.
Otherwise, an IXSCAN and FETCH non-blocking plan will be used.

Another point of view: this scenario shows that "non-blocking" plans can still do an unbounded
amount of work per getNext().

We expect classic to have better latency and throughput than SBE on this workload, and we expect
dstorch marked this conversation as resolved.
Show resolved Hide resolved
the combination of classic planner + SBE execution (PM-3591) to perform about as well as classic.
Expand Down
2 changes: 1 addition & 1 deletion src/workloads/query/multiplanner/Simple.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Description: |
The goal of this test is to exercise multiplanning. We create as many indexes as possible, and run a
query that makes all of them eligible, so we get as many competing plans as possible.

The original goal of this test was to demonstrate weaknesses of the SBE multiplanner when compared to
This test was originally created to demonstrate weaknesses of the SBE multiplanner when compared to
the classic multiplanner. Mainly, the SBE multiplanner can't round-robin between plans, which means it
has to run the list of plans sequentially, which means we can't short-circuit when the shortest-running
plan finishes.
Expand Down
13 changes: 7 additions & 6 deletions src/workloads/query/multiplanner/UseClusteredIndex.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
SchemaVersion: 2018-07-01
Owner: "@mongodb/query"
Description: |
The goal of this test is to exercise multiplanning. We create as many indexes as possible, and run a
query that makes all of them eligible, so we get as many competing plans as possible. Here, we do this on a
clustered collection and add a selective predicate on _id, so that the clustered index is a viable candidate plan.
The goal of this test is to exercise multiplanning in the presence of clustered indexes. We
create as many indexes as possible, and run a query that makes all of them eligible, so we get
as many competing plans as possible. The collection is clustered and has very large strings as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like this note about very large strings is accurate. It looks like we don't explicitly mention the _id field during data generation, so we probably end up with ObjectIds, not large strings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping!

_id; also, one of the predicates is on _id which means a clustered collection scan is included
in the competing plans.

We expect classic to have better latency and throughput than SBE on this workload,
and we expect the combination of classic planner + SBE execution (PM-3591) to perform about
as well as classic.
This workload is similar to 'Simple.yml' except for the collection being clustered, and the
extra predicate.

GlobalDefaults:
dbname: &db test
Expand Down
29 changes: 18 additions & 11 deletions src/workloads/query/multiplanner/VariedSelectivity.yml
Original file line number Diff line number Diff line change
@@ -1,17 +1,24 @@
SchemaVersion: 2018-07-01
Owner: "@mongodb/query"
Description: |
The goal of this test is to exercise multiplanning. We run the same query 7 times, each one with a
different selectivity value that we are comparing against x1, calcuated based on the number of
documents we want the query to match. This will help us measure the overhead of throwing out the
result set gathered during multi-planning when the result set exceeds 101 documents. Unlike many
of the other multiplanner/ workloads, we only test with 2 indexes here, because 2 indexes is a
worst case for throwing away results. Having more indexes increases planning time, but not query
execution time, so having more indexes makes the *relative* cost of throwing away results smaller.

We expect classic to have better latency and throughput than SBE on this workload,
and we expect the combination of classic planner + SBE execution (PM-3591) to perform about
as well as classic.
The goal of this test is to measure the overhead of "throwing out" the initial results returned by
multiplanning.

When a query runs with Classic multiplanner + Classic execution, then when multiplanning finishes
the query can resume running and reuse the partial results it gathered during multiplanning. By
contrast when running with SBE execution, the query has
to start over--unless it already finished during multiplanning. This means SBE has a
discontinuity in performance as the size of the result set grows: when it crosses from 100 to
102 documents, it has to recompute those first ~100 documents.

To measure this, we run the same query 7 times, each one with a different selectivity value.
For example, in phase 'MultiplannerWith50ExpectedResults' we choose a selectivity of
'50 / collectionSize' to make the query return (approximately) 50 documents.

Unlike many of the other multiplanner/ workloads, we only test with 2 indexes here, because
2 indexes is a worst case for throwing away results. Having more indexes increases planning
time, but not query execution time, so having more indexes makes the *relative* cost of
throwing away results smaller.

GlobalDefaults:
dbname: &db test
Expand Down