mongodb · dpercy · Jun 21, 2024 · May 7, 2024 · May 10, 2024 · May 10, 2024
@@ -1,11 +1,12 @@
 SchemaVersion: 2018-07-01
 Owner: "@mongodb/query"
 Description: |
-  The goal of this test is to exercise multiplanning. We create as many indexes as possible, and run
-  a query that makes all of them eligible, so we get as many competing plans as possible. We also
-  add a sort stage on an unindexed field, ensuring that every plan is a blocking plan. Because all
-  plans are blocking and return as many documents as possible, multiplanning will hit "max works"
-  instead of EOF of numToReturn. This maximizes the overhead of multiplanning on both classic and SBE.
+  The goal of this test is to show how a blocking sort can increase the overhead of multiplanning.
+  We create as many indexes as possible, and run a query that makes all of them eligible, so we
+  get as many competing plans as possible. We also add a sort stage on an unindexed field,
+  ensuring that every plan is a blocking plan. Because all plans are blocking and return as many
+  documents as possible, multiplanning will hit "max works" instead of EOF or numToReturn.
+  This maximizes the overhead of multiplanning.
 
   We expect classic to have better latency and throughput than SBE on this workload,
   and we expect the combination of classic planner + SBE execution (PM-3591) to perform about

@@ -1,13 +1,14 @@
 SchemaVersion: 2018-07-01
 Owner: "@mongodb/query"
 Description: |
-  The goal of this test is to exercise multiplanning. We create as many indexes as possible, and run a
-   query that makes all of them eligible, so we get as many competing plans as possible. Here, we do this on a
-   clustered collection that has very large strings as _id.
+  The goal of this test is to exercise multiplanning in the presence of clustered indexes. We
+  create as many indexes as possible, and run a query that makes all of them eligible, so we get
+  as many competing plans as possible. The collection is clustered and has very large strings as
+  _id.
 
-   We expect classic to have better latency and throughput than SBE on this workload,
-   and we expect the combination of classic planner + SBE execution (PM-3591) to perform about
-   as well as classic.
+  This workload is similar to 'Simple.yml' except that the collection is clustered. None of the
+  competing plans actually take advantage of the clustering (there is no bounded collection scan
+  plan). Maybe we expect the larger record IDs to make fetch take more wall clock time.
 
 GlobalDefaults:
   dbname: &db test

diff --git a/src/workloads/query/multiplanner/CompoundIndexes.yml b/src/workloads/query/multiplanner/CompoundIndexes.yml
@@ -402,6 +402,27 @@ Actors:
         query: *query
 
 AutoRun:
+<<<<<<< HEAD
+- When:
+    mongodb_setup:
+      $eq:
+      - standalone-sbe
+      - standalone-80-feature-flags # At time of writing this will enable PM-3591.
+      - standalone-all-feature-flags  # At time of writing this will enable PM-3591.
+      - standalone-classic-query-engine
+    branch_name:
+      $gte: v7.3
+||||||| 9aedad60
+- When:
+    mongodb_setup:
+      $eq:
+      - standalone-sbe
+      - standalone-80-feature-flags # At time of writing this will enable PM-3591.
+      - standalone-all-feature-flags  # At time of writing this will enable PM-3591.
+      - standalone-classic-query-engine
+    branch_name:
+      $gte: v7.3
+=======
   - When:
       mongodb_setup:
         $eq:
@@ -411,3 +432,4 @@ AutoRun:
           - standalone-classic-query-engine
       branch_name:
         $gte: v7.3
+>>>>>>> master
@@ -1,15 +1,21 @@
 SchemaVersion: 2018-07-01
 Owner: "@mongodb/query"
 Description: |
-  The goal of this test is to exercise multiplanning. We create as many indexes as possible, and run a
-  query that makes all of them eligible, so we get as many competing plans as possible. We add a
-  group stage, which is blocking. The SBE multiplanner will multiplan group as it is a part of the
-  canonical query, but the classic multiplanner will not plan. This means the SBE multiplanner will
-  have the overhead of trial running blocking plans when compared to the classic multiplanner.
+  This test was created to show how three different multiplanners handle $group.
-  This test was created to show how three different multiplanners handle $group.
+  This test was created to show how the multiplanner handles $group.
-  This test was created to show how three different multiplanners handle $group.
+  This test was created to show how the multiplanner handles $group.
+  The query is essentially the one from 'Simple.yml': we have as many indexed predicates as
+  possible, to create as many indexed plans as possible, but only one of those predicates is
+  selective, which means only one of those plans is efficient. Where this test departs from
+  'Simple.yml' is by adding a $group stage after the access-path part of the query.
 
-  We expect classic to have better latency and throughput than SBE on this workload,
-  and we expect the combination of classic planner + SBE execution (PM-3591) to perform about
-  as well as classic.
+  We expect the Classic multiplanner with Classic execution to reach EOF during planning, and
+  then feed those documents into the $group stage. By contrast the SBE multiplanner will not
+  reuse results gathered during multiplanning.
+
+  The Classic multiplanner with SBE execution would normally be able to avoid starting over,
+  when the query finishes during multiplanning, but in this test it can't because of the $group.
+  When there are any pipeline stages beyond the access-path part of the query, then when
+  multiplanning finishes we construct a new SBE plan with both the access path and the other
+  pipeline stages.
 
 GlobalDefaults:
   dbname: &db test

diff --git a/src/workloads/query/multiplanner/NoGoodPlan.yml b/src/workloads/query/multiplanner/NoGoodPlan.yml
@@ -1,15 +1,16 @@
 SchemaVersion: 2018-07-01
 Owner: "@mongodb/query"
 Description: |
-  The goal of this test is to exercise multiplanning. We create 64 indexes and run a query that
-  makes all of them eligible, so we get as many competing plans as possible. The only selective
-  field is unindexed, however, meaning no index will be effective in planning. By ensuring all plans
-  are relatively equally bad, we are likely to hit the works limit sooner than the 101 results
-  limit.
+  The goal of this test is to exercise the case in multiplanning where all competing plans are bad.
+
+  As in 'Simple.yml' we create 64 indexes and run a query that makes all of them eligible, so we
+  get as many competing plans as possible. However, unlike 'Simple.yml', in this workload the only
+  selective field is unindexed, however, meaning no index will be effective in planning. By
+  ensuring all plans are relatively equally bad, we are likely to hit the works limit sooner than
+  the 101 results limit.
 
-  We expect classic to have better latency and throughput than SBE on this workload,
-  and we expect the combination of classic planner + SBE execution (PM-3591) to perform about
-  as well as classic.
+  This case is intended not so much to show a difference between Classic and SBE, but to show a
+  case where any multiplanner will struggle.
 
 GlobalDefaults:
   dbname: &db test

@@ -1,14 +1,8 @@
 SchemaVersion: 2018-07-01
 Owner: "@mongodb/query"
 Description: |
-  The goal of this test is to exercise multiplanning. We create as many indexes as possible, and run a
-  query that makes all of them eligible, so we get as many competing plans as possible. All predicates
-  are very selective (match 0% of the documents). With zero results, we do no hit the EOF optimization
-  and all competing plans hit the works limit instead of document limit.
-
-  We expect classic to have better latency and throughput than SBE on this workload,
-  and we expect the combination of classic planner + SBE execution (PM-3591) to perform about
-  as well as classic.
+  This test shows an example where we have many competing plans, but they all are very efficient:
+  they all have empty index bounds, which means any competing plan will finish immediately.
 
 GlobalDefaults:
   dbname: &db test

@@ -1,9 +1,10 @@
 SchemaVersion: 2018-07-01
 Owner: "@mongodb/query"
 Description: |
-  The goal of this test is to exercise multiplanning. We create as many indexes as possible, and run a
-  query that makes all of them eligible, so we get as many competing plans as possible. Here, we add
-  an additional predicate: {no_such_field: "none"} to guarantee that we hit getTrialPeriodMaxWorks().
+  The goal of this test is to exercise the "max works" case of multiplanning. The test is similar
+  to 'Simple.yml' except we add an additional predicate: {no_such_field: "none"}, which is always
+  false on this dataset. This guarantees that the query will not be able to finish multiplanning
+  by producing enough documents, so instead we will hit getTrialPeriodMaxWorks().
 
   We expect classic to have better latency and throughput than SBE on this workload,
   and we expect the combination of classic planner + SBE execution (PM-3591) to perform about

@@ -1,13 +1,20 @@
 SchemaVersion: 2018-07-01
 Owner: "@mongodb/query"
 Description: |
-  The goal of this test is to exercise multiplanning. If the selectivity value is small enough (less
-  than 0.5), the optimal plan is to employ a blocking plan by scanning a segment of empty data and
-  conducting a blocking-sort operation, whereas the other plans' index provides the right sort
-  order, but requires a full scan, and every document is rejected after the FETCH stage. Because the
-  SBE multiplanner can't round-robin, it has a heuristic "try nonblocking plans first".  This
-  scenario is a worst case for that heuristic, because we'll try the best plan last. Otherwise, an
-  IXSCAN and FETCH non-blocking plan will be used.
+  The goal of this test is to exercise multiplanning when both blocking and non-blocking plans are
+  available.
+
+  If the selectivity value is small enough (less than 0.5), the optimal plan is to employ a
+  blocking plan by scanning a segment of empty data and conducting a blocking-sort operation,
+  whereas the other plans' index provides the right sort order, but requires a full scan, and
+  every document is rejected after the FETCH stage.
+
+  Because the SBE multiplanner can't round-robin, it has a heuristic "try nonblocking plans first".
+  This scenario is a worst case for that heuristic, because we'll try the best plan last.
+  Otherwise, an IXSCAN and FETCH non-blocking plan will be used.
+
+  Another point of view: this scenario shows that "non-blocking" plans can still do an unbounded
+  amount of work per getNext().
 
   We expect classic to have better latency and throughput than SBE on this workload, and we expect
   the combination of classic planner + SBE execution (PM-3591) to perform about as well as classic.

@@ -4,7 +4,7 @@ Description: |
   The goal of this test is to exercise multiplanning. We create as many indexes as possible, and run a
   query that makes all of them eligible, so we get as many competing plans as possible.
 
-  The original goal of this test was to demonstrate weaknesses of the SBE multiplanner when compared to
+  This test was originally created to demonstrate weaknesses of the SBE multiplanner when compared to
   the classic multiplanner. Mainly, the SBE multiplanner can't round-robin between plans, which means it
   has to run the list of plans sequentially, which means we can't short-circuit when the shortest-running
   plan finishes.

@@ -1,13 +1,14 @@
 SchemaVersion: 2018-07-01
 Owner: "@mongodb/query"
 Description: |
-  The goal of this test is to exercise multiplanning. We create as many indexes as possible, and run a
-   query that makes all of them eligible, so we get as many competing plans as possible. Here, we do this on a
-   clustered collection and add a selective predicate on _id, so that the clustered index is a viable candidate plan.
+  The goal of this test is to exercise multiplanning in the presence of clustered indexes. We
+  create as many indexes as possible, and run a query that makes all of them eligible, so we get
+  as many competing plans as possible. The collection is clustered and has very large strings as
+  _id; also, one of the predicates is on _id which means a clustered collection scan is included
+  in the competing plans.
 
-   We expect classic to have better latency and throughput than SBE on this workload,
-   and we expect the combination of classic planner + SBE execution (PM-3591) to perform about
-   as well as classic.
+  This workload is similar to 'Simple.yml' except for the collection being clustered, and the
+  extra predicate.
 
 GlobalDefaults:
   dbname: &db test

@@ -1,17 +1,24 @@
 SchemaVersion: 2018-07-01
 Owner: "@mongodb/query"
 Description: |
-  The goal of this test is to exercise multiplanning. We run the same query 7 times, each one with a
-  different selectivity value that we are comparing against x1, calcuated based on the number of
-  documents we want the query to match. This will help us measure the overhead of throwing out the
-  result set gathered during multi-planning when the result set exceeds 101 documents.  Unlike many
-  of the other multiplanner/ workloads, we only test with 2 indexes here, because 2 indexes is a
-  worst case for throwing away results. Having more indexes increases planning time, but not query
-  execution time, so having more indexes makes the *relative* cost of throwing away results smaller.
-
-  We expect classic to have better latency and throughput than SBE on this workload,
-  and we expect the combination of classic planner + SBE execution (PM-3591) to perform about
-  as well as classic.
+  The goal of this test is to measure the overhead of "throwing out" the initial results returned by
+  multiplanning.
+
+  When a query runs with Classic multiplanner + Classic execution, then when multiplanning finishes
+  the query can resume running and reuse the partial results it gathered during multiplanning. By
+  contrast when running with SBE execution, the query has
+  to start over--unless it already finished during multiplanning. This means SBE has a
+  discontinuity in performance as the size of the result set grows: when it crosses from 100 to
+  102 documents, it has to recompute those first ~100 documents.
+
+  To measure this, we run the same query 7 times, each one with a different selectivity value.
+  For example, in phase 'MultiplannerWith50ExpectedResults' we choose a selectivity of
+  '50 / collectionSize' to make the query return (approximately) 50 documents.
+
+  Unlike many of the other multiplanner/ workloads, we only test with 2 indexes here, because
+  2 indexes is a worst case for throwing away results. Having more indexes increases planning
+  time, but not query execution time, so having more indexes makes the *relative* cost of
+  throwing away results smaller.
 
 GlobalDefaults:
   dbname: &db test