CBO: Do not consider plans that exceed memory limits #11495

cameronreynoldson · 2018-09-17T16:39:49Z

Check for the maximum amount of memory a query is allowed to have at any point in time in the CostComparator. If it is exceeded, choose the plan using less memory.

mbasmanova · 2018-09-17T17:11:37Z

CC: @findepi @rschlussel

rschlussel · 2018-09-17T17:52:56Z

@cameronreynoldson thanks for the PR. This looks like a good start on how to compare costs. I don't see the constructor that has individual costs and cumulative cost being used, so you still need to have the costCalculator include the individual cost when it's creating cost estimates.

rschlussel · 2018-09-17T17:55:13Z

presto-main/src/main/java/com/facebook/presto/cost/PlanNodeCostEstimate.java

    private final double cpuCost;
    private final double memoryCost;
    private final double networkCost;

-    public PlanNodeCostEstimate(double cpuCost, double memoryCost, double networkCost)
+    public PlanNodeCostEstimate(double individualCpuCost, double individualMemoryCost, double individualNetworkCost)


this constructor may be useful for testing, but in general we should use the one below.

I left this constructor here because it is used in several other places and seems to be used to pass in the initial costs. If this constructor were to disappear, we would be explicitly passing in the same costs to both the individual and cumulative parameters. Is that what you're getting at?

oh i see it becomes cummulative after the "add" method. I would put a comment making that clearer and also make the other constructor private. Additionally instead of having two constructors that initialize stuff, call the other constructor from here.

rschlussel · 2018-09-17T18:10:40Z

presto-main/src/main/java/com/facebook/presto/cost/PlanNodeCostEstimate.java

+    {
+        return new PlanNodeCostEstimate(0, 0, individualNetworkCost);
+    }
+
    public static PlanNodeCostEstimate cpuCost(double cpuCost)


where do these get used. Right now they are exactly the same as the individualCost methods. Figure out if that's the correct behavior, and if so get rid of one set of methods.

rschlussel · 2018-09-17T21:13:58Z

Spoke to @mbasmanova. Looks good to both of us. Can you squash the commits together and rename the commit title to "Choose lower memory plan if a memory estimate exceeds limit"?

martint · 2018-09-17T22:05:45Z

presto-main/src/main/java/com/facebook/presto/cost/PlanNodeCostEstimate.java

-     * Returns CPU component of the cost. Unknown value is represented by {@link Double#NaN}
+     * Returns individual CPU component of the cost. Unknown value is represented by {@link Double#NaN}
+     */
+    public double getIndividualCpuCost()


What does "individual" mean in this context? Individual with respect to what?

individual here refers to the cost of the node itself; before this change, the estimate was tracking the cumulative cost of the whole sub-tree rooted at that node.

In that case, I'd suggest we rename the existing methods to getCumulativeCpuCost, etc. and make these look like getCpuCost. The term "individual" is confusing, whereas "cumulative" has a much more familiar connotation, especially in the context of a tree.

I think "individual cost" (memory/cpu/network) of a single node is not the right abstraction.
If, for example, we have an identity projection over one of the Joins and no projection over the other Join, we would be comparing pears to apples -- individual cost (memory/cpu/network) of the projection will negligible.

What we need instead is something that estimates the cost of the whole plan in 4 dimensions (3 existing and 1 new):

(cumulative) CPU cost

(cumulative) network cost

(cumulative) memory cost (maybe this will be actually obsolete)

peak memory usage

mbasmanova · 2018-09-17T23:53:31Z

presto-main/src/main/java/com/facebook/presto/cost/PlanNodeCostEstimate.java

@@ -85,13 +138,17 @@ public double getNetworkCost()
     */
    public boolean hasUnknownComponents()
    {
-        return isNaN(cpuCost) || isNaN(memoryCost) || isNaN(networkCost);
+        return isNaN(individualCpuCost) || isNaN(individualMemoryCost) || isNaN(individualNetworkCost) ||
+                isNaN(cpuCost) || isNaN(memoryCost) || isNaN(networkCost);
    }

    @Override
    public String toString()


@rschlussel @cameronreynoldson Will this change affect the query plan? E.g. is this method used in PlanPrinter?

This could probably be changed back to how it was given that the one of the cumulative costs would have to be unknown if any of the individual costs were unknown.

it won't affect the query plan unless you have cbo on. PlanPrinter just gets the stats and costs of the final plan

there aren't any cases where the individual cost would be unknown and the cumulative cost wouldn't be.

mbasmanova · 2018-09-17T23:55:44Z

presto-main/src/main/java/com/facebook/presto/cost/PlanNodeCostEstimate.java

                Double.compare(that.memoryCost, memoryCost) == 0 &&
                Double.compare(that.networkCost, networkCost) == 0;
    }

    @Override
    public int hashCode()
    {
-        return Objects.hash(cpuCost, memoryCost, networkCost);
+        return Objects.hash(individualCpuCost, individualMemoryCost, individualNetworkCost,
+                cpuCost, memoryCost, networkCost);
    }

    public PlanNodeCostEstimate add(PlanNodeCostEstimate other)


@rschlussel @cameronreynoldson Before this change add method was simply adding two estimates. E.g. there was not implicit assumption that the estimates being added are for the source nodes. In fact, this doesn't always seem to be the case.

com.facebook.presto.cost.CostCalculatorWithEstimatedExchanges.ExchangeCostEstimator#visitAggregation @Override public PlanNodeCostEstimate visitAggregation(AggregationNode node, Void context) { PlanNodeStatsEstimate sourceStats = getStats(node.getSource()); List<Symbol> sourceSymbols = node.getSource().getOutputSymbols(); PlanNodeCostEstimate remoteRepartitionCost = CostCalculatorUsingExchanges.calculateExchangeCost( numberOfNodes, sourceStats, sourceSymbols, REPARTITION, REMOTE, types); PlanNodeCostEstimate localRepartitionCost = CostCalculatorUsingExchanges.calculateExchangeCost( numberOfNodes, sourceStats, sourceSymbols, REPARTITION, LOCAL, types); // TODO consider cost of aggregation itself, not only exchanges, based on aggregation's properties return remoteRepartitionCost.add(localRepartitionCost); }

Would it be worth our effort to change the uses of PlanNodeCostEstimate to follow our new pattern of initializing with the individual costs and using add() for cumulative costs?

kokosing · 2018-09-18T06:06:21Z

presto-main/src/main/java/com/facebook/presto/cost/PlanNodeCostEstimate.java

    }

    public PlanNodeCostEstimate add(PlanNodeCostEstimate other)
    {
        return new PlanNodeCostEstimate(
+                individualCpuCost,
+                individualMemoryCost,
+                individualNetworkCost,


why not the other. individualNetworkCost? Add until this point was commutative. Maybe Double.NaN would be better here.

kokosing · 2018-09-18T06:06:52Z

presto-main/src/main/java/com/facebook/presto/cost/PlanNodeCostEstimate.java

@@ -85,13 +138,17 @@ public double getNetworkCost()
     */
    public boolean hasUnknownComponents()
    {
-        return isNaN(cpuCost) || isNaN(memoryCost) || isNaN(networkCost);
+        return isNaN(individualCpuCost) || isNaN(individualMemoryCost) || isNaN(individualNetworkCost) ||
+                isNaN(cpuCost) || isNaN(memoryCost) || isNaN(networkCost);


put each one in separate line

kokosing · 2018-09-18T06:10:32Z

presto-main/src/main/java/com/facebook/presto/cost/CostComparator.java

@@ -59,6 +60,11 @@ public int compare(Session session, PlanNodeCostEstimate left, PlanNodeCostEstim
        requireNonNull(right, "right is null");
        checkArgument(!left.hasUnknownComponents() && !right.hasUnknownComponents(), "cannot compare unknown costs");

+        long queryMaxMemory = getQueryMaxMemory(session).toBytes();


It would be nice to use dedicated session property here, which could be set to query max memory by default. This is because that memory cost is not exactly the memory consumption.

kokosing · 2018-09-18T06:15:30Z

presto-main/src/test/java/com/facebook/presto/cost/TestCostComparator.java

+        new CostComparisonAssertion(1.0, 0.0, 0.0)
+                .smaller(1000, 100, 100)
+                .larger(100, 6e13, 100)
+                .assertCompare();


can you also add a test where you compare the case where individual memory costs are higher than query max memory

kokosing · 2018-09-18T06:17:38Z

presto-main/src/main/java/com/facebook/presto/cost/CostComparator.java

@@ -59,6 +60,11 @@ public int compare(Session session, PlanNodeCostEstimate left, PlanNodeCostEstim
        requireNonNull(right, "right is null");
        checkArgument(!left.hasUnknownComponents() && !right.hasUnknownComponents(), "cannot compare unknown costs");

+        long queryMaxMemory = getQueryMaxMemory(session).toBytes();
+        if (left.getIndividualMemoryCost() > queryMaxMemory || right.getIndividualMemoryCost() > queryMaxMemory) {


to me it is like hack that you are changing generic compare method. I would prefer to have a dedicated method to handle the case high memory consumption plan nodes.

findepi · 2018-09-18T06:22:23Z

presto-main/src/main/java/com/facebook/presto/cost/PlanNodeCostEstimate.java

-     * Returns CPU component of the cost. Unknown value is represented by {@link Double#NaN}
+     * Returns individual CPU component of the cost. Unknown value is represented by {@link Double#NaN}
+     */
+    public double getIndividualCpuCost()


I think "individual cost" (memory/cpu/network) of a single node is not the right abstraction.
If, for example, we have an identity projection over one of the Joins and no projection over the other Join, we would be comparing pears to apples -- individual cost (memory/cpu/network) of the projection will negligible.

What we need instead is something that estimates the cost of the whole plan in 4 dimensions (3 existing and 1 new):

(cumulative) CPU cost

(cumulative) network cost

(cumulative) memory cost (maybe this will be actually obsolete)

peak memory usage

findepi · 2018-09-18T06:27:39Z

@cameronreynoldson @mbasmanova @rschlussel I don't think this problem is matter of a simple patch.
To me, what we currently have is a quickfix which helps in some situations but not the others.

If we want a quickfix only, I would consider following options:

compare plans by memory usage first (ie assign memory cost a much higher weight)
compare plans by memory usage first if memory cost exceeds certain threshold (in relation to query max memory) -- we had this initially, but we dropped this, since memory cost doesn't estimate peak memory usage

If we want a proper solution, I think some design doc/discussion would be welcome.

sopel39 · 2018-09-18T08:14:32Z

Currently CBO memory estimate has rather undefined semantics. I think it is supposed to measure peak memory, but it doesn't do it currently. What it does instead is summing peak memory of each operator without taking into account execution timeline. This will make estimate pessimistic in plans with joins (build side finishes thus freeing memory of that subplan).
Additionally, we probably should go operator by operator to verify that memory estimate is reasonable.
I would suggest improving peak memory estimation before proceeding with this PR. Otherwise I think there should be some feature toggle to disable or parametrize this behavior.
As @findepi suggested, for now one can tweak cost weights to put more emphasis on memory.

mbasmanova · 2018-09-18T11:52:17Z

@findepi @sopel39 @kokosing These are all fair points. While evaluating CBO on production queries. some queries failed due to memory limit after being rewritten to use broadcast join. Hence, we were wondering if there would be a way to make CBO of the memory limit and avoid plan that exceed that limit.

It is indeed not clear how this can be achieved right now. In particular, it seems like there is no infrastructure to compare costs of the whole plans and not just the sub-plans rooted at the nodes being modified. Also, the costs are compared before all the optimizations have been applied.

I would suggest improving peak memory estimation before proceeding with this PR.

Seems reasonable. Are there any specific places that are known to have issues?

one can tweak cost weights to put more emphasis on memory

What we found is that broadcast joins tend to use about the same amount of CPU as distributed joins, but a lot more peak memory. At the same time, broadcast joins have significantly lower wall times (due to better handling of data skew). In practice, lower wall times are good, because users experience lower latency and memory gets released faster. Hence, we don't want to put higher weight on memory, unless the amount of memory needed for the query exceeds the limit.

findepi · 2018-09-18T21:17:57Z

It is indeed not clear how this can be achieved right now. In particular, it seems like there is no infrastructure to compare costs of the whole plans and not just the sub-plans rooted at the nodes being modified.

Correct. It was also apparent in one of the TPC-DS queries.
I would say: memo-based (exploratory) optimizer is the answer. Other than that, I don't have a better (quicker) answer to that.

Maybe if we solve all the problems that are definitely solvable within current framework, production queries will be already happy?

I would suggest improving peak memory estimation before proceeding with this PR.

Seems reasonable. Are there any specific places that are known to have issues?

@mbasmanova You mean where peak memory deviates from what we calculate today?

Things to consider

are we calculating memory usage correctly (eg does PagesIndex introduce some overhead that we need to care about? we used to disregard is-null arrays, which can add noticeable percentage; there might be other cases)
agg
- operators below final agg and above it cannot run in parallel (but final agg can run in parallel with those below and with those above)
join, sort -- something like the above
thus, for a plan, you need to calculate 2 things "peak memory usage of a subtree" and "peak memory usage when top node is ~~executing~~ outputting" -- this allows you to calc the metric bottom-up, just like any other cost

if you wish, i can try formalize this a bit.

What we found is that broadcast joins tend to use about the same amount of CPU as distributed joins, but a lot more peak memory.

@mbasmanova "use" -- you mean estimates or actual resource usage on the cluster?
The cost model assumes there is certain cost of building LookupSource, which is multiplied in case of broadcast join (every node builds its own LS)
Or maybe the cost of building LS is negligible compared to the cost of processing the probe table (join of tables with very different sizes)?

It might be that relative CPU weights between nodes/operators are assigned "suboptimally". Today building LS costs the same as Filter, as Projection, etc.

findepi · 2018-09-19T09:03:41Z

I am eager to work on this.
Just not this week :/

mbasmanova · 2018-10-09T15:54:38Z

Superseded by #11667

facebook-github-bot added the CLA Signed label Sep 17, 2018

mbasmanova self-assigned this Sep 17, 2018

cameronreynoldson changed the title ~~Add getQueryMaxMemory check to CostComparator~~ CBO: Do not consider plans that exceed memory limits Sep 17, 2018

rschlussel reviewed Sep 17, 2018

View reviewed changes

rschlussel approved these changes Sep 17, 2018

View reviewed changes

Choose lower memory plan if a memory estimate exceeds limit

a4cb6a5

cameronreynoldson force-pushed the max_mem_in_costcomparator branch from 503cafc to a4cb6a5 Compare September 17, 2018 21:22

martint reviewed Sep 17, 2018

View reviewed changes

mbasmanova reviewed Sep 17, 2018

View reviewed changes

kokosing reviewed Sep 18, 2018

View reviewed changes

findepi suggested changes Sep 18, 2018

View reviewed changes

arhimondr self-requested a review September 25, 2018 20:22

mbasmanova mentioned this pull request Sep 26, 2018

Restrict size of automatically broadcasted tables #11570

Closed

findepi mentioned this pull request Sep 27, 2018

Peak memory cost calculation #11591

Closed

sopel39 mentioned this pull request Oct 2, 2018

Quantify stats/cost calculator quality. Probabilistically adjust models as a longer term goal #11615

Closed

3 tasks

mbasmanova added the cbo label Oct 9, 2018

mbasmanova closed this Oct 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CBO: Do not consider plans that exceed memory limits #11495

CBO: Do not consider plans that exceed memory limits #11495

cameronreynoldson commented Sep 17, 2018

mbasmanova commented Sep 17, 2018

rschlussel commented Sep 17, 2018

rschlussel Sep 17, 2018

cameronreynoldson Sep 17, 2018

rschlussel Sep 17, 2018

rschlussel Sep 17, 2018

rschlussel commented Sep 17, 2018

martint Sep 17, 2018

mbasmanova Sep 17, 2018

martint Sep 18, 2018 •

edited

Loading

findepi Sep 18, 2018

mbasmanova Sep 17, 2018

cameronreynoldson Sep 18, 2018

rschlussel Sep 18, 2018

rschlussel Sep 18, 2018

mbasmanova Sep 17, 2018

cameronreynoldson Sep 18, 2018

kokosing Sep 18, 2018

kokosing Sep 18, 2018

kokosing Sep 18, 2018

kokosing Sep 18, 2018

kokosing Sep 18, 2018

findepi Sep 18, 2018

findepi commented Sep 18, 2018

sopel39 commented Sep 18, 2018

mbasmanova commented Sep 18, 2018

findepi commented Sep 18, 2018 •

edited

Loading

findepi commented Sep 19, 2018

mbasmanova commented Oct 9, 2018

CBO: Do not consider plans that exceed memory limits #11495

CBO: Do not consider plans that exceed memory limits #11495

Conversation

cameronreynoldson commented Sep 17, 2018

mbasmanova commented Sep 17, 2018

rschlussel commented Sep 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rschlussel commented Sep 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martint Sep 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

findepi commented Sep 18, 2018

sopel39 commented Sep 18, 2018

mbasmanova commented Sep 18, 2018

findepi commented Sep 18, 2018 • edited Loading

findepi commented Sep 19, 2018

mbasmanova commented Oct 9, 2018

martint Sep 18, 2018 •

edited

Loading

findepi commented Sep 18, 2018 •

edited

Loading