-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CBO: Do not consider plans that exceed memory limits #11495
CBO: Do not consider plans that exceed memory limits #11495
Conversation
CC: @findepi @rschlussel |
@cameronreynoldson thanks for the PR. This looks like a good start on how to compare costs. I don't see the constructor that has individual costs and cumulative cost being used, so you still need to have the costCalculator include the individual cost when it's creating cost estimates. |
private final double cpuCost; | ||
private final double memoryCost; | ||
private final double networkCost; | ||
|
||
public PlanNodeCostEstimate(double cpuCost, double memoryCost, double networkCost) | ||
public PlanNodeCostEstimate(double individualCpuCost, double individualMemoryCost, double individualNetworkCost) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this constructor may be useful for testing, but in general we should use the one below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this constructor here because it is used in several other places and seems to be used to pass in the initial costs. If this constructor were to disappear, we would be explicitly passing in the same costs to both the individual and cumulative parameters. Is that what you're getting at?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i see it becomes cummulative after the "add" method. I would put a comment making that clearer and also make the other constructor private. Additionally instead of having two constructors that initialize stuff, call the other constructor from here.
{ | ||
return new PlanNodeCostEstimate(0, 0, individualNetworkCost); | ||
} | ||
|
||
public static PlanNodeCostEstimate cpuCost(double cpuCost) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do these get used. Right now they are exactly the same as the individualCost methods. Figure out if that's the correct behavior, and if so get rid of one set of methods.
Spoke to @mbasmanova. Looks good to both of us. Can you squash the commits together and rename the commit title to "Choose lower memory plan if a memory estimate exceeds limit"? |
503cafc
to
a4cb6a5
Compare
* Returns CPU component of the cost. Unknown value is represented by {@link Double#NaN} | ||
* Returns individual CPU component of the cost. Unknown value is represented by {@link Double#NaN} | ||
*/ | ||
public double getIndividualCpuCost() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "individual" mean in this context? Individual with respect to what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
individual
here refers to the cost of the node itself; before this change, the estimate was tracking the cumulative cost of the whole sub-tree rooted at that node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I'd suggest we rename the existing methods to getCumulativeCpuCost
, etc. and make these look like getCpuCost
. The term "individual" is confusing, whereas "cumulative" has a much more familiar connotation, especially in the context of a tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "individual cost" (memory/cpu/network) of a single node is not the right abstraction.
If, for example, we have an identity projection over one of the Joins and no projection over the other Join, we would be comparing pears to apples -- individual cost (memory/cpu/network) of the projection will negligible.
What we need instead is something that estimates the cost of the whole plan in 4 dimensions (3 existing and 1 new):
- (cumulative) CPU cost
- (cumulative) network cost
- (cumulative) memory cost (maybe this will be actually obsolete)
- peak memory usage
@@ -85,13 +138,17 @@ public double getNetworkCost() | |||
*/ | |||
public boolean hasUnknownComponents() | |||
{ | |||
return isNaN(cpuCost) || isNaN(memoryCost) || isNaN(networkCost); | |||
return isNaN(individualCpuCost) || isNaN(individualMemoryCost) || isNaN(individualNetworkCost) || | |||
isNaN(cpuCost) || isNaN(memoryCost) || isNaN(networkCost); | |||
} | |||
|
|||
@Override | |||
public String toString() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rschlussel @cameronreynoldson Will this change affect the query plan? E.g. is this method used in PlanPrinter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably be changed back to how it was given that the one of the cumulative costs would have to be unknown if any of the individual costs were unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it won't affect the query plan unless you have cbo on. PlanPrinter just gets the stats and costs of the final plan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there aren't any cases where the individual cost would be unknown and the cumulative cost wouldn't be.
Double.compare(that.memoryCost, memoryCost) == 0 && | ||
Double.compare(that.networkCost, networkCost) == 0; | ||
} | ||
|
||
@Override | ||
public int hashCode() | ||
{ | ||
return Objects.hash(cpuCost, memoryCost, networkCost); | ||
return Objects.hash(individualCpuCost, individualMemoryCost, individualNetworkCost, | ||
cpuCost, memoryCost, networkCost); | ||
} | ||
|
||
public PlanNodeCostEstimate add(PlanNodeCostEstimate other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rschlussel @cameronreynoldson Before this change add
method was simply adding two estimates. E.g. there was not implicit assumption that the estimates being added are for the source
nodes. In fact, this doesn't always seem to be the case.
com.facebook.presto.cost.CostCalculatorWithEstimatedExchanges.ExchangeCostEstimator#visitAggregation
@Override
public PlanNodeCostEstimate visitAggregation(AggregationNode node, Void context)
{
PlanNodeStatsEstimate sourceStats = getStats(node.getSource());
List<Symbol> sourceSymbols = node.getSource().getOutputSymbols();
PlanNodeCostEstimate remoteRepartitionCost = CostCalculatorUsingExchanges.calculateExchangeCost(
numberOfNodes,
sourceStats,
sourceSymbols,
REPARTITION,
REMOTE,
types);
PlanNodeCostEstimate localRepartitionCost = CostCalculatorUsingExchanges.calculateExchangeCost(
numberOfNodes,
sourceStats,
sourceSymbols,
REPARTITION,
LOCAL,
types);
// TODO consider cost of aggregation itself, not only exchanges, based on aggregation's properties
return remoteRepartitionCost.add(localRepartitionCost);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth our effort to change the uses of PlanNodeCostEstimate to follow our new pattern of initializing with the individual costs and using add() for cumulative costs?
} | ||
|
||
public PlanNodeCostEstimate add(PlanNodeCostEstimate other) | ||
{ | ||
return new PlanNodeCostEstimate( | ||
individualCpuCost, | ||
individualMemoryCost, | ||
individualNetworkCost, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not the other. individualNetworkCost
? Add
until this point was commutative. Maybe Double.NaN
would be better here.
@@ -85,13 +138,17 @@ public double getNetworkCost() | |||
*/ | |||
public boolean hasUnknownComponents() | |||
{ | |||
return isNaN(cpuCost) || isNaN(memoryCost) || isNaN(networkCost); | |||
return isNaN(individualCpuCost) || isNaN(individualMemoryCost) || isNaN(individualNetworkCost) || | |||
isNaN(cpuCost) || isNaN(memoryCost) || isNaN(networkCost); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put each one in separate line
@@ -59,6 +60,11 @@ public int compare(Session session, PlanNodeCostEstimate left, PlanNodeCostEstim | |||
requireNonNull(right, "right is null"); | |||
checkArgument(!left.hasUnknownComponents() && !right.hasUnknownComponents(), "cannot compare unknown costs"); | |||
|
|||
long queryMaxMemory = getQueryMaxMemory(session).toBytes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to use dedicated session property here, which could be set to query max memory by default. This is because that memory cost is not exactly the memory consumption.
new CostComparisonAssertion(1.0, 0.0, 0.0) | ||
.smaller(1000, 100, 100) | ||
.larger(100, 6e13, 100) | ||
.assertCompare(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add a test where you compare the case where individual memory costs are higher than query max memory
@@ -59,6 +60,11 @@ public int compare(Session session, PlanNodeCostEstimate left, PlanNodeCostEstim | |||
requireNonNull(right, "right is null"); | |||
checkArgument(!left.hasUnknownComponents() && !right.hasUnknownComponents(), "cannot compare unknown costs"); | |||
|
|||
long queryMaxMemory = getQueryMaxMemory(session).toBytes(); | |||
if (left.getIndividualMemoryCost() > queryMaxMemory || right.getIndividualMemoryCost() > queryMaxMemory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to me it is like hack that you are changing generic compare
method. I would prefer to have a dedicated method to handle the case high memory consumption plan nodes.
* Returns CPU component of the cost. Unknown value is represented by {@link Double#NaN} | ||
* Returns individual CPU component of the cost. Unknown value is represented by {@link Double#NaN} | ||
*/ | ||
public double getIndividualCpuCost() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "individual cost" (memory/cpu/network) of a single node is not the right abstraction.
If, for example, we have an identity projection over one of the Joins and no projection over the other Join, we would be comparing pears to apples -- individual cost (memory/cpu/network) of the projection will negligible.
What we need instead is something that estimates the cost of the whole plan in 4 dimensions (3 existing and 1 new):
- (cumulative) CPU cost
- (cumulative) network cost
- (cumulative) memory cost (maybe this will be actually obsolete)
- peak memory usage
@cameronreynoldson @mbasmanova @rschlussel I don't think this problem is matter of a simple patch. If we want a quickfix only, I would consider following options:
If we want a proper solution, I think some design doc/discussion would be welcome. |
Currently CBO memory estimate has rather undefined semantics. I think it is supposed to measure peak memory, but it doesn't do it currently. What it does instead is summing peak memory of each operator without taking into account execution timeline. This will make estimate pessimistic in plans with joins (build side finishes thus freeing memory of that subplan). |
@findepi @sopel39 @kokosing These are all fair points. While evaluating CBO on production queries. some queries failed due to memory limit after being rewritten to use broadcast join. Hence, we were wondering if there would be a way to make CBO of the memory limit and avoid plan that exceed that limit. It is indeed not clear how this can be achieved right now. In particular, it seems like there is no infrastructure to compare costs of the whole plans and not just the sub-plans rooted at the nodes being modified. Also, the costs are compared before all the optimizations have been applied.
Seems reasonable. Are there any specific places that are known to have issues?
What we found is that broadcast joins tend to use about the same amount of CPU as distributed joins, but a lot more peak memory. At the same time, broadcast joins have significantly lower wall times (due to better handling of data skew). In practice, lower wall times are good, because users experience lower latency and memory gets released faster. Hence, we don't want to put higher weight on memory, unless the amount of memory needed for the query exceeds the limit. |
Correct. It was also apparent in one of the TPC-DS queries. Maybe if we solve all the problems that are definitely solvable within current framework, production queries will be already happy?
@mbasmanova You mean where peak memory deviates from what we calculate today? Things to consider
if you wish, i can try formalize this a bit.
@mbasmanova "use" -- you mean estimates or actual resource usage on the cluster? It might be that relative CPU weights between nodes/operators are assigned "suboptimally". Today building LS costs the same as Filter, as Projection, etc. |
I am eager to work on this. |
Superseded by #11667 |
Check for the maximum amount of memory a query is allowed to have at any point in time in the CostComparator. If it is exceeded, choose the plan using less memory.