-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peak memory cost calculation #11591
Peak memory cost calculation #11591
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't see WIP
. Anyhow, since i already spent time on commenting, here are some comments =)
public static final PlanNodeCostEstimate INFINITE_COST = new PlanNodeCostEstimate(POSITIVE_INFINITY, POSITIVE_INFINITY, POSITIVE_INFINITY); | ||
public static final PlanNodeCostEstimate UNKNOWN_COST = new PlanNodeCostEstimate(NaN, NaN, NaN); | ||
public static final PlanNodeCostEstimate ZERO_COST = new PlanNodeCostEstimate(0, 0, 0); | ||
public static final PlanNodeCostEstimate INFINITE_COST = new PlanNodeCostEstimate(POSITIVE_INFINITY, POSITIVE_INFINITY, POSITIVE_INFINITY, POSITIVE_INFINITY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nicer if they were. You can change them in a separate commit to gain extra brownie points =)
PlanNodeCostEstimate.infinity()
, PlanNodeCostEstimate.unknown()
, PlanNodeCostEstimate.zero()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take this opportunity when i am terribly indebted 😛
} | ||
|
||
private final double cpuCost; | ||
private final double memoryCost; | ||
private final double peakMemoryCost; | ||
private final double peakMemoryWhenOutputting; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain at least in the commit message what's the difference between peakMemoryCost
and peakMemoryWhenOutputting
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not giving this explanation upfront.
So, the basic idea is as outlined in #11495 (comment): to calculate peak memory usage of a query.
To calculate this recursively, we need to know for every node in a plan tree:
- peak memory usage of the subtree
- peak memory usage when the node is outputting (producing output) -- this is required to compute peak memory usage of plan tree enclosing given node
- this is based on observation that until given node is outputting, plan nodes higher up need no memory (they have nothing to process)
- also, some nodes will consume all input before outputting, so when they are outputting, their subtrees no longer need memory
- the two properties of source nodes -- peak memory usage and peak memory usage when the node is outputting -- combined with node-specific formulas should allow to computing the properties higher up, realizing peak memory usage calculation for the whole plan
Now, the ugly part. In current cost calculation code, PlanNodeCostEstimate conveniently represents two things:
- local cost of the node
- total cost of the plan subtree
It's no wonder we reuse single class for this, as currently total cost of the plan subtree is just a sum of local costs of all nodes.
After this change, this design no longer fully fits:
- when calculating local cost of the node, we should calculate
[cpu, memory, network]
- when calculating total cost of the plan subtree, we should calculate
[cpu, peak memory, network]
(this can still be the same type as local cost of the node), but we temporarily need to hold[cpu, peak memory, peak memory when outputting, network]
- it might be that the "extended" type is just an inner thing is
CostProvider
implementation, so maybe the overall design still holds- ... edit: oh it can't.
Memo
needs to keep this "extended" type as well
- ... edit: oh it can't.
- it might be that the "extended" type is just an inner thing is
- summing cost vectors does not always make sense
For now, I didn't address the design part at all. I am just conveniently reusing PlanNodeCostEstimate
everywhere.
} | ||
|
||
@Override | ||
public String toString() | ||
{ | ||
return toStringHelper(this) | ||
.add("cpu", cpuCost) | ||
.add("memory", memoryCost) | ||
.add("memory", peakMemoryCost) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change toString method to include both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally left out the other, as it's irrelevant from cost-comparison perspective.
presto-main/src/main/java/com/facebook/presto/cost/CostComparator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/cost/CostCalculatorUsingExchanges.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/cost/CostCalculatorUsingExchanges.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/cost/CachingCostProvider.java
Show resolved
Hide resolved
|
||
PlanNodeCostEstimate sourcesCost = sourceProvider.getSources(node).stream() | ||
.map(this::getCumulativeCost) | ||
.reduce(ZERO_COST, PlanNodeCostEstimate::add); | ||
|
||
PlanNodeCostEstimate cumulativeCost = localCosts.add(sourcesCost); | ||
PlanNodeCostEstimate cumulativeCost = node.accept(new CostCombiner(), new CostCombiner.Context(localCost, sourcesCost)); | ||
verify(cumulativeCost != null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a message in case it ever fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it this unlikely case, a stacktrace should suffice
{ | ||
PlanNodeCostEstimate cost = context.localCost.add(context.sourcesCost); | ||
|
||
verify(Doubles.compare(context.localCost.getPeakMemoryCost(), context.localCost.getPeakMemoryWhenOutputting()) == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doubles.compare
delegates to Double.compare
. But you don't need it here. It is needed if deterministic comparison with NaN
is needed, or if -0
to 0
differentiation is needed. I'm not sure if that's something you expect here. I think just simple context.localCost.getPeakMemoryCost() == context.localCost.getPeakMemoryWhenOutputting()
will work here, since NaN == NaN => FALSE
, -0 == 0 => TRUE
public static void main(String[] args)
{
System.out.println("NaN == NaN: " + (NaN == NaN));
System.out.println("-0 == 0: " + (-0.0 == 0.0));
}
NaN == NaN: false
-0 == 0: true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please add a meaningful message in case it ever fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed something that will tell "OK" also for "NaN vs NaN".
Just want to be sure I understand the approach correctly: |
|
||
if (node.getStep() == AggregationNode.Step.SINGLE || node.getStep() == AggregationNode.Step.FINAL) { | ||
return cost.withMemory( | ||
Math.max(context.sourcesCost.getPeakMemoryCost(), context.sourcesCost.getPeakMemoryWhenOutputting() + aggregationMemory), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this computation? I don't understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final/Single agg lifecycle is:
- idle, waiting for data
- at this stage, it uses no memory, so peak memory of the whole subtree is determined by peak memory for the subplans
- consuming input
- at this stage, it allocates memory up to
aggregationMemory
; sources are outputting, so we know that subplans memory usage is no more thangetPeakMemoryWhenOutputting()
; thus memory usage is no greater thansourcesCost.getPeakMemoryWhenOutputting() + aggregationMemory
- at this stage, it allocates memory up to
- producing output
- at this stage, it slowly deallocates memory (so memory usage is bounded by
aggregationMemory
); input is fully consumed so it uses no memory at all; thus memory usage is no greater thanaggregationMemory
(which in turn is no more thansourcesCost.getPeakMemoryWhenOutputting() + aggregationMemory
)
- at this stage, it slowly deallocates memory (so memory usage is bounded by
Does this explanation help?
@arhimondr @rschlussel your review is appreciated! getting comments was the very purpose of this PR anyway. I will read thru the comments tomorrow. |
I am copying the explanation i put in an inline comment (#11591 (comment)), so that it doesn't get hidden by GH. Sorry for not giving this explanation upfront. So, the basic idea is as outlined in #11495 (comment): to calculate peak memory usage of a query.
Now, the ugly part. In current cost calculation code, PlanNodeCostEstimate conveniently represents two things:
It's no wonder we reuse single class for this, as currently total cost of the plan subtree is just a sum of local costs of all nodes.
For now, I didn't address the design part at all. I am just conveniently reusing |
I've reorganized the code a bit. @arhimondr @rschlussel here's how it should be read:
|
Note: first commit ( |
@findepi Piotr, there is a lot of discussion of the proposed design scattered through the comments. Would you consolidate these into PR description and keep it up-to-date as the discussion evolves. This will allow more robust and careful review of the final design (and assist in reviewing the implementation). A have a few questions:
Thoughts? |
This is exactly what we used to have.
Yes, this is the case wherever we have kind of long-lived buffering: build side of a join, final/single aggregation, sort, TopN (... ?) Does it help? |
Do you have a few concrete examples? I understand this in principle, but I'm having hard time constructing such a query. I'm also trying to figure out how to scan a lot of query plans and automatically detect potential beneficiaries of this change? Also, do you happen to know what's the difference between memory cost and peak memory of a node? |
I don't have context of this work. But here is an example. In the following example, peak memory is
|
|
||
private static class MemoryEstimate | ||
{ | ||
private final double maxMemoryBeforeOutputting; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comments/javadoc about what all these things are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say something like "max memory allocation before node starts outputting". Would that be more helpful?
@Override | ||
protected MemoryEstimate visitPlan(PlanNode node, EstimationContext context) | ||
{ | ||
if (context.sourcesCosts.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just tablescans and values nodes? I'd rather have explicit methods for those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe i was just lazy.... OTOH even com.facebook.presto.cost.CostCalculatorUsingExchanges.CostEstimator#visitPlan
has "generic" handling (or lack thereof) of many node types.
I will try to get rid of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to get rid of it.
Just note: getting rid of this shortcut may lead to having many copy-paste-like methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok leaving it if leaf types would always have this kind of cost, just leave a comment explaining why.
} | ||
|
||
// A PlanNode that is estimated to use memory should have an explicit rule here. | ||
verify(isNaN(context.localMaxMemory) || context.localMaxMemory == 0, "Unexpected local memory usage estimation for %s: %s", node, context.localMaxMemory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what sort of plan shape would get us here? Also, when would you get 0 vs. NaN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what sort of plan shape would get us here?
I don't understand the question.
Also, when would you get 0 vs. NaN?
NaN is when CostCalculatorUsingExchanges
or CostCalculatorWithEstimatedExchanges
cannot calculate the local cost of the plan node.
0 is when they can calculate and consider it 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I guess I'm trying to understand why all things with 0/NaN cost should have their memory output treated the same, but there shouldn't be a default for things with cost estimates.
- If the local cost estimate is NaN, MemoryEstimate should be NaN.
PlanCostEstimate sourcesCost = context.sourcesCosts.stream() | ||
.reduce(PlanCostEstimate.zero(), CachingCostProvider::addSiblingsCost); | ||
|
||
return new MemoryEstimate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these the costs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just asking if you could explain where the numbers come from.
presto-main/src/main/java/com/facebook/presto/cost/CachingCostProvider.java
Outdated
Show resolved
Hide resolved
.reduce(PlanCostEstimate.zero(), CachingCostProvider::addSiblingsCost); | ||
|
||
return new MemoryEstimate( | ||
sourcesCost.getMaxMemory(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain this?
presto-main/src/main/java/com/facebook/presto/cost/PlanCostEstimate.java
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/cost/PlanCostEstimate.java
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/cost/PlanCostEstimate.java
Show resolved
Hide resolved
@findepi Is this ready to be reviewed? |
@arhimondr yes, but i am yet to apply some comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got feeling that stas/cost calculations become too tangled in this PR. I think we need to modify CostCalculator
so that it computes cumulative cost. Then CostCalculator
could also derive peek memory estimation. It seems natural place for such computation instead of CachingCostProvider
.
I also think it would be better if we simply added peak memory estimate (possibly package-private) to PlanNodeCostEstimate
. Ideally mem and cost estimates should be evaluated by different calculators, but it's not needed now.
PlanCostEstimate cumulativeCost = cache.get(node); | ||
if (cumulativeCost == null) { | ||
cumulativeCost = calculateCumulativeCost(node); | ||
verify(cache.put(node, cumulativeCost) == null, "Cost already set"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can it be set when you just check cache.get
is null
import static com.google.common.base.Preconditions.checkArgument; | ||
import static java.lang.Double.isNaN; | ||
|
||
public final class PlanCostEstimate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is artificial class. I think better would be to:
- rename it to something more explicit:
PlanNodeMemoryEstimate
that would store just memory estimates
- rename it to
PlanNodeMemoryEstimate
and make it include
PlanNodeCostEstimate otherCosts
- Simply store memory estimates as part of
PlanNodeCostEstimate
I prefer 3)
@@ -36,7 +47,7 @@ | |||
private final Session session; | |||
private final TypeProvider types; | |||
|
|||
private final Map<PlanNode, PlanNodeCostEstimate> cache = new IdentityHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logically it seems we are missing one explicit calculator for the stuff here to be clean. We should have 3 calcs:
StatsCalculator
PeakMemoryCalculator
CostCalculator
- depends on 1) and 2).
- depends on 1)
For now I think we can make PlanNodeCostEstimate
contain memory estimates also as a package private fields
return new PlanCostEstimate( | ||
a.getCpuCost() + b.getCpuCost(), | ||
a.getMaxMemory() + b.getMaxMemory(), | ||
a.getMaxMemoryWhenOutputting() + b.getMaxMemoryWhenOutputting(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put NaN
here as this value has no meaning
extends PlanVisitor<MemoryEstimate, EstimationContext> | ||
{ | ||
@Override | ||
protected MemoryEstimate visitPlan(PlanNode node, EstimationContext context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context suggest there is some recursion here, but there isn't. Just add fields to PeakMemoryEstimator
:
double operatorMemory;
List<PlanCostEstimate> sourcesCosts;
or:
PlanCostEstimate operatorCost;
List<PlanCostEstimate> sourcesCosts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would that be any better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't suggest there is a recursion, so easier to read and understand.
{ | ||
PlanNodeCostEstimate localCosts = costCalculator.calculateCost(node, statsProvider, session, types); | ||
PlanNodeCostEstimate localCost = costCalculator.calculateCost(node, statsProvider, session, types); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To compute cumulative PlanNodeCostEstimate
we need to compute PlanCostEstimate
.
To compute PlanCostEstimate
we need to compute PlanNodeCostEstimate
(this time locally) and PlanNodeStatsEstimate
.
So to compute PlanNodeCostEstimate
(cumulative) we need to compute PlanNodeCostEstimate
(individual).
Can we untangle it?
Let's not derive EstimationContext#localMaxMemory
from PlanNodeCostEstimate
, but rather compute it directly in PeakMemoryEstimator
. It seems natural place for such computation
presto-main/src/main/java/com/facebook/presto/cost/CachingCostProvider.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments not addressed, see: #11591 (comment)
Non-changes in this commit: - `CostCalculator`s still calculate local `[cpu, memory, network]` cost, as `PlanNodeCostEstimate` - `CachingCostProvider` still returns cumulative `[cpu, memory, network]` cost, as `PlanNodeCostEstimate` Changes in this commit: - a new type is introduced, `PlanCostEstimate` which holds more information than `PlanNodeCostEstimate` - this new type is used by `CachingCostProvider` (the only `CostProvider` implementation) to calculate peak memory usage - this new type is kept in `Memo` as well, since `Memo` serves as storage space for `CachingCostProvider` wherever it encounters a `GroupReference` - `PlanNodeCostEstimate#memory` returned from `CostProvider` is now an estimation of peak memory usage, rather than just a sum of `PlanNodeCostEstimate#memory` cost components of all sub-nodes
I discussed with @findepi and decided that we could move forward with this PR (as it brings useful feature) with a possibility to extract abstractions later on. Next step to abstract
In that regard caching won't be specific to any particular trait. |
|
||
return new PlanCostEstimate( | ||
localCost.getCpuCost() + sourcesCost.getCpuCost(), | ||
Math.max(sourcesCost.getMaxMemory(), Math.max(memoryEstimate.maxMemoryBeforeOutputting, memoryEstimate.maxMemoryWhenOutputting)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static import max
@@ -35,7 +77,7 @@ | |||
private final Session session; | |||
private final TypeProvider types; | |||
|
|||
private final Map<PlanNode, PlanNodeCostEstimate> cache = new IdentityHashMap<>(); | |||
private final Map<PlanNode, PlanCostEstimate> cache = new IdentityHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that what is called PlanCostEstimate
must be called PlanNodeCostEstimate
. And what is currently called PlanNodeCostEstimate
must be called PlanCostEstimate
verify(memoryEstimate != null); | ||
|
||
PlanCostEstimate sourcesCost = sourcesCosts.stream() | ||
.reduce(PlanCostEstimate.zero(), CachingCostProvider::addSiblingsCost); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move addSiblingsCost
to PlanCostEstimate
, and call it sum
private static class EstimationContext | ||
{ | ||
private final double localMaxMemory; | ||
private final List<PlanCostEstimate> sourcesCosts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that MemoryEstimate
can (should?) be used here.
public MemoryEstimate visitTopN(TopNNode node, EstimationContext context) | ||
{ | ||
// TODO | ||
return new MemoryEstimate(NaN, NaN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean that if there's any TopNNode
in the plan - the memory estimate will be unknown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so - i'm not sure how useful the current implementation is. Many nodes cannot be processed based on the memory estimates. So, perhaps we need something similar as explained here: #11591 (comment). In other words it looks like this code should be inside of the CostCalculator
. I would only change the trait model a bit
Stats[parent] = FStats(Stats[children]) - nothing changes, that how it is implemented right now
Cost[parent] = FCost(Stats[parent], Stats[children], Cost[children])
And that should be CostCalculators
responsibility to compute everything, peak-memory/cpu/network cost. So it looks like the CostCalculator#calculateCost
interface should accept the CostProvider
(similarly to StatsProvider). And than based on child costs the peak memory can be estimated.
Draft of peak memory calculations. Does not compile yet, as this is only to share the idea of the approach.
Related to the discussion under #11495
cc @sopel39 @kokosing @mbasmanova