New scheduling directive: compute_with#1546
Conversation
test/correctness/compute_with.cpp
Outdated
| @@ -0,0 +1,745 @@ | |||
| #include "Halide.h" | |||
| #include "../common/check_call_graphs.h" | |||
There was a problem hiding this comment.
Please avoid using relative include paths with #include; it makes working with Bazel harder than it could be. Use the pattern we're now using elsewhere:
#include "test/common/check_call_graphs.h"
7b29e10 to
1c391e1
Compare
|
If a Maybe I am trying to ask with this PR when to use |
|
The idea with compute_with is to allow multiple updates of a Func within the same loop nest (e.g. if you want to schedule the updates within the same GPU loop nest) as long as there are no loop-carried dependencies. You can't do this with compute_at, since it is per Func granularity (each update will be within separate loop nests). |
| vec = 64; | ||
| } | ||
| for (Func f : intermediates) { | ||
| auto apply_schedule_to_intermediate = [&](Func& f){ |
There was a problem hiding this comment.
Since the method this is in is called "schedule" I think we can name lambdas like this "schedule_intermediate" or perhaps "fold_vectorize_and_place" if a more descriptive name seems useful.
src/BoundsInference.cpp
Outdated
| const auto iter = std::find_if(fused_groups[i].begin(), fused_groups[i].end(), | ||
| [&producing_func](const Function& f) { return (f.name() == producing_func.name()); }); | ||
| if (iter != fused_groups[i].end()) { | ||
| index = i; |
There was a problem hiding this comment.
This will not pass strict signed/unsigned conversion checks, which I'd like to turn on at some point. I think the clearest way to do this with std algorithms is an outer std::find_if and a std::any_of nested inside.
zvookin
left a comment
There was a problem hiding this comment.
Have not reviewed ScheduleFunctions.cpp yet.
src/BoundsInference.cpp
Outdated
| const vector<Dim> &dims = (producing_stage == 0) ? producing_func.definition().schedule().dims() : | ||
| producing_func.update(producing_stage-1).schedule().dims(); | ||
|
|
||
| int var_index; |
src/BoundsInference.cpp
Outdated
| funcs[i] = env.find(order[i])->second; | ||
| } | ||
|
|
||
| vector<vector<Function>> fused_func_groups(fused_groups.size()); |
There was a problem hiding this comment.
This would likely be clearer using range based for loops and putting the values directly into the result vector. (You can use resize or reserve/emplace to control the sizes.)
(I get that this feedback is a bit scratching the surface, but the general gist is that there's a mixture of container programming style going on here, and it's making it a bit tricky to follow the logic. Hence I'm adding some comments as I go.)
src/BoundsInference.cpp
Outdated
| if (starts_with(op->name, stages[i].stage_prefix)) { | ||
| producing = i; | ||
| f = stages[i].func; | ||
| stage_index = stages[i].stage; |
src/Func.h
Outdated
| /** A single definition of a Func. May be a pure or update definition. */ | ||
| class Stage { | ||
| /** Reference to the Function this stage (or definition) belongs to. */ | ||
| Internal::Function func; |
There was a problem hiding this comment.
Andrew and I would prefer if we didn't abbreviate to "func" for a thing that is a Function (not a Func). Please consider either spelling out "function" or maybe using "fun."
src/Func.h
Outdated
| Internal::Function func; | ||
| Internal::Definition definition; | ||
| std::string stage_name; | ||
| size_t stage; |
There was a problem hiding this comment.
Is this field the index in the stage list? Is there a better name.
At some level, it is starting to feel like maybe we want a more expressive data structure, such as an actual graph or perhaps indexable maps. I don't think this is necessary for this PR, but we should think about it a little.
| Auto | ||
| }; | ||
|
|
||
| /** Different ways to handle the case when the start/end of the loops of stages |
There was a problem hiding this comment.
Definitely need more detail, but I think putting examples in the comment on the compute_with directive is the best way to do that. (Can also link to a tutorial I suppose.)
src/Schedule.h
Outdated
| * recursively injecting realizations for them at particular sites | ||
| * in this loop nest. A LoopLevel identifies such a site. */ | ||
| * in this loop nest. A LoopLevel identifies such a site. The site | ||
| * can either be a specific loopness within all stages of a function |
There was a problem hiding this comment.
I'm not sure what "specific loopness" is. Is it meant to be "loop nest" or does this mean "a loop over over the same dimension across all stages of a function"? ("Loop nest" would imply a set of loops all of which are on the same var or dim I think, but it isn't quite clear to me even if that is the wording.)
src/Schedule.h
Outdated
| bool fold_forward; | ||
| }; | ||
|
|
||
| /** This indicates two function stages which loopness are fused from outermost |
There was a problem hiding this comment.
Perhaps: "This represents two stages with fused loop nests from outermost to a specific loop level. The loops to compute func_1(stage_1) are fused with the loops to compute func_2(stage_2) from outermost to loop level var_name and the computation from stage_1 of func_1 occurs first."
src/Schedule.h
Outdated
| /** Until which loop level (starting from outermost) we should fuse | ||
| * computation of this function stage with another function stage? The | ||
| * function we are fusing this function with and this function should | ||
| * be independent of each other. See \ref Func::compute_with and |
There was a problem hiding this comment.
What does "independent of each other" mean? They're not required to be distinct are they?
src/Schedule.h
Outdated
| std::vector<PrefetchDirective> &prefetches(); | ||
| // @} | ||
|
|
||
| /** Until which loop level (starting from outermost) we should fuse |
There was a problem hiding this comment.
"Innermost loop level of fused loop nest for this function stage. Fusion runs from outermost to this loop level." (Definitely remove question mark unless I'm missing something.)
src/ScheduleFunctions.cpp
Outdated
| }; | ||
|
|
||
| bool var_name_match(string v1, string v2) { | ||
| if (v1 == v2) return true; |
There was a problem hiding this comment.
I really do not like if statements without braces. (Long conversation, but at the time there was a rash of SSL bugs related to bad else balancing, I decided the language construct is simply indefensible. If one is programming control flow, use the braces. If not, use boolean logic or some other mechanism.) In this case a single return statement and a concatenation of || operators is likely what you want.
src/ScheduleFunctions.cpp
Outdated
| // otherwise, this may mess up the bounds_touched computation. | ||
| int n_predicates_inner = 0; | ||
| for (int i = start_fuse; (i >= 0) && (i < (int)stage_s.dims().size()-1); ++i) { | ||
| const Dim &dim = stage_s.dims()[i]; |
There was a problem hiding this comment.
It appears every use of "dim" here is in "prefix + dim.var" so it might be clearer to just make a temporary on that expression.
src/ScheduleFunctions.cpp
Outdated
|
|
||
| void visit(const Realize *op) { | ||
| IRVisitor::visit(op); | ||
| if (op->name == func) result = true; |
There was a problem hiding this comment.
Prefer "result = result || (op->name == func)"
|
@abadams PTAL |
src/RealizationOrder.cpp
Outdated
|
|
||
| // Check to see if there is a loop-carried dependence in a 'func' update | ||
| // definition with the previous definition it's computed with. | ||
| bool has_loop_carried_dependence(const string &func, const vector<Expr> &prev_args, |
There was a problem hiding this comment.
I'm concerned about this case:
f(x) = x;
RDom r(0, 3);
f(r) += 3;
f(r+1) *= 2;
Fusing on r changes the output
4d3bc4f to
dfe49b5
Compare
dsharletg
left a comment
There was a problem hiding this comment.
Really nice work on the tests!
src/Func.h
Outdated
| * stage 's' from outermost loop to a given LoopLevel. 'this' stage will | ||
| * be computed AFTER 's' in the innermost fused dimension. There should not | ||
| * be any dependencies between those two fused stages. They should not have | ||
| * extern definitions either. An update stage can be computed with its |
There was a problem hiding this comment.
Can you clarify the comment regarding extern definitions?
src/Func.h
Outdated
| * be computed AFTER 's' in the innermost fused dimension. There should not | ||
| * be any dependencies between those two fused stages. They should not have | ||
| * extern definitions either. An update stage can be computed with its | ||
| * immediate preceeding stage given that there is no loop-carried |
There was a problem hiding this comment.
Is the comment regarding update stages still correct given abadams@ recent question and subsequent tweaks?
There was a problem hiding this comment.
compute_with of update stages of the same Func is now disabled.
src/Schedule.h
Outdated
| /** Shift the end of the fused loops to align. */ | ||
| AlignEnd, | ||
|
|
||
| /** compute_with will make no attemp to align the start/end of the |
src/BoundsInference.cpp
Outdated
| return true; | ||
| } | ||
|
|
||
| // Determine if the current producing stage are fused with other |
| if (y < size/2-1): | ||
| g(x, y) = input(2*x, 2*y) | ||
| \endcode | ||
| * |
| return contents->func_name; | ||
| } | ||
|
|
||
| int LoopLevel::stage_index() const { |
There was a problem hiding this comment.
There will be a nontrivial merge collision between this and #2504, but I don't think that it will be hard to resolve.
There was a problem hiding this comment.
What is causing the collision?
src/Schedule.h
Outdated
| EXPORT LoopLevel(Internal::Function f, VarOrRVar v); | ||
| EXPORT LoopLevel(Func f, VarOrRVar v); | ||
| EXPORT LoopLevel(Internal::Function f, VarOrRVar v, int stage_level = -1); | ||
| EXPORT LoopLevel(Func f, VarOrRVar v, int stage_level = -1); |
There was a problem hiding this comment.
This is a user-facing ctor: should a user ever pass anything other than -1? What happens if they do?
| DeviceAPI device_api = op->device_api; | ||
| if (is_one(extent_val)) { | ||
| for_type = ForType::Serial; | ||
| device_api = DeviceAPI::None; |
There was a problem hiding this comment.
Not at all obvious that this is safe. A GPUBlocks loop of extent one is different to a Hexagon loop of extent one is different to a serial loop of extent 1. I assume it's safe because the loops have already been checked for compatibility?
2d3ae5e to
03f2cb5
Compare
2f84851 to
2d3ae5e
Compare
4e3115f to
959dbcc
Compare
3509efc to
6003aa9
Compare
|
PTAL. compute_with of stages of the same Func is now disabled. Only fusion of stages with no producer/consumer relationship is allowed. |
dsharletg
left a comment
There was a problem hiding this comment.
Just took a quick skim since I've reviewed this once before. I have only minor comments, except for one recurring one: I still see code to support fusing updates from the same func, weren't we going to drop that capability?
| for (const auto &fn : order) { | ||
| if (visited.find(fn) == visited.end()) { | ||
| vector<string> group; | ||
| find_fused_groups_dfs(fn, fuse_adjacency_list, visited, group); |
There was a problem hiding this comment.
I think doing this might make the implementation of find_fused_groups_dfs a bit more complicated/less efficient (line 33 would need to append the result of that recursive call to it's own internal copy).
src/RealizationOrder.cpp
Outdated
| for (const auto &iter : fused_pairs_graph) { | ||
| for (const auto &pair : iter.second) { | ||
| if (pair.func_1 == pair.func_2) { | ||
| // compute_with among stages of a function is okay, |
There was a problem hiding this comment.
Is this old? Didn't we decide this is not OK?
There was a problem hiding this comment.
Which internal copy are you referring to?
src/RealizationOrder.h
Outdated
| * the realization order and the fused groups in that order. | ||
| */ | ||
| std::pair<std::vector<std::string>, std::vector<std::vector<std::string>>> realization_order( | ||
| const std::vector<Function> &output, std::map<std::string, Function> &env); |
There was a problem hiding this comment.
Identation looks off here.
src/Schedule.h
Outdated
|
|
||
| /** Different ways to handle the case when the start/end of the loops of stages | ||
| * computed with (fused) are not aligned. */ | ||
| enum class AlignStrategy { |
There was a problem hiding this comment.
Maybe we should name this LoopAlignStrategy, because AlignStrategy sounds like something we might want to describe a memory alignment option. It also seems like a more accurate description of what it's doing.
src/ScheduleFunctions.cpp
Outdated
| // computed with are the results of the same application of splits/renames/etc. | ||
| // Also, if it is a split dimension, verify that it doesn't use ShiftInwards | ||
| // as tail strategy since this may affect correctness. | ||
| if (p.func_1 == p.func_2) { // Update and its preceding stage are fused |
There was a problem hiding this comment.
Weren't we going to remove the ability to fuse stages from the same func?
|
Looks like a few of the build bot issues are legit (e.g. AlignStrategy -> LoopAlignStrategy in camera_pipe). |
|
Failures look unrelated and enough build bots passed, merging this! |
This PR added new scheduling directive,
compute_with, which fuses iteration over some stage of a function with another stage from outermost loop to a given loop level, e.g.g.compute_with(f, y)is equivalent to:The granularity of
compute_withis function definition (stage). The stage on whichcompute_withis called will be executed AFTER the stage it's computed with in the innermost fused loop. All fused stages have to have the same schedule from outermost to the innermost fused loop dimension.NOTE: compute_with of update stages of the same Func is now disabled since it is non-trivial to prove that there are no loop-carry dependencies.