Skip to content

Comments

New scheduling directive: compute_with#1546

Merged
dsharletg merged 7 commits intomasterfrom
compute_with_directive
Dec 18, 2017
Merged

New scheduling directive: compute_with#1546
dsharletg merged 7 commits intomasterfrom
compute_with_directive

Conversation

@psuriana
Copy link
Contributor

@psuriana psuriana commented Oct 14, 2016

This PR added new scheduling directive, compute_with, which fuses iteration over some stage of a function with another stage from outermost loop to a given loop level, e.g. g.compute_with(f, y) is equivalent to:

for y = union of f and g bounds
   for x = ...
      if (y within f bound):
        compute f
   for x = ...
      if (y within g bound):
        compute g

The granularity of compute_with is function definition (stage). The stage on which compute_with is called will be executed AFTER the stage it's computed with in the innermost fused loop. All fused stages have to have the same schedule from outermost to the innermost fused loop dimension.

NOTE: compute_with of update stages of the same Func is now disabled since it is non-trivial to prove that there are no loop-carry dependencies.

@@ -0,0 +1,745 @@
#include "Halide.h"
#include "../common/check_call_graphs.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid using relative include paths with #include; it makes working with Bazel harder than it could be. Use the pattern we're now using elsewhere:

 #include "test/common/check_call_graphs.h"

@jingpu
Copy link
Contributor

jingpu commented Jun 16, 2017

If a compute_with schedule can fuse an initial definition and an update definition, it looks like it is changing the interleaving granularity of a producer-consumer pair. Would it be better to express this intent using a compute_at?

Maybe I am trying to ask with this PR when to use compute_with versus compute_at.

@psuriana
Copy link
Contributor Author

The idea with compute_with is to allow multiple updates of a Func within the same loop nest (e.g. if you want to schedule the updates within the same GPU loop nest) as long as there are no loop-carried dependencies. You can't do this with compute_at, since it is per Func granularity (each update will be within separate loop nests).

vec = 64;
}
for (Func f : intermediates) {
auto apply_schedule_to_intermediate = [&](Func& f){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the method this is in is called "schedule" I think we can name lambdas like this "schedule_intermediate" or perhaps "fold_vectorize_and_place" if a more descriptive name seems useful.

const auto iter = std::find_if(fused_groups[i].begin(), fused_groups[i].end(),
[&producing_func](const Function& f) { return (f.name() == producing_func.name()); });
if (iter != fused_groups[i].end()) {
index = i;
Copy link
Member

@zvookin zvookin Jun 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not pass strict signed/unsigned conversion checks, which I'd like to turn on at some point. I think the clearest way to do this with std algorithms is an outer std::find_if and a std::any_of nested inside.

Copy link
Member

@zvookin zvookin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have not reviewed ScheduleFunctions.cpp yet.

const vector<Dim> &dims = (producing_stage == 0) ? producing_func.definition().schedule().dims() :
producing_func.update(producing_stage-1).schedule().dims();

int var_index;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be size_t.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or possibly ptrdiff_t.

funcs[i] = env.find(order[i])->second;
}

vector<vector<Function>> fused_func_groups(fused_groups.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would likely be clearer using range based for loops and putting the values directly into the result vector. (You can use resize or reserve/emplace to control the sizes.)

(I get that this feedback is a bit scratching the surface, but the general gist is that there's a mixture of container programming style going on here, and it's making it a bit tricky to follow the logic. Hence I'm adding some comments as I go.)

if (starts_with(op->name, stages[i].stage_prefix)) {
producing = i;
f = stages[i].func;
stage_index = stages[i].stage;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast size_t to int?

src/Func.h Outdated
/** A single definition of a Func. May be a pure or update definition. */
class Stage {
/** Reference to the Function this stage (or definition) belongs to. */
Internal::Function func;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Andrew and I would prefer if we didn't abbreviate to "func" for a thing that is a Function (not a Func). Please consider either spelling out "function" or maybe using "fun."

src/Func.h Outdated
Internal::Function func;
Internal::Definition definition;
std::string stage_name;
size_t stage;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this field the index in the stage list? Is there a better name.

At some level, it is starting to feel like maybe we want a more expressive data structure, such as an actual graph or perhaps indexable maps. I don't think this is necessary for this PR, but we should think about it a little.

Auto
};

/** Different ways to handle the case when the start/end of the loops of stages
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely need more detail, but I think putting examples in the comment on the compute_with directive is the best way to do that. (Can also link to a tutorial I suppose.)

src/Schedule.h Outdated
* recursively injecting realizations for them at particular sites
* in this loop nest. A LoopLevel identifies such a site. */
* in this loop nest. A LoopLevel identifies such a site. The site
* can either be a specific loopness within all stages of a function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what "specific loopness" is. Is it meant to be "loop nest" or does this mean "a loop over over the same dimension across all stages of a function"? ("Loop nest" would imply a set of loops all of which are on the same var or dim I think, but it isn't quite clear to me even if that is the wording.)

src/Schedule.h Outdated
bool fold_forward;
};

/** This indicates two function stages which loopness are fused from outermost
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps: "This represents two stages with fused loop nests from outermost to a specific loop level. The loops to compute func_1(stage_1) are fused with the loops to compute func_2(stage_2) from outermost to loop level var_name and the computation from stage_1 of func_1 occurs first."

src/Schedule.h Outdated
/** Until which loop level (starting from outermost) we should fuse
* computation of this function stage with another function stage? The
* function we are fusing this function with and this function should
* be independent of each other. See \ref Func::compute_with and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "independent of each other" mean? They're not required to be distinct are they?

src/Schedule.h Outdated
std::vector<PrefetchDirective> &prefetches();
// @}

/** Until which loop level (starting from outermost) we should fuse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Innermost loop level of fused loop nest for this function stage. Fusion runs from outermost to this loop level." (Definitely remove question mark unless I'm missing something.)

Copy link
Member

@zvookin zvookin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more.

};

bool var_name_match(string v1, string v2) {
if (v1 == v2) return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really do not like if statements without braces. (Long conversation, but at the time there was a rash of SSL bugs related to bad else balancing, I decided the language construct is simply indefensible. If one is programming control flow, use the braces. If not, use boolean logic or some other mechanism.) In this case a single return statement and a concatenation of || operators is likely what you want.

// otherwise, this may mess up the bounds_touched computation.
int n_predicates_inner = 0;
for (int i = start_fuse; (i >= 0) && (i < (int)stage_s.dims().size()-1); ++i) {
const Dim &dim = stage_s.dims()[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears every use of "dim" here is in "prefix + dim.var" so it might be clearer to just make a temporary on that expression.


void visit(const Realize *op) {
IRVisitor::visit(op);
if (op->name == func) result = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer "result = result || (op->name == func)"

@psuriana
Copy link
Contributor Author

psuriana commented Oct 6, 2017

@abadams PTAL


// Check to see if there is a loop-carried dependence in a 'func' update
// definition with the previous definition it's computed with.
bool has_loop_carried_dependence(const string &func, const vector<Expr> &prev_args,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about this case:

f(x) = x;
RDom r(0, 3);
f(r) += 3;
f(r+1) *= 2;

Fusing on r changes the output

@psuriana psuriana force-pushed the compute_with_directive branch from 4d3bc4f to dfe49b5 Compare October 31, 2017 20:38
Copy link
Contributor

@dsharletg dsharletg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work on the tests!

src/Func.h Outdated
* stage 's' from outermost loop to a given LoopLevel. 'this' stage will
* be computed AFTER 's' in the innermost fused dimension. There should not
* be any dependencies between those two fused stages. They should not have
* extern definitions either. An update stage can be computed with its
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify the comment regarding extern definitions?

src/Func.h Outdated
* be computed AFTER 's' in the innermost fused dimension. There should not
* be any dependencies between those two fused stages. They should not have
* extern definitions either. An update stage can be computed with its
* immediate preceeding stage given that there is no loop-carried
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comment regarding update stages still correct given abadams@ recent question and subsequent tweaks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_with of update stages of the same Func is now disabled.

src/Schedule.h Outdated
/** Shift the end of the fused loops to align. */
AlignEnd,

/** compute_with will make no attemp to align the start/end of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attempt

return true;
}

// Determine if the current producing stage are fused with other
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/are/is/

if (y < size/2-1):
g(x, y) = input(2*x, 2*y)
\endcode
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

return contents->func_name;
}

int LoopLevel::stage_index() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be a nontrivial merge collision between this and #2504, but I don't think that it will be hard to resolve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is causing the collision?

src/Schedule.h Outdated
EXPORT LoopLevel(Internal::Function f, VarOrRVar v);
EXPORT LoopLevel(Func f, VarOrRVar v);
EXPORT LoopLevel(Internal::Function f, VarOrRVar v, int stage_level = -1);
EXPORT LoopLevel(Func f, VarOrRVar v, int stage_level = -1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a user-facing ctor: should a user ever pass anything other than -1? What happens if they do?

DeviceAPI device_api = op->device_api;
if (is_one(extent_val)) {
for_type = ForType::Serial;
device_api = DeviceAPI::None;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all obvious that this is safe. A GPUBlocks loop of extent one is different to a Hexagon loop of extent one is different to a serial loop of extent 1. I assume it's safe because the loops have already been checked for compatibility?

@abadams abadams force-pushed the compute_with_directive branch from 2d3ae5e to 03f2cb5 Compare December 3, 2017 16:53
@abadams abadams force-pushed the compute_with_directive branch 2 times, most recently from 2f84851 to 2d3ae5e Compare December 4, 2017 20:29
@psuriana psuriana force-pushed the compute_with_directive branch 2 times, most recently from 4e3115f to 959dbcc Compare December 7, 2017 22:40
@psuriana psuriana force-pushed the compute_with_directive branch from 3509efc to 6003aa9 Compare December 8, 2017 00:58
@psuriana
Copy link
Contributor Author

psuriana commented Dec 8, 2017

PTAL. compute_with of stages of the same Func is now disabled. Only fusion of stages with no producer/consumer relationship is allowed.

Copy link
Contributor

@dsharletg dsharletg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just took a quick skim since I've reviewed this once before. I have only minor comments, except for one recurring one: I still see code to support fusing updates from the same func, weren't we going to drop that capability?

for (const auto &fn : order) {
if (visited.find(fn) == visited.end()) {
vector<string> group;
find_fused_groups_dfs(fn, fuse_adjacency_list, visited, group);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing this might make the implementation of find_fused_groups_dfs a bit more complicated/less efficient (line 33 would need to append the result of that recursive call to it's own internal copy).

for (const auto &iter : fused_pairs_graph) {
for (const auto &pair : iter.second) {
if (pair.func_1 == pair.func_2) {
// compute_with among stages of a function is okay,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this old? Didn't we decide this is not OK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which internal copy are you referring to?

* the realization order and the fused groups in that order.
*/
std::pair<std::vector<std::string>, std::vector<std::vector<std::string>>> realization_order(
const std::vector<Function> &output, std::map<std::string, Function> &env);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identation looks off here.

src/Schedule.h Outdated

/** Different ways to handle the case when the start/end of the loops of stages
* computed with (fused) are not aligned. */
enum class AlignStrategy {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should name this LoopAlignStrategy, because AlignStrategy sounds like something we might want to describe a memory alignment option. It also seems like a more accurate description of what it's doing.

// computed with are the results of the same application of splits/renames/etc.
// Also, if it is a split dimension, verify that it doesn't use ShiftInwards
// as tail strategy since this may affect correctness.
if (p.func_1 == p.func_2) { // Update and its preceding stage are fused
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weren't we going to remove the ability to fuse stages from the same func?

Copy link
Contributor

@dsharletg dsharletg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work!

@dsharletg
Copy link
Contributor

Looks like a few of the build bot issues are legit (e.g. AlignStrategy -> LoopAlignStrategy in camera_pipe).

@dsharletg
Copy link
Contributor

Failures look unrelated and enough build bots passed, merging this!

@dsharletg dsharletg merged commit dabe67e into master Dec 18, 2017
@dsharletg dsharletg deleted the compute_with_directive branch December 20, 2017 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants