Recommended way of designing library functions #7655

dragly · 2023-06-23T08:31:17Z

dragly
Jun 23, 2023

As our Halide-based code base is growing, we are collecting more common code into libraries of reusable functions. However, we are struggling with some problems related to scheduling and making functions generic.

Scheduling

The library functions typically return a Func as a result. The result often depends on a few intermediate Funcs that use an RDom. For performance, it becomes important that these intermediate Funcs are not computed inline, since the result is often used in multiple places down the pipeline.

Since we often use gpu_tile on a Func further down the pipeline, we have found that the easiest strategy to avoid inline computation is to throw in a compute_at(final, x_inner) on the intermediate Funcs. However, we do not know which Func to pass as the first argument to compute_at in the library code.

To solve this, we have a few strategies with pros and cons:

Return all the intermediate Funcs back to the caller for later scheduling. This unfortunately makes all callers of the library dependent on the implementation details.
Return a schedule(func, var) callback function that invokes compute_at(func, var) on every intermediate Func. This is our current strategy, but unfortunately makes the scheduling fixed to use compute_at for everything.

Functions generic over `Var`s

Another problem we have is that every Func made in the library code is immediately fixed to some particular Vars. Are there any good strategies to make the library code generic over the Vars? The only option I have come up with so far is to pass a list of Vars into the function itself. I know there is an internal FindFreeVars function, but this is not available in the Python API. Using internal functions also seems a bit risky in terms of future compatibility.

Linear algebra types

Finally, we have a few linear algebra types that hold Expr values, such as Vector3:

@dataclass
class Vector3:
    x: Expr
    y: Expr
    z: Expr

This makes it easy to build a nice library of operations such as summation, dot products, cross products, and calculating magnitudes. However, a problem we run into is that because we are using Expr, everything gets inlined. Larger operations tend to include a lot of duplicate calculations. We have yet to find a nice way to break these calculations up and schedule intermediate calculations separately.

Does anyone have some good advice on structuring a code base in Halide with reusable functions?

steven-johnson · 2023-06-23T17:46:36Z

steven-johnson
Jun 23, 2023
Maintainer

Yes, this is unfortunately a problem that we don't have a great solution for. The most common solution for this inside Google is basically your second strategy, but using a LoopLevel instead (which makes it easier to specify root or inline), e.g.

using SchedulerFn = std::function<void(const Halide::LoopLevel&, const Halide::Target&)>;

std::pair<Halide::Func, SchedulerFn> MyHalideLibrary(...) {
  
  Func f = ...;

  auto scheduler = [=](const Halide::LoopLevel& compute_at, const Halide::Target&, target) {
    f.compute_at(compute_at);
    if (target.has_feature(...)) {
        // do something special
    }

  };

  return {f, scheduler};
}

This makes things a bit more flexible, but still is unsatisfying, as it requires more invasive knowledge (by both caller and callee) than would be ideal.

At this point, I suspect a better solution to this problem would be to continue to develop the autoscheduler(s), to produce schedules that are genuinely high-quality across the board, and easier to integrate into build systems. (Anecdotally, the Adams2019 autoscheduler is believed to produce good schedule for desktop-class x86-64 systems, but not so much on other systems; at a minimum, we should really generate training data for arm64 as well.)

0 replies

abadams · 2023-06-23T17:56:01Z

abadams
Jun 23, 2023
Maintainer

I usually use a LoopLevel, but not with the lambda. You can schedule things to be compute_at a LoopLevel before the loop level is bound, so you can schedule eagerly instead of needing to defer it.

0 replies

dragly · 2023-07-31T11:17:00Z

dragly
Jul 31, 2023
Author

Thanks for the answers! I had somehow overlooked LoopLevel altogether. I also like the idea of skipping the lambda and perform scheduling eagerly.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommended way of designing library functions #7655

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Recommended way of designing library functions #7655

dragly Jun 23, 2023

Scheduling

Functions generic over Vars

Linear algebra types

Replies: 3 comments

steven-johnson Jun 23, 2023 Maintainer

abadams Jun 23, 2023 Maintainer

dragly Jul 31, 2023 Author

dragly
Jun 23, 2023

Functions generic over `Var`s

steven-johnson
Jun 23, 2023
Maintainer

abadams
Jun 23, 2023
Maintainer

dragly
Jul 31, 2023
Author