Replies: 3 comments
-
Yes, this is unfortunately a problem that we don't have a great solution for. The most common solution for this inside Google is basically your second strategy, but using a
This makes things a bit more flexible, but still is unsatisfying, as it requires more invasive knowledge (by both caller and callee) than would be ideal. At this point, I suspect a better solution to this problem would be to continue to develop the autoscheduler(s), to produce schedules that are genuinely high-quality across the board, and easier to integrate into build systems. (Anecdotally, the Adams2019 autoscheduler is believed to produce good schedule for desktop-class x86-64 systems, but not so much on other systems; at a minimum, we should really generate training data for arm64 as well.) |
Beta Was this translation helpful? Give feedback.
-
I usually use a LoopLevel, but not with the lambda. You can schedule things to be compute_at a LoopLevel before the loop level is bound, so you can schedule eagerly instead of needing to defer it. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the answers! I had somehow overlooked |
Beta Was this translation helpful? Give feedback.
-
As our Halide-based code base is growing, we are collecting more common code into libraries of reusable functions. However, we are struggling with some problems related to scheduling and making functions generic.
Scheduling
The library functions typically return a
Func
as a result. The result often depends on a few intermediateFunc
s that use anRDom
. For performance, it becomes important that these intermediateFunc
s are not computed inline, since the result is often used in multiple places down the pipeline.Since we often use
gpu_tile
on aFunc
further down the pipeline, we have found that the easiest strategy to avoid inline computation is to throw in acompute_at(final, x_inner)
on the intermediateFunc
s. However, we do not know whichFunc
to pass as the first argument tocompute_at
in the library code.To solve this, we have a few strategies with pros and cons:
Func
s back to the caller for later scheduling. This unfortunately makes all callers of the library dependent on the implementation details.schedule(func, var)
callback function that invokescompute_at(func, var)
on every intermediateFunc
. This is our current strategy, but unfortunately makes the scheduling fixed to usecompute_at
for everything.Functions generic over
Var
sAnother problem we have is that every
Func
made in the library code is immediately fixed to some particularVar
s. Are there any good strategies to make the library code generic over theVar
s? The only option I have come up with so far is to pass a list ofVar
s into the function itself. I know there is an internalFindFreeVars
function, but this is not available in the Python API. Using internal functions also seems a bit risky in terms of future compatibility.Linear algebra types
Finally, we have a few linear algebra types that hold
Expr
values, such asVector3
:This makes it easy to build a nice library of operations such as summation, dot products, cross products, and calculating magnitudes. However, a problem we run into is that because we are using
Expr
, everything gets inlined. Larger operations tend to include a lot of duplicate calculations. We have yet to find a nice way to break these calculations up and schedule intermediate calculations separately.Does anyone have some good advice on structuring a code base in Halide with reusable functions?
Beta Was this translation helpful? Give feedback.
All reactions