-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more opencl math builtins #438
base: main
Are you sure you want to change the base?
Add more opencl math builtins #438
Conversation
Thanks for working on this!
I think the reason was that calling |
test/test_target.py
Outdated
for i, func in enumerate(ternary_funcs)), | ||
) | ||
|
||
_ = knl(cl.CommandQueue(cl_ctx), f=np.zeros(len(ternary_funcs))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be willing to hack together some output checking for these, at least where numpy equivalents exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It ain't pretty, but: 561c5e0
The intention being to raise when mixed dtypes are passed, or to cast the arguments to the proper dtype in CL code? (I'm guessing the latter - so should we have the CL math functions cast their arguments in all cases?) |
The pytato failure was just a spurious CondaHTTPError, but we can just wait for the next pushed commit. |
@inducer, I implemented type casting in |
loopy/target/opencl.py
Outdated
def wrap_in_typecast_lazy(self, actual_dtype, needed_dtype, s): | ||
if needed_dtype.dtype.kind == "b" and actual_dtype().dtype.kind == "f": | ||
_actual = actual_dtype() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea behind wrap_in_typecast_lazy
was to only do type inference (which is what's behind the actual_dtype
callable) if it is clear from the circumstances that it is needed (and avoid in the majority of cases). Since this calls type inference unconditionally, I think wrap_in_typecast_lazy
has outlived its usefulness (It's only used in three spots any more), and we should probably just get rid of it in favor of always using wrap_in_typecast
.
@kaushikcfd, do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I don't see any issues in axing the lazy version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I can take care of it here unless you object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for offering, that'd be great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I don't think it introduced any additional failures - sorry it won't be easy to tell from the CI runs. (Figured it'd be easier to just remove it now before debugging the rest.)
The failures are
Also, (2) (and (3), I guess) pose a question: should this PR apply type casting so broadly? Before, type casting was almost entirely left to be implicit (so far as I can tell). But the ways I can think of to tell the |
I'm not in principle against making casts explicit in the code, with the caveat that I'd like the SNR of the generated code to remain reasonably high. If the code gets overwhelmed by casts, it's no longer useful to look at. The other caveat is cost: I would hope that implicit casts and explicit casts have the same cost. Does your code insert casts in places where you feel they're "unreasonable" to insert? |
loopy/target/c/codegen/expression.py
Outdated
clbl = self.codegen_state.ast_builder.known_callables["pow"] | ||
clbl = clbl.with_types({0: tgt_dtype, 1: exponent_dtype}, | ||
self.codegen_state.callables_table)[0] | ||
|
||
self.codegen_state.seen_functions.add( | ||
SeenFunction( | ||
clbl.name, clbl.name_in_target, | ||
(base_dtype, exponent_dtype), | ||
(tgt_dtype,))) | ||
return var(clbl.name_in_target)(self.rec(expr.base, type_context), | ||
self.rec(expr.exponent, type_context)) | ||
|
||
common_dtype = np.find_common_type( | ||
[], [dtype.numpy_dtype for id, dtype in clbl.arg_id_to_dtype.items() | ||
if (id >= 0 and dtype is not None)]) | ||
from loopy.types import NumpyType | ||
dtype = NumpyType(common_dtype) | ||
inner_type_context = dtype_to_type_context( | ||
self.kernel.target, dtype) | ||
|
||
return var(clbl.name_in_target)( | ||
self.rec(expr.base, inner_type_context, dtype), | ||
self.rec(expr.exponent, inner_type_context, dtype) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this code is duplicating logic and could be replaced with
return self.rec(
var("pow")(expr.base, expr.exponent),
type_context
)
but I get key errors for "pow" in self.codegen_state.callables_table
. (The same holds if I keep the seen_function
business.) Not sure what the deal is here; I'd appreciate any guidance!
Edit: sorry, the code highlighting makes this unclear - this is ExpressionToCExpressionMapper.map_power
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried your version with clbl.name_in_target
? I don't think pow
has a chance of getting turned into a ResolvedFunction
at this stage:
Lines 817 to 836 in ef7c411
class ResolvedFunction(LoopyExpressionBase): | |
""" | |
A function identifier whose definition is known in a :mod:`loopy` program. | |
A function is said to be *known* in a :class:`~loopy.TranslationUnit` if its | |
name maps to an :class:`~loopy.kernel.function_interface.InKernelCallable` | |
in :attr:`loopy.TranslationUnit.callables_table`. Refer to :ref:`func-interface`. | |
.. attribute:: function | |
An instance of :class:`pymbolic.primitives.Variable` or | |
:class:`loopy.library.reduction.ReductionOpFunction`. | |
""" | |
init_arg_names = ("function", ) | |
def __init__(self, function): | |
if isinstance(function, str): | |
function = p.Variable(function) | |
from loopy.library.reduction import ReductionOpFunction | |
assert isinstance(function, (p.Variable, ReductionOpFunction)) | |
self.function = function |
The way I see it, the
known_callables["pow"]
hacks around that.
IMO, recursing on the resolved callable might work, but I'm also not hating the current version.
Maybe @kaushikcfd can add some perspective, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think pow has a chance of getting turned into a ResolvedFunction at this stage:
Yep that's my assessment too. Which is the reason we had the known_callables["pow"]
lying around.
self.rec(expr.base, inner_type_context, dtype),
Sorry, maybe I'm missing the context here, but why do we need the type casts? Isn't C's type-promotion along with the name_in_target
enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, let me link to my comment here (for some reason, I couldn't reply to this thread at the time).
why do we need the type casts?
The multi-arg math functions in CL refuse to promote in the case of mixed-type inputs, raising compilation errors about ambiguous input types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah I see missed that (GH was not allowing to me reply to this as well. waiting for a couple minutes resolved it, strange), thanks for explaining! I'm fine with this approach too, but any reason why the way we had handled for powf(32|64)
wasn't chosen for the other multi-arg callables. Is it because we would end up having too many if-elses
leading to unmaintainable code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my view (just from working on this PR), those premables for pow
were a bit redundant since with_types
already (nominally) implements the logic for type promotion---and it's more generic, e.g., to half precision types, etc. I imagine powf32/powf64
arose for historical reasons, since (unlike the other math builtins) power is its own symbolic primitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and it's more generic, e.g., to half precision types, etc
Agreed that this is more general. I would have loved it if there weren't (float) (0.0f)
, but I guess we could improve on it later.
powf32/powf64 arose for historical reasons
Those had arose just to solve the same issue: 7e37f5e. But not handled quite as neatly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have loved it if there weren't
(float) (0.0f)
Me as well, but at the moment the decision to cast occurs before recursing on the constant itself (at which point a 0
could be replaced with 0.0f
).
Just pushed one more fix for
I'll report back! |
I looked at all instances of casting within
(Also, there's one instance of If we want to tell |
Thanks for taking the time to describe the instances of casting. While some aren't ideal, I can live with all of them.
|
cbbad75
to
6c896ef
Compare
Cool. 6c896ef should fix the remaining failures. Sorry for yet again making the tutorial more verbose... if such prominent and slightly embarrassing casting is a bother, I am happy to refine things now. Otherwise, feel free to ping me if you ever want to revisit this.
Thanks for explaining! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks great. Just a few minor things left.
loopy/target/c/codegen/expression.py
Outdated
clbl = self.codegen_state.ast_builder.known_callables["pow"] | ||
clbl = clbl.with_types({0: tgt_dtype, 1: exponent_dtype}, | ||
self.codegen_state.callables_table)[0] | ||
|
||
self.codegen_state.seen_functions.add( | ||
SeenFunction( | ||
clbl.name, clbl.name_in_target, | ||
(base_dtype, exponent_dtype), | ||
(tgt_dtype,))) | ||
return var(clbl.name_in_target)(self.rec(expr.base, type_context), | ||
self.rec(expr.exponent, type_context)) | ||
|
||
common_dtype = np.find_common_type( | ||
[], [dtype.numpy_dtype for id, dtype in clbl.arg_id_to_dtype.items() | ||
if (id >= 0 and dtype is not None)]) | ||
from loopy.types import NumpyType | ||
dtype = NumpyType(common_dtype) | ||
inner_type_context = dtype_to_type_context( | ||
self.kernel.target, dtype) | ||
|
||
return var(clbl.name_in_target)( | ||
self.rec(expr.base, inner_type_context, dtype), | ||
self.rec(expr.exponent, inner_type_context, dtype) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried your version with clbl.name_in_target
? I don't think pow
has a chance of getting turned into a ResolvedFunction
at this stage:
Lines 817 to 836 in ef7c411
class ResolvedFunction(LoopyExpressionBase): | |
""" | |
A function identifier whose definition is known in a :mod:`loopy` program. | |
A function is said to be *known* in a :class:`~loopy.TranslationUnit` if its | |
name maps to an :class:`~loopy.kernel.function_interface.InKernelCallable` | |
in :attr:`loopy.TranslationUnit.callables_table`. Refer to :ref:`func-interface`. | |
.. attribute:: function | |
An instance of :class:`pymbolic.primitives.Variable` or | |
:class:`loopy.library.reduction.ReductionOpFunction`. | |
""" | |
init_arg_names = ("function", ) | |
def __init__(self, function): | |
if isinstance(function, str): | |
function = p.Variable(function) | |
from loopy.library.reduction import ReductionOpFunction | |
assert isinstance(function, (p.Variable, ReductionOpFunction)) | |
self.function = function |
The way I see it, the
known_callables["pow"]
hacks around that.
IMO, recursing on the resolved callable might work, but I'm also not hating the current version.
Maybe @kaushikcfd can add some perspective, too.
doc/tutorial.rst
Outdated
@@ -560,8 +560,8 @@ Consider this example: | |||
#define lid(N) ((int) get_local_id(N)) | |||
... | |||
for (int i_outer = 0; i_outer <= -1 + (15 + n) / 16; ++i_outer) | |||
for (int i_inner = 0; i_inner <= (-16 + n + -16 * i_outer >= 0 ? 15 : -1 + n + -16 * i_outer); ++i_inner) | |||
a[16 * i_outer + i_inner] = 0.0f; | |||
for (int i_inner = 0; i_inner <= ((char)(-16 + n + -16 * i_outer >= 0) ? 15 : -1 + n + -16 * i_outer); ++i_inner) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at them now, these char
casts are a bit hard to bear. Any chance we can have type inference return the right bool-like type (maybe get it from the target) to avoid these casts being emitted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. It would be correct to cast these to bool
in CL so long as they aren't written to global arrays, right?
(And you wouldn't rather just omit the cast entirely? I wouldn't at all mind revising the type casting logic to allow for more specialization, it's just a matter of choosing how to do so.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be correct to cast these to bool in CL so long as they aren't written to global arrays, right?
Reading/writing GlobalArg
s and reading ValueArg
s would be the main way in which bool
s could find their way in. Are you saying you're thinking of adding hooks to all those paths?
And you wouldn't rather just omit the cast entirely?
Of course! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I misread your first comment here as "cast to bool instead of char." I'll give it (it being something that omits the casting) a stab soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I addressed this by first adding the class attribute use_int8_for_bool
to TargetBase
(let me know if you'd prefer it just on CFamilyTarget
) and adding it to CFamilyTarget
's hash and comparison fields. Type inference queries this to decide whether to return int8
or bool8
(replacing the prior np.int32
, which seemed wrong?). Same modification for ExpressionToCExpressionMapper.map_if
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(sorry for my repeated failure to type correctly, should be good now.)
For some reason I can't reply directly to the comment about I observe that |
Prompted by flake8 complaining about an unused variable, in b4e76d2 I (thought I) corrected the argument types specified to the generated integer power function. This led to the pytato failure. But I don't imagine either the current implementation nor my change are consistent with numpy's type promotion, right? Should the input types be chosen in the same way as, e.g., other |
I'd say so... but isn't that what your code is currently doing? Also, any idea what's causing the (seeming) integer overflows in pytato? |
In |
b6a46f6
to
37ac61e
Compare
Just a conda connection reset error... strangely not the first time today on the "without arg check" job. Mind rerunning it? |
Invited you to the repo, so that you can do that yourself. :) |
I seem to have very bad luck with it, so thanks =D |
Hi @inducer, just wanted to ping this - not to rush or anything - but just to clarify that I believe everything we've discussed has been addressed and this is ready for review. (Also, just FYI, the CI job that runs twice to test caching seems to fairly consistently hit conda HTTP errors on the first attempt triggered by pushes.) |
Thanks for working on this. I see you've taken the approach of pushing the |
Point taken re: type inference being blind to true Booleans. I guess I'm unsure how else we would use the target to inform type inference such that the casts to To me this feels like another argument that type casting should be "context-aware" (i.e., informed by the parent expression), on top of wanting to cast literals only inside function calls. The alternatives I can imagine require similarly invasive modifications to the expression-to-C/CL mappers. |
I added all the basic functions I could find via regex matching that shouldn't require new logic (so this isn't by any means the complete set of functions listed here). I added a rudimentary test to just ensure that all listed functions resolve and execute.
I could not discern any way that the special-cased logic for fmin, fmax, atan2, and copysign differentiated from that applied to
_CL_SIMPLE_MULTI_ARG_FUNCTIONS
(when I was trying to determine where to place added multi-arg functions).7eba8de removes this branch - let me know if I should revert, etc.
I am confused by the branch for
pow
- it seems to generatepowf32
orpowf64
based on the type, each of which are defined in the preamble to alias to the built-inpow
. Could this possibly be removed? I ask because I havepowr
just directly resolving topowr
itself and couldn't think of a reason to do otherwise.Closes #437.