Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more opencl math builtins #438

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

zachjweiner
Copy link
Collaborator

@zachjweiner zachjweiner commented Jun 22, 2021

I added all the basic functions I could find via regex matching that shouldn't require new logic (so this isn't by any means the complete set of functions listed here). I added a rudimentary test to just ensure that all listed functions resolve and execute.

I could not discern any way that the special-cased logic for fmin, fmax, atan2, and copysign differentiated from that applied to _CL_SIMPLE_MULTI_ARG_FUNCTIONS (when I was trying to determine where to place added multi-arg functions).
7eba8de removes this branch - let me know if I should revert, etc.

I am confused by the branch for pow - it seems to generate powf32 or powf64 based on the type, each of which are defined in the preamble to alias to the built-in pow. Could this possibly be removed? I ask because I have powr just directly resolving to powr itself and couldn't think of a reason to do otherwise.

Closes #437.

@inducer
Copy link
Owner

inducer commented Jun 28, 2021

Thanks for working on this!

I am confused by the branch for pow - it seems to generate powf32 or powf64 based on the type, each of which are defined in the preamble to alias to the built-in pow. Could this possibly be removed? I ask because I have powr just directly resolving to powr itself and couldn't think of a reason to do otherwise.

I think the reason was that calling pow(float, double) results in an error, while calling powf64 would automatically upcast. But with the common-dtype finding there, I don't think it's an issue any more, and I think this can be removed. One thing I'd like to have is testing for mixed dtypes, to make sure this works as intended.

for i, func in enumerate(ternary_funcs)),
)

_ = knl(cl.CommandQueue(cl_ctx), f=np.zeros(len(ternary_funcs)))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be willing to hack together some output checking for these, at least where numpy equivalents exist?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It ain't pretty, but: 561c5e0

@zachjweiner
Copy link
Collaborator Author

zachjweiner commented Jun 28, 2021

One thing I'd like to have is testing for mixed dtypes, to make sure this works as intended.

The intention being to raise when mixed dtypes are passed, or to cast the arguments to the proper dtype in CL code? (I'm guessing the latter - so should we have the CL math functions cast their arguments in all cases?)

@zachjweiner
Copy link
Collaborator Author

The pytato failure was just a spurious CondaHTTPError, but we can just wait for the next pushed commit.

@zachjweiner
Copy link
Collaborator Author

@inducer, I implemented type casting in ExpressionToOpenCLCExpressionMapper and added a test (and fixed two small bugs I encountered along the way).

def wrap_in_typecast_lazy(self, actual_dtype, needed_dtype, s):
if needed_dtype.dtype.kind == "b" and actual_dtype().dtype.kind == "f":
_actual = actual_dtype()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea behind wrap_in_typecast_lazy was to only do type inference (which is what's behind the actual_dtype callable) if it is clear from the circumstances that it is needed (and avoid in the majority of cases). Since this calls type inference unconditionally, I think wrap_in_typecast_lazy has outlived its usefulness (It's only used in three spots any more), and we should probably just get rid of it in favor of always using wrap_in_typecast.

@kaushikcfd, do you agree?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I don't see any issues in axing the lazy version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I can take care of it here unless you object.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for offering, that'd be great!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I don't think it introduced any additional failures - sorry it won't be easy to tell from the CI runs. (Figured it'd be easier to just remove it now before debugging the rest.)

@zachjweiner
Copy link
Collaborator Author

zachjweiner commented Jun 28, 2021

The failures are

  1. due to ExpressionToCExpressionMapper.map_power not propagating the required types for the arguments to pow. I've fixed this (not yet pushed).
  2. string-match failures (from newly-introduced casting, e.g. in a doctest)
  3. and, most importantly: I think code generation misspecifies LHS types for Lookups. Specifically, generate_assignment_instruction_code asks for the aggregate's dtype, which, as in test_struct_assignment, tries to cast the RHS to the struct dtype. I'm yet unsure how to appropriately modify get_var_descriptor---naively passing it the Lookup itself doesn't work. nvm, there's no need, generate_assignment_instruction_code can just ask for it after

Also, (2) (and (3), I guess) pose a question: should this PR apply type casting so broadly? Before, type casting was almost entirely left to be implicit (so far as I can tell). But the ways I can think of to tell the ExpressionToOpenCLCExpressionMapper to only apply casting for function arguments feel super clunky.

@inducer
Copy link
Owner

inducer commented Jun 28, 2021

I'm not in principle against making casts explicit in the code, with the caveat that I'd like the SNR of the generated code to remain reasonably high. If the code gets overwhelmed by casts, it's no longer useful to look at.

The other caveat is cost: I would hope that implicit casts and explicit casts have the same cost.

Does your code insert casts in places where you feel they're "unreasonable" to insert?

Comment on lines 508 to 529
clbl = self.codegen_state.ast_builder.known_callables["pow"]
clbl = clbl.with_types({0: tgt_dtype, 1: exponent_dtype},
self.codegen_state.callables_table)[0]

self.codegen_state.seen_functions.add(
SeenFunction(
clbl.name, clbl.name_in_target,
(base_dtype, exponent_dtype),
(tgt_dtype,)))
return var(clbl.name_in_target)(self.rec(expr.base, type_context),
self.rec(expr.exponent, type_context))

common_dtype = np.find_common_type(
[], [dtype.numpy_dtype for id, dtype in clbl.arg_id_to_dtype.items()
if (id >= 0 and dtype is not None)])
from loopy.types import NumpyType
dtype = NumpyType(common_dtype)
inner_type_context = dtype_to_type_context(
self.kernel.target, dtype)

return var(clbl.name_in_target)(
self.rec(expr.base, inner_type_context, dtype),
self.rec(expr.exponent, inner_type_context, dtype)
)
Copy link
Collaborator Author

@zachjweiner zachjweiner Jun 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this code is duplicating logic and could be replaced with

return self.rec(
    var("pow")(expr.base, expr.exponent),
    type_context
)

but I get key errors for "pow" in self.codegen_state.callables_table. (The same holds if I keep the seen_function business.) Not sure what the deal is here; I'd appreciate any guidance!

Edit: sorry, the code highlighting makes this unclear - this is ExpressionToCExpressionMapper.map_power.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried your version with clbl.name_in_target? I don't think pow has a chance of getting turned into a ResolvedFunction at this stage:

loopy/loopy/symbolic.py

Lines 817 to 836 in ef7c411

class ResolvedFunction(LoopyExpressionBase):
"""
A function identifier whose definition is known in a :mod:`loopy` program.
A function is said to be *known* in a :class:`~loopy.TranslationUnit` if its
name maps to an :class:`~loopy.kernel.function_interface.InKernelCallable`
in :attr:`loopy.TranslationUnit.callables_table`. Refer to :ref:`func-interface`.
.. attribute:: function
An instance of :class:`pymbolic.primitives.Variable` or
:class:`loopy.library.reduction.ReductionOpFunction`.
"""
init_arg_names = ("function", )
def __init__(self, function):
if isinstance(function, str):
function = p.Variable(function)
from loopy.library.reduction import ReductionOpFunction
assert isinstance(function, (p.Variable, ReductionOpFunction))
self.function = function

The way I see it, the known_callables["pow"] hacks around that.

IMO, recursing on the resolved callable might work, but I'm also not hating the current version.

Maybe @kaushikcfd can add some perspective, too.

Copy link
Collaborator

@kaushikcfd kaushikcfd Jul 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think pow has a chance of getting turned into a ResolvedFunction at this stage:

Yep that's my assessment too. Which is the reason we had the known_callables["pow"] lying around.

self.rec(expr.base, inner_type_context, dtype),

Sorry, maybe I'm missing the context here, but why do we need the type casts? Isn't C's type-promotion along with the name_in_target enough?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, let me link to my comment here (for some reason, I couldn't reply to this thread at the time).

why do we need the type casts?

The multi-arg math functions in CL refuse to promote in the case of mixed-type inputs, raising compilation errors about ambiguous input types.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah I see missed that (GH was not allowing to me reply to this as well. waiting for a couple minutes resolved it, strange), thanks for explaining! I'm fine with this approach too, but any reason why the way we had handled for powf(32|64) wasn't chosen for the other multi-arg callables. Is it because we would end up having too many if-elses leading to unmaintainable code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view (just from working on this PR), those premables for pow were a bit redundant since with_types already (nominally) implements the logic for type promotion---and it's more generic, e.g., to half precision types, etc. I imagine powf32/powf64 arose for historical reasons, since (unlike the other math builtins) power is its own symbolic primitive.

Copy link
Collaborator

@kaushikcfd kaushikcfd Jul 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it's more generic, e.g., to half precision types, etc

Agreed that this is more general. I would have loved it if there weren't (float) (0.0f), but I guess we could improve on it later.

powf32/powf64 arose for historical reasons

Those had arose just to solve the same issue: 7e37f5e. But not handled quite as neatly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have loved it if there weren't (float) (0.0f)

Me as well, but at the moment the decision to cast occurs before recursing on the constant itself (at which point a 0 could be replaced with 0.0f).

@zachjweiner
Copy link
Collaborator Author

Just pushed one more fix for pow that should resolve everything but the string matching ones.

Does your code insert casts in places where you feel they're "unreasonable" to insert?

I'll report back!

@zachjweiner
Copy link
Collaborator Author

Does your code insert casts in places where you feel they're "unreasonable" to insert?

I looked at all instances of casting within test_loopy and test_target. While none of the output code looks obnoxious or unreadable (there are only 52 and 34 instances amongst the tests in each respective file), the common patterns are:

  • casting literals---even things like (float)(1.0f)---which is silly
  • assignments like int i = 0; float x = (float)(i), which is less offensive but still unnecessary
  • casting conditions inside ternaries to char (shouldn't type inference ask for a bool?)

(Also, there's one instance of int j = (int)(i / 256).)

If we want to tell wrap_in_typecast to skip these, the obstacle is that wrap_in_typecast doesn't know whether its input is, e.g., an argument to a binary function (even literals may need to be cast here). One would want the mapper methods to make the decision about how to handle child nodes.

@inducer
Copy link
Owner

inducer commented Jul 1, 2021

Thanks for taking the time to describe the instances of casting. While some aren't ideal, I can live with all of them.

casting conditions inside ternaries to char (shouldn't type inference ask for a bool?)

bool is an odd duck in CL, in that sizeof(bool) is unspecified. That makes it a bit of a hot potato for the type system to deal with, and it just sidesteps the problem in the way you saw.

@zachjweiner zachjweiner force-pushed the add-more-opencl-math-builtins branch from cbbad75 to 6c896ef Compare July 1, 2021 13:35
@zachjweiner
Copy link
Collaborator Author

zachjweiner commented Jul 1, 2021

Cool. 6c896ef should fix the remaining failures. Sorry for yet again making the tutorial more verbose... if such prominent and slightly embarrassing casting is a bother, I am happy to refine things now. Otherwise, feel free to ping me if you ever want to revisit this.

bool is an odd duck in CL

Thanks for explaining!

@zachjweiner
Copy link
Collaborator Author

It seems that, by promoting host-side device scalars to array args, #453 now prompts the executor to set _lpy_encountered_numpy to True for host scalars. The test added here ran into that, fixed in 93a3399.

Copy link
Owner

@inducer inducer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks great. Just a few minor things left.

Comment on lines 508 to 529
clbl = self.codegen_state.ast_builder.known_callables["pow"]
clbl = clbl.with_types({0: tgt_dtype, 1: exponent_dtype},
self.codegen_state.callables_table)[0]

self.codegen_state.seen_functions.add(
SeenFunction(
clbl.name, clbl.name_in_target,
(base_dtype, exponent_dtype),
(tgt_dtype,)))
return var(clbl.name_in_target)(self.rec(expr.base, type_context),
self.rec(expr.exponent, type_context))

common_dtype = np.find_common_type(
[], [dtype.numpy_dtype for id, dtype in clbl.arg_id_to_dtype.items()
if (id >= 0 and dtype is not None)])
from loopy.types import NumpyType
dtype = NumpyType(common_dtype)
inner_type_context = dtype_to_type_context(
self.kernel.target, dtype)

return var(clbl.name_in_target)(
self.rec(expr.base, inner_type_context, dtype),
self.rec(expr.exponent, inner_type_context, dtype)
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried your version with clbl.name_in_target? I don't think pow has a chance of getting turned into a ResolvedFunction at this stage:

loopy/loopy/symbolic.py

Lines 817 to 836 in ef7c411

class ResolvedFunction(LoopyExpressionBase):
"""
A function identifier whose definition is known in a :mod:`loopy` program.
A function is said to be *known* in a :class:`~loopy.TranslationUnit` if its
name maps to an :class:`~loopy.kernel.function_interface.InKernelCallable`
in :attr:`loopy.TranslationUnit.callables_table`. Refer to :ref:`func-interface`.
.. attribute:: function
An instance of :class:`pymbolic.primitives.Variable` or
:class:`loopy.library.reduction.ReductionOpFunction`.
"""
init_arg_names = ("function", )
def __init__(self, function):
if isinstance(function, str):
function = p.Variable(function)
from loopy.library.reduction import ReductionOpFunction
assert isinstance(function, (p.Variable, ReductionOpFunction))
self.function = function

The way I see it, the known_callables["pow"] hacks around that.

IMO, recursing on the resolved callable might work, but I'm also not hating the current version.

Maybe @kaushikcfd can add some perspective, too.

doc/tutorial.rst Outdated
@@ -560,8 +560,8 @@ Consider this example:
#define lid(N) ((int) get_local_id(N))
...
for (int i_outer = 0; i_outer <= -1 + (15 + n) / 16; ++i_outer)
for (int i_inner = 0; i_inner <= (-16 + n + -16 * i_outer >= 0 ? 15 : -1 + n + -16 * i_outer); ++i_inner)
a[16 * i_outer + i_inner] = 0.0f;
for (int i_inner = 0; i_inner <= ((char)(-16 + n + -16 * i_outer >= 0) ? 15 : -1 + n + -16 * i_outer); ++i_inner)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at them now, these char casts are a bit hard to bear. Any chance we can have type inference return the right bool-like type (maybe get it from the target) to avoid these casts being emitted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. It would be correct to cast these to bool in CL so long as they aren't written to global arrays, right?

(And you wouldn't rather just omit the cast entirely? I wouldn't at all mind revising the type casting logic to allow for more specialization, it's just a matter of choosing how to do so.)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be correct to cast these to bool in CL so long as they aren't written to global arrays, right?

Reading/writing GlobalArgs and reading ValueArgs would be the main way in which bools could find their way in. Are you saying you're thinking of adding hooks to all those paths?

And you wouldn't rather just omit the cast entirely?

Of course! :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I misread your first comment here as "cast to bool instead of char." I'll give it (it being something that omits the casting) a stab soon.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this by first adding the class attribute use_int8_for_bool to TargetBase (let me know if you'd prefer it just on CFamilyTarget) and adding it to CFamilyTarget's hash and comparison fields. Type inference queries this to decide whether to return int8 or bool8 (replacing the prior np.int32, which seemed wrong?). Same modification for ExpressionToCExpressionMapper.map_if.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(sorry for my repeated failure to type correctly, should be good now.)

@zachjweiner
Copy link
Collaborator Author

For some reason I can't reply directly to the comment about map_power: yeah, I had tried clbl.name_in_target first (it's also just "pow").

I observe that seen_functions are only used in preamble generation - they don't affect the callables table. I tried patching CallablesResolver to record having seen pow without transforming it into an actual ResolvedFunction call (otherwise powers with integer-typed arguments will map to calls to pow and fail). But then it won't land in the callables table 😣 So I give up. I did remove the bits adding pow to seen_functions for the non-integers case where it's unneeded in f5116af.

@zachjweiner
Copy link
Collaborator Author

zachjweiner commented Jul 2, 2021

Prompted by flake8 complaining about an unused variable, in b4e76d2 I (thought I) corrected the argument types specified to the generated integer power function. This led to the pytato failure. But I don't imagine either the current implementation nor my change are consistent with numpy's type promotion, right? Should the input types be chosen in the same way as, e.g., other OpenCLCallables, via numpy.find_common_type and the like? (Not to append yet another change to this PR...)

@inducer
Copy link
Owner

inducer commented Jul 5, 2021

Should the input types be chosen in the same way as, e.g., other OpenCLCallables, via numpy.find_common_type and the like? (Not to append yet another change to this PR...)

I'd say so... but isn't that what your code is currently doing?

Also, any idea what's causing the (seeming) integer overflows in pytato?

@zachjweiner
Copy link
Collaborator Author

I'd say so... but isn't that what your code is currently doing?

In OpenCLCallable.emit_types, yes, but int_pow doesn't go that route. (I wasn't sure if the current casting strategy was some intentional integer arithmetic hackery.) I pushed 2a961eb which implements the change I proposed and resolves the pytato failure (which I realized was caused by the same casting issue after my comment last week). I think the old implementation would have failed on integer powers whose result is larger than the max value of the base's type. (numpy promotes the args of things like <uint64> * <int8> to float64 to avoid overflow.)

@zachjweiner zachjweiner force-pushed the add-more-opencl-math-builtins branch from b6a46f6 to 37ac61e Compare July 6, 2021 16:10
@zachjweiner
Copy link
Collaborator Author

Just a conda connection reset error... strangely not the first time today on the "without arg check" job. Mind rerunning it?

@inducer
Copy link
Owner

inducer commented Jul 6, 2021

Invited you to the repo, so that you can do that yourself. :)

@zachjweiner
Copy link
Collaborator Author

I seem to have very bad luck with it, so thanks =D

@zachjweiner
Copy link
Collaborator Author

Hi @inducer, just wanted to ping this - not to rush or anything - but just to clarify that I believe everything we've discussed has been addressed and this is ready for review.

(Also, just FYI, the CI job that runs twice to test caching seems to fairly consistently hit conda HTTP errors on the first attempt triggered by pushes.)

@inducer
Copy link
Owner

inducer commented Jul 16, 2021

Thanks for working on this. I see you've taken the approach of pushing the bool -> int8 mapping "up" into type inference. That really worries me, because now our type system is "blind" to Booleans (depending on which target is in use). I'd really like to avoid that, it seems like a liability. Could you explain what drove this decision? Do you see alternatives?

@zachjweiner
Copy link
Collaborator Author

Point taken re: type inference being blind to true Booleans. I guess I'm unsure how else we would use the target to inform type inference such that the casts to char aren't emitted (if I'm understanding your comment here - maybe you had something else in mind?).

To me this feels like another argument that type casting should be "context-aware" (i.e., informed by the parent expression), on top of wanting to cast literals only inside function calls. The alternatives I can imagine require similarly invasive modifications to the expression-to-C/CL mappers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenCL targets aren't aware of some math built-ins
3 participants