-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support upper
and lower
in strings_udf
#12099
Support upper
and lower
in strings_udf
#12099
Conversation
std::int64_t flags_table, | ||
std::int64_t cases_table, | ||
std::int64_t special_table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why these are not void*
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me make sure all is right here. Each of the functions that return a pointer to an input mapping table returns its own type of pointer. For the flags table its uint8_t*
, for the cases table its uint16_t
. For the special case mapping table it's a special_case_mapping*
. Would the correct thing to do in this case be to receive each of these as a uintptr_t
in the cython and then carry them through the python into the lowering as a np.uintp
? Then these shim functions could accept a uintptr_t
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more correct for these to be pointer types than int64_t
types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can interact with ctypes I think you can carry around a ctypes.cvoidp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of nits around cross-calling ABI and type-punning pointers and integers.
|
||
extern "C" __device__ int lower(int* nb_retval, | ||
void* udf_str, | ||
void* const* st, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a void **
or can it just be a void *
(I note that inside you cast to string_view *
and then dereference, so I think you can strip a *
everywhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow here. This follows the pattern from the rest of the shim functions when a string_view
is an arg. IIUC st
is pointing directly to the struct, so only one level of pointing right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most functions here (e.g.
cudf/python/strings_udf/cpp/src/strings/udf/shim.cu
Lines 186 to 193 in 418cbc6
extern "C" __device__ int pyisalnum(bool* nb_retval, void const* str, std::uintptr_t chars_table) | |
{ | |
auto str_view = reinterpret_cast<cudf::string_view const*>(str); | |
*nb_retval = is_alpha_numeric( | |
reinterpret_cast<cudf::strings::detail::character_flags_table_type*>(chars_table), *str_view); | |
return 0; | |
} |
string_view
take the argument as void const *
, not void* const *
. Why is this different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah you're right! updated this. Nice catch
Co-authored-by: Lawrence Mitchell <wence@gmx.li>
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @brandon-b-miller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving C++ code.
some test failures here that seem to be related to a previous PR that I am looking into now. |
I am thinking these CI failures might be transient. Going to |
rerun tests |
@gpucibot merge |
This PR adds support for the following two functions in
strings_udf
:str.upper()
str.lower()
Part of #9639