Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support upper and lower in strings_udf #12099

Merged

Conversation

brandon-b-miller
Copy link
Contributor

This PR adds support for the following two functions in strings_udf:

  • str.upper()
  • str.lower()

Part of #9639

Comment on lines 278 to 280
std::int64_t flags_table,
std::int64_t cases_table,
std::int64_t special_table)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why these are not void* as well?

Copy link
Contributor Author

@brandon-b-miller brandon-b-miller Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me make sure all is right here. Each of the functions that return a pointer to an input mapping table returns its own type of pointer. For the flags table its uint8_t*, for the cases table its uint16_t. For the special case mapping table it's a special_case_mapping*. Would the correct thing to do in this case be to receive each of these as a uintptr_t in the cython and then carry them through the python into the lowering as a np.uintp? Then these shim functions could accept a uintptr_t here.

Copy link
Contributor

@davidwendt davidwendt Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more correct for these to be pointer types than int64_t types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can interact with ctypes I think you can carry around a ctypes.cvoidp.

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of nits around cross-calling ABI and type-punning pointers and integers.

python/strings_udf/strings_udf/_lib/tables.pyx Outdated Show resolved Hide resolved
python/strings_udf/cpp/src/strings/udf/shim.cu Outdated Show resolved Hide resolved

extern "C" __device__ int lower(int* nb_retval,
void* udf_str,
void* const* st,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a void ** or can it just be a void * (I note that inside you cast to string_view * and then dereference, so I think you can strip a * everywhere).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow here. This follows the pattern from the rest of the shim functions when a string_view is an arg. IIUC st is pointing directly to the struct, so only one level of pointing right?

Copy link
Contributor

@wence- wence- Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most functions here (e.g.

extern "C" __device__ int pyisalnum(bool* nb_retval, void const* str, std::uintptr_t chars_table)
{
auto str_view = reinterpret_cast<cudf::string_view const*>(str);
*nb_retval = is_alpha_numeric(
reinterpret_cast<cudf::strings::detail::character_flags_table_type*>(chars_table), *str_view);
return 0;
}
) that take a pointer to string_view take the argument as void const *, not void* const *. Why is this different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah you're right! updated this. Nice catch

python/strings_udf/cpp/src/strings/udf/udf_apis.cu Outdated Show resolved Hide resolved
python/strings_udf/strings_udf/lowering.py Outdated Show resolved Hide resolved
python/strings_udf/cpp/src/strings/udf/shim.cu Outdated Show resolved Hide resolved
@brandon-b-miller
Copy link
Contributor Author

rerun tests

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving C++ code.

@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Nov 16, 2022
@brandon-b-miller
Copy link
Contributor Author

some test failures here that seem to be related to a previous PR that I am looking into now.

@brandon-b-miller
Copy link
Contributor Author

I am thinking these CI failures might be transient. Going to rerun tests just to make sure, as things seem to pass locally for me.

@brandon-b-miller
Copy link
Contributor Author

rerun tests

@brandon-b-miller
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit aa13b95 into rapidsai:branch-22.12 Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge feature request New feature or request non-breaking Non-breaking change numba Numba issue Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants