-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[smart_holder] Add a new return value policy return_as_bytes
#3838
Conversation
This isn't necessary though as we already have bindings for 'py::bytes |
// test return_value_policy::return_as_bytes | ||
m.def( | ||
"invalid_utf8_string_as_bytes", | ||
[]() { return std::string("\xba\xd0\xba\xd0"); }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[]() { return std::string("\xba\xd0\xba\xd0"); }, | |
[]() { return py::bytes(std::string("\xba\xd0\xba\xd0")); }, |
is all that was ever needed.
// test return_value_policy::return_as_bytes | ||
m.def( | ||
"invalid_utf8_string_array_as_bytes", | ||
[]() { return std::array<std::string, 1>{{"\xba\xd0\xba\xd0"}}; }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[]() { return std::array<std::string, 1>{{"\xba\xd0\xba\xd0"}}; }, | |
[]() { return std::array<py::bytes, 1>{{"\xba\xd0\xba\xd0"}}; }, |
would solve this use case, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better yet, you could return the py::array directly or use py::memoryview::from_buffer
If you are every unhappy with the behavior of casters, you can always just return the python objects themselves to the same effect. |
This is for the PyCLIF pybind11 code generator. You're right, for a simple
That is a transformation we'd have to make automatically and recursively. In contrast, the core of this PR is super simple:
It got us from ~97.5% success to ~98.5%, i.e. it was the top-most-important fix by far that we needed. (We're in the "long tail" phase of the project.) |
We really need this — essentially 6-line change. |
My biggest concern is that this is to other return_value_policy. IE, we may want to combine this flag with reference_internal etc at some point in the future. Also, this is only valid for a single underlying type (std::string caster), which seems wrong. I agree we probably need a better way to handle this, but hacking the caster like this doesn't seem right. Maybe we need a proxy wrapper that automatically triggers this modified caster behavior through a templated function? |
What we really need way is the ability to override the value_conv and key_conv behavior of these templated types. Abusing return_value_policy for this has terrible code smell. |
Hm ... could you explain more? This PR is taking a very simple path to achieve the desired behavior. Could it be even simpler? |
Taking a simple path is not always the correct path. If you have a clear meaning for return value policy (which is not to control the conversion of types), then abusing it to do what you want here can spell disaster down the road. It might make it impossible to refactor, for example, is hard to document and confusing to newcomers. I think it's worth investigating to see if there's a way around it without making it a return_value_policy. |
º> This PR is taking a very simple path to achieve the desired behavior. Could it be even simpler? []() { return std::array<py::bytes, 1>{{"\xba\xd0\xba\xd0"}}; }, and specify to have it return by reference, by value, or by copy. Since we return value policy is an Enum and not a flag, we cannot combine this return value policies easily with the other ones which are mutually exclusive to one another. The real issue here is that we have no way disambiguate casters except by doing the cast ourselves in the lambda. This is normally trivial, but becomes non-trivial for container or other variant types. We could of course fix that with another caster which changes the behavior of current caster.
This exposes three issues with our current :
We could just add a special wrapper that triggers an alternative version of the stl_casters, but I don't think that would solve your problem since presumably your issue is actually ABSL or other container types? The easiest solution is probably to call the caster directly in the lambda with some modified optional args. We could abstract the list_caster, array_caster, map_caster, and set_caster further to allow for templates which modify the key_conv and value_conv as well though. Another solution though would probably be to make this another extra arg that specified in the def() block that specializes the casters or allows the user to specify their own. @henryiii @wjakob I would love to hear your thoughts on how to best solve this use case / ambiguity. |
include/pybind11/cast.h
Outdated
handle s = decode_utfN(buffer, nbytes); | ||
handle s; | ||
if (policy == return_value_policy::return_as_bytes) { | ||
s = PYBIND11_BYTES_FROM_STRING_AND_SIZE(buffer, nbytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyBytes_FromStringAndSize
The old macro is unfinished Python 2 cleanup. For new code like this it's best to use the Python 3 C API directly.
include/pybind11/detail/common.h
Outdated
reference_internal | ||
reference_internal, | ||
|
||
/** Use this policy to make C++ functions return bytes to Python instead of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of word-smithing:
With this policy, C++ string types are converted to Python bytes
, instead of str
. This is most useful when a C++ function returns a container-like type with nested C++ string types, and py::bytes
cannot be applied easily. Note that this return_value_policy
is not concerned with lifetime/ownership semantics, like the other policies, but the purpose of return_as_bytes
is certain to be orthogonal, because C++ strings are always copied to Python bytes
or str
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ strings are always copied to Python bytes or str.
What about a dictionary that has byte strings as keys and references to a C++ Object as values? This still would not work even with the wordsmithing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed.
But see my longer comment from a couple minutes ago.
This is a very good analysis, thanks Aaron! For strategic reasons, we cannot afford to make this a huge general project at the moment, we have to approach this with a long-term view: we still have to prove that pybind11 actually works for Google, by successfully integrating it into PyCLIF, i.e. we have to get from 98.5% to 100%. Once we've made that hurdle, we can devote more time on bigger projects for pybind11 itself, like generalizing the return_value_policy concept.
Will anyone ever need something more general? Will that just be over-engineering? I don't know. With one enum and |
Maybe it's best for this to live in another branch and merge in once it's cleaned up and up to 100%? |
I would also be okay if it's hidden by some IFDEF flag. I just really don't want this becoming a part of the public API that we have to support later like the PYTHON2 compatability macros. |
I could easily maintain this in the smart_holder branch if you prefer. pybind11 will not work for PyCLIF without this, just like pybind11 won't work for PyCLIF without
My totally personal bet: nobody will ever have enough motivation (time/money) to generalize |
Actually @wangxf123456 for internal usages, does this return_value_policy works with dicts? Or does it only work with sequences and sets? |
include/pybind11/detail/common.h
Outdated
be applied easily. Note that this return_value_policy is not concerned | ||
with lifetime/ownership semantics, like the other policies, but the | ||
purpose of return_as_bytes is certain to be orthogonal, because C++ | ||
strings are always copied to Python bytes or str. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mark this as experimental and likely to change in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangxf123456 Please add that this is for sequences only too: #3838 (comment) here too
This should only work with sequences and sets. For example, return types like |
@wangxf123456 @rwgk I also found some C++ template magic to rebind the type of (STL) container types: https://stackoverflow.com/a/32214640/2444240 We might be able to use a trick like this to recursively change the Return type parameter to disambiguate how we should return std::string. The TLDR of this is we need to have a variant of std::string caster that prefers to output bytes and we need a way to signal that should be called. One idea is to change the behavior of out_cast with a special tag like we for is_operator. This is actually pretty easy, but would involve setting a static variable that I would like to avoid. We need some way to pass that info into the type caster or query that extra arg from inside the std::string caster. There also may be a way to abuse the polymorphic_type_hook to do this. I am by no means an expert on C++ templating idioms, but I feel like there has to be a better way to do this than abusing the return type. |
I'd much rather @Skylion007's suggestion be attempted. This is adding a misusage of the return policies that doesn't even cover all cases like dict's, or complex types. It does not combine with other return types (since they are disjoint conceptual features). If we add it, it will be impossible to pull out (just see our "private" compatibility macros!). Just because it's easy don't make it right. If it is really, really needed and attempts to do it properly have failed, then the name should start with an underscore, and probably have a warning in the code that it is not guarantied to be kept in the future. |
FYI: I'm systematically combing through include/pybind11 to see how the one new enum + if could disturb things, or set up traps. I'll report here when I'm done. Could take a few more days. (Our focus is on larger scale issues that we need to sort out to get to 100% success rate for PyCLIF + pybind11.) |
I’m not worried about the current code. I’m worried about breaking the mental and programmatic model of return value policies and casters. |
That time might be better spent trying @Skylion007’s ideas above. |
What c++ standard are you targeting in PyCLIF? |
Data first. Also priorities. I don't want the big project die the death of a thousand cuts, getting sidetracked with too many side issues.
C++17 required (already). But I want to keep the smart_holder branch compatible with master. (I spent many hours keeping it compatible with all the old compilers.) |
If it helps: I've already said I'd be okay with it if it has a private name ( |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…cy is not available on master.
… and eigen.h Based on systematic review under pybind#3838 (comment)
return_as_bytes
return_as_bytes
…ely to pre-empt repeat trips through the CI).
Thanks, I added the underscore. I'll merge this now on the smart_holder branch, to keep the PyCLIF-pybind11 integration work on track. If @wjakob supports having this on master, I'll back-port. |
pybind#3838)" This reverts commit 7064d43. Conflicts resolved in: include/pybind11/eigen.h tests/test_builtin_casters.cpp
Description
Add a new return value policy
return_as_bytes
to make C++ functions returnbytes
to Python instead ofstr
.We can convert return values to
bytes
by applyingpy::bytes
, but this might be hard when dealing with nested types.