-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow "p" in Py_BuildValue #89488
Comments
We should allow Py_BuildValue to take a "p" argument that takes a C int and produces a Python Bool so it would be symmetric with the p format code for PyArg_Parse that takes a Python object and returns a C int containing PyObject_IsTrue(obj). |
There was no much need of this feature. In rare cases when we needed to build a bool in Py_BuildValue (I have found only 2 cases in the stdlib, and one of them is added by me) we use the following idiom: Py_BuildValue("...O...", ..., expr ? Py_True : Py_False, ...) You can have a temptation to write it as Py_BuildValue("...p...", ..., expr, ...) but there is a catch -- the arguments should be a C int. If it is not, you can break a stack. Explicit cast to int ("(int)expr") is not always correct, so you will need to write "expr ? 1 : 0" which is not much better than "expr ? Py_True : Py_False". |
I would be interested to see how the "p" format could be used in existing code. I found a few places would benefit of it. Either create a new draft PR, or put it in the same PR 28634.
ok = RAND_bytes((unsigned char*)PyBytes_AS_STRING(bytes), len);
if (ok == 0 || ok == 1)
return Py_BuildValue("NO", bytes, ok == 1 ? Py_True : Py_False);
return Py_BuildValue(
"{sksssssssisi"
"sOssssssss"
"}",
"id", cipher_id,
"name", cipher_name,
"protocol", cipher_protocol,
"description", buf,
"strength_bits", strength_bits,
"alg_bits", alg_bits
,"aead", aead ? Py_True : Py_False,
"symmetric", skcipher,
"digest", digest,
"kea", kx,
"auth", auth
); runargs = PyTuple_Pack(2, module, set_argv0 ? Py_True : Py_False); stream = _PyObject_CallMethodId(io, &PyId_open, "isisssO", fd, mode,
buffering, encoding, errors,
newline, closefd ? Py_True : Py_False); |
I prefer to reach an agreement first here before I add more things to the PR |
Is true that this is not very common, but I have find this in several occasions writing extension code and given the small impact and maintenance cost and the symmetry with PyArg_Parse I figured that could be useful |
I never understood how "..." arguments work under the hood. What happens if you pass a double, it is stored as a double on the C stack, and then Py_BuildValue() will read junk data? |
The va_start macro just sets up a pointer to the first function parameter, e.g.:- void func (int a, ...)
{
// va_start
char *p = (char *) &a + sizeof a;
} which makes p point to the second parameter. The va_arg macro does this:- void func (int a, ...)
{
// va_start
char *p = (char *) &a + sizeof a;
// va_arg
int i1 = *((int *)p);
p += sizeof (int);
// va_arg
int i2 = *((int *)p);
p += sizeof (int); // va_arg |
AFAIK, it is complicated. On old computer primitive compilers just pushed arguments one by one on the stack (in platform-depending order). When you push 64-bytes double and read 32-bit int, you get a junk not only for this read, but for reads of all following or preceding arguments. Modern compilers on modern platforms can use registers to pass variable arguments. I do not know details, but doubles and ints are passed using different registers, so the difference can be even larger. My concern is that such feature can provoke bugs. The risk can exceed the minor benefit. |
You can see how is implemented on am64 for example here: |
But isn't that risk the same for other formatting parameters? |
Or a type that promotes to int. If you pass a C short or char, or a C++ bool, it is implicitly promoted to int.
Or alternatively "!!expr"
I had to write that recently after reading through the Py_BuildValue docs twice to make sure I wasn't missing a format code I could use. It's a strange omission, especially because 'p' exists for PyArg_Parse. I'd very much like to see this change. |
I spotted three other uses in the stdlib: raw = PyObject_CallFunction(RawIO_class, "OsOO",
path_or_fd, rawmode,
closefd ? Py_True : Py_False,
opener);
wrapper = PyObject_CallFunction((PyObject *)&PyTextIOWrapper_Type,
"OsssO",
buffer,
encoding, errors, newline,
line_buffering ? Py_True : Py_False); if (PySys_Audit("sqlite3.enable_load_extension",
"OO", self, onoff ? Py_True : Py_False) < 0) {
return NULL;
} |
The leftmost argument of the ternary is an int for every example that Victor and I found in the stdlib, so no casting would be required in any of these cases. |
I think that the risk for other formatting parameters is smaller, because you know, that there are different formatting parameters for different integer and floating point types, and for pointers, and you know that you should care about truncation and overflow if the type of the argument is different from the type of the parameter. But I am not in strict opposition against this feature. It is just that I have fixed so many bugs, that I try to search potential flaws and drawbacks in any new feature. Now you know about this, and you can decide whether the benefit is larger than the risk of potential damages. To me they look equally small. Technically PR 28634 LGTM. |
IMO such problem can be solved with documentation. Pablo: can you try to explain the problem in the documentation, and maybe provide an example in the doc showing how to avoid it? I guess that a generic fix is to replace "value" with "value != 0". In C, "expr != 0" is an int, no? What I understood is that Py_BuildValue("p", value) is problematic if value type is not int. "!value" or "!!value" also has the issue if I understood correctly. Like:
I agree with Serhiy that Py_BuildValue() is ok-ish with other formats, but IMO it's the responsibility of the developer to pass the expect type (int for "p" format). This problem is not specific to Py_BuildValue(). printf() also has exactly the same issue. Such code has an undefined behavior: long long x = 1; print("%d\n", x); You must cast explicitly: long long x = 1; print("%d\n", (int)x); Or use a different format, like: long long x = 1; print("%lld\n", x); |
Thanks, Serhiy, for your insights and careful consideration. Among many things, your attention to detail, experience and careful consideration are what I admire the most! You rock :) I will look at the stdlib examples and will incorporated them into the PR and then I will carefully consider the change again. Thanks for the comments! |
No, just as "value != 0" is an int, so is "!value". |
I stumbled upon the PR and tend to agree with Serhiy that this could be a source of nasty bugs. Passing the wrong type to a va_arg() argument is undefined behavior (e.g. https://wiki.sei.cmu.edu/confluence/display/c/EXP47-C.+Do+not+call+va_arg+with+an+argument+of+the+incorrect+type), and it does seem easy to do that by accident for an int-like type. For printf() compilers implement custom checks that make sure the types are right, but we don't get that luxury with Py_BuildValue(). It's also easy to make a bool yourself with the O code and Py_True/Py_False, so I don't think the additional code is worth the risk. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: