Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow "p" in Py_BuildValue #89488

Open
pablogsal opened this issue Sep 29, 2021 · 18 comments
Open

Allow "p" in Py_BuildValue #89488

pablogsal opened this issue Sep 29, 2021 · 18 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@pablogsal
Copy link
Member

BPO 45325
Nosy @vstinner, @serhiy-storchaka, @JelleZijlstra, @pablogsal, @godlygeek
PRs
  • bpo-45325: Add a new 'p' parameter to Py_BuildValue to convert an integer into a Python bool #28634
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2021-09-29.18:08:04.356>
    labels = []
    title = 'Allow "p" in Py_BuildValue'
    updated_at = <Date 2022-02-01.03:59:42.971>
    user = 'https://github.com/pablogsal'

    bugs.python.org fields:

    activity = <Date 2022-02-01.03:59:42.971>
    actor = 'JelleZijlstra'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = []
    creation = <Date 2021-09-29.18:08:04.356>
    creator = 'pablogsal'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 45325
    keywords = ['patch']
    message_count = 18.0
    messages = ['402897', '402899', '402900', '402901', '402902', '402904', '402910', '402915', '402917', '402918', '402921', '402923', '402924', '402938', '402941', '402951', '402965', '412234']
    nosy_count = 5.0
    nosy_names = ['vstinner', 'serhiy.storchaka', 'JelleZijlstra', 'pablogsal', 'godlygeek']
    pr_nums = ['28634']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue45325'
    versions = []

    @pablogsal
    Copy link
    Member Author

    We should allow Py_BuildValue to take a "p" argument that takes a C int and produces a Python Bool so it would be symmetric with the p format code for PyArg_Parse that takes a Python object and returns a C int containing PyObject_IsTrue(obj).

    @serhiy-storchaka
    Copy link
    Member

    There was no much need of this feature. In rare cases when we needed to build a bool in Py_BuildValue (I have found only 2 cases in the stdlib, and one of them is added by me) we use the following idiom:

       Py_BuildValue("...O...", ..., expr ? Py_True : Py_False, ...)

    You can have a temptation to write it as

       Py_BuildValue("...p...", ..., expr, ...)

    but there is a catch -- the arguments should be a C int. If it is not, you can break a stack. Explicit cast to int ("(int)expr") is not always correct, so you will need to write "expr ? 1 : 0" which is not much better than "expr ? Py_True : Py_False".

    @vstinner
    Copy link
    Member

    I would be interested to see how the "p" format could be used in existing code. I found a few places would benefit of it. Either create a new draft PR, or put it in the same PR 28634.

    Modules/_itertoolsmodule.c:

        return Py_BuildValue("O(N)(OO)", Py_TYPE(lz), it, lz->saved, Py_True);
    
    return Py_BuildValue("O(O)(OO)", Py_TYPE(lz), lz->it, lz->saved,
                         lz->firstpass ? Py_True : Py_False);
    

    Modules/_ssl.c:

            ok = RAND_bytes((unsigned char*)PyBytes_AS_STRING(bytes), len);
            if (ok == 0 || ok == 1)
                return Py_BuildValue("NO", bytes, ok == 1 ? Py_True : Py_False);
    
        return Py_BuildValue(
            "{sksssssssisi"
            "sOssssssss"
            "}",
            "id", cipher_id,
            "name", cipher_name,
            "protocol", cipher_protocol,
            "description", buf,
            "strength_bits", strength_bits,
            "alg_bits", alg_bits
            ,"aead", aead ? Py_True : Py_False,
            "symmetric", skcipher,
            "digest", digest,
            "kea", kx,
            "auth", auth
           );

    Modules/main.c:

        runargs = PyTuple_Pack(2, module, set_argv0 ? Py_True : Py_False);

    Objects/fileobject.c:

        stream = _PyObject_CallMethodId(io, &PyId_open, "isisssO", fd, mode,
                                     buffering, encoding, errors,
                                     newline, closefd ? Py_True : Py_False);

    @pablogsal
    Copy link
    Member Author

    Either create a new draft PR, or put it in the same PR 28634.

    I prefer to reach an agreement first here before I add more things to the PR

    @pablogsal
    Copy link
    Member Author

    There was no much need of this feature. In rare cases when we needed to build a bool in Py_BuildValue (I have found only 2 cases in the stdlib, and one of them is added by me) we use the following idiom:

    Is true that this is not very common, but I have find this in several occasions writing extension code and given the small impact and maintenance cost and the symmetry with PyArg_Parse I figured that could be useful

    @vstinner
    Copy link
    Member

    but there is a catch -- the arguments should be a C int. If it is not, you can break a stack.

    I never understood how "..." arguments work under the hood. What happens if you pass a double, it is stored as a double on the C stack, and then Py_BuildValue() will read junk data?

    @pablogsal
    Copy link
    Member Author

    The va_start macro just sets up a pointer to the first function parameter, e.g.:-

     void func (int a, ...)
     { 
       // va_start
       char *p = (char *) &a + sizeof a;
     }

    which makes p point to the second parameter. The va_arg macro does this:-

     void func (int a, ...)
     { 
       // va_start
       char *p = (char *) &a + sizeof a;
    
       // va_arg
       int i1 = *((int *)p);
       p += sizeof (int);
    
       // va_arg
       int i2 = *((int *)p);
       p += sizeof (int);

    // va_arg
    long i2 = *((long *)p);
    p += sizeof (long);
    }

    @serhiy-storchaka
    Copy link
    Member

    What happens if you pass a double, it is stored as a double on the C stack, and then Py_BuildValue() will read junk data?

    AFAIK, it is complicated. On old computer primitive compilers just pushed arguments one by one on the stack (in platform-depending order). When you push 64-bytes double and read 32-bit int, you get a junk not only for this read, but for reads of all following or preceding arguments.

    Modern compilers on modern platforms can use registers to pass variable arguments. I do not know details, but doubles and ints are passed using different registers, so the difference can be even larger.

    My concern is that such feature can provoke bugs. The risk can exceed the minor benefit.

    @pablogsal
    Copy link
    Member Author

    You can see how is implemented on am64 for example here:

    https://blog.nelhage.com/2010/10/amd64-and-va_arg/

    @pablogsal
    Copy link
    Member Author

    My concern is that such feature can provoke bugs. The risk can exceed the minor benefit.

    But isn't that risk the same for other formatting parameters?

    @godlygeek
    Copy link
    Mannequin

    godlygeek mannequin commented Sep 29, 2021

    but there is a catch -- the arguments should be a C int

    Or a type that promotes to int. If you pass a C short or char, or a C++ bool, it is implicitly promoted to int.

    so you will need to write "expr ? 1 : 0"

    Or alternatively "!!expr"

    which is not much better than "expr ? Py_True : Py_False"

    I had to write that recently after reading through the Py_BuildValue docs twice to make sure I wasn't missing a format code I could use. It's a strange omission, especially because 'p' exists for PyArg_Parse. I'd very much like to see this change.

    @godlygeek
    Copy link
    Mannequin

    godlygeek mannequin commented Sep 29, 2021

    I spotted three other uses in the stdlib:

    Modules/_io/_iomodule.c

            raw = PyObject_CallFunction(RawIO_class, "OsOO",
                                        path_or_fd, rawmode,
                                        closefd ? Py_True : Py_False,
                                        opener);
    
        wrapper = PyObject_CallFunction((PyObject *)&PyTextIOWrapper_Type,
                                        "OsssO",
                                        buffer,
                                        encoding, errors, newline,
                                        line_buffering ? Py_True : Py_False);

    Modules/_sqlite/connection.c

        if (PySys_Audit("sqlite3.enable_load_extension",
                        "OO", self, onoff ? Py_True : Py_False) < 0) {
            return NULL;
        }

    @godlygeek
    Copy link
    Mannequin

    godlygeek mannequin commented Sep 29, 2021

    The leftmost argument of the ternary is an int for every example that Victor and I found in the stdlib, so no casting would be required in any of these cases.

    @serhiy-storchaka
    Copy link
    Member

    But isn't that risk the same for other formatting parameters?

    I think that the risk for other formatting parameters is smaller, because you know, that there are different formatting parameters for different integer and floating point types, and for pointers, and you know that you should care about truncation and overflow if the type of the argument is different from the type of the parameter.

    But I am not in strict opposition against this feature. It is just that I have fixed so many bugs, that I try to search potential flaws and drawbacks in any new feature. Now you know about this, and you can decide whether the benefit is larger than the risk of potential damages. To me they look equally small.

    Technically PR 28634 LGTM.

    @vstinner
    Copy link
    Member

    I think that the risk for other formatting parameters is smaller, because you know, that there are different formatting parameters for different integer and floating point types, and for pointers, and you know that you should care about truncation and overflow if the type of the argument is different from the type of the parameter.

    IMO such problem can be solved with documentation.

    Pablo: can you try to explain the problem in the documentation, and maybe provide an example in the doc showing how to avoid it?

    I guess that a generic fix is to replace "value" with "value != 0". In C, "expr != 0" is an int, no?

    What I understood is that Py_BuildValue("p", value) is problematic if value type is not int.

    "!value" or "!!value" also has the issue if I understood correctly. Like:

    long value = 3; Py_BuildValue("p", !value);
    

    I agree with Serhiy that Py_BuildValue() is ok-ish with other formats, but IMO it's the responsibility of the developer to pass the expect type (int for "p" format).

    This problem is not specific to Py_BuildValue(). printf() also has exactly the same issue. Such code has an undefined behavior:

    long long x = 1; print("%d\n", x);

    You must cast explicitly:

    long long x = 1; print("%d\n", (int)x);

    Or use a different format, like:

    long long x = 1; print("%lld\n", x);

    @pablogsal
    Copy link
    Member Author

    But I am not in strict opposition against this feature. It is just that I have fixed so many bugs, that I try to search potential flaws and drawbacks in any new feature. Now you know about this, and you can decide whether the benefit is larger than the risk of potential damages. To me they look equally small.

    Thanks, Serhiy, for your insights and careful consideration. Among many things, your attention to detail, experience and careful consideration are what I admire the most! You rock :)

    I will look at the stdlib examples and will incorporated them into the PR and then I will carefully consider the change again.

    Thanks for the comments!

    @godlygeek
    Copy link
    Mannequin

    godlygeek mannequin commented Sep 30, 2021

    "!value" or "!!value" also has the issue if I understood correctly.

    No, just as "value != 0" is an int, so is "!value".

    @JelleZijlstra
    Copy link
    Member

    I stumbled upon the PR and tend to agree with Serhiy that this could be a source of nasty bugs. Passing the wrong type to a va_arg() argument is undefined behavior (e.g. https://wiki.sei.cmu.edu/confluence/display/c/EXP47-C.+Do+not+call+va_arg+with+an+argument+of+the+incorrect+type), and it does seem easy to do that by accident for an int-like type. For printf() compilers implement custom checks that make sure the types are right, but we don't get that luxury with Py_BuildValue(). It's also easy to make a bool yourself with the O code and Py_True/Py_False, so I don't think the additional code is worth the risk.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @iritkatriel iritkatriel added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 23, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants