Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling Python functions from HPy #122

Open
hodgestar opened this issue Nov 20, 2020 · 10 comments
Open

Calling Python functions from HPy #122

hodgestar opened this issue Nov 20, 2020 · 10 comments

Comments

@hodgestar
Copy link
Contributor

We need to pick some APIs to provide for calling Python functions from HPy.

The existing C API has a large zoo of methods for making functions calls. The zoo is documented at https://docs.python.org/3/c-api/call.html#object-calling-api.

Under the hood there are two main calling conventions -- tp_call and vectorcall. I propose that we largely ignore these and instead focus on what fits well with the Python language syntax and what is convenient to call from C.

Most Pythonic

Of the zoo, PyObject_Call most clearly matches a generic f(*args, **kw) in Python, and I propose that we adapt is as follows for HPy:

/* Call a callable Python object, with arguments given by the tuple args, and named arguments
    given by the dictionary kwargs.

    ctx: The HPy execution context.
    args: May be HPy_NULL if no positional arguments need to be passed. Otherwise it should be
             a handle to a tuple.
    kwargs: May be HPy_NULL if no keywords arguments need to be passed. Otherwise it should be
                 a handle to a dictionary of keyword arguments.

    Return the result of the call on success, or raise an exception and return HPy_NULL on failure.

    This is the equivalent of the Python expression: callable(*args, **kwargs).
 */
HPy HPy_Call(HPyContext ctx, HPy callable, HPy args, HPy kwargs);    

Note that this differs from PyObject_Call in that args may be NULL. I'm not sure why this was a requirement in the existing API.

A some point in the future we might want to implement Py_BuildValue to allow tuples of C values to be constructed easily (and maybe even something similar for easily constructing dictionaries from C values).

Most C-like (Most Scenic? :)

PyObject_VectorcallDict closely matches the signature we chose for HPyFunc_KEYWORDS methods, but while this is convenient for receiving arguments from Python, I'm not convinced it's a great fit for calling Python functions from C because one has to construct a dictionary of keyword arguments.

PyObject_Vectorcall takes only the names of the keyword arguments (as a tuple), which seems slightly more convenient.

All of the vectorcall methods have the strange behaviour that nargs also functions as a flag selecting whether args[-1] (or args[0] for PyObject_VectorcallMethod) may temporarily be overwritten by the called function. I propose that we ensure we do NOT copy this behaviour.

The best suggestion I have at the moment for a convenient C-like calling convention is:

/* Call a callable Python object with positional and keyword arguments given by a C arrays.

    ctx: The HPy execution context.
    args: An array of positional arguments. May be NULL if there are no positional arguments.
    nargs; The number of positional arguments.
    kwnames: An array of keyword argument names given as UTF8 strings. May be NULL if there
                     are no keyword arguments.
    kwargs: An array of keyword argument values. May be NULL if there are no keyword arguments.
    nkwargs: The number of keyword arguments.

    Return the result of the call on success, or raise an exception and return HPy_NULL on failure.

    This is the equivalent of the Python expression: callable(*args, **kwargs).
 */
HPy HPy_CallArray(HPyContext ctx, HPy *args, HPy_ssize_t nargs, const char **kwnames, HPy *kwargs, HPy_ssize_t nkwargs);

I'm uncertain whether this is to big a departure from existing APIs or whether there are better options.

Other considerations

  • We should also implement HPyCallable_Check at the same time.
  • We'll probably want to implement HPy_CallMethod and HPy_CallMethodArray, but I propose that we decide on these first and then ensure those match later.
@hodgestar
Copy link
Contributor Author

@mattip kindly grep'ed pandas, scipy and numpy for us to see which calling conventions they use. The full results are at https://paste.ubuntu.com/p/3xs6dC5Gy9/. The summary is that the most frequent calls are PyObject_Call, PyObject_CallObject, PyObject_CallMethod, PyObject_CallFunctionObjArgs.

@arigo
Copy link
Member

arigo commented Nov 20, 2020

HPy_CallArray() could take 4 arguments instead of 5, if we NULL-terminate the kwnames array in C. This is probably cleaner and matches the general usage of some other const char ** arguments in the CPython C API.

@antocuni
Copy link
Collaborator

Some random considerations:

  1. as a general rule, we need to keep in mind that we want to make migration to HPy easy: so, I'm +1 to clean up the API but we need to take care not to clean it up "too much"

  2. we need for sure a call function which completely matches what we receive as arguments: people need it when they want to forward (*args, **kwargs) to another function. I suspect that a number of PyObject_Call uses are because of that (but I didn't check)

  3. the best candidate for point (2) seems to be PyObject_VectorCallDict, so I think this should be included.

  4. how do we call it? I don't like HPyCall_Array because it seems too close to numpy names like PyArray_xxx. We might just keep calling it HPy_VectorCallDict: if anything, it makes it very easy for people to know what it corresponds to.

  5. an alternative is to say that vector call is the new default, so: PyObject_Call ==> HPy_LegacyCall, PyObject_VectorCall ==> HPy_Call, PyObject_VectorCallDict ==> HPy_CallDict

  6. re "VectorCall strange behavior of changing args[0]": we need to understand why they do this, whether it's useful and if we don't get into problems if we disallow it.

  7. from matti's list, I see that CallFunction and CallMethod are used relatively often so I fear we have provide a variant which takes a format string, which is also the easiest to call from C

  8. (for the future): since day 1, we have been thinking of introducing the equivalent of argument-clinic: i.e., people write functions with a C signature and HPy takes care of the argument parsing. In this scenario, we might want to make it possible for people to call functions with a specific signature in mind (similar to HPy_GetAttr_s and HPy_GetItem_i). Something like HPy_Call_iif to call a function which takes (long, long, double) ?

@hodgestar
Copy link
Contributor Author

hodgestar commented Dec 21, 2020

I've opened PR #122 to implement HPy_Call, HPy_CallObject and HPyCallable_Check. Once that lands, the plan is to write two new issues for:

  • HPy_VectorCallXXX (or whatever we end up calling it)
  • HPy_CallFunction and HPy_CallMethod

and summarize what is left to do in each of those.

We don't have an urgent need for either of those, so hopefully we can defer implementing them until after other things (e.g. the numpy port might suggest which other calling options are needed).

@antocuni
Copy link
Collaborator

@mattip kindly grep'ed pandas, scipy and numpy for us to see which calling conventions they use. The full results are at https://paste.ubuntu.com/p/3xs6dC5Gy9/. The summary is that the most frequent calls are PyObject_Call, PyObject_CallObject, PyObject_CallMethod, PyObject_CallFunctionObjArgs.

FWIW, the link in the pastebin is expired. Maybe for the future we should avoid using it in the issues.
@mattip, @hodgestar do you happen to still have it somewhere?

@antocuni
Copy link
Collaborator

Update as of 2021-11-29. In the meantime, we merged #147 and we added HPy_CallTupleDict as the most generic calling API, and basically the equivalent of the old PyObject_Call.

In the meantime, #251 aims to introduce an API to call functions by passing arguments as a C array of HPy args[]. This closely resembles CPython's Vectorcall. Let's discuss the details here.

I think that this should be the "blessed" and fastest calling API. I propose to just call it HPy_Call.

I also think that it should closely match the signature of our HPyFunc_KEYWORDS methods:

  1. for consistency
  2. to make it possible to easily forward arguments from one call to the other (i.e. the C equivalent of def foo(*args, **kw): bar(*args, **kw))
  3. to make it possible to call the underlying C function without having to pack/unpack/convert arguments. This should be extremely useful for alternative implementations.

Moreover, I think we should think more about how to play nicely with PyObject_VectorCall. This is the fastest calling convention offered by CPython, so ideally we should try to exploit it as much as possible. The first problem is keyword arguments:

  • HPyFunc_KEYWORDS receives a HPy kw which is a dict
  • Vectorcall receives a PyObject *kwnames, which is a tuple with just the names of the keywords

We should investigate whether we want to change HPyFunc_KEYWORDS to match Vectorcall; if we do this, on CPython we should implement them by using METH_FASTCALL instead of METH_VARARGS. This should make them faster on newer CPythons.

Another question is what to do with HPy_CallMethod. The first natural option would be this:

HPy args[] = {....};
HPy result = HPy_CallMethod(ctx, myobj, "method_name", args, nargs};

However, PyObject_VectorcallMethod uses a different convention and puts myobj as the first item of args. So, if we follow this convention it would look like this:

HPy args[] = {myobj, ...};
HPy result = HPy_CallMethod(ctx, "method_name", args, nargs);

I think this is a bit uglier from the API point of view, but it allows a more efficient implementation on CPython because we can implement HPy_CallMethod with a direct call to PyObject_VectorcallMethod.

Final point to consider: PEP 590/Vectorcall support a special flag called PY_VECTORCALL_ARGUMENTS_OFFSET: the TL;DR version is that under certain conditions it allows to temporarily mutate args[0]: I think it is designed to speed up the C equivalent of this code:

    def foo(self, *args, **kwargs):
        return some_other_obj.bar(*args, **kwargs)

So we need to decide whether it is something which we want to support as well. I couldn't find any real world usage so it's a bit hard to judge at the moment.

To summarize, I propose the following plan (but feel free to disagree and/or amend the proposal):

  1. Change the signature of HPyFunc_KEYWORDS to match Vectorcall
  2. Change HPyArg_Parse to take kwames instead of kwdict.
  3. Introduce the following functions:
HPy HPy_Call(HPy callable, HPy args[], HPy_ssize_t nargs, HPy kwnames); // ==> PyObject_VectorCall
HPy HPy_CallDict(HPy callable, HPy args[], HPy_ssize_t nargs, HPy kwdict); // ==> PyObject_VectorCallDict
HPy HPy_CallMethod(HPy name, HPy args[], HPy_ssize_t nargs, HPy kwdict); // ==> PyObject_VectorCallMethod
HPy HPy_CallMethod_s(const char *s, HPy args[], HPy_ssize_t nargs, HPy kwnames);
  1. CPython doesn't offer PyObject_VectorCallMethodDict; I don't know why nor whether we should or not, let's postpone the decision
  2. Postpone the decision about what to do w.r.t PY_VECTORCALL_ARGUMENTS_OFFSET until we have a clearer understanding of what is the use case

Points 1 and 2 should be done together. Point 3 can be done separately, probably by updating #251.

@mattip
Copy link
Contributor

mattip commented Nov 29, 2021

This seems reasonable. I think one of the design goals should be to do whatever is fastest on CPython, if it doesn't cause too many problems for the other interpreters, .

@hodgestar
Copy link
Contributor Author

1, 2 & 4 sound good to me.

I'm not sure what postponing the decision about PY_VECTORCALL_ARGUMENTS_OFFSET means concretely. Does it mean that initially HPy_CallMethod and HPy_CallMethod_s won't support the offset flag? Or that they initially will support the offset flag?

I'm happy with the call methods listed in 3.

@antocuni
Copy link
Collaborator

I'm not sure what postponing the decision about PY_VECTORCALL_ARGUMENTS_OFFSET means concretely. Does it mean that initially HPy_CallMethod and HPy_CallMethod_s won't support the offset flag? Or that they initially will support the offset flag?

It means that for now we just ignore it. From the CPython point of view, PY_VECTORCALL_ARGUMENTS_OFFSET is optional and if you never use it, CPython will assume that the args[] array is immutable and won't be able to take some fast-path. My point is precisely that I don't know/didn't understand in which cases these fast paths are actually useful and how often they would be hit.

Maybe we should just ask @markshannon who is the author of the PEP.
Mark, it's not necessary that you read the long discussion here but I would like to ask:

  1. from the point of view of a C extension developer, what is the intended use case for PY_VECTORCALL_ARGUMENTS_OFFSET?
  2. Can you think of any kind of code which would become unnecessarily slow if HPy doesn't support something like that?
  3. Are you aware of any 3rd party extension which already uses it?

@TeamSpen210
Copy link

It seems to me that the use case is that when you're allocating the args array, you prepend a NULL (or anything else that you happen to have there) then pass in this flag. If the callable is a method, it then doesn't have to allocate & copy the args array to add on the function. Here's a use in ceval.c when a with statement calls __exit__():

            PyObject *stack[4] = {NULL, exc, val, tb};
            res = PyObject_Vectorcall(exit_func, stack + 1,
                    3 | PY_VECTORCALL_ARGUMENTS_OFFSET, NULL);

So if you wanted to take advantage of this, one way would be to say have an API where the first element of the array is the callable, then the callee can temporarily swap that out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants