You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The title mentions two different problems but I am grouping them in the same issue since I think they can be solved together.
I am already working on a fix in PR #142 but I thought it was useful to report my findings here, for the future.
First problem: consider ctx_CallRealFunctionFromTrampoline:
in CPython-ABI mode this is not a problem because _py2h and _h2py are just no-op casts, but in in universal mode, _py2h allocates new handles which are never freed:
However, there is a better solution: the second problem is that the current implementation of handles is unnecessarily slow. I tried the ujson and piconumpy benchmarks:
ujson: the universal ABI is ~15% slower than the CPython ABI
piconumpy: the universal ABI is ~35% slower than the CPython ABI
In PR #142 I am experimenting with a different approach for hpy.universal. In particular, _py2h and _h2py are implemented like this:
// The main reason for +1/-1 is to make sure that if people casts HPy to// PyObject* directly, things explode.staticinlineHPy_py2h(PyObject*obj) {
if (obj==NULL)
returnHPy_NULL;
return (HPy){(HPy_ssize_t)(obj+1)};
}
staticinlinePyObject*_h2py(HPyh) {
ifHPy_IsNull(h)
returnNULL;
return (PyObject*)(h._i-1);
}
So, they are basically no-op casts again, and the benchmarks are much faster:
ujson: the universal ABI is ~0.9% slower than the CPython ABI
piconumpy: the universal ABI is ~6% slower than the CPython ABI
Historical note: why do we represent hpy.universal handles as indexes in a list? The original ideas was to support the debug mode, so that we could easily store extra debugging infos for each handles.
The idea that I am trying in PR #142 is different, i.e. to wrap a generic universal ctx into a debug ctx: so, debug handles are wrapper around generic opaque universal handles and the extra infos can be attached directly on the wrappers. Moreover, by doing that we pay the overhead of "heavy" handles only for the modules for which the debug mode is enabled.
EDIT: fixed the PR number
The text was updated successfully, but these errors were encountered:
The title mentions two different problems but I am grouping them in the same issue since I think they can be solved together.
I am already working on a fix in PR #142 but I thought it was useful to report my findings here, for the future.
First problem: consider
ctx_CallRealFunctionFromTrampoline
:hpy/hpy/universal/src/ctx_meth.c
Lines 6 to 15 in fdc6047
in CPython-ABI mode this is not a problem because
_py2h
and_h2py
are just no-op casts, but in in universal mode,_py2h
allocates new handles which are never freed:hpy/hpy/universal/src/handles.c
Lines 137 to 152 in fdc6047
The proper solution is something like this:
However, there is a better solution: the second problem is that the current implementation of handles is unnecessarily slow. I tried the ujson and piconumpy benchmarks:
In PR #142 I am experimenting with a different approach for
hpy.universal
. In particular,_py2h
and_h2py
are implemented like this:So, they are basically no-op casts again, and the benchmarks are much faster:
Historical note: why do we represent hpy.universal handles as indexes in a list? The original ideas was to support the debug mode, so that we could easily store extra debugging infos for each handles.
The idea that I am trying in PR #142 is different, i.e. to wrap a generic universal ctx into a debug ctx: so, debug handles are wrapper around generic opaque universal handles and the extra infos can be attached directly on the wrappers. Moreover, by doing that we pay the overhead of "heavy" handles only for the modules for which the debug mode is enabled.
EDIT: fixed the PR number
The text was updated successfully, but these errors were encountered: