diff --git a/Makefile b/Makefile index b4342a537..f18fdcc7a 100644 --- a/Makefile +++ b/Makefile @@ -73,5 +73,6 @@ docs-examples-tests: cd docs/examples/simple-example && python3 setup.py --hpy-abi=universal install cd docs/examples/mixed-example && python3 setup.py install cd docs/examples/snippets && python3 setup.py --hpy-abi=universal install + cd docs/examples/quickstart && python3 setup.py --hpy-abi=universal install cd docs/examples/hpytype-example && python3 setup.py --hpy-abi=universal install python3 -m pytest docs/examples/tests.py ${TEST_ARGS} diff --git a/docs/api-reference/inline-helpers.rst b/docs/api-reference/inline-helpers.rst index 96be1b315..2e719caf7 100644 --- a/docs/api-reference/inline-helpers.rst +++ b/docs/api-reference/inline-helpers.rst @@ -5,5 +5,10 @@ Inline Helpers Those functions are usually small convenience functions that everyone could write but in order to avoid duplicated effort, they are defined by HPy. +One category of inline helpers are functions that convert the commonly used +but not fixed width C types, such as ``int``, or ``long long``, to HPy API. +The HPy API always uses well-defined fixed width types like ``int32`` or +``unsigned int8``. + .. autocmodule:: hpy/inline_helpers.h :members: diff --git a/docs/api.rst b/docs/api.rst index 3f5a1bd6f..36c39ac62 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -157,6 +157,7 @@ Moreover, ``HPyContext`` is used by the :term:`HPy Universal ABI` to contain a sort of virtual function table which is used by the C extensions to call back into the Python interpreter. +.. _simple example: A simple example ----------------- @@ -242,9 +243,11 @@ are still written using the ``Python.h`` API. Note that the HPy module does not specify its name. HPy does not support the legacy single phase module initialization and the only module initialization approach is -the multi-phase initialization (PEP 451). With multi-phase module initialization, -the name of the module is always taken from the ``ModuleSpec``, i.e., most likely -from the name used in the ``import {{name}}`` statement that imported your module. +the multi-phase initialization (`PEP 489 `_). +With multi-phase module initialization, +the name of the module is always taken from the ``ModuleSpec`` (`PEP 451 `_) +, i.e., most likely from the name used in the ``import {{name}}`` statement that +imported your module. This is the only difference stemming from multi-phase module initialization in this simple example. @@ -320,30 +323,6 @@ table, which now becomes: :start-after: // BEGIN: methodsdef :end-before: // END: methodsdef -More Examples -------------- - -HPy usually has tests for each API function. This means that there is lots of -examples available by looking at the tests. However, the test source uses -many macros and is hard to read. To overcome this we supply a utility to -export clean C sources for the tests. Since the HPy tests are not shipped by -default, you need to clone the HPy repository from GitHub: - -.. code-block:: console - - > git clone https://github.com/hpyproject/hpy.git - -After that, install all test requirements and dump the sources: - -.. code-block:: console - - > cd hpy - > python3 -m pip install pytest filelock - > python3 -m pytest --dump-dir=test_sources test/ - -This will dump the generated test sources into folder ``test_sources``. Note, -that the tests won't be executed but skipped with an appropriate message. - Creating types in HPy --------------------- @@ -477,7 +456,7 @@ A type with ``.legacy_slots != NULL`` is required to have ``HPyType_BuiltinShape_Legacy`` and to include ``PyObject_HEAD`` at the start of its struct. It would be easy to relax this requirement on CPython (where the ``PyObject_HEAD`` fields are always present) but a large burden on other -implementations (e.g. PyPy, GraalPython) where a struct starting with +implementations (e.g. PyPy, GraalPy) where a struct starting with ``PyObject_HEAD`` might not exist. Types created via the old Python C API are automatically legacy types. @@ -567,3 +546,40 @@ be considered in three places: For more information about the built-in shape and for a technical explanation for why it is required, see :c:member:`HPyType_Spec.builtin_shape` and :c:enum:`HPyType_BuiltinShape`. + +More Examples +------------- + +The :doc:`porting-example/index` shows another complete example +of HPy extension ported from Python/C API. + +The `HPy project space `_ on GitHub +contains forks of some popular Python extensions ported to HPy as +a proof of concept/feasibility studies, such as the +`Kiwi solver `_. +Note that those forks may not be up to date with their upstream projects +or with the upstream HPy changes. + +HPy unit tests +~~~~~~~~~~~~~~ + +HPy usually has tests for each API function. This means that there is lots of +examples available by looking at the tests. However, the test source uses +many macros and is hard to read. To overcome this we supply a utility to +export clean C sources for the tests. Since the HPy tests are not shipped by +default, you need to clone the HPy repository from GitHub: + +.. code-block:: console + + > git clone https://github.com/hpyproject/hpy.git + +After that, install all test requirements and dump the sources: + +.. code-block:: console + + > cd hpy + > python3 -m pip install pytest filelock + > python3 -m pytest --dump-dir=test_sources test/ + +This will dump the generated test sources into folder ``test_sources``. Note, +that the tests won't be executed but skipped with an appropriate message. diff --git a/docs/debug-mode.rst b/docs/debug-mode.rst index 8e465dd09..0f59bc948 100644 --- a/docs/debug-mode.rst +++ b/docs/debug-mode.rst @@ -43,7 +43,7 @@ Debug mode works *only* for extensions built with HPy universal ABI. To enable debug mode, use environment variable ``HPY``. If ``HPY=debug``, then all HPy modules are loaded with the trace context. Alternatively, it is also possible to specify the mode per module like this: -``HPY=modA:debug,modB=debug``. +``HPY=modA:debug,modB:debug``. In order to verify that your extension is being loaded in debug mode, use environment variable ``HPY_LOG``. If this variable is set, then all HPy diff --git a/docs/examples/hpytype-example/builtin_type.c b/docs/examples/hpytype-example/builtin_type.c index 087a89ab7..5ec749b69 100644 --- a/docs/examples/hpytype-example/builtin_type.c +++ b/docs/examples/hpytype-example/builtin_type.c @@ -82,7 +82,7 @@ static void make_Language(HPyContext *ctx, HPy module) } HPyDef_SLOT(simple_exec, HPy_mod_exec) -int simple_exec_impl(HPyContext *ctx, HPy m) { +static int simple_exec_impl(HPyContext *ctx, HPy m) { make_Dummy(ctx, m); if (HPyErr_Occurred(ctx)) return -1; diff --git a/docs/examples/hpytype-example/simple_type.c b/docs/examples/hpytype-example/simple_type.c index 9d336931b..0e857bc49 100644 --- a/docs/examples/hpytype-example/simple_type.c +++ b/docs/examples/hpytype-example/simple_type.c @@ -80,7 +80,7 @@ static HPyType_Spec Point_spec = { // BEGIN: add_type HPyDef_SLOT(simple_exec, HPy_mod_exec) -int simple_exec_impl(HPyContext *ctx, HPy m) { +static int simple_exec_impl(HPyContext *ctx, HPy m) { if (!HPyHelpers_AddType(ctx, m, "Point", &Point_spec, NULL)) { return -1; } diff --git a/docs/examples/quickstart/quickstart.c b/docs/examples/quickstart/quickstart.c new file mode 100644 index 000000000..d437bc4f3 --- /dev/null +++ b/docs/examples/quickstart/quickstart.c @@ -0,0 +1,39 @@ +// quickstart.c + +// This header file is the entrypoint to the HPy API: +#include "hpy.h" + +// HPy method: the HPyDef_METH macro generates some boilerplate code, +// the same code can be also written manually if desired +HPyDef_METH(say_hello, "say_hello", HPyFunc_NOARGS) +static HPy say_hello_impl(HPyContext *ctx, HPy self) +{ + // Methods take HPyContext, which must be passed as the first argument to + // all HPy API functions. Other than that HPyUnicode_FromString does the + // same thing as PyUnicode_FromString. + // + // HPy type represents a "handle" to a Python object, but may not be + // a pointer to the object itself. It should be fully "opaque" to the + // users. Try uncommenting the following two lines to see the difference + // from PyObject*: + // + // if (self == self) + // HPyUnicode_FromString(ctx, "Surprise? Try HPy_Is(ctx, self, self)"); + + return HPyUnicode_FromString(ctx, "Hello world"); +} + +static HPyDef *QuickstartMethods[] = { + &say_hello, // 'say_hello' generated for us by the HPyDef_METH macro + NULL, +}; + +static HPyModuleDef quickstart_def = { + .doc = "HPy Quickstart Example", + .defines = QuickstartMethods, +}; + +// The Python interpreter will create the module for us from the +// HPyModuleDef specification. Additional initialization can be +// done in the HPy_mod_execute slot +HPy_MODINIT(quickstart, quickstart_def) diff --git a/docs/examples/quickstart/setup.py b/docs/examples/quickstart/setup.py new file mode 100644 index 000000000..0d56aefec --- /dev/null +++ b/docs/examples/quickstart/setup.py @@ -0,0 +1,13 @@ +# setup.py + +from setuptools import setup, Extension +from os import path + +DIR = path.dirname(__file__) +setup( + name="hpy-quickstart", + hpy_ext_modules=[ + Extension('quickstart', sources=[path.join(DIR, 'quickstart.c')]), + ], + setup_requires=['hpy'], +) diff --git a/docs/examples/tests.py b/docs/examples/tests.py index 53a82cda5..a796a4874 100644 --- a/docs/examples/tests.py +++ b/docs/examples/tests.py @@ -43,6 +43,12 @@ def test_simple_type(): assert p.z == 2000 +def test_quickstart(): + import quickstart + assert quickstart.say_hello() == "Hello world" +# END: test_quickstart + + def test_builtin_type(): obj = builtin_type.Dummy("hello") assert obj == "hello" diff --git a/docs/index.rst b/docs/index.rst index 778d6cabc..addcb5aae 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -8,36 +8,58 @@ HPy: a better API for Python HPy provides a new API for extending Python in C. +There are several advantages to writing C extensions in HPy: + + - **Speed**: it runs much faster on PyPy, GraalPy, and at native speed on CPython + + - **Deployment**: it is possible to compile a single binary which runs unmodified on all + supported Python implementations and versions -- think "stable ABI" on steroids + + - **Simplicity**: it is simpler and more manageable than the ``Python.h`` API, both for + the users and the Pythons implementing it + + - **Debugging**: it provides an improved debugging experience. Debug mode can be turned + on at runtime without the need to recompile the extension or the Python running it. + HPy design is more suitable for automated checks. + The official `Python/C API `_, also informally known as ``#include ``, is specific to the current implementation of CPython: it exposes a lot of -internal details which makes it hard: +internal details which makes it hard to: + + - implement it for other Python implementations (e.g. PyPy, GraalPy, + Jython, ...) + + - experiment with new approaches inside CPython itself, for example: - - to implement it for other Python implementations (e.g. PyPy, GraalPython, - Jython, IronPython, etc.) + - use a tracing garbage collection instead of reference counting + - remove the global interpreter lock (GIL) to take full advantage of multicore architectures + - use tagged pointers to reduce memory footprint - - to experiment with new things inside CPython itself: e.g. using a GC - instead of refcounting, or to remove the GIL. +Where to go next: +----------------- -There are several advantages to write your C extension in HPy: + - Show me the code: - - it runs much faster on PyPy, GraalPython, and at native speed on CPython + - :doc:`Quickstart` + - :ref:`Simple documented HPy extension example` + - :doc:`Tutorial: porting Python/C API extension to HPy` - - it is possible to compile a single binary which runs unmodified on all - supported Python implementations and versions + - Details: - - it is simpler and more manageable than the ``Python.h`` API + - :doc:`HPy overview: motivation, goals, current status` + - :doc:`HPy API concepts introduction` + - :doc:`Python/C API to HPy Porting guide` + - :doc:`HPy API reference` - - it provides an improved debugging experience: in "debug mode", HPy - actively checks for many common mistakes such as reference leaks and - invalid usage of objects after they have been deleted. It is possible to - turn the "debug mode" on at startup time, without needing to recompile - Python or the extension itself +Full table of contents: +----------------------- .. toctree:: :maxdepth: 2 + quickstart overview api porting-guide diff --git a/docs/misc/index.rst b/docs/misc/index.rst index 6570684e1..5a57e5157 100644 --- a/docs/misc/index.rst +++ b/docs/misc/index.rst @@ -2,8 +2,6 @@ Misc notes ========== .. toctree:: - :maxdepth: 2 + :maxdepth: 1 - str-builder-api embedding - protocols diff --git a/docs/misc/protocols-code.c b/docs/misc/protocols-code.c deleted file mode 100644 index 0d94166ae..000000000 --- a/docs/misc/protocols-code.c +++ /dev/null @@ -1,55 +0,0 @@ -// BEGIN: foo -void iterate_objects() -{ - /* If the object is not a sequence, we might want to fall back to generic iteration. */ - HPySequence seq = HPy_AsSequence(ctx, obj); - if (HPy_Sequence_IsError(seq)) - goto not_a_sequence; - HPy_Close(ctx, obj); /* we'll be using only 'seq' in the sequel */ - HPy_ssize_t len = HPy_Sequence_Len(ctx, seq); - for (int i=0; i`_ - for more discussion about the naming convention. - -.. note:: - The goal of the document is only to describe the current CPython API and - its real-world usage. For a discussion about how to design the equivalent - HPy API, see `issue #214 `_ - - -Current CPython API --------------------- - -Bytes -~~~~~ - -There are essentially two ways to build ``bytes``: - -1. Copy the content from an existing C buffer: - -.. code-block:: c - - PyObject* PyBytes_FromString(const char *v); - PyObject* PyBytes_FromStringAndSize(const char *v, Py_ssize_t len); - PyObject* PyBytes_FromFormat(const char *format, ...); - - -2. Create an uninitialized buffer and fill it manually: - -.. code-block:: c - - PyObject s = PyBytes_FromStringAndSize(NULL, size); - char *buf = PyBytes_AS_STRING(s); - strcpy(buf, "hello"); - -(1) is easy for alternative implementations and we can probably provide an HPy -equivalent without changing much, so we will concentrate on (2): let's call it -"raw-buffer API". - -Unicode -~~~~~~~ - -Similarly to ``bytes``, there are several ways to build a ``str``: - -.. code-block:: c - - PyObject* PyUnicode_FromString(const char *u); - PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size); - PyObject* PyUnicode_FromKindAndData(int kind, const void *buffer, Py_ssize_t size); - PyObject* PyUnicode_FromFormat(const char *format, ...); - PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar); - - -.. note:: - ``PyUnicode_FromString{,AndSize}`` take an UTF-8 string in input - -The following functions are used to initialize an uninitialized object, but I -could not find any usage of them outside CPython itself, so I think they can -be safely ignored for now: - -.. code-block:: c - - Py_ssize_t PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, Py_ssize_t length, Py_UCS4 fill_char); - Py_ssize_t PyUnicode_CopyCharacters(PyObject *to, Py_ssize_t to_start, PyObject *from, Py_ssize_t from_start, Py_ssize_t how_many); - - -There are also a bunch of API functions which have been deprecated (see `PEP -623 `_ and `PEP 624 -`_) so we will not take them into -account. The deprecated functions include but are not limited to: - -.. code-block:: c - - PyUnicode_FromUnicode - PyUnicode_FromStringAndSize(NULL,...) // use PyUnicode_New instead - PyUnicode_AS_UNICODE - PyUnicode_AS_DATA - PyUnicode_READY - - -Moreover, CPython 3.3+ adopted a flexible string represenation (`PEP 393 -`_) which means that the underlying -buffer of ``str`` objects can be an array of 1-byte, 2-bytes or 4-bytes -characters (the so called "kind"). - -``str`` objects offer a raw-buffer API, but you need to call the appropriate -function depending on the kind, returning buffers of different types: - -.. code-block:: c - - typedef uint32_t Py_UCS4; - typedef uint16_t Py_UCS2; - typedef uint8_t Py_UCS1; - Py_UCS1* PyUnicode_1BYTE_DATA(PyObject *o); - Py_UCS2* PyUnicode_2BYTE_DATA(PyObject *o); - Py_UCS4* PyUnicode_4BYTE_DATA(PyObject *o); - - -Uninitialized unicode objects are created by calling ``PyUnicode_New(size, -maxchar)``, where ``maxchar`` is the maximum allowed value of a character -inside the string, and determines the kind. So, in cases in which ``maxchar`` -is known in advance, we can predict at compile time what will be the kind of -the string and write code accordingly. E.g.: - -.. code-block:: c - - // ASCII only --> kind == PyUnicode_1BYTE_KIND - PyObject *s = PyUnicode_New(size, 127); - Py_UCS1 *buf = PyUnicode_1BYTE_DATA(s); - strcpy(buf, "hello"); - - -.. note:: - CPython distinguishes between ``PyUnicode_New(size, 127)`` and - ``PyUnicode_New(size, 255)``: in both cases the kind is - ``PyUnicode_1BYTE_KIND``, but the former also sets a flag to indicate that - the string is ASCII-only. - -There are cases in which you don't know the kind in advance because you are -working on generic data. To solve the problem in addition to the raw-buffer -API, CPython also offers an "Opaque API" to write a char inside an unicode: - -.. code-block:: c - - int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, Py_UCS4 character) - void PyUnicode_WRITE(int kind, void *data, Py_ssize_t index, Py_UCS4 value) - -Note that the character to write is always ``Py_UCS4``, so -``_WriteChar``/``_WRITE`` have logic to do something different depending on -the kind. - -.. note:: - ``_WRITE`` is a macro, and its implementation contains a ``switch(kind)``: - I think it is designed with the explicit goal of allowing the compiler to - hoist the ``switch`` outside a loop in which we repeatedly call - ``_WRITE``. However, it is worth noting that I could not find any code - using it outside CPython itself, so it's probably something which we don't - need to care of for HPy. - - -Raw-buffer vs Opaque API ---------------------------- - -There are two ways to initialize a non-initialized string object: - -- **Raw-buffer API**: get a C pointer to the memory and fill it directly: - ``PyBytes_AsString``, ``PyUnicode_1BYTE_DATA``, etc. - -- **Opaque API**: call special functions API to fill the content, without - accessing the buffer directly: e.g., ``PyUnicode_WriteChar``. - -From the point of view of the implementation, a completely opaque API gives -the most flexibility in terms of how to implement a builder and/or a string. -A good example is PyPy's ``str`` type, which uses UTF-8 as the internal -representation. A completely opaque ``HPyStrBuilder`` could allow PyPy to fill -directly its internal UTF-8 buffer (at least in simple cases). On the other -hand, a raw-buffer API would force PyPy to store the UCS{1,2,4} bytes in a -temporary buffer and convert them to UTF-8 during the ``build()`` phase. - -On the other hand, from the point of view of the C programmer it is easier to -have direct access the memory. This allows to: - -- use ``memcpy()`` to copy data into the buffer - -- pass the buffer directly to other C functions which write into it (e.g., - ``read()``) - -- use standard C patterns such as ``*p++ = ...`` or similar. - - -Problems and constraints ------------------------- - -``bytes`` and ``str`` are objects are immutable: the biggest problem of the -current API boils down to the fact that the API allows to construct objects -which are not fully initialized and to mutate them during a -not-well-specificed "initialization phase". - -Problems for alternative implementations: - -1. it assumes that the underlying buffer **can** be mutated. This might not be - always the case, e.g. if you want to use a Java string or an RPython string - as the data buffer. This might also lead to unnecessary copies. - -2. It makes harder to optimize the code: e.g. a JIT cannot safely assume that - a string is actually immutable. - -3. It interacts badly with a moving GC, because we need to ensure that ``buf`` - doesn't move. - -Introducing a builder solves most of the problems, because it introduces a -clear separation between the mutable and immutable phases. - - -Real world usage ------------------ - -In this section we analyze the usage of some string building API in -real world code, as found in the `Top 4000 PyPI packages -`_. - -PyUnicode_New -~~~~~~~~~~~~~ - -This is the recommended "modern" way to create ``str`` objects but it's not -widely used outside CPython. A simple ``grep`` found only 17 matches in the -4000 packages, although some are in very important packages such as -`cffi `_, -``markupsafe`` -(`1 `__, -`2 `__, -`3 `__) -and ``simplejson`` -(`1 `__, -`2 `__). - -In all the examples linked above, ``maxchar`` is hard-coded and known at -compile time. - -There are only four usages of ``PyUnicode_New`` in which ``maxchar`` is -actually unknown until runtime, and it is curious to note that the first three -are in runtime libraries used by code generators: - - 1. `mypyc `__ - - 2. `Cython `__ - - 3. `siplib `__ - - 4. `PyICU `__: - this is the only non-runtime library usage of it, and it's used to - implement a routine to create a ``str`` object from an UTF-16 buffer. - -For HPy, we should at lest consider the opportunity to design special APIs for -the cases in which ``maxchar`` is known in advance, -e.g. ``HPyStrBuilder_ASCII``, ``HPyStrBuilder_UCS1``, etc., and evaluate -whether this would be beneficial for alternative implementations. - -Create empty strings -~~~~~~~~~~~~~~~~~~~~~ - -A special case is ``PyUnicode_New(0, 0)``, which contructs an empty ``str`` -object. CPython special-cases it to always return a prebuilt object. - -This pattern is used a lot inside CPython but only once in 3rd-party extensions, in the ``regex`` library ( -`1 `__, -`2 `__). - -Other ways to build empty strings are ``PyUnicode_FromString("")`` which is used 27 times and ``PyUnicode_FromStringAndSize("", 0)`` which is used only `once -`_. - -For HPy, maybe we should just have a ``ctx->h_EmptyStr`` and -``ctx->h_EmptyBytes``? - -PyUnicode_From*, PyUnicode_Decode* -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Functions of the ``PyUnicode_From*`` and ``PyUnicode_Decode*`` families should -be easy to adapt to HPy, so we won't discuss them in detail. However, here is -the of matches found by grep for each function, to get an idea of how much -each is used: - -``PyUnicode_From*`` family:: - - Documented: - 964 PyUnicode_FromString - 259 PyUnicode_FromFormat - 125 PyUnicode_FromStringAndSize - 58 PyUnicode_FromWideChar - 48 PyUnicode_FromEncodedObject - 17 PyUnicode_FromKindAndData - 9 PyUnicode_FromFormatV - - Undocumented: - 7 PyUnicode_FromOrdinal - - Deprecated: - 66 PyUnicode_FromObject - 45 PyUnicode_FromUnicode - -``PyUnicode_Decode*`` family:: - - 143 PyUnicode_DecodeFSDefault - 114 PyUnicode_DecodeUTF8 - 99 PyUnicode_Decode - 64 PyUnicode_DecodeLatin1 - 51 PyUnicode_DecodeASCII - 12 PyUnicode_DecodeFSDefaultAndSize - 10 PyUnicode_DecodeUTF16 - 8 PyUnicode_DecodeLocale - 6 PyUnicode_DecodeRawUnicodeEscape - 3 PyUnicode_DecodeUTF8Stateful - 2 PyUnicode_DecodeUTF32 - 2 PyUnicode_DecodeUnicodeEscape - - -Raw-buffer access -~~~~~~~~~~~~~~~~~ - -Most of the real world packages use the raw-buffer API to initialize ``str`` -objects, and very often in a way which can't be easily replaced by a fully -opaque API. - -Example 1, ``markupsafe``: the -`DO_ESCAPE `_ -macro takes a parameter called ``outp`` which is obtained by calling -``PyUnicode*BYTE_DATA`` -(`1BYTE `_, -(`2BYTE `_, -(`4BYTE `_). -``DO_ESCAPE`` contains code like this, which would be hard to port to a fully-opaque API: - -.. code-block:: c - - memcpy(outp, inp-ncopy, sizeof(*outp)*ncopy); \ - outp += ncopy; ncopy = 0; \ - *outp++ = '&'; \ - *outp++ = '#'; \ - *outp++ = '3'; \ - *outp++ = '4'; \ - *outp++ = ';'; \ - break; \ - -Another interesting example is -`pybase64 `_. -After removing the unnecessary stuff, the logic boils down to this: - -.. code-block:: c - - out_len = (size_t)(((buffer.len + 2) / 3) * 4); - out_object = PyUnicode_New((Py_ssize_t)out_len, 127); - dst = (char*)PyUnicode_1BYTE_DATA(out_object); - ... - base64_encode(buffer.buf, buffer.len, dst, &out_len, libbase64_simd_flag); - -Note that ``base64_encode`` is an external C function which writes stuff into -a ``char *`` buffer, so in this case it is **required** to use the raw-buffer -API, unless you want to allocate a temporary buffer and copy chars one-by-one -later. - -There are other examples similar to these, but I think there is already enough -evidence that HPy **must** offer a raw-buffer API in addition to a -fully-opaque one. - - -Typed vs untyped raw-buffer writing -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To initialize a ``str`` object using the raw-buffer interface, you need to get -a pointer to the buffer. The vast majority of code uses -``PyUnicode_{1,2,4}BYTE_DATA`` to get a buffer of type ``Py_UCS{1,2,4}*`` and -write directly to it: - -.. code-block:: c - - PyObject *s = PyUnicode_New(size, 127); - Py_UCS1 *buf = PyUnicode_1BYTE_DATA(s); - buf[0] = 'H'; - buf[1] = 'e'; - buf[2] = 'l'; - ... - -The other way to get a pointer to the raw-buffer is to call -``PyUnicode_DATA()``, which returns a ``void *``: the only reasonable way to -write something in this buffer is to ``memcpy()`` the data from another -``str`` buffer of the same kind. This technique is used for example by -`CPython's textio.c `_. - -Outside CPython, the only usage of this technique is inside cython's helper -function `__Pyx_PyUnicode_Join `_. - -This probably means that we don't need to offer untyped raw-buffer writing for -HPy. If we really need to support the ``memcpy`` use case, we can probably -just offer a special function in the builder API. - -PyUnicode_WRITE, PyUnicode_WriteChar -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Outside CPython, ``PyUnicode_WRITE()`` is used only inside Cython's helper -functions -(`1 `__, -`2 `__). -Considering that Cython will need special support for HPy anyway, this means -that we don't need an equivalent of ``PyUnicode_WRITE`` for HPy. - -Similarly, ``PyUnicode_WriteChar()`` is used only once, inside -`JPype `_. - - -PyUnicode_Join -~~~~~~~~~~~~~~ - -All the API functions listed above require the user to know in advance the -size of the string: ``PyUnicode_Join()`` is the only native API call which -allows to build a string whose size is not known in advance. - -Examples of usage are found in ``simplejson`` -(`1 `__, -`2 `__), -`pycairo `__, -``regex`` -(`1 `__, -`2 `__, -`3 `__, -`4 `__, -`5 `__, -`6 `__) -and others, for a total of 25 grep matches. - - -.. note:: - - Contrarily to its unicode equivalent, ``PyBytes_Join()`` does not - exist. There is ``_PyBytes_Join()`` which is private and undocumented, but - some extensions rely on it anyway: - `Cython `__, - `regex `__, - `dulwich `__. - -In theory, alternative implementaions should be able to provide a more -efficient way to achieve the goal. E.g. for pure Python code PyPy offers -``__pypy__.builders.StringBuilder`` which is faster than both ``StringIO`` and -``''.join``, so maybe it might make sense to offer a way to use it from C. diff --git a/docs/overview.rst b/docs/overview.rst index bab5ce530..60fcdde66 100644 --- a/docs/overview.rst +++ b/docs/overview.rst @@ -31,7 +31,7 @@ efficiently and without compromise**. In particular, **reference counting is not part of the API**: we want a more generic way of managing resources that is possible to impelement with different strategies, including the existing reference counting and/or with a moving *Garbage Collector* (like the ones used -by PyPy, GraalPython or Java, for example). Moreover, each implementation can +by PyPy, GraalPy or Java, for example). Moreover, each implementation can experiment with new memory layout of objects, add optimizations, etc. The following is a list of sub-goals. @@ -169,7 +169,7 @@ different ABIs: As the name suggests, the HPy Universal ABI is designed to be loaded and executed by a variety of different Python implementations. Compiled extensions can be loaded unmodified on all the interpreters which support - it. PyPy and GraalPython support it natively. CPython supports it by using the + it. PyPy and GraalPy support it natively. CPython supports it by using the ``hpy.universal`` package, and there is a small speed penalty [#f1]_ compared to the CPython ABI. @@ -223,14 +223,14 @@ The HPy project offers some benefits to the python ecosystem, both to Python users and to library developers. - C extensions can achieve much better speed on alternative implementions, - including PyPy and GraalPython: according to early :ref:`benchmarks`, an + including PyPy and GraalPy: according to early :ref:`benchmarks`, an extension written in HPy can be ~3x faster than the equivalent extension written using ``Python.h``. - Improved debugging: when you load extensions in :ref:`debug-mode:debug mode`, many common mistakes are checked and reported automatically. - Universal binaries: libraries can choose to distribute only Universal ABI binaries. By doing so, they can support all Python implementations and - version of CPython (like PyPy, GraalPython, CPython 3.10, CPython 3.11, etc) + version of CPython (like PyPy, GraalPy, CPython 3.10, CPython 3.11, etc) for which an HPy loader exists, including those that do not yet exist! This currently comes with a small speed penalty on CPython, but for non-performance critical libraries it might still be a good tradeoff. @@ -325,7 +325,7 @@ already in place. As on April 2022, the following milestones have been reached: - it is possible to load HPy Universal extensions on PyPy (using the PyPy `hpy branch `_). - - it is possible to load HPy Universal extensions on `GraalPython + - it is possible to load HPy Universal extensions on `GraalPy `_. @@ -405,7 +405,7 @@ code/design/discussions of HPy: - Cython - - GraalPython + - GraalPy - RustPython @@ -424,7 +424,7 @@ compatibility layer include: - `IronPython `_ - - `GraalPython `_ + - `GraalPy `_ .. rubric:: Footnotes diff --git a/docs/porting-guide.rst b/docs/porting-guide.rst index e7e621490..66d4bc7ca 100644 --- a/docs/porting-guide.rst +++ b/docs/porting-guide.rst @@ -43,7 +43,7 @@ Back to ``HPy`` vs ``HPyField`` vs ``HPyGlobal``: as soon as they are no longer needed. The debug mode will report a long-lived ``HPy`` as a potential memory leak. - * In PyPy and GraalPython, ``HPy`` handles are implemented using an + * In PyPy and GraalPy, ``HPy`` handles are implemented using an indirection: they are indexes inside a big list of GC-managed objects: this big list is tracked by the GC, so when an object moves its pointer is correctly updated. @@ -146,7 +146,7 @@ Direct C API to HPy mappings In many cases, migrating to HPy is as easy as just replacing a certain C API function by the appropriate HPy API function. Table :ref:`table-mapping` gives a mapping between C API and HPy API functions. This mapping is generated together -with the code for the :term:`CPython ABI` mode, so it is correct. +with the code for the :term:`CPython ABI` mode, so it is guaranteed to be correct. .. _table-mapping: diff --git a/docs/quickstart.rst b/docs/quickstart.rst new file mode 100644 index 000000000..b526552e7 --- /dev/null +++ b/docs/quickstart.rst @@ -0,0 +1,52 @@ +HPy Quickstart +============== + +This section shows how to quickly get started with HPy by creating +a simple HPy extension from scratch. + +Install HPy: + +.. + This should be updated to pip install hpy once this version is released + +.. code-block:: console + + python3 -m pip install git+https://github.com/hpyproject/hpy.git#egg=hpy.universal + +Create a new directory for the new HPy extension. Location and name of the directory +do not matter. Add the following two files: + +.. literalinclude:: examples/quickstart/quickstart.c + +.. literalinclude:: examples/quickstart/setup.py + :language: python + +Build the extension: + +.. code-block:: console + + python3 setup.py --hpy-abi=universal develop + +Try it out -- start Python console in the same directory and type: + +.. literalinclude:: examples/tests.py + :start-after: test_quickstart + :end-before: # END: test_quickstart + +Notice the shared library that was created by running ``setup.py``: + +.. code-block:: console + + > ls *.so + quickstart.hpy0.so + +It does not have Python version encoded in it. If you happen to have +GraalPy or PyPy installation that supports given HPy version, you can +try running the same extension on it. For example, start +``$GRAALVM_HOME/bin/graalpy`` in the same directory and type the same +Python code: the extension should load and work just fine. + +Where to go next? + + - :ref:`Simple documented HPy extension example` + - :doc:`Tutorial: porting Python/C API extension to HPy` diff --git a/docs/trace-mode.rst b/docs/trace-mode.rst index 704b033a8..35b5e409f 100644 --- a/docs/trace-mode.rst +++ b/docs/trace-mode.rst @@ -19,7 +19,7 @@ Similar to how the :ref:`debug mode is activated `, use environment variable ``HPY``. If ``HPY=trace``, then all HPy modules are loaded with the trace context. Alternatively, it is also possible to specify the mode -per module like this: ``HPY=modA:trace,modB=trace``. +per module like this: ``HPY=modA:trace,modB:trace``. Environment variable ``HPY_LOG`` also works.