From 3ed0a22b9a926da85bc081e2d99062b0f10c4c79 Mon Sep 17 00:00:00 2001 From: stepan Date: Tue, 31 Jan 2023 15:57:04 +0100 Subject: [PATCH 1/7] Fix PEP references --- docs/api.rst | 8 +++++--- docs/examples/hpytype-example/builtin_type.c | 2 +- docs/examples/hpytype-example/simple_type.c | 2 +- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/api.rst b/docs/api.rst index 3f5a1bd6f..abf11f337 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -242,9 +242,11 @@ are still written using the ``Python.h`` API. Note that the HPy module does not specify its name. HPy does not support the legacy single phase module initialization and the only module initialization approach is -the multi-phase initialization (PEP 451). With multi-phase module initialization, -the name of the module is always taken from the ``ModuleSpec``, i.e., most likely -from the name used in the ``import {{name}}`` statement that imported your module. +the multi-phase initialization (`PEP 489 `_). +With multi-phase module initialization, +the name of the module is always taken from the ``ModuleSpec`` (`PEP 451 `_) +, i.e., most likely from the name used in the ``import {{name}}`` statement that +imported your module. This is the only difference stemming from multi-phase module initialization in this simple example. diff --git a/docs/examples/hpytype-example/builtin_type.c b/docs/examples/hpytype-example/builtin_type.c index 087a89ab7..5ec749b69 100644 --- a/docs/examples/hpytype-example/builtin_type.c +++ b/docs/examples/hpytype-example/builtin_type.c @@ -82,7 +82,7 @@ static void make_Language(HPyContext *ctx, HPy module) } HPyDef_SLOT(simple_exec, HPy_mod_exec) -int simple_exec_impl(HPyContext *ctx, HPy m) { +static int simple_exec_impl(HPyContext *ctx, HPy m) { make_Dummy(ctx, m); if (HPyErr_Occurred(ctx)) return -1; diff --git a/docs/examples/hpytype-example/simple_type.c b/docs/examples/hpytype-example/simple_type.c index 9d336931b..0e857bc49 100644 --- a/docs/examples/hpytype-example/simple_type.c +++ b/docs/examples/hpytype-example/simple_type.c @@ -80,7 +80,7 @@ static HPyType_Spec Point_spec = { // BEGIN: add_type HPyDef_SLOT(simple_exec, HPy_mod_exec) -int simple_exec_impl(HPyContext *ctx, HPy m) { +static int simple_exec_impl(HPyContext *ctx, HPy m) { if (!HPyHelpers_AddType(ctx, m, "Point", &Point_spec, NULL)) { return -1; } From e0e60c5097f263ae38c62edefb7f29abc6a84e04 Mon Sep 17 00:00:00 2001 From: stepan Date: Tue, 31 Jan 2023 15:29:02 +0100 Subject: [PATCH 2/7] Documentation improvements --- docs/api-reference/inline-helpers.rst | 4 ++ docs/api.rst | 62 ++++++++++++++++----------- docs/index.rst | 50 ++++++++++++++------- docs/misc/index.rst | 4 +- docs/misc/protocols.rst | 8 ++-- docs/misc/str-builder-api.rst | 6 ++- docs/porting-guide.rst | 2 +- 7 files changed, 89 insertions(+), 47 deletions(-) diff --git a/docs/api-reference/inline-helpers.rst b/docs/api-reference/inline-helpers.rst index 96be1b315..1e7a1defb 100644 --- a/docs/api-reference/inline-helpers.rst +++ b/docs/api-reference/inline-helpers.rst @@ -5,5 +5,9 @@ Inline Helpers Those functions are usually small convenience functions that everyone could write but in order to avoid duplicated effort, they are defined by HPy. +One category of inline helpers are functions that convert the commonly used +but not fixed width C types, such as ``int``, or ``long long``, to HPy API +that uses only fixed with types in order to ensure maximal ABI compatibility. + .. autocmodule:: hpy/inline_helpers.h :members: diff --git a/docs/api.rst b/docs/api.rst index abf11f337..c31e82384 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -157,6 +157,7 @@ Moreover, ``HPyContext`` is used by the :term:`HPy Universal ABI` to contain a sort of virtual function table which is used by the C extensions to call back into the Python interpreter. +.. _simple example: A simple example ----------------- @@ -322,30 +323,6 @@ table, which now becomes: :start-after: // BEGIN: methodsdef :end-before: // END: methodsdef -More Examples -------------- - -HPy usually has tests for each API function. This means that there is lots of -examples available by looking at the tests. However, the test source uses -many macros and is hard to read. To overcome this we supply a utility to -export clean C sources for the tests. Since the HPy tests are not shipped by -default, you need to clone the HPy repository from GitHub: - -.. code-block:: console - - > git clone https://github.com/hpyproject/hpy.git - -After that, install all test requirements and dump the sources: - -.. code-block:: console - - > cd hpy - > python3 -m pip install pytest filelock - > python3 -m pytest --dump-dir=test_sources test/ - -This will dump the generated test sources into folder ``test_sources``. Note, -that the tests won't be executed but skipped with an appropriate message. - Creating types in HPy --------------------- @@ -569,3 +546,40 @@ be considered in three places: For more information about the built-in shape and for a technical explanation for why it is required, see :c:member:`HPyType_Spec.builtin_shape` and :c:enum:`HPyType_BuiltinShape`. + +More Examples +------------- + +The :doc:`porting-example/index` shows another complete example +of HPy extension ported from Python/C API. + +The `HPy project space `_ on GitHub +contains forks of some popular Python extensions ported to HPy as +a proof of concept/feasibility studies, such as the +`Kiwi solver `_. +Note that those forks may not be up to date with their upstream projects +or with the upstream HPy changes. + +HPy unit tests +~~~~~~~~~~~~~~ + +HPy usually has tests for each API function. This means that there is lots of +examples available by looking at the tests. However, the test source uses +many macros and is hard to read. To overcome this we supply a utility to +export clean C sources for the tests. Since the HPy tests are not shipped by +default, you need to clone the HPy repository from GitHub: + +.. code-block:: console + + > git clone https://github.com/hpyproject/hpy.git + +After that, install all test requirements and dump the sources: + +.. code-block:: console + + > cd hpy + > python3 -m pip install pytest filelock + > python3 -m pytest --dump-dir=test_sources test/ + +This will dump the generated test sources into folder ``test_sources``. Note, +that the tests won't be executed but skipped with an appropriate message. diff --git a/docs/index.rst b/docs/index.rst index 778d6cabc..cb0fc7acb 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -8,32 +8,52 @@ HPy: a better API for Python HPy provides a new API for extending Python in C. +There are several advantages to write your C extension in HPy: + + - **Speed**: it runs much faster on PyPy, GraalPython, and at native speed on CPython + + - **Deployment**: it is possible to compile a single binary which runs unmodified on all + supported Python implementations and versions -- think "stable ABI" on steroids + + - **Simplicity**: it is simpler and more manageable than the ``Python.h`` API, both for + the users and the Pythons implementing it + + - **Debugging**: it provides an improved debugging experience. Debug mode can be turned + on at runtime without the need to recompile the extension or the Python running it. + HPy design is more suitable for automated checks. + The official `Python/C API `_, also informally known as ``#include ``, is specific to the current implementation of CPython: it exposes a lot of -internal details which makes it hard: +internal details which makes it hard to: - - to implement it for other Python implementations (e.g. PyPy, GraalPython, - Jython, IronPython, etc.) + - implement it for other Python implementations (e.g. PyPy, GraalPython, + Jython, ...) - - to experiment with new things inside CPython itself: e.g. using a GC - instead of refcounting, or to remove the GIL. + - experiment with new approaches inside CPython itself, for example: -There are several advantages to write your C extension in HPy: + - use a tracing garbage collection instead of reference counting + - remove the global interpreter lock (GIL) to take full advantage of multicore architectures + - use tagged pointers to reduce memory footprint + +Where to go next: +----------------- + + - Show me the code: - - it runs much faster on PyPy, GraalPython, and at native speed on CPython + - :ref:`Simple HPy extension example` + - :doc:`Example of porting a sample Python/C API extension to HPy` - - it is possible to compile a single binary which runs unmodified on all - supported Python implementations and versions + - Details: - - it is simpler and more manageable than the ``Python.h`` API + - :doc:`HPy overview: motivation, goals, current status` + - :doc:`HPy API concepts introduction` + - :doc:`Python/C API to HPy Porting guide` + - :doc:`HPy API reference` - - it provides an improved debugging experience: in "debug mode", HPy - actively checks for many common mistakes such as reference leaks and - invalid usage of objects after they have been deleted. It is possible to - turn the "debug mode" on at startup time, without needing to recompile - Python or the extension itself +Full table of contents: +----------------------- .. toctree:: :maxdepth: 2 diff --git a/docs/misc/index.rst b/docs/misc/index.rst index 6570684e1..43f3898af 100644 --- a/docs/misc/index.rst +++ b/docs/misc/index.rst @@ -2,8 +2,8 @@ Misc notes ========== .. toctree:: - :maxdepth: 2 + :maxdepth: 1 - str-builder-api embedding + str-builder-api protocols diff --git a/docs/misc/protocols.rst b/docs/misc/protocols.rst index 2b7b8de9f..1f88f64d5 100644 --- a/docs/misc/protocols.rst +++ b/docs/misc/protocols.rst @@ -1,6 +1,8 @@ -##################### -Specialized Protocols -##################### +################################### +Design notes: Specialized Protocols +################################### + +`Note: these are only design notes. The API is not finalized, nor implemented yet.` Protocols can help remove abstraction overhead when possible. For example, consider the case of iterating over a sequence (list, tuple, array.array, etc.) diff --git a/docs/misc/str-builder-api.rst b/docs/misc/str-builder-api.rst index 46612a331..5312009a7 100644 --- a/docs/misc/str-builder-api.rst +++ b/docs/misc/str-builder-api.rst @@ -1,5 +1,7 @@ -bytes/str building API -======================= +Design notes: bytes/str building API +==================================== + +`Note: these are only design notes. The API is not finalized, nor implemented yet.` We need to design an HPy API to build ``bytes`` and ``str`` objects. Before making any proposal, it is useful to understand: diff --git a/docs/porting-guide.rst b/docs/porting-guide.rst index e7e621490..66f9261ae 100644 --- a/docs/porting-guide.rst +++ b/docs/porting-guide.rst @@ -146,7 +146,7 @@ Direct C API to HPy mappings In many cases, migrating to HPy is as easy as just replacing a certain C API function by the appropriate HPy API function. Table :ref:`table-mapping` gives a mapping between C API and HPy API functions. This mapping is generated together -with the code for the :term:`CPython ABI` mode, so it is correct. +with the code for the :term:`CPython ABI` mode, so it is guaranteed to be correct. .. _table-mapping: From 889d347d5a1f194a0bb7d8dfe0a7a20ae0800fe0 Mon Sep 17 00:00:00 2001 From: stepan Date: Tue, 31 Jan 2023 16:33:19 +0100 Subject: [PATCH 3/7] Rename: GraalPython->GraalPy --- docs/api.rst | 2 +- docs/index.rst | 4 ++-- docs/overview.rst | 14 +++++++------- docs/porting-guide.rst | 2 +- 4 files changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/api.rst b/docs/api.rst index c31e82384..36c39ac62 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -456,7 +456,7 @@ A type with ``.legacy_slots != NULL`` is required to have ``HPyType_BuiltinShape_Legacy`` and to include ``PyObject_HEAD`` at the start of its struct. It would be easy to relax this requirement on CPython (where the ``PyObject_HEAD`` fields are always present) but a large burden on other -implementations (e.g. PyPy, GraalPython) where a struct starting with +implementations (e.g. PyPy, GraalPy) where a struct starting with ``PyObject_HEAD`` might not exist. Types created via the old Python C API are automatically legacy types. diff --git a/docs/index.rst b/docs/index.rst index cb0fc7acb..6a48d3aa2 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -10,7 +10,7 @@ HPy provides a new API for extending Python in C. There are several advantages to write your C extension in HPy: - - **Speed**: it runs much faster on PyPy, GraalPython, and at native speed on CPython + - **Speed**: it runs much faster on PyPy, GraalPy, and at native speed on CPython - **Deployment**: it is possible to compile a single binary which runs unmodified on all supported Python implementations and versions -- think "stable ABI" on steroids @@ -27,7 +27,7 @@ also informally known as ``#include ``, is specific to the current implementation of CPython: it exposes a lot of internal details which makes it hard to: - - implement it for other Python implementations (e.g. PyPy, GraalPython, + - implement it for other Python implementations (e.g. PyPy, GraalPy, Jython, ...) - experiment with new approaches inside CPython itself, for example: diff --git a/docs/overview.rst b/docs/overview.rst index bab5ce530..60fcdde66 100644 --- a/docs/overview.rst +++ b/docs/overview.rst @@ -31,7 +31,7 @@ efficiently and without compromise**. In particular, **reference counting is not part of the API**: we want a more generic way of managing resources that is possible to impelement with different strategies, including the existing reference counting and/or with a moving *Garbage Collector* (like the ones used -by PyPy, GraalPython or Java, for example). Moreover, each implementation can +by PyPy, GraalPy or Java, for example). Moreover, each implementation can experiment with new memory layout of objects, add optimizations, etc. The following is a list of sub-goals. @@ -169,7 +169,7 @@ different ABIs: As the name suggests, the HPy Universal ABI is designed to be loaded and executed by a variety of different Python implementations. Compiled extensions can be loaded unmodified on all the interpreters which support - it. PyPy and GraalPython support it natively. CPython supports it by using the + it. PyPy and GraalPy support it natively. CPython supports it by using the ``hpy.universal`` package, and there is a small speed penalty [#f1]_ compared to the CPython ABI. @@ -223,14 +223,14 @@ The HPy project offers some benefits to the python ecosystem, both to Python users and to library developers. - C extensions can achieve much better speed on alternative implementions, - including PyPy and GraalPython: according to early :ref:`benchmarks`, an + including PyPy and GraalPy: according to early :ref:`benchmarks`, an extension written in HPy can be ~3x faster than the equivalent extension written using ``Python.h``. - Improved debugging: when you load extensions in :ref:`debug-mode:debug mode`, many common mistakes are checked and reported automatically. - Universal binaries: libraries can choose to distribute only Universal ABI binaries. By doing so, they can support all Python implementations and - version of CPython (like PyPy, GraalPython, CPython 3.10, CPython 3.11, etc) + version of CPython (like PyPy, GraalPy, CPython 3.10, CPython 3.11, etc) for which an HPy loader exists, including those that do not yet exist! This currently comes with a small speed penalty on CPython, but for non-performance critical libraries it might still be a good tradeoff. @@ -325,7 +325,7 @@ already in place. As on April 2022, the following milestones have been reached: - it is possible to load HPy Universal extensions on PyPy (using the PyPy `hpy branch `_). - - it is possible to load HPy Universal extensions on `GraalPython + - it is possible to load HPy Universal extensions on `GraalPy `_. @@ -405,7 +405,7 @@ code/design/discussions of HPy: - Cython - - GraalPython + - GraalPy - RustPython @@ -424,7 +424,7 @@ compatibility layer include: - `IronPython `_ - - `GraalPython `_ + - `GraalPy `_ .. rubric:: Footnotes diff --git a/docs/porting-guide.rst b/docs/porting-guide.rst index 66f9261ae..66d4bc7ca 100644 --- a/docs/porting-guide.rst +++ b/docs/porting-guide.rst @@ -43,7 +43,7 @@ Back to ``HPy`` vs ``HPyField`` vs ``HPyGlobal``: as soon as they are no longer needed. The debug mode will report a long-lived ``HPy`` as a potential memory leak. - * In PyPy and GraalPython, ``HPy`` handles are implemented using an + * In PyPy and GraalPy, ``HPy`` handles are implemented using an indirection: they are indexes inside a big list of GC-managed objects: this big list is tracked by the GC, so when an object moves its pointer is correctly updated. From 2a1c05741bf096a8fe5ed23ed9cef493a54a87c5 Mon Sep 17 00:00:00 2001 From: stepan Date: Tue, 31 Jan 2023 16:41:24 +0100 Subject: [PATCH 4/7] Remove design notes from user documentation --- docs/misc/index.rst | 2 - docs/misc/protocols-code.c | 55 ----- docs/misc/protocols.rst | 33 --- docs/misc/str-builder-api.rst | 445 ---------------------------------- 4 files changed, 535 deletions(-) delete mode 100644 docs/misc/protocols-code.c delete mode 100644 docs/misc/protocols.rst delete mode 100644 docs/misc/str-builder-api.rst diff --git a/docs/misc/index.rst b/docs/misc/index.rst index 43f3898af..5a57e5157 100644 --- a/docs/misc/index.rst +++ b/docs/misc/index.rst @@ -5,5 +5,3 @@ Misc notes :maxdepth: 1 embedding - str-builder-api - protocols diff --git a/docs/misc/protocols-code.c b/docs/misc/protocols-code.c deleted file mode 100644 index 0d94166ae..000000000 --- a/docs/misc/protocols-code.c +++ /dev/null @@ -1,55 +0,0 @@ -// BEGIN: foo -void iterate_objects() -{ - /* If the object is not a sequence, we might want to fall back to generic iteration. */ - HPySequence seq = HPy_AsSequence(ctx, obj); - if (HPy_Sequence_IsError(seq)) - goto not_a_sequence; - HPy_Close(ctx, obj); /* we'll be using only 'seq' in the sequel */ - HPy_ssize_t len = HPy_Sequence_Len(ctx, seq); - for (int i=0; i`_ - for more discussion about the naming convention. - -.. note:: - The goal of the document is only to describe the current CPython API and - its real-world usage. For a discussion about how to design the equivalent - HPy API, see `issue #214 `_ - - -Current CPython API --------------------- - -Bytes -~~~~~ - -There are essentially two ways to build ``bytes``: - -1. Copy the content from an existing C buffer: - -.. code-block:: c - - PyObject* PyBytes_FromString(const char *v); - PyObject* PyBytes_FromStringAndSize(const char *v, Py_ssize_t len); - PyObject* PyBytes_FromFormat(const char *format, ...); - - -2. Create an uninitialized buffer and fill it manually: - -.. code-block:: c - - PyObject s = PyBytes_FromStringAndSize(NULL, size); - char *buf = PyBytes_AS_STRING(s); - strcpy(buf, "hello"); - -(1) is easy for alternative implementations and we can probably provide an HPy -equivalent without changing much, so we will concentrate on (2): let's call it -"raw-buffer API". - -Unicode -~~~~~~~ - -Similarly to ``bytes``, there are several ways to build a ``str``: - -.. code-block:: c - - PyObject* PyUnicode_FromString(const char *u); - PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size); - PyObject* PyUnicode_FromKindAndData(int kind, const void *buffer, Py_ssize_t size); - PyObject* PyUnicode_FromFormat(const char *format, ...); - PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar); - - -.. note:: - ``PyUnicode_FromString{,AndSize}`` take an UTF-8 string in input - -The following functions are used to initialize an uninitialized object, but I -could not find any usage of them outside CPython itself, so I think they can -be safely ignored for now: - -.. code-block:: c - - Py_ssize_t PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, Py_ssize_t length, Py_UCS4 fill_char); - Py_ssize_t PyUnicode_CopyCharacters(PyObject *to, Py_ssize_t to_start, PyObject *from, Py_ssize_t from_start, Py_ssize_t how_many); - - -There are also a bunch of API functions which have been deprecated (see `PEP -623 `_ and `PEP 624 -`_) so we will not take them into -account. The deprecated functions include but are not limited to: - -.. code-block:: c - - PyUnicode_FromUnicode - PyUnicode_FromStringAndSize(NULL,...) // use PyUnicode_New instead - PyUnicode_AS_UNICODE - PyUnicode_AS_DATA - PyUnicode_READY - - -Moreover, CPython 3.3+ adopted a flexible string represenation (`PEP 393 -`_) which means that the underlying -buffer of ``str`` objects can be an array of 1-byte, 2-bytes or 4-bytes -characters (the so called "kind"). - -``str`` objects offer a raw-buffer API, but you need to call the appropriate -function depending on the kind, returning buffers of different types: - -.. code-block:: c - - typedef uint32_t Py_UCS4; - typedef uint16_t Py_UCS2; - typedef uint8_t Py_UCS1; - Py_UCS1* PyUnicode_1BYTE_DATA(PyObject *o); - Py_UCS2* PyUnicode_2BYTE_DATA(PyObject *o); - Py_UCS4* PyUnicode_4BYTE_DATA(PyObject *o); - - -Uninitialized unicode objects are created by calling ``PyUnicode_New(size, -maxchar)``, where ``maxchar`` is the maximum allowed value of a character -inside the string, and determines the kind. So, in cases in which ``maxchar`` -is known in advance, we can predict at compile time what will be the kind of -the string and write code accordingly. E.g.: - -.. code-block:: c - - // ASCII only --> kind == PyUnicode_1BYTE_KIND - PyObject *s = PyUnicode_New(size, 127); - Py_UCS1 *buf = PyUnicode_1BYTE_DATA(s); - strcpy(buf, "hello"); - - -.. note:: - CPython distinguishes between ``PyUnicode_New(size, 127)`` and - ``PyUnicode_New(size, 255)``: in both cases the kind is - ``PyUnicode_1BYTE_KIND``, but the former also sets a flag to indicate that - the string is ASCII-only. - -There are cases in which you don't know the kind in advance because you are -working on generic data. To solve the problem in addition to the raw-buffer -API, CPython also offers an "Opaque API" to write a char inside an unicode: - -.. code-block:: c - - int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, Py_UCS4 character) - void PyUnicode_WRITE(int kind, void *data, Py_ssize_t index, Py_UCS4 value) - -Note that the character to write is always ``Py_UCS4``, so -``_WriteChar``/``_WRITE`` have logic to do something different depending on -the kind. - -.. note:: - ``_WRITE`` is a macro, and its implementation contains a ``switch(kind)``: - I think it is designed with the explicit goal of allowing the compiler to - hoist the ``switch`` outside a loop in which we repeatedly call - ``_WRITE``. However, it is worth noting that I could not find any code - using it outside CPython itself, so it's probably something which we don't - need to care of for HPy. - - -Raw-buffer vs Opaque API ---------------------------- - -There are two ways to initialize a non-initialized string object: - -- **Raw-buffer API**: get a C pointer to the memory and fill it directly: - ``PyBytes_AsString``, ``PyUnicode_1BYTE_DATA``, etc. - -- **Opaque API**: call special functions API to fill the content, without - accessing the buffer directly: e.g., ``PyUnicode_WriteChar``. - -From the point of view of the implementation, a completely opaque API gives -the most flexibility in terms of how to implement a builder and/or a string. -A good example is PyPy's ``str`` type, which uses UTF-8 as the internal -representation. A completely opaque ``HPyStrBuilder`` could allow PyPy to fill -directly its internal UTF-8 buffer (at least in simple cases). On the other -hand, a raw-buffer API would force PyPy to store the UCS{1,2,4} bytes in a -temporary buffer and convert them to UTF-8 during the ``build()`` phase. - -On the other hand, from the point of view of the C programmer it is easier to -have direct access the memory. This allows to: - -- use ``memcpy()`` to copy data into the buffer - -- pass the buffer directly to other C functions which write into it (e.g., - ``read()``) - -- use standard C patterns such as ``*p++ = ...`` or similar. - - -Problems and constraints ------------------------- - -``bytes`` and ``str`` are objects are immutable: the biggest problem of the -current API boils down to the fact that the API allows to construct objects -which are not fully initialized and to mutate them during a -not-well-specificed "initialization phase". - -Problems for alternative implementations: - -1. it assumes that the underlying buffer **can** be mutated. This might not be - always the case, e.g. if you want to use a Java string or an RPython string - as the data buffer. This might also lead to unnecessary copies. - -2. It makes harder to optimize the code: e.g. a JIT cannot safely assume that - a string is actually immutable. - -3. It interacts badly with a moving GC, because we need to ensure that ``buf`` - doesn't move. - -Introducing a builder solves most of the problems, because it introduces a -clear separation between the mutable and immutable phases. - - -Real world usage ------------------ - -In this section we analyze the usage of some string building API in -real world code, as found in the `Top 4000 PyPI packages -`_. - -PyUnicode_New -~~~~~~~~~~~~~ - -This is the recommended "modern" way to create ``str`` objects but it's not -widely used outside CPython. A simple ``grep`` found only 17 matches in the -4000 packages, although some are in very important packages such as -`cffi `_, -``markupsafe`` -(`1 `__, -`2 `__, -`3 `__) -and ``simplejson`` -(`1 `__, -`2 `__). - -In all the examples linked above, ``maxchar`` is hard-coded and known at -compile time. - -There are only four usages of ``PyUnicode_New`` in which ``maxchar`` is -actually unknown until runtime, and it is curious to note that the first three -are in runtime libraries used by code generators: - - 1. `mypyc `__ - - 2. `Cython `__ - - 3. `siplib `__ - - 4. `PyICU `__: - this is the only non-runtime library usage of it, and it's used to - implement a routine to create a ``str`` object from an UTF-16 buffer. - -For HPy, we should at lest consider the opportunity to design special APIs for -the cases in which ``maxchar`` is known in advance, -e.g. ``HPyStrBuilder_ASCII``, ``HPyStrBuilder_UCS1``, etc., and evaluate -whether this would be beneficial for alternative implementations. - -Create empty strings -~~~~~~~~~~~~~~~~~~~~~ - -A special case is ``PyUnicode_New(0, 0)``, which contructs an empty ``str`` -object. CPython special-cases it to always return a prebuilt object. - -This pattern is used a lot inside CPython but only once in 3rd-party extensions, in the ``regex`` library ( -`1 `__, -`2 `__). - -Other ways to build empty strings are ``PyUnicode_FromString("")`` which is used 27 times and ``PyUnicode_FromStringAndSize("", 0)`` which is used only `once -`_. - -For HPy, maybe we should just have a ``ctx->h_EmptyStr`` and -``ctx->h_EmptyBytes``? - -PyUnicode_From*, PyUnicode_Decode* -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Functions of the ``PyUnicode_From*`` and ``PyUnicode_Decode*`` families should -be easy to adapt to HPy, so we won't discuss them in detail. However, here is -the of matches found by grep for each function, to get an idea of how much -each is used: - -``PyUnicode_From*`` family:: - - Documented: - 964 PyUnicode_FromString - 259 PyUnicode_FromFormat - 125 PyUnicode_FromStringAndSize - 58 PyUnicode_FromWideChar - 48 PyUnicode_FromEncodedObject - 17 PyUnicode_FromKindAndData - 9 PyUnicode_FromFormatV - - Undocumented: - 7 PyUnicode_FromOrdinal - - Deprecated: - 66 PyUnicode_FromObject - 45 PyUnicode_FromUnicode - -``PyUnicode_Decode*`` family:: - - 143 PyUnicode_DecodeFSDefault - 114 PyUnicode_DecodeUTF8 - 99 PyUnicode_Decode - 64 PyUnicode_DecodeLatin1 - 51 PyUnicode_DecodeASCII - 12 PyUnicode_DecodeFSDefaultAndSize - 10 PyUnicode_DecodeUTF16 - 8 PyUnicode_DecodeLocale - 6 PyUnicode_DecodeRawUnicodeEscape - 3 PyUnicode_DecodeUTF8Stateful - 2 PyUnicode_DecodeUTF32 - 2 PyUnicode_DecodeUnicodeEscape - - -Raw-buffer access -~~~~~~~~~~~~~~~~~ - -Most of the real world packages use the raw-buffer API to initialize ``str`` -objects, and very often in a way which can't be easily replaced by a fully -opaque API. - -Example 1, ``markupsafe``: the -`DO_ESCAPE `_ -macro takes a parameter called ``outp`` which is obtained by calling -``PyUnicode*BYTE_DATA`` -(`1BYTE `_, -(`2BYTE `_, -(`4BYTE `_). -``DO_ESCAPE`` contains code like this, which would be hard to port to a fully-opaque API: - -.. code-block:: c - - memcpy(outp, inp-ncopy, sizeof(*outp)*ncopy); \ - outp += ncopy; ncopy = 0; \ - *outp++ = '&'; \ - *outp++ = '#'; \ - *outp++ = '3'; \ - *outp++ = '4'; \ - *outp++ = ';'; \ - break; \ - -Another interesting example is -`pybase64 `_. -After removing the unnecessary stuff, the logic boils down to this: - -.. code-block:: c - - out_len = (size_t)(((buffer.len + 2) / 3) * 4); - out_object = PyUnicode_New((Py_ssize_t)out_len, 127); - dst = (char*)PyUnicode_1BYTE_DATA(out_object); - ... - base64_encode(buffer.buf, buffer.len, dst, &out_len, libbase64_simd_flag); - -Note that ``base64_encode`` is an external C function which writes stuff into -a ``char *`` buffer, so in this case it is **required** to use the raw-buffer -API, unless you want to allocate a temporary buffer and copy chars one-by-one -later. - -There are other examples similar to these, but I think there is already enough -evidence that HPy **must** offer a raw-buffer API in addition to a -fully-opaque one. - - -Typed vs untyped raw-buffer writing -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To initialize a ``str`` object using the raw-buffer interface, you need to get -a pointer to the buffer. The vast majority of code uses -``PyUnicode_{1,2,4}BYTE_DATA`` to get a buffer of type ``Py_UCS{1,2,4}*`` and -write directly to it: - -.. code-block:: c - - PyObject *s = PyUnicode_New(size, 127); - Py_UCS1 *buf = PyUnicode_1BYTE_DATA(s); - buf[0] = 'H'; - buf[1] = 'e'; - buf[2] = 'l'; - ... - -The other way to get a pointer to the raw-buffer is to call -``PyUnicode_DATA()``, which returns a ``void *``: the only reasonable way to -write something in this buffer is to ``memcpy()`` the data from another -``str`` buffer of the same kind. This technique is used for example by -`CPython's textio.c `_. - -Outside CPython, the only usage of this technique is inside cython's helper -function `__Pyx_PyUnicode_Join `_. - -This probably means that we don't need to offer untyped raw-buffer writing for -HPy. If we really need to support the ``memcpy`` use case, we can probably -just offer a special function in the builder API. - -PyUnicode_WRITE, PyUnicode_WriteChar -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Outside CPython, ``PyUnicode_WRITE()`` is used only inside Cython's helper -functions -(`1 `__, -`2 `__). -Considering that Cython will need special support for HPy anyway, this means -that we don't need an equivalent of ``PyUnicode_WRITE`` for HPy. - -Similarly, ``PyUnicode_WriteChar()`` is used only once, inside -`JPype `_. - - -PyUnicode_Join -~~~~~~~~~~~~~~ - -All the API functions listed above require the user to know in advance the -size of the string: ``PyUnicode_Join()`` is the only native API call which -allows to build a string whose size is not known in advance. - -Examples of usage are found in ``simplejson`` -(`1 `__, -`2 `__), -`pycairo `__, -``regex`` -(`1 `__, -`2 `__, -`3 `__, -`4 `__, -`5 `__, -`6 `__) -and others, for a total of 25 grep matches. - - -.. note:: - - Contrarily to its unicode equivalent, ``PyBytes_Join()`` does not - exist. There is ``_PyBytes_Join()`` which is private and undocumented, but - some extensions rely on it anyway: - `Cython `__, - `regex `__, - `dulwich `__. - -In theory, alternative implementaions should be able to provide a more -efficient way to achieve the goal. E.g. for pure Python code PyPy offers -``__pypy__.builders.StringBuilder`` which is faster than both ``StringIO`` and -``''.join``, so maybe it might make sense to offer a way to use it from C. From b63aa8f3b05feb21e4553a2f3cade0a8cd62e9d0 Mon Sep 17 00:00:00 2001 From: stepan Date: Tue, 31 Jan 2023 17:46:57 +0100 Subject: [PATCH 5/7] Add a quickstart section --- Makefile | 1 + docs/examples/quickstart/quickstart.c | 39 ++++++++++++++++++++ docs/examples/quickstart/setup.py | 13 +++++++ docs/examples/tests.py | 6 ++++ docs/index.rst | 6 ++-- docs/quickstart.rst | 52 +++++++++++++++++++++++++++ 6 files changed, 115 insertions(+), 2 deletions(-) create mode 100644 docs/examples/quickstart/quickstart.c create mode 100644 docs/examples/quickstart/setup.py create mode 100644 docs/quickstart.rst diff --git a/Makefile b/Makefile index b4342a537..f18fdcc7a 100644 --- a/Makefile +++ b/Makefile @@ -73,5 +73,6 @@ docs-examples-tests: cd docs/examples/simple-example && python3 setup.py --hpy-abi=universal install cd docs/examples/mixed-example && python3 setup.py install cd docs/examples/snippets && python3 setup.py --hpy-abi=universal install + cd docs/examples/quickstart && python3 setup.py --hpy-abi=universal install cd docs/examples/hpytype-example && python3 setup.py --hpy-abi=universal install python3 -m pytest docs/examples/tests.py ${TEST_ARGS} diff --git a/docs/examples/quickstart/quickstart.c b/docs/examples/quickstart/quickstart.c new file mode 100644 index 000000000..d437bc4f3 --- /dev/null +++ b/docs/examples/quickstart/quickstart.c @@ -0,0 +1,39 @@ +// quickstart.c + +// This header file is the entrypoint to the HPy API: +#include "hpy.h" + +// HPy method: the HPyDef_METH macro generates some boilerplate code, +// the same code can be also written manually if desired +HPyDef_METH(say_hello, "say_hello", HPyFunc_NOARGS) +static HPy say_hello_impl(HPyContext *ctx, HPy self) +{ + // Methods take HPyContext, which must be passed as the first argument to + // all HPy API functions. Other than that HPyUnicode_FromString does the + // same thing as PyUnicode_FromString. + // + // HPy type represents a "handle" to a Python object, but may not be + // a pointer to the object itself. It should be fully "opaque" to the + // users. Try uncommenting the following two lines to see the difference + // from PyObject*: + // + // if (self == self) + // HPyUnicode_FromString(ctx, "Surprise? Try HPy_Is(ctx, self, self)"); + + return HPyUnicode_FromString(ctx, "Hello world"); +} + +static HPyDef *QuickstartMethods[] = { + &say_hello, // 'say_hello' generated for us by the HPyDef_METH macro + NULL, +}; + +static HPyModuleDef quickstart_def = { + .doc = "HPy Quickstart Example", + .defines = QuickstartMethods, +}; + +// The Python interpreter will create the module for us from the +// HPyModuleDef specification. Additional initialization can be +// done in the HPy_mod_execute slot +HPy_MODINIT(quickstart, quickstart_def) diff --git a/docs/examples/quickstart/setup.py b/docs/examples/quickstart/setup.py new file mode 100644 index 000000000..0d56aefec --- /dev/null +++ b/docs/examples/quickstart/setup.py @@ -0,0 +1,13 @@ +# setup.py + +from setuptools import setup, Extension +from os import path + +DIR = path.dirname(__file__) +setup( + name="hpy-quickstart", + hpy_ext_modules=[ + Extension('quickstart', sources=[path.join(DIR, 'quickstart.c')]), + ], + setup_requires=['hpy'], +) diff --git a/docs/examples/tests.py b/docs/examples/tests.py index 53a82cda5..a796a4874 100644 --- a/docs/examples/tests.py +++ b/docs/examples/tests.py @@ -43,6 +43,12 @@ def test_simple_type(): assert p.z == 2000 +def test_quickstart(): + import quickstart + assert quickstart.say_hello() == "Hello world" +# END: test_quickstart + + def test_builtin_type(): obj = builtin_type.Dummy("hello") assert obj == "hello" diff --git a/docs/index.rst b/docs/index.rst index 6a48d3aa2..32e5cdde9 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -41,8 +41,9 @@ Where to go next: - Show me the code: - - :ref:`Simple HPy extension example` - - :doc:`Example of porting a sample Python/C API extension to HPy` + - :doc:`Quickstart` + - :ref:`Simple documented HPy extension example` + - :doc:`Tutorial: porting Python/C API extension to HPy` - Details: @@ -58,6 +59,7 @@ Full table of contents: .. toctree:: :maxdepth: 2 + quickstart overview api porting-guide diff --git a/docs/quickstart.rst b/docs/quickstart.rst new file mode 100644 index 000000000..7bf9ab13c --- /dev/null +++ b/docs/quickstart.rst @@ -0,0 +1,52 @@ +HPy Quickstart +============= + +This section shows how to quickly get started with HPy by creating +a simple HPy extension from scratch. + +Install HPy: + +.. + This should be updated to pip install hpy once this version is released + +.. code-block:: console + + python3 -m pip install git+https://github.com/hpyproject/hpy.git#egg=hpy.universal + +Create a new directory for the new HPy extension. Location and name of the directory +do not matter. Add the following two files: + +.. literalinclude:: examples/quickstart/quickstart.c + +.. literalinclude:: examples/quickstart/setup.py + :language: python + +Build the extension: + +.. code-block:: console + + python3 setup.py --hpy-abi=universal develop + +Try it out -- start Python console in the same directory and type: + +.. literalinclude:: examples/tests.py + :start-after: test_quickstart + :end-before: # END: test_quickstart + +Notice the shared library that was created by running ``setup.py``: + +.. code-block:: console + + > ls *.so + quickstart.hpy0.so + +It does not have Python version encoded in it. If you happen to have +GraalPy or PyPy installation that supports given HPy version, you can +try running the same extension on it. For example, start +``$GRAALVM_HOME/bin/graalpy`` in the same directory and type the same +Python code: the extension should load and work just fine. + +Where to go next? + + - :ref:`Simple documented HPy extension example` + - :doc:`Tutorial: porting Python/C API extension to HPy` From 78e558603fd17ab931b918b3a268bea69113f572 Mon Sep 17 00:00:00 2001 From: Stepan Sindelar Date: Wed, 1 Feb 2023 11:59:21 +0100 Subject: [PATCH 6/7] Apply code review suggestions Co-authored-by: Matti Picus --- docs/api-reference/inline-helpers.rst | 5 +++-- docs/index.rst | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/api-reference/inline-helpers.rst b/docs/api-reference/inline-helpers.rst index 1e7a1defb..2e719caf7 100644 --- a/docs/api-reference/inline-helpers.rst +++ b/docs/api-reference/inline-helpers.rst @@ -6,8 +6,9 @@ Those functions are usually small convenience functions that everyone could write but in order to avoid duplicated effort, they are defined by HPy. One category of inline helpers are functions that convert the commonly used -but not fixed width C types, such as ``int``, or ``long long``, to HPy API -that uses only fixed with types in order to ensure maximal ABI compatibility. +but not fixed width C types, such as ``int``, or ``long long``, to HPy API. +The HPy API always uses well-defined fixed width types like ``int32`` or +``unsigned int8``. .. autocmodule:: hpy/inline_helpers.h :members: diff --git a/docs/index.rst b/docs/index.rst index 32e5cdde9..addcb5aae 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -8,7 +8,7 @@ HPy: a better API for Python HPy provides a new API for extending Python in C. -There are several advantages to write your C extension in HPy: +There are several advantages to writing C extensions in HPy: - **Speed**: it runs much faster on PyPy, GraalPy, and at native speed on CPython From 70a8a0a8c40e8722a9cac6a0b2b168c959f7c4da Mon Sep 17 00:00:00 2001 From: stepan Date: Wed, 1 Feb 2023 12:01:25 +0100 Subject: [PATCH 7/7] Fix in trace and debug mode docs --- docs/debug-mode.rst | 2 +- docs/quickstart.rst | 2 +- docs/trace-mode.rst | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/debug-mode.rst b/docs/debug-mode.rst index 8e465dd09..0f59bc948 100644 --- a/docs/debug-mode.rst +++ b/docs/debug-mode.rst @@ -43,7 +43,7 @@ Debug mode works *only* for extensions built with HPy universal ABI. To enable debug mode, use environment variable ``HPY``. If ``HPY=debug``, then all HPy modules are loaded with the trace context. Alternatively, it is also possible to specify the mode per module like this: -``HPY=modA:debug,modB=debug``. +``HPY=modA:debug,modB:debug``. In order to verify that your extension is being loaded in debug mode, use environment variable ``HPY_LOG``. If this variable is set, then all HPy diff --git a/docs/quickstart.rst b/docs/quickstart.rst index 7bf9ab13c..b526552e7 100644 --- a/docs/quickstart.rst +++ b/docs/quickstart.rst @@ -1,5 +1,5 @@ HPy Quickstart -============= +============== This section shows how to quickly get started with HPy by creating a simple HPy extension from scratch. diff --git a/docs/trace-mode.rst b/docs/trace-mode.rst index 704b033a8..35b5e409f 100644 --- a/docs/trace-mode.rst +++ b/docs/trace-mode.rst @@ -19,7 +19,7 @@ Similar to how the :ref:`debug mode is activated `, use environment variable ``HPY``. If ``HPY=trace``, then all HPy modules are loaded with the trace context. Alternatively, it is also possible to specify the mode -per module like this: ``HPY=modA:trace,modB=trace``. +per module like this: ``HPY=modA:trace,modB:trace``. Environment variable ``HPY_LOG`` also works.