Skip to content

Latest commit

 

History

History
308 lines (223 loc) · 11 KB

guide-porting.rst

File metadata and controls

308 lines (223 loc) · 11 KB

c

single: Porting

Porting – Adding Support for Python 3

After you modernize <guide-modernization> your C extension to use the latest features available in Python 2, it is time to address the differences between Python 2 and 3.

The recommended way to port is keeping single-source compatibility between Python 2 and 3, until support Python 2 can be safely dropped. For Python code, you can use libraries like six and future, and, failing that, if sys.version_info >= (3, 0): blocks for conditional code. For C, the py3c library provides common tools, and for special cases you can use conditional compilation with #if IS_PY3.

To start using py3c, #include <py3c.h>, and instruct your compiler to find the header.

double: Porting; Strings double: Porting; Bytes double: Porting; Unicode

The Bytes/Unicode split

The most painful change for extension authors is the bytes/unicode split: unlike Python 2's str or C's char*, Python 3 introduces a sharp divide between human-readable strings and binary data. You will need to decide, for each string value you use, which of these two types you want.

Make the division as sharp as possible: mixing the types tends to lead to utter chaos. Functions that take both Unicode strings and bytes (in a single Python version) should be rare, and should generally be convenience functions in your interface; not code deep in the internals.

However, you can use a concept of native strings: a type that corresponds to the str type in Python: PyBytes on Python 2, and PyUnicode in Python 3. This is the type that you will need to return from functions like __str__ and __repr__. Using the **native string** extensively is suitable for conservative projects: it affects the semantics under Python 2 as little as possible, while not requiring the resulting Python 3 API to feel contorted. With py3c, functions for the native string type are PyStr_* (PyStr_FromString, PyStr_Type, PyStr_Check, etc.). They correspond to `PyString <https://docs.python.org/2/c-api/string.html>`_ on Python 2, and `PyUnicode <https://docs.python.org/3/c-api/unicode.html>`_ on Python 3. The supported API is the intersection of `PyString_* <https://docs.python.org/2/c-api/string.html>`_ and `PyUnicode_* <https://docs.python.org/3/c-api/unicode.html>`_, except PyStr_Size (see below) and the deprecated PyUnicode_Encode; additionally `PyStr_AsUTF8String <https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AsUTF8String>`_ is defined. Keep in mind py3c expects that native strings are always encoded withutf-8under Python 2. If you use a different encoding, you will need to convert between bytes and text manually. For binary data, use PyBytes_* (PyBytes_FromString, PyBytes_Type, PyBytes_Check, etc.). Python 3.x provides them under these names only; in Python 2.6+ they are aliases of PyString_*. (For even older Pythons, py3c also provides these aliases.) The supported API is the intersection of `PyString_* <https://docs.python.org/2/c-api/string.html>`_ and `PyBytes_* <https://docs.python.org/3/c-api/bytes.html>`_, Porting mostly consists of replacing "PyString" to either "PyStr" or "PyBytes"; just see the caveat about size below. To summarize the four different string type names: ============ ============= ============== =================== String kind py2 py3 Use ============ ============= ============== =================== PyStr_* PyString_* PyUnicode_* Human-readable text PyBytes_* PyString_* ✔ Binary data PyUnicode_* ✔ ✔ Unicode strings PyString_* ✔ error In unported code ============ ============= ============== =================== .. index:: double: Porting; String Size String size ~~~~~~~~~~~ When dealing with Unicode strings, the concept of “size” is tricky, since the number of characters doesn't necessarily correspond to the number of bytes in the string's UTF-8 representation. To prevent subtle errors, this library does *not* provide a PyStr_Size function. Instead, use PyStr_AsUTF8AndSize. This functions like Python 3's `PyUnicode_AsUTF8AndSize <https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AsUTF8AndSize>`_, except under Python 2, the string is not encoded (as it should already be in UTF-8), the size pointer must not be NULL, and the size may be stored even if an error occurs. .. index:: double: Porting; Ints double: Porting; Long Ints ~~~~ While string type is split in Python 3, the int is just the opposite:intandlongwere unified.PyInt`` is gone and only ``PyLong_remains (and, to confuse things further, PyLong is named "int" in Python code). The py3c headers alias PyInt to PyLong, so if you're using them, there's no need to change at this point. .. index:: double: Porting; Argument parsing double: Porting; PyArg_Parse double: Porting; Py_BuildValue Floats ~~~~~~ In Python 3, the function :c:func:`PyFloat_FromString <PyFloat_FromString>` lost its second, ignored argument. The py3c headers redefine the function to take one argument even in Python 2. You will need to remove the excess argument from all calls. Argument Parsing ~~~~~~~~~~~~~~~~ The format codes for argument-parsing functions of the PyArg_Parse family have changed somewhat. In Python 3, thes,z,es,es#andU(plus the newC) codes accept only Unicode strings, whilecandSonly accept bytes. Formats accepting Unicode strings usually encode to char* using UTF-8. Specifically, these ares,s*,s#,z,z*,z#, and alsoes,et,es#, andet#when the encoding argument is set to NULL. In Python 2, the default encoding was used instead. There is no variant ofzfor bytes, which means htere's no built-in way to accept "bytes or NULL" as achar*. If you need this, write anO&converter. Python 2 lacks anycode, which, in Python 3, works on byte objects. The use cases needingbytesin Python 3 andstrin Python 2 should be rare; if needed, use#ifdef IS_PY3to select a compatible PyArg_Parse call. .. XXX: Write an O& converter for "z" and "y" XXX: Write/document handling pathnames safely and portably; see PyUnicode_FSConverter/PyUnicode_FSDecoder Compare the `Python 2 <https://docs.python.org/2/c-api/arg.html>`_ and `Python 3 <https://docs.python.org/3/c-api/arg.html>`_ docs for full details. .. index:: double: Porting; Module Initialization Defining Extension Types ~~~~~~~~~~~~~~~~~~~~~~~~ If your module defines extension types, i.e. variables of typePyTypeObject(and related structures likePyNumberMethodsandPyBufferProcs), you might need to make changes to these definitions. Please read the :doc:`Extension types <ext-types>` guide for details. A common incompatibility comes from type flags, like :data:`Py_TPFLAGS_HAVE_WEAKREFS` and :data:`Py_TPFLAGS_HAVE_ITER`, which are removed in Python 3 (where the functionality is always present). If you are only using these flags in type definitions, (and *not* for example in :c:func:`PyType_HasFeature`), you can include<py3c/tpflags.h>to define them to zero under Python 3. For more information, read the :ref:`Type flags <tpflags>` section. Module initialization ~~~~~~~~~~~~~~~~~~~~~ The module creation process was overhauled in Python 3. py3c provides a compatibility wrapper so most of the Python 3 syntax can be used. PyModuleDef and PyModule_Create ------------------------------- Module object creation with py3c is the same as in Python 3. First, create a PyModuleDef structure:: static struct PyModuleDef moduledef = { PyModuleDef_HEAD_INIT, /* m_base */ "spam", /* m_name */ NULL, /* m_doc */ -1, /* m_size */ spam_methods /* m_methods */ }; Then, where a Python 2 module would have :: m = Py_InitModule3("spam", spam_methods, "Python wrapper ..."); use instead :: m = PyModule_Create(&moduledef); Form_size, use -1. (If you are sure the module supports multiple subinterpreters, you can use 0, but this is tricky to achieve portably.) Additional members of the PyModuleDef structure are not accepted under Python 2. See `Python documentation <https://docs.python.org/3/c-api/module.html#initializing-c-modules_>`_ for details on PyModuleDef and PyModule_Create. Module creation entrypoint -------------------------- Instead of thevoid init<name>function in Python 2, or a Python3-stylePyObject *PyInit<name>function, use the MODULE_INIT_FUNC macro to define an initialization function, and return the created module from it:: MODULE_INIT_FUNC(name) { ... m = PyModule_Create(&moduledef); ... if (error) { return NULL; } ... return m; } The File API ~~~~~~~~~~~~ The :c:type:`PyFile <py2:PyFileObject>` API was severely reduced :c:func:`in Python 3 <py3:PyFile_FromFd>`. The new version is specifically intended for internal error reporting in Python. Native Python file objects are officially no longer backed byFILE*. Use the Python API from the :py:mod:`py3:io` module instead of handling files in C. The Python API supports all kinds of file-like objects, not just built-in files – though, admittedly, it's cumbersome to use from plain C. If you really need to access an API that deals withFILE*only (e.g. for debugging), see py3c's limited :doc:`file API shim <fileshim>`. Other changes ~~~~~~~~~~~~~ If you find a case where py3c doesn't help, use#if IS_PY3to include code for only one or the other Python version. And if your think others might have the same problem, consider contributing a macro and docs to py3c! .. index:: Building, ABI tags Building ~~~~~~~~ When building your extension, note that Python 3.2 introduced ABI version tags (`PEP 3149 <https://www.python.org/dev/peps/pep-3149/>`_), which can be added to shared library filenames to ensure that the library is loaded with the correct Python version. For example, instead offoo.so, the shared library for the extension modulefoomight be namedfoo.cpython-33m.so. Your buildsystem might generate these for you already, but if you need to modify it, you can get the tags fromsystonfig``:

>>> import sysconfig
>>> sysconfig.get_config_var('EXT_SUFFIX')
'.cpython-34m.so'
>>> sysconfig.get_config_var('SOABI')
'cpython-34m'

This is completely optional; the old filenames without ABI tags are still valid.

Done!

Do your tests now pass under both Python 2 and 3? (And do you have enough tests?) Then you're done porting!

Once you decide to drop compatibility with Python 2, you can move to the Cleanup <guide-cleanup> section.