# `ctypes` Tips #

[`ctypes`](https://docs.python.org/3/library/ctypes.html) is a very handy tool for building Python wrappers for shared libraries written for C or C++. In most cases, it is probably preferable to use this, rather than write an *extension module* in C or C++ to provide the Python API: it can take a lot of code to implement the necessary C/C++ wrappers to represent Python objects and methods, while this can usually be done directly in Python with a fraction of the effort.

While the documentation for `ctypes` is quite comprehensive, there are a few subtle points that might not be clear.

A Python wrapper will typically need a lot of things from the `ctypes` module. Its own documentation page uses wildcard imports in the examples, which I prefer to avoid. Instead, I reference its exports by importing the module under a shorter name:

In [None]:
import ctypes as ct

## Load The Runtime Library, Not The Development Library ##

Consider the following directory entries currently on my Debian system for the [Cairo](https://cairographics.org/) graphics library:

    /usr/lib/x86_64-linux-gnu/libcairo.so -> libcairo.so.2.11600.0
    /usr/lib/x86_64-linux-gnu/libcairo.so.2 -> libcairo.so.2.11600.0
    /usr/lib/x86_64-linux-gnu/libcairo.so.2.11600.0

As you can see, there are 3 separate names for the same file. Which one should you use?

The answer is, use the name `libcairo.so.2`. The unversioned name comes from the *development* package:

    > dpkg-query -S /usr/lib/x86_64-linux-gnu/libcairo.so
    libcairo2-dev:amd64: /usr/lib/x86_64-linux-gnu/libcairo.so

while the versioned names come from the *runtime* package:

    > dpkg-query -S /usr/lib/x86_64-linux-gnu/libcairo.so.2
    libcairo2:amd64: /usr/lib/x86_64-linux-gnu/libcairo.so.2

So, in a wrapper for Cairo, you would load the library using something like

    cairo = ct.cdll.LoadLibrary("libcairo.so.2")

You only need to care about the first numeric component of the version, since that is the one incremented for any ABI changes (which might necessitate changes to your wrapper).

While having the development package installed is useful while you are developing your wrapper (being able to refer to the include files for information, etc), you should only require your users to have the runtime package in order to be able to run scripts that use your wrapper. Of course, they, too might find the development package useful when writing such scripts. But let that be their choice.

This only applies to distros like Debian which publish their packages in precompiled binary form. In ones like Gentoo, where users install everything from source, there is no distinction between “development” and “runtime” packages.

## Pointer Tips ##

Most pointer types (all the ones constructed with `POINTER()`) are subclasses of `_Pointer` and share some common properties. The exceptions are `c_void_p` and `c_char_p`, discussed separately below.

These common pointer types have a `contents` attribute you can use to dereference the pointer. This returns a reference to the `ctypes` object holding the value, and can be assigned to to point it to a different object. Non-pointer `ctypes` types in turn have a `value` attribute that you can use to access or change the value.

Alternatively, you can index the pointer as though it were an array to directly access the value pointed to, so `p[0]` is equivalent to `p.contents.value`, and this works for both getting the current value and assigning a new value.

In [None]:
i1 = ct.c_int(3)
p1 = ct.pointer(i1)
p2 = ct.POINTER(ct.c_int)(ct.c_int(3))
print(p1.contents, p1.contents.value, p2.contents.value, p2[0])
p2[0] = 5
print(p1.contents, p2.contents)


Note that dereferencing a `NULL` pointer (whether via indexing or by looking at the `contents` attribute) is not allowed (though you can assign to the `contents` attribute of a `NULL` pointer to make it non-`NULL`). Interestingly, you cannot seem to assign a `NULL` value to an existing pointer object—at least, not directly.

You can check for a `NULL` pointer by treating the pointer as a `bool`:

In [None]:
import sys

p3 = ct.POINTER(ct.c_int)()

for p, name in ((p1, "p1"), (p2, "p2"), (p3, "p3")) :
    sys.stdout.write(name + " ")
    if bool(p) :
        sys.stdout.write("has contents %d" % p[0])
    else :
        sys.stdout.write("is NULL")
    #end if
    sys.stdout.write("\n")
#end for

The `contents` attribute can be used to selectively access parts of the value being pointed to, without copying the whole thing:

In [None]:
class MyStruct(ct.Structure) :
    _fields_ = \
        [
            ("field1", ct.c_int),
            ("field2", ct.c_int),
        ]
#end MyStruct

p1 = ct.POINTER(MyStruct)(MyStruct(4, 5))
p2 = ct.POINTER(MyStruct)(p1.contents) # sharing same struct

print("before:", p2.contents.field1, p2.contents.field2)
p1.contents.field1 = 6
print("after:", p2.contents.field1, p2.contents.field2)


## `c_void_p` ##

The `ctypes` explanation of `c_void_p` (the untyped pointer) is that the Python type is `int` or `None`.  When creating a `c_void_p`, you can pass an integer for the address (including 0 for `NULL`), or you can pass `None` as an alternative for `NULL`. But when getting back one of these, the 0 or `NULL` address is always converted to `None`:

In [None]:
p1 = ct.c_void_p(3)
p2 = ct.c_void_p(0)
print(p1.value, p2.value)

Note that `c_void_p` has no `contents` attribute for pointer dereferencing.

## Getting Field Offsets ##

For some reason, the documentation page doesn’t currently mention any equivalent of the C `offsetof` construct, even though `ctypes` does support this. You get it via the `offset` attribute of a field, itself accessed as an attribute of the structure definition. You can also get the field size in a similar way, e.g.

In [None]:
class Fields1(ct.Structure) :
    _fields_ = \
        [
            ("field1", ct.c_int),
            ("field2", ct.c_int),
        ]
#end Fields1

class Fields2(ct.Structure) :
    _fields_ = \
        [
            ("field1", ct.c_int),
            ("field2", ct.c_double),
        ]
#end Fields2

Fields1.field2.offset, Fields1.field2.size, Fields2.field2.offset, Fields2.field2.size

## Getting Addresses Of Python Objects ##

Sometimes you want to pass the address of the data inside a Python object directly to a library routine, to save copying data back and forth. This is particularly useful for Python objects of type `bytes` and `bytearray`, as well as arrays created with the [`array`](https://docs.python.org/3/library/array.html) module. This has to be done in slightly different ways for these different objects.

To demonstrate this, I will make calls to the low-level `libc` [`memcpy`(3)](https://linux.die.net/man/3/memcpy) routine to copy data between Python objects:

In [None]:
libc = ct.cdll.LoadLibrary("libc.so.6")
libc.memcpy.restype = ct.c_void_p
libc.memcpy.argtypes = (ct.c_void_p, ct.c_void_p, ct.c_size_t) # dst, src, count

For a `bytes` object, a simple `cast` is sufficient to obtain the address of the data:

In [None]:
b1 = b"some:text"
b2 = b"other text"
print(b1, b2)
b1adr = ct.cast(b1, ct.c_void_p).value
b2adr = ct.cast(b2, ct.c_void_p).value
libc.memcpy(b2adr, b1adr, 5)
print(b1, b2)

For a `bytearray`, things are slightly more involved.

In [None]:
b1 = bytearray(b"different text")
b1adr = ct.addressof((ct.c_ubyte * len(b1)).from_buffer(b1))
libc.memcpy(b2adr, b1adr, 6)
print(b1, b2)

By the way, you can’t use this technique on `bytes`; it appears this only works on *mutable* objects.

[`array`](https://docs.python.org/3/library/array.html) arrays have a `buffer_info()` method which returns the address and length of the underlying memory buffer. While this still works, it is apparently deprecated. So the same trick works as for `bytearray`s:

In [None]:
import array
b1 = array.array("B", b"yet other text")
b1adr = ct.addressof((ct.c_ubyte * len(b1)).from_buffer(b1))
libc.memcpy(b2adr, b1adr, 7)
print(b1.tobytes(), b2)

Casting can be used to create a pointer to a `ctypes` array type.

In [None]:
b = bytearray(b"some text")
b1 = (ct.c_ubyte * 0).from_buffer(b)

In this case, I have set the array length to 0, which prevents me from using `b1` directly to access any of the bytes in `b`, but a pointer constructed from `b1` is not so constrained:

In [None]:
p = ct.cast(b1, ct.POINTER(ct.c_ubyte))
[chr(c) for c in p[0:3]]

Because the original Python object is mutable, `ctypes` allows me to use the pointer to assign to its components from within Python (this would not be allowed for a pointer into a `bytes` object, for example):

In [None]:
p[5] = ord("z")
b

Of course, external libraries are not going to respect Python’s access-control mechanisms.

## `c_char` And `c_char_p` ##

A `c_char_p` is not quite equivalent to `ct.POINTER(c_char)`; it is assumed to point to a *null-terminated* array of `c_char`. Accessing the `value` attribute returns the data up to, but not including, the terminating null:

In [None]:
b = b"hello\0 there"
ct.cast(b, ct.c_char_p).value

Note you cannot assign to the `value` or `contents` of a `c_char_p` (this silently reallocates the buffer to hold the new value):

In [None]:
ct.cast(b, ct.c_char_p).contents = b"text"
b

But you can to the `value` of an _array_ of `c_char` (note the extra null inserted after the value):

In [None]:
ct.cast(b, ct.POINTER(len(b) * ct.c_char))[0][0:4] = (4 * ct.c_char)(*list(b"text"))
b

Here’s a similar thing done to a `bytearray` instead of a `bytes` object:

In [None]:
b = bytearray(b"hello\0 there")
(len(b) * ct.c_char).from_buffer(b).value = b"tex"
b

## Array Conversions ##

Conversion of a ctypes array (at least of simple element types) to a Python sequence is quite straightforward:

In [None]:
c_arr = (3 * ct.c_int)(5, 4, 3)
list(c_arr)

Conversion the other way is slightly more involved:

In [None]:
arr = [8, 7, 6]
c_arr = (len(arr) * ct.c_int)(*arr)
c_arr, list(c_arr)

## Calling Varargs Routines ##

Some C routines take variable numbers of arguments. `ctypes` does not directly support calling such routines with differing numbers of arguments, but it is possible to do so by dynamically constructing an appropriately-typed entry-point object, with suitable conversions from the types of the Python objects passed as actual arguments. Again, `ctypes` does not provide direct access to the class constructor for doing this, but it can be done by cloning an existing entry point.

For example, consider the [`snprintf`(3)](https://manpages.debian.org/1/snprintf.3.html) function, which has a prototype that looks like this:

    int snprintf(char str[restrict .size], size_t size,
               const char *restrict format, ...);

Sure, this is not something you would normally want to call from Python, given that the [`%`-operator](https://docs.python.org/3/library/stdtypes.html#old-string-formatting) already gives you access to the full range of `printf`-style formatting features, and more. But dealing with this routine helps to illustrate the general technique.

We can define the fixed part of the prototype like this:

In [None]:
libc.snprintf.restype = ct.c_int
libc.snprintf.argtypes = (ct.c_char_p, ct.c_size_t, ct.c_char_p)

but clearly that is not sufficient to actually make calls to this routine. However, using that prototype as a starting point, here is a Python wrapper that facilitates making such calls, by dynamically constructing the argument types for a copy of the entry point:

In [None]:
typemappings = \
    { # mapping from Python types to ctypes types
        int : ct.c_int,
        float : ct.c_double,
        # add more if you like, but more complex types
        # will likely require special handling
    }

def snprintf(strsize, format, vals) :
    "invokes snprintf with the specified format and parameters, returning" \
    " a string of up to strsize bytes."
    if not all(any(type(v) == t for t in typemappings) for v in vals) :
        raise TypeError("unsupported type present in vals")
        # todo: could give more detail about which element(s) of vals are at fault
    #end if
    result = bytes((0,) * (strsize + 1))
    basefunc = libc.snprintf
      # fixed part of type info already set up
    func = type(basefunc).from_address(ct.addressof(basefunc))
      # same entry point address, but can have entirely different arg/result types
    func.restype = basefunc.restype # keep same result type
    all_arg_types = list(basefunc.argtypes) # start with fixed part of arglist
    c_format = format.encode()
    all_args = [result, strsize + 1, c_format] # actual args for fixed part of arglist
    for val in vals : # construct rest of arglist
        cvaltype = typemappings[type(val)]
        all_args.append(cvaltype(val)) # converted argument value
        all_arg_types.append(cvaltype) # corresponding converted argument type
    #end for
    func.argtypes = all_arg_types
    result_len = func(*all_args) # includes trailing null iff output is truncated
    return result[:min(result_len, strsize)].decode() # exclude trailing null
#end snprintf

A sample call to this routine might look like:

In [None]:
a = 1
b = 2
snprintf(25, "The sum of %d and %d is %d.", [a, b, a + b])

`printf` functions can also accept strings. Some other varargs functions may want a trailing `NULL` argument to indicate the end of the argument list. Dealing with both of these cases is left as an exercise for the reader.

## Example: Wrapper For Cairo ##

In order to offer specific examples, the following discussion will refer to details of the implementation of **Qahirah**, my Python wrapper for the [**Cairo**](http://cairographics.org/) graphics library. This can be obtained from [GitLab](https://gitlab.com/ldo/qahirah) or [GitHub](https://github.com/ldo/qahirah).

## Reference-Counted Objects ##

Some libraries have their own systems for keeping track of reference counts of objects. It’s probably easier to avoid library objects having a reference count greater than 1; just let Python itself keep track of references to your Python wrapper object, and have your `__del__()` method dispose of the library object. Then the library object automatically exists for as long as your Python wrapper object exists.

## Round-Trip Object Mapping ##

The underlying library may implement several different types of objects, and objects of one type may hold references to objects of another type. There will likely be API calls not only to set such references to particular objects, but also to retrieve the current object references from the referencing objects. How do you map this back to Python?

For example, in Cairo, a drawing `Context` can have a current _source_ for computing rendered pixel values, which is a reference to a `Pattern`. You have a `set_source` call to set the source to a specified `Pattern`, and you have a `get_source` call to retrieve a reference to the last-set `Pattern`.

It is easy enough to wrap the `set_source` call in an equivalent Python method on the `Context` wrapper, which takes a `Pattern` wrapper and extracts and passes both underlying objects to the library call. But what about `get_source`? Ideally, if you already have a Python wrapper object for that `Pattern`, it would be good to return that, rather than create a new wrapper.

One way to do this would be to have a separate field in your `Context` wrapper which saves the last-set `Pattern` wrapper when the `set_source` method is called, then your `get_source` method can “cheat” and, rather than calling the Cairo routine at all, it simply returns the value of this field. This has some consequences I find undesirable:
* it means the wrapper object has to be kept around, even if the user doesn’t need it.
* it stands the risk of getting out of sync, if the user makes a direct call to the Cairo `set_source` routine via some other way.

There is a more elegant solution, where you keep a class variable in the referenced class (`Pattern`, in this case) which holds a mapping from underlying object references to the corresponding Python objects; then the `get_source` method can simply look up the returned value in this dictionary; if an entry already exists, it can return that existing wrapper, otherwise it can automatically construct a new wrapper, add it to the dictionary, and return it.

A further twist is to make this dictionary a `WeakValueDictionary`. That way, entries in it only exist for as long as the caller has other references to those Python wrapper objects; if the user gets rid of their own references, then they disappear from this mapping as well. This keeps memory usage for unwanted objects from growing without bounds.

In order to make this work, the constructor for the wrapper object has to be implemented entirely within the `__new__()` method, rather than using `__init__()`. This way, you can control whether a new wrapper object is created or not.


## Constructors Versus Create Methods ##

Related to the previous topic, I find it best to reserve the constructors for my wrapper classes for internal use, namely to construct wrapper objects around already-created library objects. Then I provide a separate set of `create()` methods (typically defined as `classmethod`s), which do the creation of library objects and call the constructor internally as appropriate, returning the newly-created wrapper for the newly-created library object. This way, the constructor centralizes all the mechanism for mapping underlying library objects to corresponding wrapper objects.


## Avoiding Segfaults With Transient Python Objects ##

There are certain situations where you might construct a `ctypes` object on the fly, directly in a call to a low-level library routine. This can mean that the only reference to the object disappears, and the object is reclaimed, before the library routine gets a chance to use it.

I have found this can happen particularly with callbacks. For example:

    callback_type = ct.CFUNCTYPE(...)

    def my_callback(...) :
        ...
    #end my_callback

    result = library.libroutine(..., callback_type(my_callback), ...)

It is best to avoid this. Instead, keep the callback wrapper constructed by `callback_type()` in a Python variable for at least the duration of the library call:

    my_callback_wrapper = callback_type(my_callback)
    result = library.libroutine(..., my_callback_wrapper, ...)

Further than this, you might be installing the callback for later use, which means the library might try to call it after the function in which `my_callback_wrapper` was defined exits. In this situation, you will need to keep the wrapper object somewhere more permanent, such as an instance variable, where it will stay in existence as long as it is needed:

    self._my_callback_wrapper = callback_type(my_callback)
    result = library.install_callback(..., self._my_callback_wrapper, ...)


## Avoiding Segfaults On Program Exit ##

There is a drawback with custom `__del__()` methods on Python objects, which surfaces at program exit time: **there are no guarantees about the order in which they are called at exit time**. For example, if the underlying library object has already been deleted, then attempts to call disposal routines can cause intermittent segfaults. Of course, at times other than program termination, `__del__()` cleanup should continue to work OK.

The answer is for your wrapper module to install an `atexit` cleanup routine which **deletes the `__del__()` methods from all your classes that have them**. At the time that `atexit` routines are called, the state of things is still guaranteed to be sane; they will only go to pieces afterwards. For example, in my Cairo wrapper above, among the last few lines of code is something like

    def _atexit() :
        for cls in Context, Surface, Device, Pattern, ... etc ... :
            delattr(cls, "__del__")
        #end for
    #end _atexit
    atexit.register(_atexit)

Now, any of these objects that are disposed after this will no longer try to call any `__del__()` methods, which should be OK because everything is going away anyway.