# Low-level hackery

One function I've used without much comment is `numpy.frombuffer`, which lets us wrap arbitrary regions of memory as Numpy arrays. We can "peek" at any memory we want; we can also "poke" it, changing values, byte by byte.

Consider, for instance, a byte string. These are immutable (cannot be changed) in Python:

In [1]:
hello = b"Hello, world!"

In [2]:
try:
    hello[4:8] = b"????"
except TypeError as err:
    print("Nope: " + str(err))

Nope: 'bytes' object does not support item assignment


In [3]:
import numpy
a = numpy.frombuffer(hello, dtype=numpy.uint8)
a

array([ 72, 101, 108, 108, 111,  44,  32, 119, 111, 114, 108, 100,  33],
      dtype=uint8)

In [4]:
a.view("S1")

array([b'H', b'e', b'l', b'l', b'o', b',', b' ', b'w', b'o', b'r', b'l',
       b'd', b'!'], dtype='|S1')

By default, Numpy tries to protect you from doing evil things.

In [5]:
try:
    a[4:8] = [69, 86, 73, 76]
except ValueError as err:
    print("Nope: " + str(err))

Nope: assignment destination is read-only


But this is Python: we can shoot our foot if we want to.

In [6]:
a.flags.writeable = True

In [7]:
a[4:8] = [69, 86, 73, 76]

In [8]:
hello

b'HellEVILorld!'

This messes with Python's internal data model.

In [9]:
hello = b"Hello, world!"
a = numpy.frombuffer(hello, dtype=numpy.uint8)
a.flags.writeable = True
a[4:8] = [69, 86, 73, 76]
print(hello == b"Hello, world!")

False


In [10]:
exec("""
hello = b"Hello, world!"
a = numpy.frombuffer(hello, dtype=numpy.uint8)
a.flags.writeable = True
a[4:8] = [69, 86, 73, 76]
print(hello == b"Hello, world!")
""")

True


(The second example was interpreted as a `.pyc` script, in which all instances of the literal `b"Hello, world!"` were replaced by a single object: modifying that object in line 4 changed it in line 5!)

With the help of ctypes, a built-in Python library, Numpy can wrap any address at all. (Some will cause segmentation faults, so be careful!)

In [11]:
x = 12345

In [12]:
import ctypes
import sys

ptr = ctypes.cast(id(x), ctypes.POINTER(ctypes.c_uint8))
a = numpy.ctypeslib.as_array(ptr, (sys.getsizeof(x),))
a

array([  1,   0,   0,   0,   0,   0,   0,   0,  32, 165, 225,  27, 240,
        95,   0,   0,   1,   0,   0,   0,   0,   0,   0,   0,  57,  48,
         0,   0], dtype=uint8)

Do you see it? We're looking at a Python object header, a pointer to the `int` type (also a Python object), and then the number itself. Here's a hint: it's the last four bytes.

In [13]:
a[-4:].view(numpy.int32)

array([12345], dtype=int32)

Let's try a string.

In [14]:
y = "Hey there."
ptr = ctypes.cast(id(y), ctypes.POINTER(ctypes.c_uint8))
a = numpy.ctypeslib.as_array(ptr, (sys.getsizeof(y),))
a

array([  2,   0,   0,   0,   0,   0,   0,   0,  32, 113, 225,  27, 240,
        95,   0,   0,  10,   0,   0,   0,   0,   0,   0,   0,  80, 211,
        73,  49,  70,  52, 185,   3, 228,   9, 101,   6, 101,  10, 160,
        11,   0,   0,   0,   0,   0,   0,   0,   0,  72, 101, 121,  32,
       116, 104, 101, 114, 101,  46,   0], dtype=uint8)

In [15]:
a[-11:].tostring()

b'Hey there.\x00'

By wrapping a pointer as a Numpy array (and maybe setting `array.flags.writeable = True`), we can do anything.

We can break anything.

Seriously, just letting `numpy.ctypeslib.as_array(0, (8,))` repr itself on the screen would cause a segmentation fault. 

Is there anything useful we can do with this power?

The original purpose of the ctypes library was to run code in compiled C libraries (not C++, that's much more complicated).

For a busy data analyst, the advantage of that is that you don't need to write (or wait for) official bindings to use a C library.

In [23]:
import ctypes
libc = ctypes.cdll.LoadLibrary("/lib64/libc.so.6")

We can run arbitrary functions from `libc.so.6`, but we have to tell Python what its argument types are (that's not stored in the shared object file).

In [30]:
libc.malloc.argtypes = (ctypes.c_size_t,)                      # argument types (only one)
libc.malloc.restype = ctypes.POINTER(ctypes.c_double)          # return type

In [31]:
ptr = libc.malloc(100 * numpy.dtype(numpy.float64).itemsize)   # pass number of bytes
ptr

<__main__.LP_c_double at 0x76a2f0a58d90>

In [32]:
a = numpy.ctypeslib.as_array(ptr, (100,))
a

array([6.44471624e-310, 6.44471624e-310, 5.21164549e-310, 5.21164549e-310,
       6.44470069e-310, 5.43472210e-323, 0.00000000e+000, 3.95252517e-323,
       6.44471342e-310, 0.00000000e+000, 4.44659081e-323, 6.44470068e-310,
       5.21164510e-310, 0.00000000e+000, 0.00000000e+000,             nan,
       0.00000000e+000, 5.16069376e-310, 3.55727265e-322, 5.43472210e-323,
       6.44470068e-310, 5.21164510e-310, 1.48219694e-323, 1.18575755e-322,
                   nan, 0.00000000e+000, 6.44471342e-310, 7.11454530e-322,
       5.43472210e-323, 6.44470068e-310, 5.21164510e-310, 0.00000000e+000,
       1.38338381e-322,             nan, 0.00000000e+000, 4.94065646e-324,
       1.06718180e-321, 5.43472210e-323, 6.44470068e-310, 5.21164510e-310,
       0.00000000e+000, 1.38338381e-322,             nan, 4.94065646e-323,
       4.94065646e-324, 1.42290906e-321, 3.95252517e-323, 6.44470068e-310,
       5.21164510e-310, 0.00000000e+000, 5.92878775e-323,             nan,
       0.00000000e+000, 4

In [35]:
a[:] = 0.0
a

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [36]:
a.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

We've just used low-level `libc.malloc` to allocate data for an array. This array doesn't "own" its memory, so it doesn't go away when the Python object (`a`) gets garbage collected. We'd have to call `libc.free`, like any C programmer.

Why do this? I've used it to allocate arrays on NUMA hardware and Knight's Landing MCDRAM. Specialty memory allocations will probably get more important, not less, as architectures get more heterogeneous.

(No reason to write an entire analysis in C just to get special allocators.)

Another possible reason is to access special devices; device drivers are often written in C and distributed with C programmers in mind, but they may be the only thing between you and an important dataset.

Another example: you can wrap structs. (The snake eats its own tail again.)

In [37]:
class PyObject(ctypes.Structure): pass
PyObject._fields_ = [("ob_refcnt", ctypes.c_size_t),
                     ("ob_type", ctypes.POINTER(PyObject))]

which is equivalent to

```c
struct PyObject {
    size_t ob_refcnt;
    PyObject* ob_type;
    // there's more, but you only need to encode as much as you need...
}
```

In [38]:
hello = b"Hello, world!"

In [39]:
ptr = PyObject.from_address(id(hello))
ptr

<__main__.PyObject at 0x76a2f0a0c158>

In [43]:
ptr.ob_refcnt

1

This `ob_refcnt` is the number of Python references to a given object. There's a way to do it with a `sys` call:

In [46]:
sys.getrefcount(hello)

2

but it's always one too high because you create a reference to pass it to that function!

In [47]:
biglist = [hello] * 1000

In [48]:
ptr.ob_refcnt

1001

In [49]:
sys.getrefcount(hello)

1002

The ctypes library can wrap numbers, strings, pointers, arrays, and structs, which is just about everything you might encounter in C. The reason C++ isn't supported is because C shoehorns its much larger type system into shared object files by "name mangling."

Even if you reverse the name mangling with the `c++filt` program, those names cannot be uniquely identified without interpreting the C++ headers. By that point, you might as well use Cython or pybind11.

**One last example:** the quickest, dirtiest way possible to call out to compiled code:

In [66]:
import os

with open("tmp.cpp", "w") as cfile:
    cfile.write("""
#include <complex>
    
extern "C" {    
    void quick(int height, int width, int maxiterations, double* c, int* fractal) {
        for (int h = 0;  h < height;  h++) {
            for (int w = 0;  w < width;  w++) {
                double creal = c[2 * (h + height*w)];
                double cimag = c[2 * (h + height*w) + 1];
                std::complex<double> ci = std::complex<double>(creal, cimag);
                std::complex<double> z = ci;
                for (int i = 0;  i < maxiterations;  i++) {
                    z = z * z + ci;
                    if (std::abs(z) > 2) {
                        fractal[h + height*w] = i;
                        break;
                    }
                }
            }
        }
    }
}
""")

assert os.system("gcc -fPIC -shared tmp.cpp -o libtmp.so") == 0

In [67]:
libtmp = ctypes.cdll.LoadLibrary(os.path.join(os.getcwd(), "libtmp.so"))
libtmp.quick.argtypes = (ctypes.c_int, ctypes.c_int, ctypes.c_int,
                         ctypes.POINTER(ctypes.c_double),
                         ctypes.POINTER(ctypes.c_int))
libtmp.quick.restype = None

In [75]:
def run_dirty(height, width, maxiterations=20):
    y, x = numpy.ogrid[-1:0:height*1j, -1.5:0:width*1j]
    c = x + y*1j
    fractal = numpy.zeros(c.shape, dtype=numpy.int32) + maxiterations
    libtmp.quick(height, width, maxiterations,
                 ctypes.cast(c.ctypes.data, ctypes.POINTER(ctypes.c_double)),
                 ctypes.cast(fractal.ctypes.data, ctypes.POINTER(ctypes.c_int)))
    return fractal

In [None]:
import time
starttime = time.time()
fractal = run_dirty(8000, 12000)
time.time() - starttime