## Numpy

Numpy is a convenient, Pythonic _toolkit_ for manipulating raw memory. It's primarily intended for data analysis applications:

In [1]:
import numpy

array = numpy.array([0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9])
array[5:7]

array([ 5.5,  6.6])

In [2]:
array[array > 3]

array([ 3.3,  4.4,  5.5,  6.6,  7.7,  8.8,  9.9])

But it also lets you do bare metal stuff, like byte-swapping and casting.

In [3]:
array.byteswap()

array([  0.00000000e+000,  -1.54234871e-180,  -1.54234871e-180,
         1.90359857e+185,  -1.54234871e-180,   2.81419792e-320,
         1.90359857e+185,  -6.06598800e+066,  -1.54234871e-180,
        -6.06598800e+066])

In [4]:
array.view(dtype="int32")

array([          0,           0, -1717986918,  1072798105, -1717986918,
        1073846681,  1717986918,  1074423398, -1717986918,  1074895257,
                 0,  1075183616,  1717986918,  1075471974,  -858993459,
        1075760332, -1717986918,  1075943833,  -858993459,  1076088012], dtype=int32)

A Numpy array object (`ndarray`) is essentially just a C array with a Python object attached. The Python object manages everything that is ordinarily unsafe about C arrays:

   * the length (to prevent overwrites)
   * the type (to prevent unintended casting), including unsigned integers
   * the byte order (ditto)
   * C vs Fortran order for multidimensional arrays (e.g. which index runs contiguously in memory: the first or last?)
   * whether this object owns the array or if it is just a "view".

Usually, when you create a new Numpy array (sometimes implicitly in an expression involving arrays), you want Numpy to allocate a new memory buffer and let the `ndarray` object own it. That is, when the `ndarray` is deleted, the buffer gets freed.

For performance, some Numpy operations give you a "view" into another array, rather than a copy:

In [5]:
subarray = array[5:]
subarray

array([ 5.5,  6.6,  7.7,  8.8,  9.9])

In [6]:
subarray[2] = 999.99

In [7]:
array

array([   0.  ,    1.1 ,    2.2 ,    3.3 ,    4.4 ,    5.5 ,    6.6 ,
        999.99,    8.8 ,    9.9 ])

You can identify a "view" because it has a "base" reference to the array that it's viewing. By maintaining a reference, the view can ensure that the base doesn't get garbage collected until they're both out of scope.

In [8]:
subarray.base is array

True

In [9]:
array.base is None

True

But there's yet another case: sometimes you have a buffer already and want Numpy to wrap it. Maybe you want to use some of Numpy's vectorized functions on the data, or maybe you want to pass it to some software that only recognizes data in Numpy format (`<cough>` machine learning `<cough>`).

Anything that satisfies Python's "buffer" interface can become an `ndarray`.

In [10]:
string = "hello there"
array = numpy.frombuffer(string, dtype=numpy.uint8)
array

array([104, 101, 108, 108, 111,  32, 116, 104, 101, 114, 101], dtype=uint8)

In [11]:
map(chr, array)

['h', 'e', 'l', 'l', 'o', ' ', 't', 'h', 'e', 'r', 'e']

In [12]:
array.base is string

True

With some effort, Numpy arrays can even wrap arbitrary regions of memory, given by an integer-valued pointer.

In [13]:
import ctypes
libc = ctypes.cdll.LoadLibrary("libc.so.6")
libc.malloc.restype = ctypes.POINTER(ctypes.c_double)
ptr = libc.malloc(4096)
ptr

<__main__.LP_c_double at 0x7c1b48034c20>

In [14]:
ptr.__array_interface__ = {
    "version": 3,
    "typestr": numpy.ctypeslib._dtype(type(ptr.contents)).str,
    "data": (ctypes.addressof(ptr.contents), False),
    "shape": (4096,)
}
array = numpy.array(ptr, copy=False)
array

array([  6.74186042e-310,   6.74186042e-310,   2.09542885e-316, ...,
         6.01346953e-154,   0.00000000e+000,   4.03651633e-321])

## Snake eating its tail again

Have you ever wondered what Python structs look like? You don't have to use the C API to delve into this. The `id(obj)` for some `obj` happens to be a numerical pointer to the object in memory. This fact is not guaranteed in future versions of Python (nor is it true in alternate implementations, such as Jython), but it's true for now.

In [15]:
string = "hello there"
id(string)

136456361385200

In [16]:
ptr = ctypes.cast(id(string), ctypes.POINTER(ctypes.c_uint8))
ptr.__array_interface__ = {
    "version": 3,
    "typestr": numpy.ctypeslib._dtype(type(ptr.contents)).str,
    "data": (ctypes.addressof(ptr.contents), False),
    "shape": (64,)
}
array = numpy.array(ptr, copy=False)
print map(chr, array)

['\x01', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x80', '\xba', '\x91', '\x00', '\x00', '\x00', '\x00', '\x00', '\x0b', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', ']', '\xfc', 'i', '\xae', 'K', '\xc1', '\xc5', '\x90', '\x00', '\x00', '\x00', '\x00', 'h', 'e', 'l', 'l', 'o', ' ', 't', 'h', 'e', 'r', 'e', '\x00', '\r', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\xe0', '\xb8', '\x91', '\x00', '\x00', '\x00', '\x00', '\x00']


Can you spot it?