<br><br><br><br><br>

# Jailbreaking Python

<br><br><br><br><br>

### Tale of two VMs

**Java** and **Python** are both high-level programming environments based on virtual machines. Your programs compile to virtual machine instructions that are emulated at runtime. This limits the consequences of mistakes, making it easier to develop interactively.

However, **Java** and **Python**'s relationships to the outside world are very different.

   * **Java** lives almost entirely inside its virtual machine. There's a pure Java library for just about everything, while its C API is very hard to use. Consequently, there are many Java Runtime Environment implementations, all satisfying the same specification.
   * **Python** has extension modules for a lot of functionality. Its C API is relatively easy to use and there are dozens of wrappers. However, this makes it difficult for alterantive implementations, like PyPy, to compete with the standard CPython because most of those extension modules only work with CPython.

**Python**, through its extension modules, is a leaky abstraction.

<br>

In [1]:
# Let's break Python!

hello = b"Hello, world!"         # Python strings are immutable so they can be referenced, not copied.
try:                             # A lot of code depends on strings not changing after they're made.
    hello[4:8] = b"????"
except TypeError as err:
    print("Not allowed to change it: " + str(err))

import numpy                     # Wrap it with a Numpy array (not a copy).
a = numpy.frombuffer(hello, dtype=numpy.uint8)
print("a =", a)
print("a.view('S1') =", a.view('S1'))

a.flags.writeable = True         # Set the writable flag to True,
a[4:8] = [69, 86, 73, 76]        # and we can write to it.

print("hello =", repr(hello))    # And it is evil.

Not allowed to change it: 'bytes' object does not support item assignment
a = [ 72 101 108 108 111  44  32 119 111 114 108 100  33]
a.view('S1') = [b'H' b'e' b'l' b'l' b'o' b',' b' ' b'w' b'o' b'r' b'l' b'd' b'!']
hello = b'HellEVILorld!'


In [2]:
# See how evil this is:

# Interpreted as it looks: we change one b"Hello, world!" but not the other.
hello = b"Hello, world!"
a = numpy.frombuffer(hello, dtype=numpy.uint8)
a.flags.writeable = True
a[4:8] = [69, 86, 73, 76]
print("case 1:", hello == b"Hello, world!")

# Interpreted as a single .pyc; literal b"Hello, world!" strings are replaced by a common object.
exec("""
hello = b"Hello, world!"
a = numpy.frombuffer(hello, dtype=numpy.uint8)
a.flags.writeable = True
a[4:8] = [69, 86, 73, 76]
print("case 2:", hello == b"Hello, world!")
""")

case 1: False
case 2: True


<br><br><br><br><br>

### Python is an environment in which you can poke individual bytes

<br><br><br><br><br>

In [3]:
x = 12345

import ctypes
import sys

ptr = ctypes.cast(id(x), ctypes.POINTER(ctypes.c_uint8))
a = numpy.ctypeslib.as_array(ptr, (sys.getsizeof(x),))
print("a =", a)

# We're looking at a Python object header, a pointer to the int type (also a Python object), and
# then the number itself: 12345 in little endian bytes is [57, 48, 0, 0]. Do you see it?

a = [  2   0   0   0   0   0   0   0  32  22 161 221  72  86   0   0   1   0
   0   0   0   0   0   0  57  48   0   0]


In [4]:
# Now for a string.

y = "Hey there."
ptr = ctypes.cast(id(y), ctypes.POINTER(ctypes.c_uint8))
a = numpy.ctypeslib.as_array(ptr, (sys.getsizeof(y),))
print("a =", a)

print("\na[-11:]            =", a[-11:])

print("\na[-11:].tostring() =", repr(a[-11:].tostring()))

a = [  2   0   0   0   0   0   0   0  32 226 160 221  72  86   0   0  10   0
   0   0   0   0   0   0  79  79 192   9  38   9 212  73 228 171 183 242
 166 127   0   0   0   0   0   0   0   0   0   0  72 101 121  32 116 104
 101 114 101  46   0]

a[-11:]            = [ 72 101 121  32 116 104 101 114 101  46   0]

a[-11:].tostring() = b'Hey there.\x00'


In [7]:
# The snake eats its own tail: reading PyObject refcount without changing it.
class PyObject(ctypes.Structure): pass
PyObject._fields_ = [("ob_refcnt", ctypes.c_size_t),
                     ("ob_type", ctypes.POINTER(PyObject))]

hello = b"Hello, world!"
ptr = PyObject.from_address(id(hello))
print("ptr =", ptr)

print("\nBefore:")
print("ptr.ob_refcnt =", ptr.ob_refcnt, "sys.getrefcount(hello) =", sys.getrefcount(hello))

biglist = [hello] * 1000
print("\nAfter:")
print("ptr.ob_refcnt =", ptr.ob_refcnt, "sys.getrefcount(hello) =", sys.getrefcount(hello))

ptr = <__main__.PyObject object at 0x7fa6f2b67d08>

Before:
ptr.ob_refcnt = 2 sys.getrefcount(hello) = 3

After:
ptr.ob_refcnt = 1002 sys.getrefcount(hello) = 1003


<br><br><br><br><br>

### Doing something _useful_ with this power

<br><br><br><br><br>

In [5]:
# Suppose you're on a supercomputer with Non-Uniform Memory Access (NUMA) and you want Numpy arrays.

import ctypes.util
libnuma = ctypes.cdll.LoadLibrary(ctypes.util.find_library("numa"))

libnuma.numa_alloc_local.argtypes = (ctypes.c_size_t,)  # manually set the expected types (no .h file)
libnuma.numa_alloc_local.restype = ctypes.POINTER(ctypes.c_double)

ptr  = libnuma.numa_alloc_local(4*1024)                 # allocate it!
a    = numpy.ctypeslib.as_array(ptr, (1024,))           # wrap it as an array!
a[:] = numpy.linspace(-511.5, 511.5, 1024)              # assign to it; use it!

print("a =", a)

a = [-511.5 -510.5 -509.5 ...  509.5  510.5  511.5]


In [8]:
# Of course, if you're allocating CPU-local memory in Numpy arrays, you'll want to ensure that
# your thread doesn't switch to another CPU.

import psutil
psutil.Process().cpu_affinity([0])

<br><br><br><br><br>

### Interacting with C and C++

With <tt>ctypes</tt>, we can call any function in any library that supports Foreign Function Interface (FFI). This includes almost every compiled language except C++.

Both for C++ and for safety from low-level hacks, we should use a library.

<br><br><br><br><br>