## Off-heap Java memory

In the Java world, "off-heap" means data that the Java garbage collector doesn't know about and can't manage. It is the only way to get stable pointers to any data, since the garbage collector is generational (copying long-lived items to less frequently checked buffers, as an optimization).

The standard library has a set of functions for dealing with off-heap memory because it improves performance. It was _supposed_ to be blocked from ordinary users like us. The class, encouragingly named "`sun.misc.Unsafe`", has a singleton field named "`theUnsafe"` that is declared private, so we can't access it directly:

In [None]:
sun.misc.Unsafe.theUnsafe

There's also a public getter method that checks to see if we're a library function. We're not, so that's a `SecurityException`.

In [None]:
sun.misc.Unsafe.getUnsafe

But there's a sneaky way. Java reflection makes it possible to change the "private" modifier on class fields, which we can then use to get at the hidden object.

In [None]:
val privateField = classOf[sun.misc.Unsafe].getDeclaredField("theUnsafe")
privateField.setAccessible(true)
val unsafe = privateField.get(null).asInstanceOf[sun.misc.Unsafe]

Heh, heh, heh! We have nefariously gained the ability to "malloc."

In [None]:
val ptr = unsafe.allocateMemory(4096)

Yes, it's just a long integer. It really is unsafe.

In [None]:
unsafe.getByte(ptr)

Although this technique looks like (okay, _is_) a hack, it is now a well-established hack. Many important libraries, such as Spark, heavily rely upon it. Since it's a glitch in the specification, not the implementation, Sun/Oracle has to live with the consequences.

From [this list of functions](http://www.docjar.com/docs/api/sun/misc/Unsafe.html), we see that we can set bytes, get integers, etc. as well as "`copyMemory`" (memcpy) and "`freeMemory`" (free).

See, for example, [this blog](http://mishadoff.com/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/) and others for a guide to this magical class.

In [None]:
unsafe.putByte(ptr + 0, 0)
unsafe.putByte(ptr + 1, 1)
unsafe.putByte(ptr + 2, 0)
unsafe.putByte(ptr + 3, 0)

In [None]:
unsafe.getInt(ptr)

Did you notice that that's little endian? According to the specification, the JVM is big-endian, but that only covers how Java's public functions interact with externals. This unsafe class shows us that it's internally little endian, which is a reasonable choice for performance.

Java also doesn't have any unsigned integers, which makes it awkward to deal with file formats. You have to do the bit twiddling yourself:

In [None]:
def castSignedAsUnsigned(x: Int) = x match {
    case x if x < 0 => x.toLong + (1L << 32)
    case x => x
}

def castUnsignedAsSigned(x: Long) = x match {
    case x if x > (1L << 32) => throw new Exception("too big")
    case x if (x & (1L << 32)) != 0 => -(x & ~(1L << 32)).toInt
    case x => x.toInt
}

In [None]:
castUnsignedAsSigned(3000000000L)
castSignedAsUnsigned(castUnsignedAsSigned(3000000000L))

In [None]:
castSignedAsUnsigned(-123)
castUnsignedAsSigned(castSignedAsUnsigned(-123))

In [None]:
unsafe.putInt(ptr, castUnsignedAsSigned(3000000000L))
castSignedAsUnsigned(unsafe.getInt(ptr))

One thing to note: that `ptr` is just a long integer: it could point anywhere. Just as with the Numpy examples, we can wrap other libraries' data and view or manipulate them. We just need to be careful about ownership rules.

Unlike Numpy, which has ownership built into the `ndarray` that wraps the memory, you have to `Unsafe.freeMemory` your allocations by hand.