# Limiting RAM (again)

Linux offers a few ways to place OS limits on the amount of RAM one can use. Unfortunately, none of them are ideal for our purposes. The `getrlimit` man page describes several options:

* Resident Set Size: this includes virtual memory on disk, and for anything computational that should have reasonable performance characteristics, we *really* don't want to be swapping out to disk. And for running experiments we don't want to measure swapping time either, so this option is out.
* Addressable Space: this would be reasonable to set a limit on, but unfortunately if a process goes over this limit, it receives a segmentation fault. Not exactly a recoverable error!
* Data: this is the maximum size of the process's heap, which is *exactly* what we want to limit. Unfortunately for us, it also [tends to be ignored](http://venkateshabbarapu.blogspot.ca/2012/09/linux-memory-limits-rlimits.html), which left me banging my head on the wall for quite some time.

Since the kernel doesn't help us, we'll just simulate a `MemoryError` by writing a context manager that checks for us. You can pass the context manager a memory limit and it will raise a `MemoryError` if you exceed your own limit before or after some block of code.

This solution isn't *ideal*, because what you really want is for a `MemoryError` to be raised at the first operation that takes you over the limit. This will have a coarser resolution, the block. But, we should be able to make it work for our purposes.

In [1]:
import os
import resource

In [2]:
from collections import namedtuple
from contextlib import contextmanager

MemUsage = namedtuple('MemUsage', 'size resident share text lib data dt')

def get_mem_usage():
    pid = os.getpid()
    with open('/proc/{}/statm'.format(pid)) as f:
        string = f.read()
    return MemUsage._make([int(x) for x in string.split()])    
    
@contextmanager
def mem_limiter(bytes_):
    if get_mem_usage().data > bytes_:
        raise MemoryError("You're already using {} bytes which is over the limit of {} bytes".format(get_mem_usage().data, bytes_))
    yield
    if get_mem_usage().data > bytes_:
        raise MemoryError("FAIL! You're using {} bytes which is over the limit of {} bytes".format(get_mem_usage().data, bytes_))


You can see how get_mem_usage works. It returns a named tuple that has the various measures of memory usage.

In [3]:
get_mem_usage()

MemUsage(size=130795, resident=6984, share=1660, text=852, lib=0, data=101428, dt=0)

In [4]:
import numpy as np

with mem_limiter(1024 ** 2):
    a = np.empty((1024, 1024, 512))

MemoryError: FAIL! You're using 1151468 bytes which is over the limit of 1048576 bytes

Great! We simulate a MemoryError. Another disadvantage of this approach is that the object does actually get created and the memory is still being used. So, if we try do anything else, even without creating objects:

In [5]:
with mem_limiter(1024 ** 2):
    pass

MemoryError: You're already using 1151634 bytes which is over the limit of 1048576 bytes

it will fail. So, for now we have to remember to clean up any objects that we leave lying around after a MemoryError.

In [6]:
del a
with mem_limiter(1024 ** 2):
    pass