# Theano Tutorial - Python Tutorial

In [1]:
import theano
import theano.tensor as T

Following http://deeplearning.net/software/theano/tutorial/python-memory-management.html#python-memory-management

## Python Memory Management

Python allocates memory transparently, manages objects using a reference counting system, and frees memory when an object's reference count falls to zero.

### Basic Objects

What is the size of an `int`?

In [2]:
import sys

def show_sizeof(x, level=0):
    print "\t" * level, x.__class__, sys.getsizeof(x), x
    
    if hasattr(x, '__iter__'):
        if hasattr(x, 'items'):
            for xx in x.items():
                show_sizeof(xx, level + 1)
        else:
            for xx in x:
                show_sizeof(xx, level + 1)

In [3]:
show_sizeof(None)
show_sizeof(3)
show_sizeof(2**63)
show_sizeof(1857574678926579826398562938)
show_sizeof(185757467892657982639856293887653985698276439856928346598236598623984569)

 <type 'NoneType'> 16 None
 <type 'int'> 24 3
 <type 'long'> 36 9223372036854775808
 <type 'long'> 40 1857574678926579826398562938
 <type 'long'> 56 185757467892657982639856293887653985698276439856928346598236598623984569


In [4]:
show_sizeof(3.14159265358979323846264338327950288)

 <type 'float'> 24 3.14159265359


A `float` is three times the size a C programmer would expect!?

In [5]:
show_sizeof("")
show_sizeof("My hovercraft is full of eels")

 <type 'str'> 37 
 <type 'str'> 66 My hovercraft is full of eels


In [6]:
show_sizeof([])
show_sizeof([4, "toaster", 230.1])

 <type 'list'> 72 []
 <type 'list'> 96 [4, 'toaster', 230.1]
	<type 'int'> 24 4
	<type 'str'> 44 toaster
	<type 'float'> 24 230.1


The size of an empty C++ `std::list()` is only 16 bytes, 4-5 times less than the Python "equivalent."

In [7]:
show_sizeof({})
show_sizeof({'a': 213, 'b': 2131})

 <type 'dict'> 280 {}
 <type 'dict'> 280 {'a': 213, 'b': 2131}
	<type 'tuple'> 72 ('a', 213)
		<type 'str'> 38 a
		<type 'int'> 24 213
	<type 'tuple'> 72 ('b', 2131)
		<type 'str'> 38 b
		<type 'int'> 24 2131


The dictionary doesn't quite "add up" because there is internal storage for a tree-like structure or a hash table. The C++ `std::map()` takes 48 bytes.

What does this mean? If we need to scale, we need to be careful about how many objects we create to limit the quantity of memory our program uses. However, to devise a good memory management strategy, we need to consider not only the sizes of the objects, but how many and the order in which we create them. A key element to understand is how Python allocates memory internally.

### Internal Memory Management

To speed up memory allocation (and reuse), Python uses a number of lists for small objects. Each list contains objects of similar size: a list for objects 1-8 bytes in size, a list fo 9-16, and so on. When a small object needs to be created, we either re-use a free block in the list or allocate a new one.

There are details to that management, but they aren't important. If interested, see

http://www.evanjones.ca/memoryallocator/

The important point is that those lists _never shrink_.

If an item (of size `x`) is deallocated (freed by lack of reference), its location is not returned to Python's global memory pool, but merely marked as free and added to the free list of items of size `x`. The dead object's location will be re-used if another object of similar size is needed, and if there are no dead objects, new space is allocated.

Therefore, the memory footprint of the application is dominated by the largest number of small objects allocated at any given point.

Therefore, we should allocated only the number of small objects necessary for one task, favoring (otherwise _non-Pythonic_) loops where only a small number of elements are created or processed rather than the (more Pythonic) patterns where lists are created using list generation syntax and then processed.

The free list only growing may not seem like a problem because the memory is still accessible to the Python program. But, because Python only returns memory to the OS on the heap on Windows, on Linux we will only ever see the total memory used by the program increase.

See https://github.com/fabianp/memory_profiler

In [9]:
run -m memory_profiler memory-profile-me.py

Filename: memory-profile-me.py

Line #    Mem usage    Increment   Line Contents
     5   62.418 MiB    0.000 MiB   @profile
     6                             def function():
     7  101.098 MiB   38.680 MiB       x = list(range(1000000))
     8  188.633 MiB   87.535 MiB       y = copy.deepcopy(x)
     9  188.633 MiB    0.000 MiB       del x
    10  188.633 MiB    0.000 MiB       return y




Or, at the command line:

    (python2)TheanoScratch$ python -m memory_profiler memory-profile-me.py

Memory can increase suprisingly quickly if you are not careful!

### Pickle

Is `pickle` wasteful?

In [10]:
run -m memory_profiler test-pickle.py

Filename: test-pickle.py

Line #    Mem usage    Increment   Line Contents
    10  188.781 MiB    0.000 MiB   @profile
    11                             def create_file():
    12  188.781 MiB    0.000 MiB       x = [(random.random(),
    13                                       random_string(),
    14                                       random.randint(0, 2 ** 64))
    15  331.352 MiB  142.570 MiB            for _ in xrange(1000000)]
    16                             
    17  578.938 MiB  247.586 MiB       pickle.dump(x, open('machin.pkl', 'w'))


Filename: test-pickle.py

Line #    Mem usage    Increment   Line Contents
    20  317.871 MiB    0.000 MiB   @profile
    21                             def load_file():
    22  621.039 MiB  303.168 MiB       y = pickle.load(open('machin.pkl', 'r'))
    23  621.039 MiB    0.000 MiB       return y




Somehow, _pickling_ is very bad for memory consumption. Unpickling, on the other hand, seems fairly efficient. Overall, pickling should be avoided for memory-sensitive applications.

If we profile `test-flat.py`, we can see we use a lot less memory (but, we won't here because it takes a long time).

In [8]:
cat test-flat.py

import memory_profiler
import random


def random_string():
    return "".join([chr(64 + random.randint(0, 25)) for _ in xrange(20)])


@profile
def create_file():
    x = [(random.random(),
          random_string(),
          random.randint(0, 2 ** 64))
         for _ in xrange(1000000)]

    f = open('machin.flat', 'w')
    for xx in x:
        print >>f, xx
    f.close()


@profile
def load_file():
    y = []
    f = open('machin.flat', 'r')
    for line in f:
        y.append(eval(line))
    f.close()
    return y


if __name__ == '__main__':
    create_file()
    load_file()


This all generalizes to strategies where we don't load the whole dataset at once, but rather load a bit at a time. Loading data to a NumPy array for example, should invlve creating the array and then reading the file line by line to fill the array - this allocates one copy of the whole data. If we use `pickle`, we allocate the whole data set (at least) twice: once by `pickle` and once by NumPy.

See the tutorial on loading and saving:

http://deeplearning.net/software/theano/tutorial/loading_and_saving.html

Python's design goals are very different than C's. C is designed for granular control, Python for programmer speed.