# pypersist
### Persistent memoisation framework for Python
This notebook shows some of the features available in pypersist.

Find more information at https://github.com/mtorpey/pypersist

## 1 - Persist

The main feature of pypersist is the `persist` decorator:

In [None]:
from pypersist import persist

Let us also import some other libraries, just for testing:

In [None]:
from time import sleep
from os import listdir

The following function waits for a second, then returns the triple of a given number.  We memoise it by decorating it with `@persist`:

In [None]:
@persist
def triple(x):
    sleep(1)
    return 3 * x

triple.clear()

We can call the function, and it behaves as if there were no decorator, waiting for a second each time:

In [None]:
triple(1)

In [None]:
triple(4)

But if we call it again with an argument that was already calculated, it completes in no time at all:

In [None]:
triple(1)

This is because the actual function wasn't called.  Instead, pypersist retrieved the stored result from earlier.

These results were stored in the `persist/triple` directory, as we can see if we examine the files there:

In [None]:
files = listdir('persist/triple')
files

In [None]:
open('persist/triple/' + files[0]).read()

Each file contains one result, in a pickled and compressed format.  The filename is a hash of the arguments.  We can customise this to make it more human-readable, as we will see later.

We can also manipulate the cache manually, interacting with it like a dictionary:

In [None]:
triple.cache[(('x', 4),)]

In [None]:
triple.cache[(('x', 10),)] = 31

In [None]:
triple(10)

In [None]:
try:
    print(triple.cache[(('x', 5),)])
except KeyError as e:
    print("bad key:", e.args[0])

In [None]:
len(triple.cache)

## 2 - Custom pickling

We can make the contents of these files human-readable by specifying custom `pickle` and `unpickle` functions.  In this example, we write the output of the `double` function to a string using `repr`, and we read it back in using `eval` (make sure the directory is secure if you try this at home!).

In [None]:
@persist(pickle=repr, unpickle=eval)
def double(x):
    sleep(1)
    return 2 * x

double.clear()

In [None]:
double(2)

In [None]:
double(2)

In [None]:
double(x=2)

In [None]:
double(0)

In [None]:
double("hello!")

Now the files still have complicated names:

In [None]:
files = listdir('persist/double')
files

But their contents are human-readable!

In [None]:
for file in files:
    print(open('persist/double/' + file).read())

## 3 - Storage location

Instead of using `persist/`, we can choose a different directory for storage by specifying `cache`.  We can also pick a different directory for our specific function using `funcname`.

In this example, Bob wants to save results in a directory he is sharing with Alice.  He is concerned that there may be another function called `foo`, so he specifies a more specific name for the function:

In [None]:
@persist(cache='file://results_for_alice/', funcname='foofighters')
def foo(x, y, z=1, *, a=3):
    sleep(1)
    return x + y + z + a

foo.clear()

Note that the `cache` string starts with `file://` to indicate that we want to store the results as local files and not to a remote server.

It doesn't matter whether we specify arguments with keywords, nor what order we use:

In [None]:
foo(1,4,z=3)

In [None]:
foo(1,y=4,z=3)

In [None]:
foo(1,z=3,y=4)

If an argument is equal to its default value, it will be treated as if it were not specified at all:

In [None]:
foo(1,4,3,a=3)  # Last arg has the default value, so it is ignored

In [None]:
foo(1,4,3,a=7)  # Last arg is kw-only, and used

In [None]:
foo(1,4,a=3,z=3)  # Default arg in non-canonical order

In [None]:
foo(1,4,a=7,z=3)  # Non-default arg in non-canonical order

Behind the scenes, a key is created from a list of arguments using a pypersist function called `arg_tuple`:

In [None]:
from pypersist.preprocessing import arg_tuple
def baz(a, b, c=3, d=4, *, e, f=6, g=7):
    return a+b+c+d+e+f+g
arg_tuple(baz, 10, 2, 3, e=50, g=3, f=6)

Note that arguments are sorted alphabetically, and that `f=6` is removed since it is equal to the default value.  This ensures that equivalent sets of arguments are treated as equal.  This behaviour can be customised by specifying a `key` function, as we will see in the next example.

## 4 - Custom key function

Here we have a function `sum` that takes a variable number of arguments and adds them all up.  We can memoise it as usual:

In [None]:
@persist
def sum(*args):
    sleep(1)
    acc = 0
    for x in args:
        acc += x
    return acc

sum.clear()

Any mathematician knows that this function will return the same answer regardless of which order a set of arguments is specified in.  However, pypersist doesn't know this, and so it produces a different key for each ordering of arguments, and recomputes each time:

In [None]:
sum(1,4,3,7,3,12)

In [None]:
sum(4,12,7,3,3,1)  # recomputed since the args are in different order

We can improve on this by specifying a `key` function.  In this case we use a function that returns a sorted list of all the arguments:

In [None]:
@persist(key=lambda *args: sorted(args))
def sum(*args):
    sleep(1)
    acc = 0
    for x in args:
        acc += x
    return acc

sum.clear()

Now when we try two different orderings, they hash to the same file, and the answer from the first one can be reused:

In [None]:
sum(1,4,3,7,3,12)

In [None]:
sum(4,12,7,3,3,1)  # args sorted and answer retrieved from cache

## 5 - Custom hash function

In the next example, we memoise a function `pow` that returns the exponent of one integer by another, in a human-readable way:

In [None]:
@persist(hash=lambda k: '%s to the %s' % (k[0][1], k[1][1]),
         pickle=str,
         unpickle=int)
def pow(x,y):
    return x**y
pow.clear()

The key, which has the form `(('x', 2), ('y', 3))`, is hashed to the string `2 to the 3`, which will be used in the filename.  The integer result is converted to a string using `str`, and converted back using `int`.

In [None]:
pow(2,3)

Let us run a few examples, and examine the output:

In [None]:
pow(7,4)
pow(1,3)
pow(10,5)
pow(0,0)
pow(2,16)

In [None]:
listdir('persist/pow')

In [None]:
open('persist/pow/7 to the 4.out', 'r').read()

These results can now be inspected by anyone, without them even needing to know about pypersist.

## 6 - Storing keys

So far we have just hashed a key to retrieve its result.  The chance of a hash collision is tiny (by default we use SHA-256) but for absolute correctness we may choose to store a key along with its result:

In [None]:
@persist(storekey=True)
def square(x):
    return x*x

square.clear()

In [None]:
square(12)

In [None]:
square(0)

Not only does this ensure correctness, but it allows us to iterate through the keys in the cache:

In [None]:
for key in square.cache:
    print(key)

In [None]:
for key in square.cache.keys():
    print(key)

In [None]:
for val in square.cache.values():
    print(val)

In [None]:
for pair in square.cache.items():
    print(pair)

Let us create a deliberate hash collision and see how pypersist handles it.

Michael uses pypersist to memoise his `square` function, but foolishly uses a hash function that maps every key to the same string!

In [None]:
@persist(hash=lambda k: 'hello world')
def square(x):
    return x*x

square.clear()

The first call to `square` works just fine:

In [None]:
square(3)

But then the same result is retrieved whatever input we give:

In [None]:
square(4)

This is because every set of arguments points to `hello world.out`:

In [None]:
listdir('persist/square')

What if we choose to store keys?

In [None]:
@persist(hash=lambda k: 'hello world',
         storekey=True)
def square(x):
    return x*x

square.clear()

The first call to `square` writes a file to disk:

In [None]:
square(3)

If we call `square` again with a different argument, pypersist checks the key and raises a HashCollisionError:

In [None]:
try:
    square(4)
except Exception as hce:
    print("Hash collision for keys", hce.args[0], 'and', hce.args[1])

## 7 - Unhashing

Another way of preventing hash collisions is to use an injective `hash` function (i.e. one that never maps two different keys to the same string).  If our `hash` function is injective, and we specify its inverse, `unhash`, then we can iterate through its keys just as if `storekey` was set to `True`.

In the following example, we memoise the exponential function.  As a key, we take the float of the input (so that `2.0` is the same as `2`), and we simply hash it to a string of the form `"e to the 2.0"`.  Since this hash function is injective, we can specify an `unhash` function that simply strips the initial `"e to the "` from the beginning, and calls `float` on the remainder.  This gives us back the original number.

In [None]:
@persist(key=float,
         hash=lambda k: f'e to the {k}',
         unhash=lambda s: float(s[9:]))
def exp(x):
    return 2.71828 ** x

exp.clear()

The function works as expected:

In [None]:
print(exp(2))
print(exp(2.0))
print(exp(-1))
print(exp(3.14))

And we can iterate over its keys, just as if we had used `storekey=True`:

In [None]:
for key in exp.cache:
    print(key)

In [None]:
for key,val in exp.cache.items():
    print('e to the', key, 'equals', val)

## 8 - Using a server
All our examples so far have simply written all results to disk.  But what if we want to connect to a server online?

The `mongodb/` directory in pypersist allows us to setup a MongoDB server with a REST interface, which pypersist users can interact with to store results.

Execute the following block, or navigate to `mongodb_server` and execute `run.py`, to start a server that can be used to store results:

In [None]:
#from eve import Eve
#import os
#fname = os.path.join(os.getcwd(), 'mongodb_server', 'settings.py')
#app = Eve(settings=fname)
#app.run()

If such a server is running, we can connect to it by specifying a `cache` string that starts with `mongodb://`, as follows:

In [None]:
@persist(cache="mongodb://localhost:5000/persist/")
def start_and_end(string):
    return string[0] + string[-1]

start_and_end.clear()

It can be used in exactly the same way as a file cache:

In [None]:
start_and_end('Hello World!')

In [None]:
start_and_end('Doctor')

In [None]:
start_and_end('Doctor')

In [None]:
del start_and_end.cache[(('string','Hello World!'),)]

In [None]:
start_and_end.clear()

We can also use all the options that have been described so far:

In [None]:
@persist(cache="mongodb://localhost:5000/persist/", storekey=True)
def alternating(string):
    return string[::2]

alternating.cache.clear()

In [None]:
alternating('steadfastness')

In [None]:
alternating('steadfastness')

In [None]:
words = ['ballooned', 'biannually', 'curliness', 'pursuance', 'situation', 'thesaurus']
[alternating(word) for word in words]

In [None]:
for pair in alternating.cache.items():
    print(pair)

In [None]:
@persist(cache="mongodb://localhost:5000/persist/",
         key=float,
         hash=lambda k: f'e to the {k}',
         unhash=lambda s: float(s[len('e to the '):]))
def exp(x):
    return 2.71828 ** x
exp.clear()

In [None]:
print(exp(2))
print(exp(-1))
print(exp(2.0))
print(exp(3.14))

In [None]:
for key in exp.cache:
    print(key)

In [None]:
for key,val in exp.cache.items():
    print('e to the', key, 'equals', val)