# Modeling Cash Flows in Python

In [1]:
import pandas as pd

## Framework Development--First Steps
The goal with this framework is to create a way to easily model cash flows in Python. It should be easy for people coming from a traditional Excel-based underwriting background to understand. As a result, heavy inspiration comes from typical Excel model patterns.

Modeling Excel rows as functions with time periods as arguments is a natural place to start. The capital account below is a basic example. Even for a simple model though, function execution ends up being highly iterative.

In [2]:
call_counter = 0

def starting_balance(period: int):
    global call_counter
    call_counter += 1
    return 100 if period == 0 else ending_balance(period - 1)

def draw(period: int):
    global call_counter
    call_counter += 1
    return 5

def accrual(period: int):
    global call_counter
    call_counter += 1
    return round(starting_balance(period) * 0.08, 2)

def ending_balance(period: int):
    global call_counter
    call_counter += 1
    return starting_balance(period) + draw(period) + accrual(period)

%time ending_balance(20)
print(f'Total function executions: {call_counter}')

CPU times: user 2.93 s, sys: 12.7 ms, total: 2.94 s
Wall time: 2.97 s
Total function executions: 10485755


The runtime for 20 periods executed locally is just under three seconds. The total number of function calls is exponential (O(2^n)) in this simple example. This clearly doesn't work for a typical 10 monthly pro forma with hundreds of line items.
Using `cache` (equivalent to `lru_cache(maxsize=None)` available in 3.10+) cuts runtime down to approximately 60 microseconds.

In [3]:
from functools import cache

call_counter = 0

@cache
def starting_balance(period: int):
    global call_counter
    call_counter += 1
    return 100 if period == 0 else ending_balance(period - 1)

@cache
def draw(period: int):
    global call_counter
    call_counter += 1
    return 5

@cache
def accrual(period: int):
    global call_counter
    call_counter += 1
    return round(starting_balance(period) * 0.08, 2)

@cache
def ending_balance(period: int):
    global call_counter
    call_counter += 1
    return starting_balance(period) + draw(period) + accrual(period)

%time ending_balance(20)
print(f'Total function executions: {call_counter}')

CPU times: user 49 µs, sys: 0 ns, total: 49 µs
Wall time: 50.8 µs
Total function executions: 84


`cache`/`lru_cache` wrappers store cached values in a dictionary with keys equal to the hashed argument parameters. This creates two challenges: i) arguements must be hashable (i.e. immutable), and ii) caches will be stored across instances if wrapped at the class level.

We will ignore the first issue for now since it likely isn't an issue in many cases. The second issue means wrapping methods will could lead to memory leaks.

In [4]:
import gc

class Property:
    
    def __init__(self, sf, rental_rate):
        self.sf = sf
        self.rental_rate = rental_rate
    
    @cache
    def rent(self, period):
        return self.sf * self.rental_rate


prop1 = Property(100_000, 40)
prop2 = Property(50_000, 60)
prop1.rent(1)
prop2.rent(1)
print(f'Cache size with both instances: {prop2.rent.cache_info()}')
del prop1
gc.collect()  # Force a garbage collection cycle to run
print(f'Cache size after first instance removed: {prop2.rent.cache_info()}')

Cache size with both instances: CacheInfo(hits=0, misses=2, maxsize=None, currsize=2)
Cache size after first instance removed: CacheInfo(hits=0, misses=2, maxsize=None, currsize=2)


Even if a property instance is deleted, the cached values of its methods are never purged even after it is removed. Note that there isn't cache key collision because the instance `self` parameter is included in the key.

An alternative to using `cache` would be `lru_cache` with some defined max size. This would eventually purge old values, but it's impossible to know how large the max size should be since it depends on the number of instances created and unique method calls.

Ideally, the cache would be separate for each instance and would be removed when the instance is removed. [Descriptors](https://docs.python.org/3/howto/descriptor.html) allow us to generate a new cached method for each instance the first time it is called. This approach also speeds up initialization since caches aren't created until they are needed.

In [5]:
from types import MethodType

class memoize:
    
    def __init__(self, func):
        self.func = func
    
    def __set_name__(self, owner, name):
        self.name = name
    
    def __get__(self, instance=None, cls=None):
        if instance is None:
            return self
        
        bound_func = MethodType(cache(self.func), instance)
        setattr(instance, self.name, bound_func)
        return bound_func

In [6]:
import weakref

class Property:
    
    def __init__(self, sf, rental_rate):
        self.sf = sf
        self.rental_rate = rental_rate
    
    @memoize
    def rent(self, period):
        return self.sf * self.rental_rate

prop = Property(100_000, 40)
# Create weak reference to the func with caching wrapper so that gc will remove it if no other references
cache_ref = weakref.ref(prop.rent)
print(cache_ref())

del prop
gc.collect()
print(cache_ref())

<bound method Property.rent of <__main__.Property object at 0x11aca7fa0>>
None


Using a callable object as a method wrapper works perfectly fine. Some IDE's have trouble finding the appropriate type hints and function signatures though. For example, PyCharm shows correct function signature hint for the underlying wrapped function, but VS Code/Pylance simply shows the generic function signature for the `MethodType` wrapper. This seems to be an issue for many function wrappers, including builtins that are part of the base Python distribution. Pylance issues with wrappers seems to be a common question. To be reviewed at some point later.

## Framework Development--Reusable Components and Groupings
There are a few challenges with the single-method-per-line-item approach, including:
* the whole class must be re-defined if any of the functions need to be changed
* code isn't reusable
* it's difficult to navigate for large projects

To start, the `memoize` class is re-defined below to inherit from a generic type and attach a name property to the wrapped function. These changes allow us to treat functions similarly to the callable container class `SeriesGroup` defined below.

In [7]:
class AbstractSeries:
    pass

class memoized_series(AbstractSeries):  # subclass of new generic type
    
    def __init__(self, func) -> None:
        super().__init__()
        self.func = func
    
    def __set_name__(self, owner, name) -> None:
        self.name = name
        self.func.name = name  # give the function a name property with the same name
    
    def __get__(self, instance=None, cls=None):
        if instance is None:
            return self
        
        bound_func = MethodType(cache(self.func), instance)
        setattr(instance, self.name, bound_func)
        return bound_func

The approach below builds out a composite tree. When a `SeriesGroup` node is called, it will propogate the call down to any node children and return the results of all the `memoized_series` methods. The response is a dictionary of `name: value` pairs for each `memoized_series` of the tree. 

Whenever an attribute is added to a `SeriesGroup` object, it's type is check. If it is also a `SeriesGroup` type, then it's parent property is automatically set to the containing `SeriesGroup`. Similarly, when an attribute is removed, it's type is checked and the `parent` property is cleared if necessary. This allows automatic navigation up the composite tree.

To navigate down the composite tree, attribute types are checked. Any `SeriesGroup` attributes (other than the `parent` property) `memoized_series` are considered children.

Since all nodes of the composite tree are linked, any node can reference any other node even if it's on another branch of the tree. The `series` method searches the entire tree structure and returns any series that meets the optional filtering function. If there's no filter, it will return the entire tree. The `child_series` just searches the current node and any child nodes. 

In [8]:
from __future__ import annotations

class SeriesGroup(AbstractSeries):
    
    def __init__(self, name: str, parent: SeriesGroup=None) -> None:
        super().__init__()
        self.name = name
        self.parent = parent
    
    @classmethod  # convenience function for initializing with children nodes
    def with_children(cls, name: str, children: Dict[str, SeriesGroup]) -> SeriesGroup:
        new_sg = cls(name)
        for name, sg in children.items():
            setattr(new_sg, name, sg)
        return new_sg
    
    def __setattr__(self, __name: str, __value: Any) -> None:
        """If setting a SeriesGroup as an attribute, set the attribute's `parent` property to self"""
        if isinstance(__value, SeriesGroup):
            super(SeriesGroup, __value).__setattr__('parent', self)
        return super().__setattr__(__name, __value)
    
    def __delattr__(self, __name: str) -> None:
        """If removing a SeriesGroup attribute, set the `parent` property to None"""
        attr = getattr(self, __name)
        if isinstance(attr, SeriesGroup):
            attr.parent = None
        return super().__delattr__(__name)
    
    def __call__(self, *args: Any, **kwds: Any) -> Any:
        series_values = {series.name: series(*args, **kwds) for series in self._series()}
        return flatten_dict(series_values)
    
    def _series(self) -> List[str, SeriesGroup]:
        """Get node's children series"""
        attrs = []
        is_child_series = lambda name, attr: False if name == 'parent' else isinstance(attr, AbstractSeries)

        # get instance attributes
        if hasattr(obj, '__dict__'):
            for name, value in obj.__dict__.items():
                if is_child_series(name, value):
                    attrs.append(getattr(obj, name))

        # get type attributes
        bases = type(obj).__mro__

        for base in bases:
            for name, attr in base.__dict__.items():
                if is_child_series(name, attr) and (name not in attrs):
                    attrs.append(getattr(obj, name))

        return attrs
    
    def root(self) -> SeriesGroup:
        """Return tree root node"""
        return self.parent.root() if self.parent is not None else self
    
    def series(self, function: Callable | None = None) -> List[SeriesGroup]:
        """
        Return series in the tree where the test function returns `True` or all tree series if function is None
        The test function should take a single AbstractSeries parameter and return a bool
        """
        return self.root().child_series(function=function)
    
    def child_series(self, function: Callable | None = None) -> List[SeriesGroup]:
        """
        Return children series where the test function returns `True` or all children series if function is None
        The test function should take a single AbstractSeries parameter and return a bool
        """
        series = []
        
        children = self._series()
        
        for child in children:
            # If there's a test condition, add child if it passes the test regardless of whether its a SG or memoized_series
            # If SG, propogate call to children as well
            if function is not None:
                if function(child):
                    series.append(child)
                
                if isinstance(child, SeriesGroup):
                    series.extend(child.child_series(function=function))
            
            # If no test condition, then then propogate call if SG otherwise add to series
            else:
                if isinstance(child, SeriesGroup):
                    series.extend(child.child_series(function=function))
                else:
                    series.append(child)
        
        return series

One great benefit to holding children as regular attributes--rather than, say, in a list--is that it allows navigation over the tree both by directly accessing attributes _and_ by search conditions. For example, if you had an income statement model for a multi-national consumer business and wanted to see retention for a specific cohort of a specific product, you might access that data directly with something like:

```
statement.revenue.americas.productyyy.cohortzzz.retention(...)
```

Alternatively, if you didn't know the structure of the model (or were creating a component class that was agnostic to model structure), you might find the same values by searching the tree with something like:

```
statement.series(lambda s: isinstance(s, RetentionClass) and (s.region == 'americas) and ...)
```

This approach isn't without tradeoffs. Since all children are regular attributes, they must have unique names which adds complexity.

Additionally, search results currently aren't stored. Searching the entire tree every time a series is called with new parameters can significantly slow down results depending on the complexity of the tree. Accessing references directly when they are known ahead of time (for example, within the same class or required child attributes) improves performance.

Currently there isn't an easy way to clear `memoized_series` caches. This means the flexibilty of dynamically looking up references to other series is compromised by the fact that functions within the tree may already store results from previous references. In generally, the risk of accidentally using outdated function results is a big risk of caching functions that are not pure functions. Some ideas to help manage this risk include i) run calculation cycles in a context manager that clears cached results on entering and exiting the context, and ii) create a new instance for every calculation cycle that is thrown away afterwards.