# Modeling Cash Flows in Python

In [1]:
import pandas as pd

## Framework Development--First Steps
The goal with this framework is to create a way to easily model cash flows in Python. It should be easy for people coming from a traditional Excel-based underwriting background to understand. As a result, heavy inspiration comes from typical Excel model patterns.

Modeling Excel rows as functions with time periods as arguments is a natural place to start. The capital account below is a basic example. Even for a simple model though, function execution ends up being highly iterative.

In [2]:
call_counter = 0

def starting_balance(period: int):
    global call_counter
    call_counter += 1
    return 100 if period == 0 else ending_balance(period - 1)

def draw(period: int):
    global call_counter
    call_counter += 1
    return 5

def accrual(period: int):
    global call_counter
    call_counter += 1
    return round(starting_balance(period) * 0.08, 2)

def ending_balance(period: int):
    global call_counter
    call_counter += 1
    return starting_balance(period) + draw(period) + accrual(period)

%time ending_balance(20)
print(f'Total function executions: {call_counter}')

CPU times: user 2.86 s, sys: 7.89 ms, total: 2.86 s
Wall time: 2.87 s
Total function executions: 10485755


The runtime for 20 periods executed locally is just under three seconds. The total number of function calls is exponential (O(2^n)) in this simple example. This clearly doesn't work for a typical 10 monthly pro forma with hundreds of line items.
Using `cache` (equivalent to `lru_cache(maxsize=None)` available in 3.10+) cuts runtime down to approximately 60 microseconds.

In [3]:
from functools import cache

call_counter = 0

@cache
def starting_balance(period: int):
    global call_counter
    call_counter += 1
    return 100 if period == 0 else ending_balance(period - 1)

@cache
def draw(period: int):
    global call_counter
    call_counter += 1
    return 5

@cache
def accrual(period: int):
    global call_counter
    call_counter += 1
    return round(starting_balance(period) * 0.08, 2)

@cache
def ending_balance(period: int):
    global call_counter
    call_counter += 1
    return starting_balance(period) + draw(period) + accrual(period)

%time ending_balance(20)
print(f'Total function executions: {call_counter}')

CPU times: user 47 µs, sys: 1 µs, total: 48 µs
Wall time: 51 µs
Total function executions: 84


`cache`/`lru_cache` wrappers store cached values in a dictionary with keys equal to the hashed argument parameters. This creates two challenges: i) arguements must be hashable (i.e. immutable), and ii) caches will be stored across instances if wrapped at the class level.

We will ignore the first issue for now since it likely isn't an issue in many cases. The second issue means wrapping methods will could lead to memory leaks.

In [4]:
import gc

class Property:
    
    def __init__(self, sf, rental_rate):
        self.sf = sf
        self.rental_rate = rental_rate
    
    @cache
    def rent(self, period):
        return self.sf * self.rental_rate


prop1 = Property(100_000, 40)
prop2 = Property(50_000, 60)
prop1.rent(1)
prop2.rent(1)
print(f'Cache size with both instances: {prop2.rent.cache_info()}')
del prop1
gc.collect()  # Force a garbage collection cycle to run
print(f'Cache size after first instance removed: {prop2.rent.cache_info()}')

Cache size with both instances: CacheInfo(hits=0, misses=2, maxsize=None, currsize=2)
Cache size after first instance removed: CacheInfo(hits=0, misses=2, maxsize=None, currsize=2)


Even if a property instance is deleted, the cached values of its methods are never purged even after it is removed. Note that there isn't cache key collision because the instance `self` parameter is included in the key.

An alternative to using `cache` would be `lru_cache` with some defined max size. This would eventually purge old values, but it's impossible to know how large the max size should be since it depends on the number of instances created and unique method calls.

Ideally, the cache would be separate for each instance and would be removed when the instance is removed. [Descriptors](https://docs.python.org/3/howto/descriptor.html) allow us to generate a new cached method for each instance the first time it is called. This approach also speeds up initialization since caches aren't created until they are needed.

In [5]:
from types import MethodType

class memoize:
    
    def __init__(self, func):
        self.func = func
    
    def __set_name__(self, owner, name):
        self.name = name
    
    def __get__(self, instance=None, cls=None):
        if instance is None:
            return self
        
        bound_func = MethodType(cache(self.func), instance)
        setattr(instance, self.name, bound_func)
        return bound_func

In [6]:
import weakref

class Property:
    
    def __init__(self, sf, rental_rate):
        self.sf = sf
        self.rental_rate = rental_rate
    
    @memoize
    def rent(self, period):
        return self.sf * self.rental_rate

prop = Property(100_000, 40)
# Create weak reference to the func with caching wrapper so that gc will remove it if no other references
cache_ref = weakref.ref(prop.rent)
print(cache_ref())

del prop
gc.collect()
print(cache_ref())

<bound method Property.rent of <__main__.Property object at 0x11ca0a080>>
None


Using a callable object as a method wrapper works perfectly fine. Some IDE's have trouble finding the appropriate type hints and function signatures though. For example, PyCharm shows correct function signature hint for the underlying wrapped function, but VS Code/Pylance simply shows the generic function signature for the `MethodType` wrapper. This seems to be an issue for many function wrappers, including builtins that are part of the base Python distribution. Pylance issues with wrappers seems to be a common question. To be reviewed at some point later.

## Framework Development--Reusable Components and Groupings
There are a few challenges with the single-method-per-line-item approach, including:
* the whole class must be re-defined if any of the functions need to be changed
* code isn't reusable
* it's difficult to navigate for large projects

The two approaches below explore different ways of creating a container object to group row functions. In both cases, if the object is called it will pass any arguments to any `memoize`d methods or recursively to any sub-container objects and return the results in a `attr_name: value` dictionary.

### A. Dynamic attribute look-up approach
This approach finds all `memoize`d and `SeriesGroup` attributes over the instance's dict and entire type hierarchy when the object is called. This approach ensures the relevant attributes are never missed. 

In models with hundreds of lines, it's much easier to read the results if they are in the expected top-down order from revenues to net cash flows. One challenge with this approach is that the order of outputs can't be easily customized. The easiest orders to implements are either i) some variant of attribute definition order or ii) alphabetical.

In [7]:
from utils import AbstractSeries, flatten_dict, memoize

In [8]:
class SeriesGroupIterator:  # Iterator for SeriesGroup
    
    def __init__(self, series_group) -> None:
        self._series = iter(series_group.get_series().values())
    
    def __next__(self):
        try:
            return next(self._series)
        except StopIteration:
            raise StopIteration


class SeriesGroup(AbstractSeries):
    
    def __init__(self, name) -> None:
        super().__init__()
        self.name = name
    
    def __call__(self, *args, **kwargs):
        series = self.get_series()
        series_values = {name: s(*args, **kwargs) for name, s in series.items()}
        return flatten_dict(series_values)
    
    def __iter__(self):
        return SeriesGroupIterator(self)
    
    def get_series(self):
        attrs = {name: getattr(self, name) for name in dir(self)}
        series = {name: attr for name, attr in attrs.items() if isinstance(attr, AbstractSeries) or (isinstance(getattr(type(self), name, None), AbstractSeries))}
        return series
    
    @classmethod
    def with_series(cls, name, series):
        new_group = cls(name)
        for s in series:
            setattr(new_group, s.name, s)
        return new_group

`AbstractSeries` is an empty parent class for both `memoize` and `SeriesGroup` that makes it easier to identify which object attributes should be rolled up. The `flatten_dict` function flattens nested dictionarys with keys set equal to concatenated sub-dict keys. Concatenated keys could lead to name collision (future note to avoid collisions).

The code snippet below shows how `SeriesGroups` can be used to build up more complex cash flows.

In [9]:
class GrossLease(SeriesGroup):
    
    def __init__(self, name, sf, rent_psf) -> None:
        super().__init__(name)
        self.sf = sf
        self.rent_psf = rent_psf

    @memoize
    def base_rent(self, period):
        return self.sf * self.rent_psf


class DynamicProperty(SeriesGroup):
    
    def __init__(self, name, leases, opex_pct) -> None:
        super().__init__(name)
        self.opex_pct = opex_pct
        self.leases = leases
    
    @memoize
    def total_rent(self, period):
        return sum([l.base_rent(period) for l in self.leases])
    
    @memoize
    def opex(self, period):
        return self.total_rent(period) * self.opex_pct
    
    @memoize
    def noi(self, period):
        return self.total_rent(period) - self.opex(period)

property = DynamicProperty(
    name='123mainst',
    leases=SeriesGroup('leases', 
                       GrossLease('tenant_a', sf=50_000, rent_psf=40), 
                       GrossLease('tenant_b', sf=25_000, rent_psf=50)),
    opex_pct=0.55)

In [10]:
pd.DataFrame([property(y) for y in range(10)]).T.style.format('{:,.0f}')

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
leases.tenant_a.base_rent,2000000,2000000,2000000,2000000,2000000,2000000,2000000,2000000,2000000,2000000
leases.tenant_b.base_rent,1250000,1250000,1250000,1250000,1250000,1250000,1250000,1250000,1250000,1250000
noi,1462500,1462500,1462500,1462500,1462500,1462500,1462500,1462500,1462500,1462500
opex,1787500,1787500,1787500,1787500,1787500,1787500,1787500,1787500,1787500,1787500
total_rent,3250000,3250000,3250000,3250000,3250000,3250000,3250000,3250000,3250000,3250000


### B. Stored attribute list approach
This approach stores the names of attributes that should be rolled up in a list property. The approach does not require searching through every attribute on every call and gives users an easy way to customize the order of outputs. However, it means the list must be updated anytime attributes are added or removed.

In [11]:
def get_hierarchy_attr(cls, filter=None):
    bases = cls.__mro__
    if bases:
        bases = list(bases)
        bases.reverse()
    
    attributes = []

    if filter:
        for base in bases:
            for name, attr in vars(base).items():
                if (name not in attributes) and filter(name, attr):
                    attributes.append(name)
    else:
        for base in bases:
            for name, attr in vars(base).items():
                if (name not in attributes):
                    attributes.append(name)
    return attributes


In [12]:
class SeriesIterator:  # Iterator class
    
    def __init__(self, series) -> None:
        self._series = series
        self._names_iter = iter(series.sub_series)
        self._sub_series_lenth = len(series.sub_series)
    
    def __next__(self):
        if len(self._series.sub_series) != self._sub_series_lenth:
            raise RuntimeError('Number of sub-series changed during iteration')
        try:
            name = next(self._names_iter)
            return getattr(self._series, name)
        except StopIteration:
            raise StopIteration


class Series(AbstractSeries):
    
    def __init__(self, name: str) -> None:
        super().__init__()
        self.name = name
        self.sub_series = get_hierarchy_attr(type(self), lambda name, value: isinstance(value, AbstractSeries))
    
    def __call__(self, *args, **kwds):
        series_values = {name: getattr(self, name)(*args, **kwds) for name in self.sub_series}
        return flatten_dict(series_values)
    
    def __iter__(self):
        return SeriesIterator(self)
    
    def add_series(self, series, index: int = -1):
        for s in series:
            setattr(self, s.name, s)
            self.sub_series.insert(index, s.name)
    
    @classmethod
    def with_sub_series(cls, name, series):
        new_series = cls(name)
        new_series.add_series(series)
        return new_series

This approach stores attributes that should be returned when the object is called in a list named `sub_series`. If any attributes are added, they need be be added to the list either directly or with the convenience method `add_series`. The order that names appear in the `sub_series` list determines the order of results when the object is called.

In [13]:
class GrossLease(Series):
    
    def __init__(self, name, sf, rent_psf) -> None:
        super().__init__(name)
        self.sf = sf
        self.rent_psf = rent_psf
        
    @memoize
    def base_rent(self, period):
        return self.sf * self.rent_psf    


class StoredAttrProperty(Series):
    
    def __init__(self, name, leases, opex_pct) -> None:
        super().__init__(name)
        self.opex_pct = opex_pct
        self.add_series([Series.with_sub_series('leases', leases)], index=0)
    
    @memoize
    def total_rent(self, period):
        return sum([l.base_rent(period) for l in self.leases])
    
    @memoize
    def opex(self, period):
        return self.total_rent(period) * self.opex_pct
    
    @memoize
    def noi(self, period):
        return self.total_rent(period) - self.opex(period)


property = StoredAttrProperty(
    name='123mainst',
    leases = [GrossLease('tenant_a', sf=50_000, rent_psf=40), 
              GrossLease('tenant_b', sf=25_000, rent_psf=50)],
    opex_pct=0.45)

In [14]:
pd.DataFrame([property(y) for y in range(10)]).T.style.format('{:,.0f}')

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
leases.tenant_b.base_rent,1250000,1250000,1250000,1250000,1250000,1250000,1250000,1250000,1250000,1250000
leases.tenant_a.base_rent,2000000,2000000,2000000,2000000,2000000,2000000,2000000,2000000,2000000,2000000
total_rent,3250000,3250000,3250000,3250000,3250000,3250000,3250000,3250000,3250000,3250000
opex,1462500,1462500,1462500,1462500,1462500,1462500,1462500,1462500,1462500,1462500
noi,1787500,1787500,1787500,1787500,1787500,1787500,1787500,1787500,1787500,1787500


### Approach comparison
There's no meaningful difference in performance between the two approaches. If it turns out that the ordering issue with the dynamic approach isn't a big deal (or there's a way to add custom orders), then its simplicity is preferrable over maintaining a list of sub-series attributes.

In [15]:
def dynamic_look_up():
    DynamicProperty(
        name='123mainst',
        leases=SeriesGroup('leases', 
                           GrossLease('tenant_a', sf=50_000, rent_psf=40), 
                           GrossLease('tenant_b', sf=25_000, rent_psf=50)),
        opex_pct=0.55)
    
    return [property(p) for p in range(120)]

%timeit dynamic_look_up()

1.9 ms ± 21.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [16]:
def stored_attr():
    StoredAttrProperty(
        name='123mainst',
        leases=SeriesGroup('leases', 
                           GrossLease('tenant_a', sf=50_000, rent_psf=40), 
                           GrossLease('tenant_b', sf=25_000, rent_psf=50)),
        opex_pct=0.55)
    
    return [property(p) for p in range(120)]

%timeit stored_attr()

1.93 ms ± 19.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
