# Aggregating entities

This notebook is a proof of concept of portfolios, which are just collections of workbooks. Basically:
    
* Portfolios are a tree structure: a portfolio can contain multiple books or other portfolios.
* A portfolio should have the same API as a single book, specifically: the tickets in a book are the set of tickets in
the underlying books, the workItems in a portfolio are the net of the workItems in the underlying books.
* Books can appear in multiple portfolios: the books hold activity, each portfolio is a view useful to somebody 
(e.g. a manager.)


An real-world example might be a helpdesk with a group of workbooks, for example:
* US-L1 support
* US-L2 support
* US-hardware-group
* HongKong-L1 support
* HongKong-L2 support
* HongKong-hardware-group
* Singapore-L1 support
* Singapore-L2 support
* Singapore-hardware-group
* Global-support

In addition to the people monitoring the individual workflows, various managers, auditors, etc, might want to
monitor aggregations, e.g:
* HongKong-all: all HongKong books
* Asia-all: all HongKong and Singapore books
* Hardware-global: all the regional hardware books
* Global-help: all L1 and all L2 and all hardware and global support
        
Note that there is not a single tree of aggregations: depending of use-case, people may want to aggregate workbooks
in different ways. 
    
*This notebook also introduces a few implementation concepts, so check for [Core] or [Test] in the below code.*

## Multiple timelines [DBA]

This workbook is about managing the trees of portfolios rather than implementing the mechanics of workItem
aggregation.

In any event, the shape of the portfolio trees should be on a different timeline from the items contained within
the portfolios themselves: that way we can rearrange our portfolio aggregations without worrying about messing up
our ability to regenerate old reports.

This is a key idea. While I hate to introduce accounting systems (yes, we will build one soon,) we are trying to
head off two problems by implementing multiple trees and multiple timelines. Here are the problems (and proposed
solutions) from an accounting website that is explaining the idea of a chart of accounts:
    
* *Consistency. It is of some importance to initially create a chart of accounts that is unlikely to change for 
several years, so that you can compare the results in the same account over a multi-year period.
If you start with a small number of accounts and then gradually expand the number of accounts over time, 
it becomes increasingly difficult to obtain comparable financial information for more than the past year.*
* *Lock down. Do not allow subsidiaries to change the standard chart of accounts without a very good reason, 
since having many versions in use makes it more difficult to consolidate the results of the business.*

That seems like two hacks. We fix the first with multiple timelines, and we fix the second with multiple portfolio
trees.

In [1]:
import mand.core

from mand.core import Entity, node, Context

from mand.core import ObjectDb, DynamoDbDriver, ddb, _tr, Timestamp, Context
from mand.core import RefData, RefDataUpdateEvent, Workbook, PrintMonitor, SummaryMonitor

rawdb = DynamoDbDriver(ddb)
db = ObjectDb(rawdb)

pClock = _tr.Clock('Portfolio', db=db).write()

## Implementing trees in an OO-database [BA]

Trees of objects are easy to implement in object databases. These would be a struggle in a relational database.
But, do note, that a naive implementation (and this is naive one)  results in a lot of small round-trip messages
to the underlying database.

In [2]:
class Book(Workbook):
    
    def books(self):
        return [self]

    def prn(self, depth=0):
        print '  '*depth, self.meta.name()
            
class PortfolioUpdateEvent(RefDataUpdateEvent):

    @node(stored=True)
    def children(self):
        return []
    
class Portfolio(RefData):

    evCls = PortfolioUpdateEvent
    
    @node
    def clock(self):
        return _tr.Clock.get('Portfolio', db=self.meta.db)
    
    @node
    def children(self):
        evs = self.activeEvents()
        if evs:
            return evs[-1].children()
        else:
            return []
        
    def setChildren(self, children, validTime=None, amends=[]):
        ev = self.evCls(entity=self, amends=amends, children=children, db=self.meta.db)
        ev.write(validTime=validTime)
        return ev
    
    @node
    def books(self):
        books = set()
        for c in self.children():
            for b in c.books():
                if b in books:
                    print 'LogMessage: Oops, book appears multiple times'
                books.add(b)
        return list(books)

    def prn(self, depth=0):
        print '  '*depth, self.meta.name()
        for c in self.children():
            c.prn(depth+1)
            
_tr.add(Book)
_tr.add(Portfolio)
_tr.add(PortfolioUpdateEvent)

## Example: creating some books and portfolios [User]

Just a set of accounts that an upper-middle class family might use to manage its finances.

In [3]:
with db:
    checking = Book('checking').write()
    savings = Book('savings').write()
    brokerage = Book('brokerage').write()
    misc = Book('misc').write()
    margin = Book('margin').write()
    retirement = Book('401K').write()
    kid1Trust = Book('trust1').write()
    kid2Trust = Book('trust2').write()

    pAll = Portfolio('Family').write()
    pKids = Portfolio('Kids').write()
    pBanking = Portfolio('Banking').write()
    pTrading = Portfolio('Trading').write()
    pDerivs = Portfolio('Derivs').write()

def info():
    c = [ o.meta.name() for o in pAll.children() ]
    b = [ o.meta.name() for o in pAll.books() ]
    print 'All children:', c
    print 'All books   :', b

### The top account (Family) should have nothing in it yet [User]

In [4]:
info()

All children: []
All books   : []


### Looking at the family account in a diffent context [Test]

This should return empty trees as well.

This is here as a test: if we start mutating our portfolio tree, we would expect the tree in *ctx2* to respect
the changes.

In [5]:
ctx2 = Context({})
with ctx2:
    info()

All children: []
All books   : []


### Making a tree of accounts [User]

Just adds some events to establish a simple chart of accounts...

In [6]:
pKids.setChildren([kid1Trust, kid2Trust])
pBanking.setChildren([checking, savings])
pTrading.setChildren([brokerage, margin, retirement])
pDerivs.setChildren([pTrading])
pAll.setChildren([pBanking, pDerivs, pKids, misc])

<__main__.PortfolioUpdateEvent at 0x107f0ad90>

### Everything is lazy [Test]

We have set up the tree, but we haven't evaluated what our books are yet, so we don't yet have our dependencies:

In [7]:
n = Context.current().getCBM(pAll.books)
n.printInputGraph()

 <Portfolio@107f15c10/Portfolio:books in Root> *not evaluated*


### Showing information about the top portfolio [Test]

In [8]:
pAll.prn()
print

info()
print

 Family
   Banking
     checking
     savings
   Derivs
     Trading
       brokerage
       margin
       401K
   Kids
     trust1
     trust2
   misc

All children: ['Banking', 'Derivs', 'Kids', 'misc']
All books   : ['misc', 'trust2', 'brokerage', 'checking', '401K', 'trust1', 'savings', 'margin']



# Current time vs bounded time state [Core]

The current system has the nice property that querying the head state of the datastore is similar to querying
a historical version of the datsatore: the only difference is that a historical query is executed within a 
context that has dynamically established cutoffs in time, thus fixing database object visibility.

Because we may cache the results of computations, a naive implementation could result in our current state being
cached and thus out-of-date as the database has received new updates. That seems counter-intuitive when our own
process is doing something reasonable such as creating portfolio trees.

Alternatively, we could avoid caching any computation results that relied upon database state. That might
cause a lot of unneeded, if conservative, recomputation.

We could also try to make events be aware of the compute cache, and let them invalidate calculations as appropriate
when they are written to the database. That seems good in terms of keeping the local process reasonable and 
consistent, but it sounds like an ugly implementation.

A reasonable design for current state would seem to be:
* what you read or compute is what you get. And it gets cached.
* if our process writes to the underlying database, values that depended on the db state will become invalid and 
need recompute, values that didn't remain unaffected.
* if you want to recalculate everything based on the current db state, there should be an easy way to describe that.

The design is provisionally implemented with a special object, *cosmicAll*: time-unbounded reads depend on it,
bounded-time reads don't. Note the dependencies below: the first one is on line 5 or so...

In [9]:
n.printInputGraph()

 <Portfolio@107f15c10/Portfolio:books in Root>
   <Portfolio@107f15c10/Portfolio:children in Root>
     <Clock@107e2c450/Clock:cutoffs in Root>
       <Clock@107e2c450/Clock:parent in Root>
         <RootClock@107e2c210/RootClock:cutoffs in Root>
           <RootClock@107e2c210/RootClock:cosmicAll in Root>
           <CosmicAll@107df4350/CosmicAll:dbState in Root>
         <Clock@107e2c450/Entity:clock in Root>
       <RootClock@107e2c210/RootClock:cutoffs in Root>
         <RootClock@107e2c210/RootClock:cosmicAll in Root>
         <CosmicAll@107df4350/CosmicAll:dbState in Root>
     <PortfolioUpdateEvent@107f0ad90/PortfolioUpdateEvent:children in Root>
     <Portfolio@107f15c10/Portfolio:clock in Root>
     <PortfolioUpdateEvent@107f0ad90/Event:amends in Root>
   <Portfolio@107f15790/Portfolio:books in Root>
     <Portfolio@107f15790/Portfolio:children in Root>
       <Clock@107e2c450/Clock:cutoffs in Root>
         <Clock@107e2c450/Clock:parent in Root>
           <RootClock@107e2c21

## Bounded-time read [Test]

Like above, but now we should not be depending on the cosmic all.

In [10]:
ts = Timestamp()
with Context({pClock.cutoffs: ts}):
    b = pAll.books()
    print
    print
    n = Context.current().getCBM(pAll.books)
    n.printInputGraph()
    info()



 <Portfolio@107f15c10/Portfolio:books in Root:4350922648>
   <Portfolio@107f15c10/Portfolio:children in Root:4350922648>
     <PortfolioUpdateEvent@107f0ad90/PortfolioUpdateEvent:children in Root:4350922648>
     <Portfolio@107f15c10/Portfolio:clock in Root:4350922648>
     <PortfolioUpdateEvent@107f0ad90/Event:amends in Root:4350922648>
     <Clock@107e2c450/Clock:cutoffs in Root:4350922648>
   <Portfolio@107f15790/Portfolio:books in Root:4350922648>
     <Portfolio@107f15790/Portfolio:children in Root:4350922648>
       <PortfolioUpdateEvent@107e2cd50/PortfolioUpdateEvent:children in Root:4350922648>
       <Portfolio@107f15790/Portfolio:clock in Root:4350922648>
       <PortfolioUpdateEvent@107e2cd50/Event:amends in Root:4350922648>
       <Clock@107e2c450/Clock:cutoffs in Root:4350922648>
     <Portfolio@107f0a590/Portfolio:books in Root:4350922648>
       <Portfolio@107f0a590/Portfolio:children in Root:4350922648>
         <PortfolioUpdateEvent@107f48e90/PortfolioUpdateEvent:chi

### CosmicAll [Core]

CosmicAll is an out-of-context solution. Haha. 

Basically, it lives in all contexts so that any invalidation on one of its nodes (e.g. writing an object) affects
all contexts that used it in a computation.

In [11]:
# ctx2 has open time cutoffs, so it should depend on the dbState, thus this should recompute...
with SummaryMonitor():
    with ctx2:
        info()

All children: ['Banking', 'Derivs', 'Kids', 'misc']
All books   : ['misc', 'trust2', 'brokerage', 'checking', '401K', 'trust1', 'savings', 'margin']
Compute activity:
              GetValue:    49
         GetValue/Calc:    17


In [12]:
# and now be cached...
with PrintMonitor():
    with ctx2:
        info()

 Context enter ctx: Root:4350922648
   GetValue begin ctx: Root:4350922648, key: Portfolio@107f15c10/Portfolio:children
     GetValue from ctx value: [<__main__.Portfo..., ctx: Root:4350922648, key: Portfolio@107f15c10/Portfolio:children
   GetValue begin ctx: Root:4350922648, key: Portfolio@107f15c10/Portfolio:books
     GetValue from ctx value: [<__main__.Book o..., ctx: Root:4350922648, key: Portfolio@107f15c10/Portfolio:books
All children: ['Banking', 'Derivs', 'Kids', 'misc']
All books   : ['misc', 'trust2', 'brokerage', 'checking', '401K', 'trust1', 'savings', 'margin']


### Quick recap:

In [13]:
pAll.prn()

 Family
   Banking
     checking
     savings
   Derivs
     Trading
       brokerage
       margin
       401K
   Kids
     trust1
     trust2
   misc


## A problem [Core]

So, what do we do when a user writes data that causes a consistency problem?

Here, we want a tree of books, but the user has caused the book *misc* to be included twice in the Family portfolio. 
It's not fatal, but it does mean that expected reasonable constraints 
(e.g. Family.fact() == aggregate(Family.books().fact())) will be violated.

The scope of the problem, and the solution, will be presented in later workbooks.

In [14]:
pDerivs.setChildren([pTrading, misc])     
info()
pAll.prn()

LogMessage: Oops, book appears multiple times
All children: ['Banking', 'Derivs', 'Kids', 'misc']
All books   : ['misc', 'trust2', 'brokerage', 'checking', '401K', 'trust1', 'savings', 'margin']
 Family
   Banking
     checking
     savings
   Derivs
     Trading
       brokerage
       margin
       401K
     misc
   Kids
     trust1
     trust2
   misc
