# pyg.base.cell
cell is a dict that forms part of a calculation graph. Most usefully, db_cell is implemented to maintain persistency of the function output in MongoDB. 
Before we start, we will show a few examples of how a cell works. Then, we will build a toy example of trading stocks based on an exponentially weighted crossover. 


* We will start by creating the system using pyg.base.dictable and pyg.timeseries. 
* We then repeat the same code, this time modifying it slightly to save the data and calculation graph in MongoDB while running the calculation.
* We conclude by discussing the two approaches

## Cell 101

In [1]:
from pyg import *
a = cell(lambda x, y: x + y,  x = 1, y = 2)
b = cell(lambda x, y: x * y,  x = 2, y = a)
b

cell
x:
    2
y:
    cell
    {'x': 1, 'y': 2, 'function': <function <lambda> at 0x000002984BB7FB50>}
function:
    <function <lambda> at 0x000002984BB7F7F0>

In [2]:
b.keys() ## b is a dict

['x', 'y', 'function']

In [3]:
b._args ## inputs

['x', 'y']

In [4]:
b._output ## where the output will go once we calculate it

['data']

In [5]:
assert b.run() ## b has not calculated yet... please run it

In [6]:
b() # calculated object note b().data

2023-02-13 11:52:57,953 - pyg - INFO - None
2023-02-13 11:52:57,960 - pyg - INFO - None


cell
x:
    2
y:
    cell
    x:
        1
    y:
        2
    latest:
        None
    updated:
        2023-02-13 11:52:57.963174
    function:
        <function <lambda> at 0x000002984BB7FB50>
function:
    <function <lambda> at 0x000002984BB7F7F0>
data:
    6
latest:
    None
updated:
    2023-02-13 11:52:57.964154

In [7]:
assert not b().run() ## b has calculated now... no need to run it

2023-02-13 11:52:59,976 - pyg - INFO - None
2023-02-13 11:52:59,978 - pyg - INFO - None


In [8]:
cell(lambda x, y: x ** y)(x = a, y = 2) # you can define the cell and then call it with the values

2023-02-13 11:53:00,825 - pyg - INFO - None
2023-02-13 11:53:00,828 - pyg - INFO - None


cell
function:
    <function <lambda> at 0x000002984BB7F5B0>
x:
    cell
    x:
        1
    y:
        2
    latest:
        None
    updated:
        2023-02-13 11:53:00.830519
    function:
        <function <lambda> at 0x000002984BB7FB50>
y:
    2
data:
    9
latest:
    None
updated:
    2023-02-13 11:53:00.830519

## Workflow without using cell or persisting

In [9]:
from pyg import *; 
import yfinance as yf # see https://github.com/ranaroussi/yfinance
constituents = dictable(read_csv('c:/github/pyg/docs/constituents.csv')).rename(lower) # downloaded from <https://datahub.io/core/s-and-p-500-companies#resource-constituents>
constituents

dictable[503 x 3]
symbol|name                |sector     
MMM   |3M                  |Industrials
AOS   |A. O. Smith         |Industrials
ABT   |Abbott              |Health Care
...503 rows...
ZBH   |Zimmer Biomet       |Health Care
ZION  |Zions Bancorporation|Financials 
ZTS   |Zoetis              |Health Care

In [10]:
def download(symbol):
    return yf.download(tickers = symbol)
    

In [11]:
stocks = constituents.inc(sector = 'Energy')
stocks

dictable[23 x 3]
name               |sector|symbol
APA Corporation    |Energy|APA   
Baker Hughes       |Energy|BKR   
Chevron Corporation|Energy|CVX   
...23 rows...
Targa Resources    |Energy|TRGP  
Valero Energy      |Energy|VLO   
Williams Companies |Energy|WMB   

In [12]:
stocks = stocks(history = download) # download takes symbol as an input and symbol is a column in stocks, so this provides symbol to download and puts output in 'history' column

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

In [13]:
stocks = stocks.inc(lambda history: len(history)>0)

In [14]:
stocks = stocks(adj = lambda history: getitem(value = history, key = 'Adj Close'))

In [15]:
stocks = stocks(rtn = lambda adj: diff(a = adj))

In [16]:
stocks = stocks(vol = lambda rtn: ewmstd(a = rtn, n = 30))

In [17]:
_data = 'data'
def crossover_(a, fast, slow, vol, instate = None):
    state = Dict(fast = {}, slow = {}, vol = {}) if instate is None else instate
    fast_ewma_ = ewma_(a, fast, instate = state.fast)
    slow_ewma_ = ewma_(a, slow, instate = state.slow)    
    raw_signal = fast_ewma_.data - slow_ewma_.data
    signal_rms = ewmrms_(raw_signal, vol, instate = state.vol)
    normalized = raw_signal/v2na(signal_rms.data)
    return Dict(data = normalized, state = Dict(fast = fast_ewma_.state, slow = slow_ewma_.state, vol = signal_rms.state))

crossover_.output = ['data', 'state']

def crossover(a, fast, slow, vol, state = None):
    return crossover_(a, fast, slow, vol, instate = state)

### some more functions to calculate the profits & loss as well as the signal/noise ratio

In [18]:
def signal_pnl(signal, rtn, vol):
    return shift(signal) * (rtn/vol)

def information_ratio(pnl):
    return 16 * ts_mean(pnl) / ts_std(pnl)

In [19]:
forecasts = stocks * dictable(fast = [2,4,8], slow = [6,12,24], forecast = ['fast', 'medium', 'slow'])

In [20]:
forecasts = forecasts(signal = lambda rtn, fast, slow: crossover_(rtn, fast = fast, slow = slow, vol = 30).data)

In [21]:
forecasts = forecasts(pnl = lambda signal, rtn, vol: signal_pnl(signal = signal, rtn = rtn, vol = vol))

In [22]:
forecasts = forecasts(ir = lambda pnl: information_ratio(pnl = pnl))

In [23]:
print(forecasts.pivot('symbol', 'forecast', 'ir', [last, f12]))

symbol|fast |medium|slow 
APA   |0.12 |-0.06 |-0.12
BKR   |0.04 |-0.08 |-0.10
COP   |-0.14|-0.15 |-0.16
CTRA  |0.24 |0.14  |-0.02
CVX   |0.45 |0.30  |0.11 
DVN   |0.12 |0.10  |0.11 
EOG   |0.09 |0.00  |-0.05
EQT   |0.38 |0.25  |0.09 
FANG  |-0.13|-0.14 |-0.23
HAL   |0.69 |0.45  |0.29 
HES   |0.16 |0.04  |0.01 
KMI   |0.49 |0.19  |0.08 
MPC   |0.22 |0.45  |0.67 
MRO   |0.22 |0.14  |0.12 
OKE   |-0.22|-0.17 |-0.08
OXY   |-0.26|-0.31 |-0.24
PSX   |0.08 |0.23  |0.49 
PXD   |0.23 |0.17  |0.21 
SLB   |-0.06|-0.22 |-0.30
TRGP  |0.25 |0.01  |-0.04
VLO   |0.29 |0.29  |0.42 
WMB   |0.18 |-0.05 |-0.18
XOM   |-0.04|-0.28 |-0.42


## Workflow while saving to in-memory graph
We can use pyg-cell to persists nodes in memory

In [24]:
stocks = stocks(history = lambda symbol: db_cell(download, symbol = symbol, pk = ['symbol', 'key'], key = 'history')())

2023-02-13 11:57:36,693 - pyg - INFO - get_cell(key = 'history', symbol = 'APA')()


[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:37,406 - pyg - INFO - get_cell(key = 'history', symbol = 'BKR')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:37,842 - pyg - INFO - get_cell(key = 'history', symbol = 'CVX')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:38,464 - pyg - INFO - get_cell(key = 'history', symbol = 'COP')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:39,005 - pyg - INFO - get_cell(key = 'history', symbol = 'CTRA')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:39,455 - pyg - INFO - get_cell(key = 'history', symbol = 'DVN')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:39,860 - pyg - INFO - get_cell(key = 'history', symbol = 'FANG')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:40,174 - pyg - INFO - get_cell(key = 'history', symbol = 'EOG')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:40,690 - pyg - INFO - get_cell(key = 'history', symbol = 'EQT')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:41,196 - pyg - INFO - get_cell(key = 'history', symbol = 'XOM')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:41,799 - pyg - INFO - get_cell(key = 'history', symbol = 'HAL')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:42,342 - pyg - INFO - get_cell(key = 'history', symbol = 'HES')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:42,848 - pyg - INFO - get_cell(key = 'history', symbol = 'KMI')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:43,192 - pyg - INFO - get_cell(key = 'history', symbol = 'MRO')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:43,934 - pyg - INFO - get_cell(key = 'history', symbol = 'MPC')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:44,325 - pyg - INFO - get_cell(key = 'history', symbol = 'OXY')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:44,932 - pyg - INFO - get_cell(key = 'history', symbol = 'OKE')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:45,430 - pyg - INFO - get_cell(key = 'history', symbol = 'PSX')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:45,755 - pyg - INFO - get_cell(key = 'history', symbol = 'PXD')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:46,146 - pyg - INFO - get_cell(key = 'history', symbol = 'SLB')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:46,610 - pyg - INFO - get_cell(key = 'history', symbol = 'TRGP')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:46,923 - pyg - INFO - get_cell(key = 'history', symbol = 'VLO')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:57:47,422 - pyg - INFO - get_cell(key = 'history', symbol = 'WMB')()



[*********************100%***********************]  1 of 1 completed


The data is persisted in the graph under the keys symbol,key and can be retrieved:

In [25]:
get_cell(key = 'history', symbol = 'WMB')

db_cell
db:
    None
symbol:
    WMB
pk:
    ['symbol', 'key']
key:
    history
function:
    <function download at 0x000002986DBF4F70>
data:
                     Open       High        Low      Close  Adj Close   Volume
    Date                                                                      
    1981-12-31   3.437749   3.507908   3.367591   3.414363   0.720358   521946
    1982-01-04   3.414363   3.429954   3.383182   3.429954   0.723647   224493
    1982-01-05   3.305228   3.398773   3.196094   3.274047   0.690754   686307
    1982-01-06   3.227275   3.274047   3.180503   3.180503   0.671018  1118455
    1982-01-07   3.164912   3.227275   3.086959   3.164912   0.667729   831425
    ...               ...        ...        ...        ...        ...      ...
    2023-02-06  32.020000  32.139999  31.430000  31.700001  31.700001  5668800
    2023-02-07  31.740000  32.230000  31.430000  32.110001  32.110001  8534500
    2023-02-08  32.000000  32.150002  31.700001  31.809999  31.80999

In [28]:
stocks = stocks(history = lambda symbol: db_cell(download, symbol = symbol, db = ['symbol', 'key'], key = 'history'))

In [30]:
c = stocks[0].history

In [32]:
c.run()

True

In [34]:
c = c.load()

In [35]:
c.run()

True

In [36]:
c

db_cell
db:
    ['symbol', 'key']
symbol:
    APA
key:
    history
function:
    <function download at 0x000002986DBF4F70>
pk:
    ['symbol', 'key']
data:
                     Open       High        Low      Close  Adj Close   Volume
    Date                                                                      
    1979-05-15   3.559404   3.607504   3.511304   3.559404   1.946916    22349
    1979-05-16   3.583454   3.823954   3.583454   3.799904   2.078464    66008
    1979-05-17   3.799904   3.848004   3.799904   3.848004   2.104774    57692
    1979-05-18   3.848004   3.992304   3.848004   3.968254   2.170550   119023
    1979-05-21   3.968254   4.064454   3.920154   4.040404   2.210012   106549
    ...               ...        ...        ...        ...        ...      ...
    2023-02-06  42.130001  42.619999  41.349998  42.020000  42.020000  7520600
    2023-02-07  42.320000  43.750000  41.509998  43.689999  43.689999  5262300
    2023-02-08  43.900002  44.009998  41.840000  42.119

In [38]:
get_cell(key = 'history', symbol = 'VLO')()

2023-02-13 12:02:41,127 - pyg - INFO - get_cell(key = 'history', symbol = 'VLO')()


[*********************100%***********************]  1 of 1 completed


db_cell
db:
    None
symbol:
    VLO
pk:
    ['symbol', 'key']
key:
    history
function:
    <function download at 0x000002986DBF4F70>
data:
                      Open        High         Low       Close   Adj Close  \
    Date                                                                     
    1982-01-04    5.484461    5.541590    5.398766    5.513026    1.700056   
    1982-01-05    5.398766    5.513026    5.255941    5.427331    1.673629   
    1982-01-06    5.170247    5.341636    4.970293    5.113117    1.576735   
    1982-01-07    5.027422    5.113117    4.913163    4.998857    1.541501   
    1982-01-08    5.027422    5.084552    4.970293    5.027422    1.550308   
    ...                ...         ...         ...         ...         ...   
    2023-02-06  131.699997  132.770004  126.059998  128.089996  128.089996   
    2023-02-07  129.000000  135.300003  128.740005  134.520004  134.520004   
    2023-02-08  134.639999  137.339996  132.589996  134.119995  134.119995   


In [40]:
c = stocks[-2].history

In [43]:
c.load()

db_cell
db:
    ['symbol', 'key']
symbol:
    VLO
key:
    history
function:
    <function download at 0x000002986DBF4F70>
pk:
    ['symbol', 'key']
data:
                      Open        High         Low       Close   Adj Close  \
    Date                                                                     
    1982-01-04    5.484461    5.541590    5.398766    5.513026    1.700056   
    1982-01-05    5.398766    5.513026    5.255941    5.427331    1.673629   
    1982-01-06    5.170247    5.341636    4.970293    5.113117    1.576735   
    1982-01-07    5.027422    5.113117    4.913163    4.998857    1.541501   
    1982-01-08    5.027422    5.084552    4.970293    5.027422    1.550308   
    ...                ...         ...         ...         ...         ...   
    2023-02-06  131.699997  132.770004  126.059998  128.089996  128.089996   
    2023-02-07  129.000000  135.300003  128.740005  134.520004  134.520004   
    2023-02-08  134.639999  137.339996  132.589996  134.119995  1

In [44]:
c.load??

[1;31mSignature:[0m [0mc[0m[1;33m.[0m[0mload[0m[1;33m([0m[0mmode[0m[1;33m=[0m[1;36m0[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
loads a document from the database and updates various keys.

Example:Persistency:
-------------
Since we want to avoid hitting the database, there is a singleton GRAPH, a dict, storing the cells by their address.
Every time we load/save from/to Mongo, we also update GRAPH.

We use the GRAPH often so if you want to FORCE the cell to go to the database when loading, use this:

>>> cell.load(-1) 
>>> cell.load(-1).load(0)  # clear GRAPH and load from db
>>> cell.load([0])     # same thing: clear GRAPH and then load if available

Example:forcing update on change in function inputs:
---------------------------------------------
If the current cell, has inputs that is different from the saved document inputs, we force a recalculation by setting updated = None

>>> from pyg import *
>>> db = partial(sql_table, server = 'DESKTOP-LU

## Workflow while saving to SQL
We can use pyg-sql to save both documents and data

In [25]:
server = 'DESKTOP-GOQ0NSM' ## use your own server or set it in the config file

In [35]:
idb = partial(sql_table, server = server, db = 'demo', table = 'items', pk = 'key', doc = True, create = True, writer = 'c:/demo/items/%key.pickle') 
sdb = partial(sql_table, server = server, db = 'demo', table = 'stock', pk = ['key', 'symbol'], doc = True, writer = 'c:/demo/stock/%symbol/%key.pickle')
fdb = partial(sql_table, server = server, db = 'demo', table = 'forecast', pk = ['key', 'symbol', 'forecast'], doc = True, writer = 'c:/demo/forecast/%symbol/%forecast/%key.pickle')

In [28]:
idb()

2023-02-13 10:56:50,521 - pyg - INFO - creating database: demo
2023-02-13 10:56:50,674 - pyg - INFO - creating table: demo.dbo.items['key', 'doc']


sql_cursor: demo.dbo.items['key'] DOCSTORE[doc] 
SELECT key, doc 
FROM dbo.items
0 records

In [31]:
_ = idb().insert_one(db_cell(key = 'constituents', data = constituents))

2023-02-13 10:58:30,899 - pyg - INFO - creating schema: archived_dbo
2023-02-13 10:58:30,959 - pyg - INFO - creating table: demo.archived_dbo.items['key', 'doc', 'deleted', 'doc']


In [33]:
get_data(table = 'items', server = server, db = 'demo', key = 'constituents')

dictable[503 x 3]
symbol|name                |sector     
MMM   |3M                  |Industrials
AOS   |A. O. Smith         |Industrials
ABT   |Abbott              |Health Care
...503 rows...
ZBH   |Zimmer Biomet       |Health Care
ZION  |Zions Bancorporation|Financials 
ZTS   |Zoetis              |Health Care

In [36]:
stocks = stocks(history = lambda symbol: periodic_cell(download, symbol = symbol, db = sdb, key = 'history')())

2023-02-13 11:44:38,217 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'BKR')()


[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:38,762 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'CVX')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:39,526 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'COP')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:40,126 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'CTRA')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:40,722 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'DVN')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:41,262 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'FANG')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:41,659 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'EOG')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:42,198 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'EQT')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:42,764 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'XOM')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:43,399 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'HAL')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:43,967 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'HES')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:44,494 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'KMI')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:44,852 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'MRO')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:45,525 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'MPC')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:45,908 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'OXY')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:46,450 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'OKE')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:47,107 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'PSX')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:47,431 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'PXD')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:47,904 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'SLB')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:48,433 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'TRGP')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:48,765 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'VLO')()



[*********************100%***********************]  1 of 1 completed

2023-02-13 11:44:49,328 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history', symbol = 'WMB')()



[*********************100%***********************]  1 of 1 completed


In [37]:
sdb()

sql_cursor: demo.dbo.stock['key', 'symbol'] DOCSTORE[doc] 
writer: c:/demo/stock/%symbol/%key.pickle

SELECT key, symbol, doc 
FROM dbo.stock
22 records

In [39]:
stocks = stocks(history_diff = lambda symbol, history: periodic_cell(diff, a = history, symbol = symbol, db = sdb, key = 'history_diff')())

2023-02-13 11:46:42,275 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history_diff', symbol = 'APA')()
2023-02-13 11:46:42,752 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history_diff', symbol = 'BKR')()
2023-02-13 11:46:42,834 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history_diff', symbol = 'CVX')()
2023-02-13 11:46:42,906 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history_diff', symbol = 'COP')()
2023-02-13 11:46:42,981 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history_diff', symbol = 'CTRA')()
2023-02-13 11:46:43,065 - pyg - INFO - get_cell(server = 'DESKTOP-GOQ0NSM', db = 'demo', schema = 'dbo', table = 'stock', key = 'history_diff', symbol = 'DVN')()
2023-02-13 11:46:43,161 - p

This is the work pattern: we feed a function that creates and run a cell to a dictable that creates the cells. This repeats itself so we can automate it:

In [40]:
S = cell_runner

NameError: name 'cell_runner' is not defined

## Workflow while saving to MongoDB

### Table creation
We create three tables dependending on the primary keys we will be using. 

In [19]:
idb = partial(mongo_table, db = 'demo', table = 'items', pk = 'item')
sdb = partial(mongo_table, db = 'demo', table = 'stock', pk = ['item', 'symbol'])
fdb = partial(mongo_table, db = 'demo', table = 'forecast', pk = ['item', 'symbol', 'forecast'])

In [20]:
idb().raw.drop(); sdb().raw.drop(); fdb().raw.drop # we first frop all existing data

2021-10-28 17:42:28,435 - pyg - INFO - INFO: deleting 18 documents based on M{}
2021-10-28 17:42:28,460 - pyg - INFO - INFO: deleting 3946 documents based on M{}


<bound method mongo_cursor.delete_many of <class 'pyg.mongo._cursor.mongo_cursor'> for Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'demo'), 'forecast') 
M{} None
documents count: 1516 
dict_keys(['_id', 'updated', '_period', 'db', 'output', 'a', 'fast', 'slow', 'vol', 'symbol', 'forecast', 'item', 'data', 'function', '_obj', '_pk'])>

In [21]:
idb().insert_one(db_cell(item = 'constituents', data = constituents))

ObjectId('617ad2f83e25e90135406a4b')

In [22]:
get_data('items','demo', item = 'constituents')

dictable[505 x 3]
symbol|name                  |sector     
MMM   |3M Company            |Industrials
AOS   |A.O. Smith Corp       |Industrials
ABT   |Abbott Laboratories   |Health Care
...505 rows...
ZBH   |Zimmer Biomet Holdings|Health Care
ZION  |Zions Bancorp         |Financials 
ZTS   |Zoetis                |Health Care

### Any code differences?
Most of the code remains the same as above, except:

* We wrap it inside a periodic_cell so it is calculated daily
* We add reference to where we want to store it in MongoDB by specifying the db as well as the primary keys of that table
* To run the function, we need to call the cell. This: loads the cell from the database (if found), checking if it even needs running and if so, runs it.

In [23]:
stocks = stocks(history = lambda symbol: periodic_cell(download, symbol = symbol,           # these are the inputs for the function
                                            db = sdb, item = 'history')())                                # these define where the data goes to. Note that symbol is in both!

2021-10-28 17:42:59,748 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'APA')()


[*********************100%***********************]  1 of 1 completed


2021-10-28 17:43:00,685 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'BKR')()


[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:01,245 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'COG')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:01,889 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'CVX')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:02,578 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'COP')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:03,132 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'DVN')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:03,638 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'FANG')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:03,997 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'EOG')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:04,449 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'XOM')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:05,095 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'HAL')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:05,687 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'HES')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:06,205 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'HFC')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:06,678 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'KMI')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:07,042 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'MRO')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:07,684 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'MPC')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:08,010 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'NOV')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:08,368 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'OXY')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:08,886 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'OKE')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:09,372 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'PSX')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:09,658 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'PXD')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:10,081 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'SLB')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:10,645 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'FTI')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:11,010 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'VLO')()



[*********************100%***********************]  1 of 1 completed

2021-10-28 17:43:11,566 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'history', symbol = 'WMB')()



[*********************100%***********************]  1 of 1 completed


In [25]:
stocks[0].history

periodic_cell
updated:
    2021-10-28 17:43:00.412933
period:
    1b
db:
    functools.partial(<function mongo_table at 0x00000184D2AEC9D0>, db='demo', table='stock', pk=['item', 'symbol'])
symbol:
    APA
item:
    history
function:
    <function download at 0x00000184D50575E0>
data:
                     Open       High        Low      Close  Adj Close    Volume
    Date                                                                       
    1979-05-15   3.559404   3.607504   3.511304   3.559404   1.989832     22349
    1979-05-16   3.583454   3.823954   3.583454   3.799904   2.124280     66008
    1979-05-17   3.799904   3.848004   3.799904   3.848004   2.151171     57692
    1979-05-18   3.848004   3.992304   3.848004   3.968254   2.218394    119023
    1979-05-21   3.968254   4.064454   3.920154   4.040404   2.258728    106549
    ...               ...        ...        ...        ...        ...       ...
    2021-10-22  27.200001  27.850000  27.080000  27.680000  27.680000  106

### Accessing the data in MongoDB
The data is now in the database and can be accessed:

In [27]:
get_data('stock', 'demo', symbol = 'APA', item = 'history')

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1979-05-15,3.559404,3.607504,3.511304,3.559404,1.989832,22349
1979-05-16,3.583454,3.823954,3.583454,3.799904,2.124280,66008
1979-05-17,3.799904,3.848004,3.799904,3.848004,2.151171,57692
1979-05-18,3.848004,3.992304,3.848004,3.968254,2.218394,119023
1979-05-21,3.968254,4.064454,3.920154,4.040404,2.258728,106549
...,...,...,...,...,...,...
2021-10-22,27.200001,27.850000,27.080000,27.680000,27.680000,10646300
2021-10-25,28.280001,28.780001,27.920000,28.219999,28.219999,9091200
2021-10-26,28.680000,28.680000,28.020000,28.150000,28.150000,8633700
2021-10-27,27.500000,28.120001,26.740000,26.900000,26.900000,11501800


In [28]:
stocks = stocks.inc(lambda history: len(history.data)>0)

In [29]:
stocks = stocks(adj = lambda history, symbol: periodic_cell(getitem, value = history, key = 'Adj Close', 
                                                            db = sdb, symbol = symbol, item = 'adj')()) 

2021-10-28 17:44:44,267 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'adj', symbol = 'APA')()
2021-10-28 17:44:44,734 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'adj', symbol = 'BKR')()
2021-10-28 17:44:45,267 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'adj', symbol = 'COG')()
2021-10-28 17:44:45,676 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'adj', symbol = 'CVX')()
2021-10-28 17:44:46,185 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'adj', symbol = 'COP')()
2021-10-28 17:44:46,907 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'adj', symbol = 'DVN')()
2021-10-28 17:44:47,355 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'adj', symbol = 'FANG')()
2021-10-28 17:44:47,597 - pyg - INFO - get_cell(url = 

In [30]:
stocks = stocks(rtn = lambda adj, symbol: periodic_cell(diff, a = adj, 
                                                        db = sdb, symbol = symbol, item = 'rtn')())

2021-10-28 19:29:48,814 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'rtn', symbol = 'APA')()
2021-10-28 19:29:49,161 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'rtn', symbol = 'BKR')()
2021-10-28 19:29:49,550 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'rtn', symbol = 'COG')()
2021-10-28 19:29:49,810 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'rtn', symbol = 'CVX')()
2021-10-28 19:29:50,141 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'rtn', symbol = 'COP')()
2021-10-28 19:29:50,494 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'rtn', symbol = 'DVN')()
2021-10-28 19:29:50,978 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'rtn', symbol = 'FANG')()
2021-10-28 19:29:51,208 - pyg - INFO - get_cell(url = 

In [31]:
stocks = stocks(vol = lambda rtn, symbol: periodic_cell(ewmstd, a = rtn, n =  30,  
                                                        db = sdb, symbol = symbol, item = 'vol')())

2021-10-28 19:29:56,725 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'vol', symbol = 'APA')()
2021-10-28 19:29:56,918 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'vol', symbol = 'BKR')()
2021-10-28 19:29:57,142 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'vol', symbol = 'COG')()
2021-10-28 19:29:57,336 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'vol', symbol = 'CVX')()
2021-10-28 19:29:57,559 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'vol', symbol = 'COP')()
2021-10-28 19:29:57,776 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'vol', symbol = 'DVN')()
2021-10-28 19:29:58,031 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'stock', item = 'vol', symbol = 'FANG')()
2021-10-28 19:29:58,225 - pyg - INFO - get_cell(url = 

In [32]:
get_data('stock', 'demo', symbol = 'APA', item = 'vol')

Date
1979-05-15         NaN
1979-05-16         NaN
1979-05-17         NaN
1979-05-18         NaN
1979-05-21         NaN
                ...   
2021-10-22    0.712025
2021-10-25    0.703151
2021-10-26    0.693454
2021-10-27    0.729227
2021-10-28    0.724516
Length: 10708, dtype: float64

### Calculating the forecasts & saving them

In [33]:
forecasts = stocks * dictable(fast = [2,4,8], slow = [6,12,24], forecast = ['fast', 'medium', 'slow'])

In [46]:
forecasts = forecasts(signal = lambda rtn, fast, slow, symbol, forecast: periodic_cell(crossover_, a = rtn, fast = fast, slow = slow, vol = 30,
                                            db = fdb, symbol = symbol, forecast = forecast, item = 'signal')(go = 1))

2021-10-28 19:33:17,919 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'forecast', forecast = 'fast', item = 'signal', symbol = 'APA')()
2021-10-28 19:33:18,316 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'forecast', forecast = 'medium', item = 'signal', symbol = 'APA')()
2021-10-28 19:33:18,541 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'forecast', forecast = 'slow', item = 'signal', symbol = 'APA')()
2021-10-28 19:33:18,844 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'forecast', forecast = 'fast', item = 'signal', symbol = 'BKR')()
2021-10-28 19:33:19,093 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'forecast', forecast = 'medium', item = 'signal', symbol = 'BKR')()
2021-10-28 19:33:19,313 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'forecast', forecast = 'slow', item = 'signal', symbol = 'BKR')()
2021-10-28 19:33:19,603 - pyg - INFO

In [47]:
forecasts = forecasts(pnl = lambda signal, rtn, vol, symbol, forecast: periodic_cell(signal_pnl, signal = signal, rtn = rtn, vol = vol,
                                                        db = fdb, symbol = symbol, forecast = forecast, item = 'pnl')(go = 1))

2021-10-28 19:34:00,197 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'forecast', forecast = 'fast', item = 'pnl', symbol = 'APA')()
2021-10-28 19:34:01,170 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'forecast', forecast = 'medium', item = 'pnl', symbol = 'APA')()
2021-10-28 19:34:01,652 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'forecast', forecast = 'slow', item = 'pnl', symbol = 'APA')()
2021-10-28 19:34:02,204 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'forecast', forecast = 'fast', item = 'pnl', symbol = 'BKR')()
2021-10-28 19:34:02,810 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'forecast', forecast = 'medium', item = 'pnl', symbol = 'BKR')()
2021-10-28 19:34:03,291 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'forecast', forecast = 'slow', item = 'pnl', symbol = 'BKR')()
2021-10-28 19:34:03,805 - pyg - INFO - get_cell(url = 

In [48]:
forecasts = forecasts(ir = lambda pnl: information_ratio(pnl = pnl.data))

In [49]:
print(forecasts.pivot('symbol', 'forecast', 'ir', [last, f12]))

symbol|fast |medium|slow 
APA   |0.13 |-0.04 |-0.10
BKR   |0.06 |-0.08 |-0.11
COG   |0.22 |0.13  |-0.01
COP   |-0.14|-0.16 |-0.17
CVX   |0.46 |0.29  |0.10 
DVN   |0.14 |0.13  |0.16 
EOG   |0.13 |0.03  |-0.03
FANG  |-0.09|-0.10 |-0.18
FTI   |-0.19|-0.36 |-0.35
HAL   |0.72 |0.46  |0.29 
HES   |0.14 |0.02  |0.00 
HFC   |0.84 |0.79  |0.69 
KMI   |0.39 |0.09  |0.02 
MPC   |0.17 |0.42  |0.69 
MRO   |0.22 |0.15  |0.13 
NOV   |0.08 |-0.06 |-0.05
OKE   |-0.25|-0.18 |-0.09
OXY   |-0.25|-0.30 |-0.23
PSX   |0.07 |0.21  |0.49 
PXD   |0.23 |0.17  |0.23 
SLB   |-0.07|-0.23 |-0.31
VLO   |0.29 |0.29  |0.42 
WMB   |0.17 |-0.07 |-0.20
XOM   |-0.04|-0.29 |-0.43


## Accessing & running the graph once the graph has been created
We can access the data or the cell:

In [50]:
get_cell('forecast', 'demo', symbol = 'APA', forecast = 'fast', item = 'signal')

periodic_cell
updated:
    2021-10-28 19:34:00.314000
period:
    1b
db:
    functools.partial(<function mongo_table at 0x00000184D2AEC9D0>, db='demo', table='forecast', pk=['item', 'symbol', 'forecast'])
_id:
    602a86fb4de6ffaf1c045c8f
_period:
    1b
_pk:
    ['forecast', 'item', 'symbol']
a:
    periodic_cell
    updated:
        None
    period:
        1b
    db:
        functools.partial(<function mongo_table at 0x00000184D2AEC9D0>, db='demo', table='stock', pk=['item', 'symbol'])
    _period:
        1b
    item:
        rtn
    symbol:
        APA
    function:
        None
data:
    Date
    1979-05-15         NaN
    1979-05-16         NaN
    1979-05-17   -1.402762
    1979-05-18   -0.939363
    1979-05-21   -1.194294
                    ...   
    2021-10-22   -0.111961
    2021-10-25    0.171604
    2021-10-26   -0.441283
    2021-10-27   -2.128037
    2021-10-28   -1.925680
    Length: 10708, dtype: float64
fast:
    2
forecast:
    fast
instate:
    None
item:
    sign

And now that the graph has been created, you can actually trigger it just by loading. i.e. The code below will give you the fast signal for APA and will ensure it is up-to-date too:

In [51]:
c = get_cell('forecast', 'demo', symbol = 'APA', forecast = 'fast', item = 'signal')
c = c.go()
print(c.data)

2021-10-28 19:35:11,016 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'forecast', forecast = 'fast', item = 'signal', symbol = 'APA')()


Date
1979-05-15         NaN
1979-05-16         NaN
1979-05-17   -1.402762
1979-05-18   -0.939363
1979-05-21   -1.194294
                ...   
2021-10-22   -0.111961
2021-10-25    0.171604
2021-10-26   -0.441283
2021-10-27   -2.128037
2021-10-28   -1.925680
Length: 10708, dtype: float64


## Point-in-time, cache and persistency
The pk-tables save a full history of all your data. 
To avoid hitting the database all the time, we also have a local GRAPH singleton that caches all the cells by their address. The cell has few basic operations we need to understand:

* **cell.run()**: Returns True/False if the cell needs to be calculated. db_cell() just check for values in its output, periodic_cell will also check if it is a new business day.
* **cell.go()**: This calculates the cell and saves the result to the database (and to GRAPH). The cell itself is not loaded but all its inputs are loaded
    - cell.go(0) : calculate only if there is a need
    - cell.go(1) : calculate me but my parents only if there is a need
    - cell.go(2) : calculate me & my parents but my grandparents only if there is a need
    - cell.go(-1) : calculate everything
<br>
* **cell.load()**: This loads the data from GRAPH, if not in GRAPH, loads it from MongoDB (and updates also the GRAPH) 
    - cell.load(-1)  : Clear the data from the GRAPH
    - cell.load(0)   : Load & update me from GRAPH, if not, from MongoDB, if not, just return good old me
    - cell.load(1)   : Even if the document does not exist, but the data saved in files exist, load these.
    - cell.load(date) : Load my version as valid on date. If none exists, throw.
* **cell()**: This loads & then go

In [54]:
from pyg import *; from functools import partial
db = partial(mongo_table, db = 'demo', table = 'persistency', pk = 'key')
db().raw.drop()

def f(a, b):
    return a+b

2021-10-28 19:36:34,396 - pyg - INFO - INFO: deleting 0 documents based on M{}


In [55]:
## now we set up a fake calculation tree:
x = db_cell(f, a = 1, b = 2, db = db, key = 'x')
y = db_cell(f, a = x, b = 2, db = db, key = 'y')
z = db_cell(f, a = x, b = y, db = db, key = 'z')

## and run it by running the final value we want
z = z()

2021-10-28 19:36:34,480 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'persistency', key = 'z')()
2021-10-28 19:36:34,577 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'persistency', key = 'x')()
2021-10-28 19:36:34,647 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'persistency', key = 'y')()


In [56]:
## we can access the data:
get_data('persistency', 'demo', key = 'x')

3

In [57]:
t0 = dt()  ## first breakpoint

In [58]:
x = db_cell(f, a = 10, b = 20, db = db, key = 'x').go()
y = db_cell(f, a = x,  b = 20, db = db, key = 'y').go()
z = db_cell(f, a = x,  b = y, db = db, key = 'z').go()

2021-10-28 19:36:39,012 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'persistency', key = 'x')()
2021-10-28 19:36:39,090 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'persistency', key = 'y')()
2021-10-28 19:36:39,255 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'persistency', key = 'z')()


In [59]:
## and here is the new data
get_data('persistency', 'demo', key = 'x')

30

In [60]:
## and here is the data valid at our first breakpoint. get_data/get_cell always go to the database to load values
get_data('persistency', 'demo', key = 'x', _deleted = t0)

3

We can ask a cell to load itself, but remember: it will go to GRAPH first by default. The GRAPH has only one copy of the cell, while in MongoDB, every time we recalculate/save a new version of the cell, we mark the old version in the database as "deleted" but otherwise keep it. To force a cell to load itself from the database, use load_cell and load_data instead.

In [61]:
db_cell(db = db, key = 'x').load(t0)

db_cell
db:
    functools.partial(<function mongo_table at 0x00000184D2AEC9D0>, db='demo', table='persistency', pk='key')
_id:
    617aedb73e25e90135408419
_pk:
    ['key']
a:
    1
b:
    2
data:
    3
key:
    x
updated:
    2021-10-28 19:36:34.667000
_deleted:
    2021-10-28 19:36:39.066000
function:
    <function f at 0x00000184D6E87C10>

We can force a full recalculation of the tree in a single line of code:

In [62]:
db_cell(db = db, key = 'z')(go = -1, mode = [t0]).data ## Should be 8, same as the old value

2021-10-28 19:36:44,980 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'persistency', key = 'z')()
2021-10-28 19:36:45,144 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'persistency', key = 'x')()
2021-10-28 19:36:45,228 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'persistency', key = 'y')()
2021-10-28 19:36:45,273 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'demo', table = 'persistency', key = 'x')()


8

## Comparison of the two workflows
Saving to the database has negatives:

- does require some (but really not much) additional code to specify where each data item goes to
- slows down the calculation

Conversely, 

+ We get full persistency: We can access each part of the graph with full visibility on the inputs, the function used to calculate the result, the function output(s), the location of where the data is stored and the time it was last updated as well as the periodicity it is calculated.
+ We get full audit, past calculations remain available to track (and indeed, rerun) if anything goes wrong
+ Each node will manage its schedule, ensuring data is up-to-date 
+ We can run just the parts of the graph we are interested in (and can run in parallel)

## To save or not to save?
Luckily we don't really need to decide on one workflow or the other as both can happily coexist. 
<br> We can build a calculation graph and decide that some key points in the calculation we want to save while intermediate calculations we can calculate on the fly and not save at all. 
<br> We have met the crossover function. Here we implement it 'on the fly' while saving just final value to db

In [63]:
from pyg import *; import pandas as pd; import numpy as np; from functools import partial

In [64]:
def fake_ts(ticker):
    return pd.Series(np.random.normal(0,1,1000), drange(-999))
db = partial(mongo_table, db = 'test', table = 'test', pk = ['key'])
db().raw.drop()

2021-10-28 19:36:53,103 - pyg - INFO - INFO: deleting 11 documents based on M{}


<class 'pyg.mongo._cursor.mongo_cursor'> for Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'test'), 'test') 
M{} None
documents count: 0

In [65]:
appl = db_cell(fake_ts, ticker = 'appl', key = 'appl_rtn', db = db)()

#I am never saving these, In fact, I don't want to see these in calculation log.
a = cell(ewma, a = appl, n = 30) 
b = cell(ewma, a = appl, n = 50)

# I may want to save these nodes but haven't made up my mind
# I do want to see the calculations in the log though...
# I replace db by the primary keys of table (here 'key'). 
# This allows as to see the calculation log as it happens. 
# data is not saved to db though until I switch to db = db as opposed to db = 'key'

c = db_cell(sub_, a = a, b = b, key = 'calculate difference of ewma', db = 'key') 
d = db_cell(ewmrms, a = c, n = 100, key = 'root mean square of difference', db = 'key')

# The final crossover I definitely want to save to db: 
final_value = db_cell(div_, a = c, b = d, key = 'appl_crossover', db = db)()

2021-10-28 19:36:53,502 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'test', table = 'test', key = 'appl_rtn')()
2021-10-28 19:36:53,703 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'test', table = 'test', key = 'appl_crossover')()
2021-10-28 19:36:53,705 - pyg - INFO - get_cell(key = 'calculate difference of ewma')()
2021-10-28 19:36:53,856 - pyg - INFO - get_cell(key = 'root mean square of difference')()
2021-10-28 19:36:53,858 - pyg - INFO - get_cell(key = 'calculate difference of ewma')()


In [66]:
db().key

['appl_crossover', 'appl_rtn']

So although we had several intermediate steps, we decided to save just the final crossover in the database, If we look at the inputs for the function, you can see that the values are not saved in the database, though the full calculation tree _is_. 
<br> Therefore, one can reload the node and then recalculate all the intermediate values on the fly

In [67]:
loaded_and_recalculated = (db()[dict(key = 'appl_crossover')] - 'data').go(1) ## but once recalculated, we can assert we got the same result
assert eq(loaded_and_recalculated.data, final_value.data)

2021-10-28 19:36:58,631 - pyg - INFO - get_cell(url = 'localhost:27017', db = 'test', table = 'test', key = 'appl_crossover')()
2021-10-28 19:36:58,633 - pyg - INFO - get_cell(key = 'calculate difference of ewma')()
2021-10-28 19:36:58,784 - pyg - INFO - get_cell(key = 'root mean square of difference')()
2021-10-28 19:36:58,786 - pyg - INFO - get_cell(key = 'calculate difference of ewma')()


## Behind the scene: cell_func
Behind the scene of cell, there is machinary designed to make it work smoothly and transparently in most cases. However, sometimes the user may need to dig deeper. Here is an example for code that fails...

In [68]:
from pyg import *
import pytest

def twox(x):
    return x*2
a = cell(a = 1)
c = cell(twox, x = a)

with pytest.raises(KeyError):
    c()

c tries to run the function. The function demands parameter x. When looking at the cells provided, cell 'a' does not contain anything like 'x' so the function fails.

In [69]:
a = cell(data = 1)
cell(twox, x = a)()

cell
x:
    cell
    {'data': 1, 'function': None}
function:
    <function twox at 0x00000184D6CA7AF0>
data:
    2

'data' key has a preferred status so although 'x' is not in the cell, we assume but default that 'data' parameter is the one the cell wants to present to the world. This is controlled by cell_output function:

In [70]:
cell_output(a)

['data']

In [71]:
a = cell(data = 1, myoutput = 3, output = 'myoutput') ## you can decide your output is different
cell_output(a), cell_item(a)

(['myoutput'], 3)

In [72]:
cell(twox, x = a)()

cell
x:
    cell
    {'data': 1, 'myoutput': 3, 'function': None, 'output': 'myoutput'}
function:
    <function twox at 0x00000184D6CA7AF0>
data:
    6

That is good but what happens if the cell has MORE than one output or we want to direct the function to grab another key?

In [73]:
a = cell(a = 1) ## this has failed...
cell(cell_func(twox, x = 'a'), x = a)() ## when you grab x, use 'a' as key

cell
x:
    cell
    {'a': 1, 'function': None}
function:
    cell_func
    relabels:
        {'x': 'a'}
    unloaded:
        []
    unitemized:
        []
    uncalled:
        []
    function:
        <function twox at 0x00000184D6CA7AF0>
data:
    2

What if you need the cell itself rather than the items in it?

In [74]:
def add_a_and_b(x):
    return x.a + x.b

x = cell(a = 1, b = 2)

cell(cell_func(add_a_and_b, unitemized = 'x'), x = x)()

cell
x:
    cell
    {'a': 1, 'b': 2, 'function': None}
function:
    cell_func
    relabels:
        {}
    unloaded:
        []
    unitemized:
        ['x']
    uncalled:
        []
    function:
        <function add_a_and_b at 0x00000184D6E6EDC0>
data:
    3

We can see that the cell x itself is presented to the function and x.a + x.b is calculated and data == 3