### Caching

The goal:
- Detect if a given query has been calculated before by a data scientist, and if so, return the same result without spending PB.

Constraints:
- DS should get the choice to spend PB again and recalculate the query
    - Could add a --force-publish flag or equivalent to forcibly recalculate the query?
- If you publish a tensor directly, and then publish the result of an operation on the same tensor, your operation should be done on the previously published result?
- Should be quickly retrievable
    - Could we store this in memory? (maybe temporarily?)
- Should be deleted when a DS account is deleted or when the DataSubjectLedger is deleted


Could store it in a nested dictionary for fast retrieval:
- GammaTensor.id maps to a dictionary of operations
    - no_op would mean publishing the entire GammaTensor itself
- Each operation maps to a dictionary of arguments (including `None`)
    - The None case would be if no additional arguments were passed (i.e. `tensor.sum()`)
- Each argument maps to the result of publishing that operation on that GammaTensor, with those arguments.



Arguments could also be other GammaTensors- in which case we could use their .id as well


In [1]:
import syft as sy
import numpy as np

In [18]:
from syft.core.tensor.autodp.phi_tensor import PhiTensor
from syft.core.tensor.autodp.gamma_tensor import GammaTensor

In [3]:
data = np.random.random((10, 10))

In [17]:
data

array([[0.3911607 , 0.98592627, 0.33820702, 0.02963337, 0.24452058,
        0.41329897, 0.04815035, 0.48070176, 0.29473544, 0.50122643],
       [0.06626371, 0.59942671, 0.14740046, 0.04934191, 0.56365619,
        0.89462541, 0.48128314, 0.3240633 , 0.55338792, 0.41130165],
       [0.13919691, 0.69189855, 0.67913157, 0.01779266, 0.84575859,
        0.41688369, 0.88947748, 0.07823131, 0.09196529, 0.53534729],
       [0.16922522, 0.13575003, 0.74192282, 0.76545072, 0.88544838,
        0.49943269, 0.93689261, 0.68560796, 0.37479221, 0.57739026],
       [0.00688592, 0.70094057, 0.66412832, 0.88781842, 0.62214698,
        0.98282953, 0.25232646, 0.69065589, 0.76593035, 0.07832368],
       [0.79588142, 0.35475581, 0.03782302, 0.33643987, 0.22897223,
        0.95788443, 0.48233922, 0.44960531, 0.5243275 , 0.14761585],
       [0.11034538, 0.53468282, 0.71399612, 0.46200998, 0.90892998,
        0.79781825, 0.36167335, 0.08702174, 0.04972668, 0.11119535],
       [0.37944737, 0.81413721, 0.2098084

In [4]:
tensor1 = PhiTensor(child=data, data_subjects="Ishan", min_vals=0, max_vals=1)
tensor2 = PhiTensor(child=data, data_subjects="Not Ishan", min_vals=0, max_vals=1)

Could this be a potential problem?

In [5]:
result1 = tensor1 + tensor2
print(result1.id)

350628696


In [6]:
result2 = tensor1 + tensor2
print(result2.id)

357131025


If we publish result1, and then publish result2, caching wouldn't work since the IDs are different, and we'd spend the PB twice

In [7]:
result1.func

<function syft.core.tensor.autodp.gamma_tensor.GammaTensor.__add__.<locals>._add(state: 'dict') -> 'jax.numpy.DeviceArray'>

In [12]:
a = dict()
a[result1.id] = {result1.func:{None:result1.value}}

In [26]:
existing_cache = dict()
def add_to_cache(tensor: GammaTensor, result, cache:dict=existing_cache, *args:None) -> dict:
    cache[tensor.id] = {tensor.func : {args or None:result}}
    return cache

In [28]:
existing_cache = add_to_cache(result1, 5)
existing_cache

{'350628696': {<function syft.core.tensor.autodp.gamma_tensor.GammaTensor.__add__.<locals>._add(state: 'dict') -> 'jax.numpy.DeviceArray'>: {None: 5}}}

In [29]:
def get_cached(tensor:GammaTensor, cache=existing_cache, *args):
    try:
        return cache[tensor.id][tensor.func][args or None]
    except:
        # regular publish method ensues
        result = np.random.randint(high=100)
        cache = add_to_cache(tensor, result, cache, args)
        return result, cache

In [None]:
get_cached(tensor=result1, cache=exi

In [13]:
a

{'350628696': {<function syft.core.tensor.autodp.gamma_tensor.GammaTensor.__add__.<locals>._add(state: 'dict') -> 'jax.numpy.DeviceArray'>: {None: array([[ 51270, 129226,  44328,   3884,  32048,  54170,   6310,  63006,
            38630,  65696],
          [  8684,  78568,  19320,   6466,  73878, 117260,  63082,  42474,
            72532,  53910],
          [ 18244,  90688,  89014,   2332, 110854,  54640, 116584,  10252,
            12054,  70168],
          [ 22180,  17792,  97244, 100328, 116056,  65460, 122800,  89864,
            49124,  75678],
          [   902,  91872,  87048, 116368,  81546, 128820,  33072,  90524,
           100392,  10266],
          [104316,  46498,   4956,  44096,  30010, 125550,  63220,  58930,
            68724,  19348],
          [ 14462,  70080,  93584,  60556, 119134, 104570,  47404,  11406,
             6516,  14574],
          [ 49734, 106710,  27500,  25838,  14506,  90338, 110034,  48154,
           108196,  70442],
          [110890,  75424,   983

In [14]:
result1.value

array([[ 51270, 129226,  44328,   3884,  32048,  54170,   6310,  63006,
         38630,  65696],
       [  8684,  78568,  19320,   6466,  73878, 117260,  63082,  42474,
         72532,  53910],
       [ 18244,  90688,  89014,   2332, 110854,  54640, 116584,  10252,
         12054,  70168],
       [ 22180,  17792,  97244, 100328, 116056,  65460, 122800,  89864,
         49124,  75678],
       [   902,  91872,  87048, 116368,  81546, 128820,  33072,  90524,
        100392,  10266],
       [104316,  46498,   4956,  44096,  30010, 125550,  63220,  58930,
         68724,  19348],
       [ 14462,  70080,  93584,  60556, 119134, 104570,  47404,  11406,
          6516,  14574],
       [ 49734, 106710,  27500,  25838,  14506,  90338, 110034,  48154,
        108196,  70442],
       [110890,  75424,   9830,  36204,  35372,  93560, 103154, 126180,
        114354,  15868],
       [ 22478, 103662, 130246,  11902, 108970,  66702,  24358,  48586,
         83858, 124150]])

In [None]:
def get_cached(