Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cache mechanism to graph nodes #35

Merged
merged 4 commits into from
Oct 6, 2022
Merged

Add cache mechanism to graph nodes #35

merged 4 commits into from
Oct 6, 2022

Conversation

nvaytet
Copy link
Member

@nvaytet nvaytet commented Oct 5, 2022

Fixes #27

Small benchmark:

import plopp as pp
from plopp.widgets import SliceWidget, slice_dims, Box
from plopp import figure, input_node, widget_node, node, Node, View
import scipp as sc

N = 10000
M = 10000
xx = sc.arange('x', float(N))
yy = sc.arange('y', float(M))
da = sc.DataArray(data=xx, coords={'x': xx})
db = sc.DataArray(data=yy, coords={'y': yy})


class DataView(View):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.data = None

    def notify_view(self, message):
        node_id = message["node_id"]
        self.data = self.graph_nodes[node_id].request_data()


a = input_node(da)
b = input_node(db)

c = node(lambda x, y: x * y)(x=a, y=b)

nodes = []
views = []
for i in range(20):
    nodes.append(node(lambda x: x[x.dims[-1], 0])(x=c))

for j in range(2):
    for n in list(nodes):
        for i in range(2):
            nd = node(lambda x: x * (i + j))(x=n)
            nodes.append(nd)
            views.append(DataView(nd))

print(views[-1].data)

a.notify_children(message='hello from a')

print(views[-1].data)

Before:
real 0m17.526s
user 0m28.001s

After:
real 0m1.117s
user 0m1.638s

I saw no difference in RAM usage, but maybe we can think of an example where more RAM would be used?
In all the ones I tried, references to large data were being kept by either the views/figures, or the children nodes themselves.

@nvaytet nvaytet marked this pull request as ready for review October 5, 2022 19:48
@@ -19,6 +19,8 @@ def __init__(self, func, *parents, **kwparents):
self.kwparents = dict(kwparents)
for parent in chain(self.parents, self.kwparents.values()):
parent.add_child(self)
self._data = None
self._needs_update = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could do away with _needs_update and use _data is None to indicate that the data is out of date. This would prevent accidental use of outdated data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestion thanks.

@nvaytet nvaytet merged commit 10d14f0 into main Oct 6, 2022
@nvaytet nvaytet deleted the node_caching branch October 6, 2022 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement caching in graph nodes
2 participants