-
-
Notifications
You must be signed in to change notification settings - Fork 914
-
-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate edges in signal graph #46
Comments
Yes, you are correct. Although classically, this isn't particularly an issue since the necessity of using a getter/accessor and explicit setters to create dependencies and trigger updates, means native built-in methods don't work anyway. MobX and the S.js have specialized array objects with their own versions of methods so it typically isn't a problem since these specialized methods account for this. That being said especially with Solid's state proxies a user can use native methods, and since they work, not realize that they are not optimal. So having more protection here is definitely a consideration. I will likely be introducing a series of function operators to perform some simple operations like map, filter, sort, etc. It's the same motivation I had for having a Control Flow DSL to abstract the details of the complexity of handling the reactivity. I'm in the process of changing Control Flow into normal components, which means I have to build the basis for this new set of function operators. While nothing as extensive as RxJS, it is a similar sort of thing. The reactive system here is basically Adam Haile's S.js very minimally modified to add a Context API. I think I might probe him further on the decision in terms of tradeoffs. |
S.js is also "broken" (it works, but creates a lot of unnecessary edges), specialized array methods won't solve the problem: And MobX is using |
It doesn't solve it in terms of people accidentally doing something inefficiently. More that libraries like S.js and MobX offer methods like |
But you need to create a graph with all dependencies, otherwise what is the point of using this library when it won't be able to propagate changes. It is not about arrays at all, it should build a graph by observing all values that involved in a data transformation algo(in a sorting example I've observed |
maybe of interest: https://github.com/Riim/cellx |
@leeoniya It is using linear search to check for duplicates 🤦♂️ https://github.com/Riim/cellx/blob/b32f5cc6aab8f112c70ddc027e5ad459ffdfef7c/src/Cell.ts#L440-L442 |
if |
@leeoniya I don't understand what type of use cases this benchmark tries to test, haven't seen any real use cases like this. What I see really often is taking a list of N items, perform some filtering and display ~10 items. |
@localvoid wouldn't the example below solve the issue you explain? let i = 0;
class Item {
_a = Math.random();
get a() {
i++;
return this._a;
}
}
const items = [];
for (let i = 0; i < 5000; i++) {
items.push(new Item());
}
sample(() => items.sort((a, b) => a.a - b.a));
console.log(i);
|
@luwes And why should I use this library if it won't be able to propagate changes through computeds? |
right I imagined the I created a codesandbox here https://codesandbox.io/s/empty-smoke-wyins would that be a solution? to use a |
@luwes yes, it solves the issue, but in this example you are using linear search to check for duplicates, so this computed function will have quadratic complexity. Vue and MobX are using
EDIT: actually, your example solves it partially, you'll get a lot of unnecessary directed edges from observables to computed |
The easiest way to do the sort is use That being said just because specific scenarios solvable doesn't mean it's intuitive or should be that way. I am interested in seeing if we can do better here; what the tradeoffs are. I think this is a valid concern. S.js was mostly optimized over 2017 time period when |
I don't usually find it productive to respond to Boris, but for Ryan's benefit and anyone else interested, I benchmarked this design decision pretty thoroughly back when I made it. De-duping edges during creation requires a hashtable or Set, and at the speeds we want, those are both slow. Or at least they were when I was doing the benchmarking. At the time, a basic read ran about 300% slower. The relevant metric isn't number of edges but number of reads. It was considerably faster overall to create a duplicate edge than to detect whether it was a duplicate. That was still true even if I accounted both for the time to traverse the extra edges during update and for the extra GC time of creating/freeing the larger arrays of dependencies. In fact, this, plus the fact that the S data structures were designed not to need try/finally statements in the update loop, were where S got most of its speed over other reactive libraries. @leeoniya it was a while ago that I looked at cellx. At the time, the try/finally statements in its core loops gave S a considerable advantage. That was before turbofan made try/finally optimizable, though, so the picture may have changed since then. |
If someone doesn't understand how bad it is, here are conservative estimates how much memory it would consume each time observable property is modified:
In this estimates I didn't take into account that arrays grow dynamically (probably with a factor 1.5 or 2), so it should consume even more memory. And if someone wants to benchmark it correctly, array presizing should be taken into account[1], because in a real application allocation sites for this arrays will produce arrays with different sizes. |
Tell me again why we should make our libs slower. Tell me again why we should optimize for troll's hypothetical worst cases rather than profiling actual apps. How would ivi know the sort was stale? Oh that's right, it doesn't. Punts the whole issue back to the user. Total fail. |
@adamhaile it seems that you are still more interested in microbenchmarks rather than finding solutions to problems. Recently I've tried to solve a use case that is way much easier to solve with fine-grained observables and autotracking (MobX). I understand all tradeoffs and I am willing to sacrifice some performance but significantly improve DX by using MobX, then I just checked how solid/S.js handles the same use cases and then I've reported this issue. Don't understand what ivi has to do with all of this. |
Assuming good intent ... The method @ryansolid was alluding to looks something like the following. It's not a [Warning, untested or even parsed code below.] function sortyBy<T, V>(array : () => T[], by : (a : T) => V, cmp? : (a : V, b : V) => number) {
return S(() => {
const _array = array(),
len = _array.length,
indices = new Array(len) as number[],
vals = new Array(len) as V[];
for (let i = 0; i < len; i++) {
indices[i] = i;
// extract values to base sort upon, thereby avoiding creating
// O(N log(N)) dependencies during call to sort()
vals[i] = by(_array[i]);
}
const _cmp = cmp
? (i : number, j : number) => cmp(vals[i], vals[j])
: (i : number, j : number) => (vals[i] < vals[j] ? -1 : vals[i] > vals[j] ? 1 : 0);
// run sort() inside sample
S.sample(() => indices.sort(_cmp));
for (let i = 0; i < len; i++) {
vals[i] = _array[indices[i]];
}
return vals;
});
} |
Yes, this solution solves sorting problem and I understand how to create such workarounds because I know how everything works underneath, but I won't expect from an average developer to understand inner workings of something like MobX/Vue/S.js. The problem is that it doesn't provide a general purpose solution. For example, use case that highlights entries that match a list of rules is easy to solve in MobX/Vue/S.js with a naive algorithm that has N*M complexity (N - entries, M - rules) and it won't be so easy to create a workaround that deals with duplicated edges in such use case (it is possible, but then I just don't see a point in using fine-grained+autotracking and I'll just solve it much more efficiently with indexes). |
Yes, I sometimes feel that was the greatest shorting coming of these libraries historically. The need for specialized solutions to optimize more complicated scenarios. This was true when I used Knockout 9 years ago and hasn't really changed. It was always possible to do something inefficiently if you are not careful. Detecting duplicate edges seem possible, and I honestly am unclear at this point what the impact is. But if @adamhaile says that in comparing the options, that was the more performant then maybe it's the cost of doing business. To me, the need for specialized helper methods isn't much different than the desire to normalize immutable data on identifiers. You could not do so and store arrays but it might not scale. So I don't know what to tell you. This doesn't automagically solve all problems and at certain points, it's better to not attack certain problems fine-grained and use diffing. You know that our list reconciliation is based on the algorithm you use in ivi. In theory, if we are mutating state we know exactly what has changed since it is localized compared to a big immutable tree so we should be able to avoid diffing at all, but since we can not guarantee people do that the generalized reconciler still diffs. Since the DOM is the bottleneck doesn't seem to matter anyway even if it is less efficient. The thing that attracts people to fine-grained is all the same things that attract people to React Hooks. Declarative data blocks, composability, extendable primitives. Except, there is an absence of Hook Rules or inconsistent abstraction (you are declaring your data requirements and view, but the code actually re-executes every re-render). |
With MobX I can use basic |
I experienced the same with Knockout, but it was in a scenario for which S (and most of the modern libs) do have a general solution. Aka something "has really changed" here :). Knockout did naive depth-first updates, and it also didn't have any facility (at the time, maybe it has gained it?) for batching observable changes. There are common use cases where that leads to O(N^2) or even O(N!) computation updates for N data updates. S and others solve that by being transactional and glitchless -- change rounds can have multiple data updates, and each affected computation runs only once per change round. That solved a ton of perf issues in real world apps we converted to S.
It'd be interesting to see if Just to be clear, it's (IMHO) a misunderstanding to say that S is supra-linear here. From the lib's perspective, the relevant 'N' is the number of signal reads, not the number of items in an upstream array. S scales linearly in both time and memory for that. Deduping edges is only a gain in scenarios (like If Did I ever show you some of the stuff we use internally about computable indices? I've never released anything, because the API isn't necessarily clean. It's a useful tool for some remaining O(N^2) cases, at the cost of breaking synchronicity -- the stuff about perf and subclocks on the S homepage is inspired by that scenario.
That's where conversations with @localvoid always wind up. He's a tech sea lion. The objections aren't about what they claim to be. Objections will continue until ivi is at the top of js-framework-benchmark again. |
@adamhaile I meant no offence. I recognize that S.js has progressed things significantly. The approach to batching and synchronization. Actually the approach to garbage collection as well. But for the sake of argument that argument could still be made. I was just trying to come at it from a holistic manner to be agreeable with @localvoid in hope of finding some sort of common ground. I tend to agree in most real-world sort of scenario this is not a thing. Or it's one of those you need to use blank and you move on. If my tone sounds uncertain it is that I'm trying to keep an open mind in discussing a potential critique. That being said this thread probably ran its course a while ago and where we've arrived at now further emphasizes that. I do want to keep it open until a time comes when another investigation can be done. But I don't think it's terribly urgent. And yes you did share that index approach I believe. And I'm glad you are reposting this information for others who read this thread. I believe this is the post: adamhaile/S#19 (comment). Thank you again for shedding some light on this. And I look forward to whenever you have time to do more work in the area. As you have noted I've diverted a little off the path to support some features that I wanted, but I still love what you've done with S.js. And even if Solid has strayed a little bit, it's still the first Reactive library I point people to and I use it in all examples and tests for the DOM Expressions rendering library. |
Oh hey, @ryansolid no offense at all! Sorry if that part came through that way. I'm glad to talk about the tradeoffs here (so long as the audience is well-intentioned ;)). |
So this is cool -- I was thinking about this, and realized there's a simple test I can do which will eliminate virtually all the duplicate edges. Since new edges are added to the end of the nodes and sources lists, I just have to check if the last edge recorded from a source is the same as the one we're adding. If so, bail out, as there's no need to add it. That just takes an array read and an equality check, much faster than a hashtable lookup. I'm going to explore it with some benchmarks. The only cases where it wouldn't be perfect are ones where we pause the update of one computation to run another AND if both read the same source AND if the original reads that source both before and after the pause. The second computation would have gone onto the end of the source's We only pause an update in two scenarios: either we encounter a read of a stale computation during an update, or a computation creates a child computation. It takes some fairly specific (aka rare) graph shapes and update patterns to make those two achieve all the conditions to make a duplicate. I'm betting this will render duplicated edges virtually non-existent. It would be perfect for Boris's I don't think I've seen any other libs use this optimization. Might be novel ... probably not, someone probably did it in the 80s, but maybe :). It's also more memory efficient. Using Cool, this seems like a neat solution. Let me see how the code looks. 1 - https://github.com/v8/v8/blob/master/src/objects/ordered-hash-table.h - 1 word pointer to object, 1 word pointer to next entry in chain, 1 word pointer to head of bucket. PS - Ryan, feel free to close this thread whenever you feel like it. |
MobX is using edge reconciliation algo, it doesn't create new Sets each time computed is changed. It will create an array, add all non-duplicated(simple check of the last accessed listener id, so in some weird scenario this array could have duplicates) dependencies, then it will start diffing and will add/remove dependencies (at this point it won't have any duplicates). |
The The |
I've said "it will allocate ~40KB when I trigger an update", after diffing it won't add a new entry. I've never talked in this thread about overall memory overhead of observables, and that is the main reason why I've started looking at alternative solutions, because MobX is allocating dynamic strings per each observable and doing a lot of other weird stuff that I don't understand (but maybe there is a reason, they have significantly more experience in this field than me).
Don't know, maybe they've tried to prevent memory leaks, but they could just assign |
thanks @adamhaile and @localvoid |
@luwes I think that it is possible to get rid of all duplicates from observables to computeds at the cost of one additional property on observables, also it can improve reset performance (in most cases it won't be necessary to remove edges from observables to computed).
Just an idea, maybe I've missed something, maybe it isn't worth it :) EDIT: Also, it is possible to detect when there aren't any nested computeds by checking that EDIT2:
|
This prevents the most cases of duplicate edges like when observables are accessed in a `sort` for example explained here solidjs/solid#46
Thanks @localvoid, I'll check this out soon! |
Thanks, everyone for this discussion. It actually ended up productive in the end. Did simple last edge detection here. And will look at more sophisticated methods in the future. |
If I understand correctly, when signal value is accessed, it doesn't perform a check when the edge already exists:
current()
value: https://github.com/ryansolid/solid/blob/8fe0c11360387a325d64e040a756585a69f6da63/src/signals.ts#L189-L192Without such checks, use cases that involve sorting/filtering in data tables will create insanely huge graphs.
For example, simple
sort()
on 5000 items will create ~110000 edges:The text was updated successfully, but these errors were encountered: