Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating observable array should be optimized #68

Closed
kmalakoff opened this issue Dec 6, 2015 · 8 comments
Closed

Creating observable array should be optimized #68

kmalakoff opened this issue Dec 6, 2015 · 8 comments

Comments

@kmalakoff
Copy link
Contributor

I added one observable array property to an object and the creation cycle is taking up large amount of time.

observable array

I create them like (I really like the idea of shallowObservables for arrays to replace asFlat since I'm using it quite a bit as I optimize):

    this.tags = observable(asFlat(options.tags || []));

I think that the creation path for observable arrays should be streamlined rather than using the generic splicing logic...nothing can observe yet, no childObserving logic is needed if the values are flat, etc.

@mweststrate
Copy link
Member

Hi @kmalakoff

  findOrCreate(path) {
    let projectNode = this.projectNode(); // this makes sure there is a root node
    let node = projectNode.find(path);
    if (node) return node;

    // this traverses splits the path and traverses the tree to find the node and will generate the full tree down to it if the path does not exist. It will create the default observable array of tags for each new node based on the name of the node and assign the passed tags.
    node = this.projectNode().findOrCreate(path, (attributes) => {
      return new LocalNode(_.extend({project: this.project, path, attributes, tags: [NAME_TO_TYPE[attributes.name] || 'module'])); // this only gets called if the node is not found. It extends properties into the class
    });
    return node;
  }

This part seems quite expensive, how / why is this triggered during the creating of the tags array? (just trying to get the complete image). Because making an object observable in itself should basically side effect free, so how is this one triggered? Does simplifying this code make a difference for the performance?

I have no problem cutting the childObserving logic out, but in general that is so cheap that I doubt whether that makes a big difference. But as said I'm a bit puzzled how adding a tag triggers all this logic?

@kmalakoff
Copy link
Contributor Author

@mweststrate I'm using chokidar to scan and sync with the filesystem and depending on the files it encounters, it adds tags to a folder node (eg. files map to tags and folders map to nodes).

Very asynchronous / event-driven..but it should be cheap because most of the time it finds an existing folder node (the find before findOrCreate so the create should only happen the first time the folder is encountered).

My hypothesis based on comparing using and not using an observable array is that this is just a scale issue that needs an optimization (1000s of nodes being created). I could have a bug somewhere and can investigate further, but the stack trace looked like it confirmed my hypothesis (I could interpreting this incorrectly!)...it looks like there is something slow in there, but I think it is probably just the overhead of doing setup (needed) and tracking (not needed), but it could be something else? (not sure if a computed calculation dependency is somehow slipping in and getting triggered)

Which part looks expensive to you?

@mweststrate
Copy link
Member

@kmalakoff Eh... wait I was reading the chart upside down, hence I was totally confused and wondering how findOrCreate is a side effect of replace :). I usually use the top down view ;). The call stack is findOrCreate > replace > splice > updateLength so that makes perfect sense. I'll look into it! I was cleaning up the code base of the core algorithm a bit anyway, so I did already spot some additional optimization opportunities (both in performance and memory consumption).

@mweststrate
Copy link
Member

@kmalakoff I've played around with short-circuiting the array initialization, see:
a6ac0c1

In my tests it was a bit faster (but not extremely significant, < 10%). Turns out the actual expensive part is defining the properties on the object. Now basically they can also be defined on the prototype, which saves in my tests 90% of the initialization time:
94151c2

Yet the down side of the latter optimization is that the indexes of the array are no longer reported as being enumerable. This the point where the abstraction of observable arrays is a bit leaky until ES6 proxies are general available; mobservable arrays tries their best to be normal arrays but are in fact objects. For most things this doesn't matter; index or length assignments, array methods etc all work correctly. However, Array.isArray will still report false on them, and Array.concat or lodash methods might not handle them correctly, so in these cases it is best to slice() them anyway before passing along.

Currently JSON.stringify and Object.keys do still work correctly, and debuggers are ok-ish thanks to the correct enumerability. But maybe this half working arrays are just confusing people and shouldn't I care too much about the correct enumerability, but just make it more clear that these aren't arrays (just like immutablejs lists aren't arrays) despite the fact that the api and syntax the same as for real arrays.

Anyway, would you mind checking both commits above to see whether this makes the performance of arrays acceptable? The first option can be safely applied without breaking anything. This branch is based on the latest transform branch that includes your commits.

EDIT: accidentally swapped the commits

@kmalakoff
Copy link
Contributor Author

Excellent! I was away from a computer yesterday and will run some performance evaluations this first thing in the morning.

As for the tradeoffs:

  • indexes of the array are no longer reported as being enumerable seems like a rare, edge case that can probably be solved another way in client code if needed
  • But maybe this half working arrays are just confusing people and shouldn't I care too much about the correct enumerability. I ran into this and read your docs which made me wonder if there is an easier way to do this. I don't have a solution and need to play around with ES6 features more to better understand the syntactic sugar limits, but it is on my list to think about (not sure if there is a way to add this functionality onto the array instance instead of returning a different type? can it be handled like in knockout where dereferencing the array gets the underlying array? - I realize these are a bit naive, but just would need to go through the thought process and see if I end up in the same place as you).

Definitely, rather than always using slice because creates a copy (this is not guaranteed to scale well with large array sizes due to memory management and the corresponding reference copying operations), I would prefer peek or value for read-only use with libraries and I would know that if it needs to be edited, it should use slice and sync it back. Obviously there is some risk to this, but it seems acceptable (perhaps in a debug mode, you try to identify modifications to peeked values if it is reported as a common source of bugs). I was a video games developer in a past life so I'm keenly aware that a slice is not free, but a peek basically is.

@kmalakoff
Copy link
Contributor Author

I've run some tests...

(1) tags are not observable - it still tracks as expensive from other parts of my code

not observable

(2) tags are observable - no optimizations yet

unoptimized

(3) tags are observable, short-circuiting the array initialization (a6ac0c1) - like you say, not much improvement

a6ac0c1f59d9e2efd03f805ecc9866d7ce4ca996

(4) tags are observable, defining the properties on the object (94151c2) - it doesn't track at all!

94151c24d74d1bd605c8f2a8e6e7ddcb98995399

So I've confirmed like you discovered (4) has removed the expensive part and updateLength / spliceWithArray doesn't even track as significant. I'm going to use this version until it is released and start using transformations today!

Thank you for looking into and optimizing this!

@mweststrate
Copy link
Member

Merged this back to feature/transform and released it as 1.1.6 (without the transform feature). Api changes:

  • Introduced mobservable.fastArray(array), in addition to mobservable.observable(array). Which is much faster when adding items but doesn't support enumerability (for (var idx in ar) .. loops). Fast arrays are by default initialized with asFlat.
  • Introduced observableArray.peek(), for fast access to the array values. Should be used read-only.

@kmalakoff
Copy link
Contributor Author

Excellent! Thank you so much...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants