Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory consumption explodes with large number of nodes. 40MB of data becomes ~16GB in MST #1683

Open
3 tasks done
techknowfile opened this issue Apr 7, 2021 · 24 comments
Open
3 tasks done
Labels
brainstorming/wild idea Brainstorming / Wild idea breaking change Breaking change

Comments

@techknowfile
Copy link

techknowfile commented Apr 7, 2021

Bug report
Memory consumption explodes with large number of nodes in MST.
2MB of data -> ~600MB in MST
40MB of data -> ~16GB in MST

  • I've checked documentation and searched for existing issues
  • I've made sure my project is based on the latest MST version
  • Minimal Reproduction

Sandbox link or minimal reproduction code
Github Repo
CodeSandbox fails long before being able to show anything meaningful. Clone repo > yarn install > yarn start

Describe the expected behavior
I'm not sure what the expected overhead of mobx-state-tree is. I've found several posts that mention that mobx-state-tree was never intended to handle hundreds of thousands of nodes, and also see that a couple developers have improved upon the performance for their own use cases. Even still, I'm surprised that 40MB of data could possibly take up 16GB when processed into MST.

Describe the observed behavior
MST able to scale to hold hundreds of thousands of nodes without what appears like a memory leak

I'll continue to update this bug report with a reproduction of the issue over the next few days, but wanted to get this posted to get the discussion started.

@techknowfile
Copy link
Author

Can't seem to really make a codesandbox for this, because codesandbox stops functioning around 10k nodes

@stewartlord
Copy link

@techknowfile Does the problem show at 10k nodes, or only when you get into higher numbers? Maybe it's enough to demonstrate the issue with fewer nodes?

@techknowfile
Copy link
Author

techknowfile commented May 18, 2021

@stewartlord CodeSandbox failed before I could show anything meaningful, so I made a minimal reproduction in a github repo, instead. https://github.com/techknowfile/mobx-vs-mst (requires using Chrome as browser)

I decided to try using MobX instead of Mobx-State-Tree, but first needed to confirm that I wasn't going to have the same issue as MST.

I noted that:
100,000 plain javascript objects = ~10MB
100,000 MobX objects = ~100MB
100,000 mobx-state-tree objects = ~600MB

Moreover, I can add about 2.9 MILLION MobX objects before crashing the browser, and can add them very quickly.
mobx-state-tree crashes the browser before reaching 500,000, and takes longer and longer to add each successive node.

I was honestly surprised to see that even MobX has a 10x memory footprint over plain javascript.

@stewartlord
Copy link

Thanks for sharing your numbers @techknowfile! I don't know enough about MobX or MST internals to speculate as to why, but it's helpful just to know the cost of having many entries and that MobX is more efficient at this than MST. Maybe one of the maintainers knows where it is being spent?

@HaveF
Copy link
Contributor

HaveF commented May 21, 2021

@techknowfile very interesting. About the mobx-state-tree part, I believe you misunderstand the usage of create.

https://github.com/techknowfile/mobx-vs-mst/blob/main/src/mst-models/RootStore.js#L44

    addMetricItem(json) { 
    //let item = MetricItem.create({ id: json.id, value: json.value});      
    self.metricItems.set(json.id, { id: json.id, value: json.value}); 
    },

You can try this.

Create is only use for create root model, if you create it every time, it will not leverage the ability of share structure of MST.

@stewartlord
Copy link

@HaveF is this documented anywhere? Can you elaborate on the ‘share structure’?

@techknowfile
Copy link
Author

@HaveF Thank you, but that doesn't change anything in terms of this issue specifically (I was only using that in the sandbox to reproduce, but in my actual application mst-gql is handling the creation)

@HaveF
Copy link
Contributor

HaveF commented May 25, 2021

@stewartlord I forget where to see it. I remember I saw the similar word shared structure. Maybe it just like prototype of js? I do not know the detail

@techknowfile Maybe you can use function like getSnapshot?
Create a root model, when mst-gql model create or update, send the snapshot to the root model?

@techknowfile
Copy link
Author

@HaveF I can't use mst-gql, as it will automatically load the data into the mst and therefore explode the memory

@HaveF
Copy link
Contributor

HaveF commented May 25, 2021

I just view the doc of mst-gql.

Maybe you need to set the right ModelBase, XXXModel and RootStore.

@techknowfile
Copy link
Author

@HaveF

Create is only use for create root model, if you create it every time, it will not leverage the ability of share structure of MST.

I don't think this is true. You can see that mst-gql's merge function calls .create() for every model instance
https://github.com/mobxjs/mst-gql/blob/7f2933a538abef815f9bffbff8ec04142920314c/src/mergeHelper.ts#L28

@HaveF
Copy link
Contributor

HaveF commented May 26, 2021

Besides your 10k nodes repo, do you have any other mst-gql example which consume lots of memory?

I don't know how mst-gql's merge works, but it seems a method which is only used in mst-gql with other graphql clients.
I do suspect there is some limitation on it, like not use too many other raw graphql query.

@weglov
Copy link

weglov commented Jun 9, 2021

I experience the same problem, in fact, I even prepared a small codesandbox example https://p968x.csb.app/.
You can see how fast the jsHeap grows, when you add 3k books in store.
still looking for a way to optimize it somehow

@techknowfile
Copy link
Author

techknowfile commented Jun 25, 2021

@weglov I'm currently having to use a hacky solution, where I'm using both pure MobX with a structure that imitates mobx-state-tree and mobx-state-tree itself. Using MobX for the objects that I have 100,000's of, and mst/mst-gql for other parts. I replicated the mst-gql merge function in my MobX root store to be able to parse json responses into the normalized store, and then resolve references that go from MobX objects to mobx-state-tree nodes.

I find that even the 10x increase in footprint when loading into mobx is also a bit excessive, and has me wondering if I should ditch these libraries entirely and roll my own... but I love how they work. And the more I dig into the internals, the more impressed I become.

@k-g-a
Copy link
Member

k-g-a commented Jul 6, 2021

There is a problem with memory consumption in MST. I did some research in the past.
There are three ways of reducing the footprint:

  1. The first one is "selfless model", which will allow us to store actions on prototype, but not the instance. This will save tons of closures and speed up execution as well. But the total memory footprint reduction is ~20%.
  2. The second one is rethinking references API; it's hard to recall all the details at the moment, but in general:
  • current reaslization relies on identifiersCache which consumes a lot of memory by itself;
  • identifiersCache is filled on observable instance creation, which means: if there is a single types.reference in your tree - lazy initialization is disabled for the whole subtree from the root up to that node;
  • to make things work as expected - tons of internal hooks are bound/called, which also adds to the memory consumption;
    I've proposed to use special types (types.arrayReference/types.mapReference) - moving to those is a breaking change, but will allow us to remove identifiersCache completely (which also gives us a chance to make identifiers mutable).
  1. There are unnecessary didChange subscriptions which ought to produce patches/snapshots, but could be omitted if there are no onPatch()/onSnapshot() up the tree. I did not investigate, but it seems those can save some % too.

Making all the refactorings mentioned above will also lead to great ObjectNode/ScalarNode simplification and further micro-optimizations.

It's hard to judge without measurements, but I assume that we could reduce MST memory footprint to about x1.2-x1.5 of pure mobx - which seems like a reasonable tradeoff for the benefits.

Unfortunatelly, the amount of work to be done is quite huge, coupled and spreads over ~60-80% of the codebase. There is also a risk, that performance gains will be not as positivie as assumed. Additionally, even if everything goes as planned, there are several breaking changes.
Taking all the above into consideration, I'm not sure about how to implement those changes. There are three possible approaches:

  1. a new branch with complete rewrite and separate release cycle (like mobx@4/mobx@5); provide migration guide; drop current version support and so on...
  2. a fork + new package (i.e. mobx-state-tree-lite);
  3. a new entry point in current package, this could come in two flavors:
    a. import { types, ... } from 'mobx-state-tree/lite';
    b. import { typeFactories_or_whatever_we_name_it } from 'mobx-state-tree';

I'd prefer 3.a (it has a drawback of poor node "exports" support by tooling atm) or 3.b - but it'll lead to significant package size increase for the period of both new and old code co-existance (which is not a concern for a treeshaking-aware developers). And can also cause some misunderstanding about which should I use?, can I mix up? and so on.

Would be great to hear @jamonholmgren thoughts.

@techknowfile
Copy link
Author

techknowfile commented Jul 9, 2021

Thank you so much for the thorough response, @k-g-a!

When I realized the memory issue initially, I first created a hacky class-based solution for MobX that replicated the normalized RootStore structure of mst-gql for the object types that I had a lot of (hundreds of thousands), and then hackily combined my mst root store to my MobXRootStore. This was done primarily to confirm that pure MobX would solve our problem. As stated above, it did of course resolve the memory issue.

However, I also need to run my MobX data structure on the backend, and needed to get rid of this MST + MobX Frankenstein.

Really liking the way that MST built "types" instead of classes (in part because of some issues I encountered with class inheritance), I then created an oversimplified javascript implementation of mst's ComplexType builder (don't know what you actually call this design pattern), where I'm then leveraging the mst-gql-inspired .merge() function to hydrate my MobX Root Store.

So, sans-TS, sans-lifecycle, sans-identifier cache, sans-snapshot support... just instances of ComplexType containing {properties, initializers} that are used to define an "ObjectNode"... the memory explosion issue is back!

And it has me thinking that maybe one of the major problems with MST has been somehow replicated into my code, which is a tiny subset of what MST is.

I'd love to upload a git repo example or codesandbox with my implementation if you'd be interested in glancing through it to see where the memory could possibly be coming from (I am using the "self" closure solution to avoid binding "this"...). I also have a ton of questions regarding how MST and MobX work that I've encountered while trying to reverse engineer it, and would be very thankful to speak to one of the few people that seem to have a good grasp on the internals and underlying theory.

I'd be interested in trying to go the self-less route (in the off chance that the memory is actually worse than 20%), though frankly I didn't realize that .prototype even existed for per-type object nodes... though my understanding of javascript isn't super strong

@k-g-a
Copy link
Member

k-g-a commented Jul 9, 2021

I'd love to upload a git repo example or codesandbox with my implementation if you'd be interested in glancing through it to see where the memory could possibly be coming from

Yep, git repo example would be great to look at.

@techknowfile
Copy link
Author

techknowfile commented Jul 9, 2021

@k-g-a

Repo: https://github.com/techknowfile/mobx-cube-example

Here's a simple example using my MobX store implementation. If you yarn install; yarn start, open Chrome DevTools Memory tab, then click the "Add 10000 Foos" button, you can watch the footprint skyrocket.

You'll notice that my implementation of props is naive and a little broken. I'm using a props initializer which first creates empty props and makes them observable, and then as part of the ObjectNode constructor the "snapshot" is parsed into the properties. The whole thing is a little silly rn, but I couldn't find where mobx-state-tree was setting the properties to the ObjectNode instances.

Anyway, instead of the data being loadable via store.create(SNAPSHOT), all data is loaded into the RootStore using store.merge(json).

Edit: I also added a "Bar" type that doesn't have any actions or views. The memory footprint is smaller, but still much larger than pure mobx.

@k-g-a
Copy link
Member

k-g-a commented Jul 12, 2021

@techknowfile , thanks for the repo, it's a great starting point for the investigation!
Could not find anything worthful at a glance, but will dig deeper this week. Would be great if you could add "pure mobx" implementation to compare it side-by-side.

@techknowfile
Copy link
Author

techknowfile commented Jul 13, 2021

@k-g-a Done! That repo now has very simple models implemented with (1) my mst-like solution, and (2) a class-based mobx solution. If you add views/actions to them, you'll notice the memory delta becomes even more significant. Another thing to note is how very fast it is to add models to the class-based store, and how slow it is to add to the mst-like store (exactly the same issues we faced when comparing mst itself to this class-based solution in the previously posted repo).

Kapture 2021-07-13 at 14 46 39

@akatechis akatechis mentioned this issue Sep 2, 2021
2 tasks
@eseQ
Copy link

eseQ commented Sep 23, 2021

  1. a new branch with complete rewrite and separate release cycle (like mobx@4/mobx@5); provide migration guide; drop current version support and so on...

It sounds like best option. For MST-users migration looks like change default map and array to mapReference and arrayReference.

the amount of work to be done is quite huge, coupled and spreads over ~60-80% of the codebase
Thats really obstruction but it unquestionable importance issue.

@BrianHung
Copy link
Contributor

I've proposed to use special types (types.arrayReference/types.mapReference) - moving to those is a breaking change, but will allow us to remove identifiersCache completely (which also gives us a chance to make identifiers mutable).

Is there some code that I can adapt in user land? Currently creating thousands of nodes, e.g. imagine copying and pasting a spreadsheet, and thinking of performant solutions for this since I keep all nodes in a map with normalized ids.

@coolsoftwaretyler
Copy link
Collaborator

Hey folks - I don't have enough context here to help move this issue forward, but I am going to tag it, since there's obviously desire and room for some major improvements, but the path forward looks somewhat challenging.

I don't expect we'll get around to this any time soon, but I do hope we can find a satisfactory path forward in the coming months/year.

@coolsoftwaretyler coolsoftwaretyler added brainstorming/wild idea Brainstorming / Wild idea breaking change Breaking change labels Jun 28, 2023
@ibelehai
Copy link

ibelehai commented Oct 9, 2023

@weglov I'm currently having to use a hacky solution, where I'm using both pure MobX with a structure that imitates mobx-state-tree and mobx-state-tree itself. Using MobX for the objects that I have 100,000's of, and mst/mst-gql for other parts. I replicated the mst-gql merge function in my MobX root store to be able to parse json responses into the normalized store, and then resolve references that go from MobX objects to mobx-state-tree nodes.

I find that even the 10x increase in footprint when loading into mobx is also a bit excessive, and has me wondering if I should ditch these libraries entirely and roll my own... but I love how they work. And the more I dig into the internals, the more impressed I become.

Hey! I encountered the same issue where I need to combine mobx and mobx-state-tree in order to increase the performance. I would appreciate if you could share some of your findings. Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
brainstorming/wild idea Brainstorming / Wild idea breaking change Breaking change
Projects
None yet
Development

No branches or pull requests

9 participants