-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leaks with oboe.drop #68
Comments
+1 When I do a regular Http request and take a Heap Snapshot on one of my pages in Chrome I get 20mb, but when I run oboe the snapshot jumps up to 100mb. And that is on a very small JSON object, on a large one I get up to 1000mb. |
+1 Also, awesome issue reporting. |
@Amberlamps Thanks :-) |
BTW… for everyone who has the same problems that we had: We used Oboe.js for transferring data using the JSON Lines protocol. For this very special use case we have written a replacement that does not suffer from the memory issue. If you have this very special use case, too, you might be interested in our modules json-lines and json-lines-client. |
So @ar7em figured this out for me. But when I pushed the node data to an array the node itself wasn't being garbage collected, even with However by doing the following the memory leak disappeared. .node('{scores info}', function(node){
node = JSON.stringify(node)
node = JSON.parse(node)
resultsData.push(node);
return oboe.drop;
}) Not entirely sure why this changes anything, but it reduces memory use by up to 300mb. @jimhigson Is there any plan to fix this? |
@badisa Here's my thought: I think it might be because your array stores a reference to the node, which keeps the node from being garbage collected. Doing this |
I am experiencing the same issue. I have a 400MB JSON file. It contains an array of arrays. Each sub-array contains anywhere from 1 to 500 objects. There are probably 12000-13000 objects in total if you hypothetically flattened the arrays. Depending on how I have my server chunk it, I can read in around 4300 of those 13,000 objects before I get an "Aw, Snap" message within Chrome. And it does this because of the same problem. I am using the drop return result, just as above, but the memory is not getting garbage collected. This is a very serious bug. Is it fixed? |
I will try it now and report back. However - like you - I don't see how this would fix the problem actually. It also feels hacky. I'm kind of concerned about it and tempted to create a very sub-standard, low-level parser that simply nulls and deletes the values manually, just to see what it actually does and to see if it's different than what oboe is doing. If that also doesn't work - or if that's what oboe is doing - then maybe there's a very bad bug in v8 itself. |
Okay, I tried the above solution and it still didn't solve it :( So, so sad. |
Does the format of the JSON file have anything to do with things being released? For example, I am sending json such as:
Should I instead send it as:
? |
@egervari What exactly are you doing with the nodes? I had no memory leak until I was pushing the nodes into a scoped Angular array. |
Each node (in my case) is an array since I am trying to process an array of arrays. For the sake of clarity, let's call these partitions. What I'd like to do is put all of the objects in this partition into PouchDB. However, even if I ignore pouch altogether, and simply do a console.log(partition[0].whatever), it'll crash after 6000+ objects are processed. That's just a little more than 1/3 of the objects to process. When it's processing the nodes, at first, it goes really fast. Then it just slows down and keeps going slower until the "Ah Snap" message shows up. Essentially, my oboe code is doing nothing:
|
@egervari So reading more closely it sounds like you are managing to read in about 250mb into your browser before it quits. That is quite a bit of memory to use, so there is a possibility that it is dying due to that and not because of a memory leak. Have you done what the first poster did? With the heap snapshot? Also did you try:
|
Yes, I tried exactly that :) It did not work. I would say though that I am not getting 250MB on each pass... each 'documents' variable probably has 13MB worth of data on average, although I've tried smaller chunks too. But here's the thing - I have tried streaming and parsing 1 document at a time too, and it still bombs - it can just process more documents before it bombs (perhaps 3000 more, but there is still so many left that it didn't get to). I didn't get a graph of the heap, although I saw that Buffer % in chrome slowly go up to 100% and then crashed. |
Okay, I saw the heap graph and it was the exact same - the graph you can see in chrome. |
Any news? @egervari have you solved your issue? If yes, how? |
In my mind, this is a pretty important issue because Oboe makes the claim on the website that it makes it able to handle JSON that is bigger than the available memory. This is an awesome claim, and I think it'll be totally true after this bug is handled. I'll take a look and see what I can figure out! |
@magic890 No, I never solved it, and I gave up on it. I implemented my own from scratch - was just easier for me - and I got it to work that way. |
@egervari Do you have it up on github, or would you be willing to? I'd love to compare what you have an what Oboe does to try to figure out where this memory leak is. |
@JuanCaicedo Mine is not a framework or anything like that - it is just a something small I put directly into my project. It is not an all-encompassing solution or anything like that. It's not a personal project regardless, so I'm reluctant to share it. Honestly, I just did the simplest possible thing - I had the server send the json in chunks, converted it to a real json object when it got to the client, sent those objects to pouchdb, and then removed them from memory with null. It works for data up to 1.8 gig on chrome, firefox and safari. A good tip is not to deal with 1000+ non-trivial objects at the same time. That will kill it on small devices. Chunk up the data and stream it and you will be fine. You don't need a framework/library. |
@egervari I'm trying to get some data to try do recreate your issue and some of the other ones one here. Do you have any tips on acquiring something that size? And then how to reformat it if I have it? Or were you producing your own data? |
I am exporting large json documents intended to be put into PouchDB from a On Sun, Jan 24, 2016 at 6:00 PM, Juan Caicedo notifications@github.com
|
@egervari totally understand that. I'm going to try with https://github.com/zeMirco/sf-city-lots-json. From what I can tell, it's a JSON document with a property |
My situation is about 12x worse than that, haha. In my cases, a lot of the On Sun, Jan 24, 2016 at 6:10 PM, Juan Caicedo notifications@github.com
|
I'm working on a repo to reproduce these errors. Right now, as a sanity check, I've been able to establish that I'll have to play around with either the data or the front end code to try to reproduce. Github doesn't let you upload files larger than 100 mb, so I'm going to have to find an alternate way of hosting them if it comes down to needing bigger data. Any ideas welcome! |
In my case, I have a Java application using Spring running on Tomcat that On Sun, Jan 24, 2016 at 9:56 PM, Juan Caicedo notifications@github.com
|
Hmm, setting up Spring and Tomcat is a whole extra level of complexity (I can probably get it though, I come from a Java background), so I think I'll try to repro @goloroden's case first. |
For what it's worth, I have some code using oboe.js to parse about 750 megs of JSON with nodejs. Without
When I add |
@lukeasrodgers Awesome, I got the same results. I suspect there might be something else going on that's causing the problem. @Amberlamps @badisa did the two of you also have a problem with this? If so, could you share any other information that could help pin down what's happening? (i.e. what type of server you're running, how you're sending the data, and if possible what your oboe client-side looks like). Thanks! |
@egervari Do you still have a version of your oboe code you could try something on? If so, on this line of code }).node('![*]', function(documents) { Could you change |
@goloroden I have a suspicion your error might be because of the event notation you're using on the client side. If you check out that test repo I made, there's a branch named .node('!.features[*]', function(feature) { to .on('node:!.features[*]',function(feature) { Doing that causes Chrome to run out of memory and display an error message. By the way, once you start the server off that repo, be sure to go to http://localhost:3000/home?drop=true, which causes the client side to use |
I can't find a way to recreate this, so I'm going to close the issue. I'll wait until Feb 28 in case anyone in the thread can help me reproduce the bug 😃 @goloroden @badisa @Amberlamps @egervari @magic890 @lukeasrodgers |
@JuanCaicedo For me the issue reproduces on slow connections, you can use recent Chrome to set network throttling at 4 Mb/s. Currently I work around it with stringily-parse mentioned above. |
Sorry for the late answer, I'm currently investigating a few ideas and will report back… thanks so far for your help :-) |
Yay, we have a result :-))) When you run the old demo code as shown in the original post, the memory leak is still there: When you change the line }).on('node:!.*', function (event) { to }).node('!.*', function (event) { the memory leak is gone: This is really good news, as this not only means that there is a workaround, but it also means that So the essential question is: What is (and why is there) the difference between |
Oh wow, that's really interesting and definitely a bug! Would it be possible for you to share the code you used to profile this? Ideally if I could clone a repo and be able to reproduce the same results as you, I could look into this 😃 |
Oh, it's just the code of the original post to this issue. What I did is the following:
|
I am experiencing the memory leak with
Unless I do the stringify/parse so it seems like it is not limited to just |
@badisa I guess it's because of your line resultsData.addData(node); where you explicitly keep a reference to the node you just received. Dropping it then of course does not have an effect. |
@JuanCaicedo Any insights on this? |
Haven't been able to look at it, I'm hoping for some time on Saturday 😃 |
Don't want to be pushy, but I am curious: Any news on this? |
Not at all, thanks for the reminder. I've been prioritizing making a gh-pages version of the website, but that should be up soon and then I'll look into this. I'm going to assign this so I'll remember it |
I'm also curious if there has been any progress on this. |
any progress on it? I still can't load big data even using }).node('!.*', function (event) { |
I've also been experiencing this memory leak, however, I have found that making a copy of the node seems to provide a workaround:
|
I thought I'd share an update. I'm currently the only one actively working on the project, and I've been dedicating most of my open source time towards a workshop I'm giving. I expect I should have more time to dedicate to oboe by the end of next week. My first priority after that is to improve how the tests and build processes works. Right now these things make it fairly challenging for me to work on the source code and I think they'll make the issue easier to diagnose. If anyone is interested in helping me do that, especially to get to know the codebase to narrow down where this might be, I would love the help 😄 |
@JuanCaicedo Any updates here? |
Please refer to #137 (comment) |
I have seen that there already was an issue on memory leaks (it's #45), and that things should be resolved by returning
oboe.drop
as documented here.Unfortunately, 2.1.1 seems to still have memory issues (or I am getting it absolutely wrong how to use
oboe.drop
correctly).My setup is as follows: I have a server based on Express that delivers an endless JSON array, and I have a client that uses Oboe to stream these data. I measure the memory consumption of this client, and within a few hours it uses hundreds of MBytes, and GC obviously does not clean up as expected.
The server looks like this:
The client looks like this:
So, the server sends a new object every 10 ms, and the client should do nothing with it but drop it. I am using stethoskop to measure the client's fitness: It sends the CPU and memory data to a StatsD server.
I have run these two processes for 4.5 hours, and the memory consumption looks like this:
I also tried to run it returning
null
instead ofoboe.drop
, same result (the left part is the same test as above, the right part is withnull
, so basically both options show the very same behavior. The drop to 0 in the middle is not because of GC, it's because I stopped and restarted the processes):So, to cut a long story short, basically I have two questions:
Moreover, the documentation states:
My guess is that dropping nodes only works (wrt memory consumption) for objects, but not for arrays. Since I am using an array as outer container here, and since sparse arrays seem to capture more memory than dense ones, this might be the cause of the problem. Please note that I'm not too sure about memory behavior of sparse arrays, so any answer in this direction will be appreciated.
The text was updated successfully, but these errors were encountered: