-
-
Notifications
You must be signed in to change notification settings - Fork 34.1k
Description
Node version: 16.17.0
Platform: Microsoft Windows NT 10.0.19043.0 x64
Sorry if the title isn't the clearest, stuggled to word it in one sentence. I also did not choose the bug report template as I'd rather have a "blank canvas" so I can fully explain the problem. I have included relevant system information at the top of this post.
So I can't go into the specifics of the application I'm working on due to NDA. As a brief outline, we basically have a whole bunch of conditions we need to test against a whole bunch of data. These conditions are generated by code themselves. The idea being that we generate a function string, which contains the condition to test, and then eval() the function and assign to an object.
So the loop of the application is:
- Generate conditions (E.g.
a > b && c > d && e > f) - Make a function string for the condition
- eval, assign to an object
- Add object to array
- Loop through array along with our data set and test each condition, store results
To give some real numbers here, I had been working my way up the amount of conditions to test. 10, 100, 1,000, 10,000, 20,000... It was when I tested 30,000 conditions/functions when the issue arose. The application would start to stutter, pausing for a period of time, seemingly at "random" (it's not, I know computers aren't random). And then the stuttering turns into a complete halt. My app becomes unresponsive indefinitely, it doesn't crash, node continues to run, using 100% of one CPU core. This all happens after the eval()s have taken place. Obviously there is a brief, expected pause whilst it evaluates 30k functions, for a second or so. But this halting/stuttering happens minutes later. The app will run consistently for a while, chugging through the data at a consistent rate, and then, seemingly out of the blue, the issue will start to happen. I have left the application running overnight for 10+ hours to see if it becomes "unstuck" at some point, which it does not.
So far, this is all in the context of the application I'm working on. After trying to figure it out for about a day, I then decided to make a standalone test. Because there's a lot of other stuff going on in the app, I wanted to eliminate anything external to make sure there wasn't some sort of other issue going on in the app. I have been able to recreate the problem in this test, which is why I'm reasonably sure this issue isn't something else.
I've tried to mimic the functionality of the original app in this test, which is why you may see parts and be like "why is this needed in this test?". I just wanted to try and match as closely as possible but as simply as possible at the same time. This test performs the same patterns of behaviour as the main app, with some slight differences. Using 30k unique functions, the original app will grind to a halt at "X loops" for 10+ hours, wheres the test app, will grind to a halt for 53 minutes, and then continue on. But change the 30k to 100k lets say, then it will stop for 10+ hours. There's obviously more complexity overall in the original app compared to the test, but the behaviour is the same.
The test does 1 million "base loops", these loops represent the data set. Each loop will have a new set of data to test. And within each of these 1 million loops is where it will then loop the 30k conditions, and test. The 1 million number is aribtary and doesn't matter too much, it just needs to be a high number so the app will run for a while.
Within the "base loop", there's a simple 1 second timer, to log some details each second (roughly), which for example looks like this:
Done 33 of 1000000, diff: 24, MS elsapsed: 1025
Done 33 of the 1 million base loops, difference of 24 loops, over 1025MS. I'd just like to throw out that I think the cause of the issue is some optimising shenanigans, but thats just my guess.
So we get some output every second to let us know what's going on... It chugs away for a bit, and then we get to a point when the first stutter happens:
Done 2075 of 1000000, diff: 4, MS elsapsed: 51314
A 51 second pause happens... And then interestingly, when it continues on, it is now faster...
Done 2115 of 1000000, diff: 40, MS elsapsed: 1022
And we start to get a difference of around ~40 instead of ~25. Then the app will continue on for a bit more, until...
Done 3568 of 1000000, diff: 10, MS elsapsed: 3194857
We get a 53 minute pause. In the original app, this is the point at which it will halt indefinitely (well, at least 10 hours). And then funnily enough, once it resumes, its a bit faster... again
Done 3619 of 1000000, diff: 51, MS elsapsed: 1003
And we get an average of ~50 per second from now on. The test app will just go along until the end now, but the original app will not. And if you play around with the test app, change the amount of functions, change the complexity etc. You can get it to halt indefinitely. You can also get it to not halt, for example, changing the complexity of the generated function from 3 && conditions, to just 1 or 2, will not cause the problem.
I've uploaded my test as a gist, it has no dependencies and I have tried to keep it as clean as possible. With comments to help explain. This post along with the script should be enough to help get the idea.
I think it's also worth noting that when I was investigating in the original app, I tried generating a .js file instead of using eval, and then import using require(), and tried it that way. Interestingly, performace on all fronts was actually much worse, and the same halting also happened, but what blew my mind was that overall performance was worse... but it's just a javascript file running in node in the most standard way possible...
Anyway, here is the link for the test script
https://gist.github.com/NoUJoe/8a009c36d7f3633abe0f1103100408ad
If anyone has any ideas of anything I can do to get around this for now, would be appreciated. As a final note, I have tried different versions of node, uninstalling/reinstalling etc. I've tried all the "basic" stuff.