VM: vm.runInNewContext with timeout got perf impact #52261
Labels
performance
Issues and PRs related to the performance of Node.js.
vm
Issues and PRs related to the vm subsystem.
Version
v20.10.0
Platform
Linux one-87-c49d8fd65-xg6j9 5.4.247-162.350.amzn2.x86_64 #1 SMP Tue Jun 27 22:03:59 UTC 2023 x86_64 GNU/Linux
Subsystem
vm
What steps will reproduce the bug?
We encountered a big performance difference in one particular scenario. Using vm script with timeout options, like:
Then do a apache ab test:
ab -n1000 -c100 http://localhost:8080/
.The programs are running in pods of 2 different k8s clusters. So I also made an image, which we can build with Dockerfile like:
the entrypointh.sh is like:
If docker is installed, you can directly test it with this command
docker run -it lellansin/test-vm-scripts:0.1
.We got:
How often does it reproduce? Is there a required condition?
Must appear on the problem machine now.
What is the expected behavior? Why is that the expected behavior?
The performance of normal machines and problem machines should not be so different.
What do you see instead?
During my troubleshooting process, I noticed that this Tuesday (2024-03-26), Node.js released the v20.12.0 version, which included a vm: fix V8 compilation cache support for vm.Script.
Then I tested the upgrade from v2.10.0 -> v2.12.0, and the new version did fix this problem. it's seem that hitting the V8 compilation cache is more of a workaround than a final solution to the current problem. With the desire to know the root cause of the problem, I still want to confirm what specifically causes this problem.
According to this code, the difference between setting timeout or not is that setting it will initialize a watchdog object. This object will initialize the event loop and corresponding thread in the constructor.
The CPU utilization on the problem machine is always below 10%. It looks like a certain resource is restricted, or the main thread of Node.js is waiting for something.
Through
ps -eLf
, I observed that the Node.js process can scale from 11 to 28 on the normal machine, and the number of threads on the problem machine is always 11.I tried to write the following libuv test code:
And found that there is not much difference between the normal machine and the problem machine.
Additional information
No response
The text was updated successfully, but these errors were encountered: