Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM: vm.runInNewContext with timeout got perf impact #52261

Open
Lellansin opened this issue Mar 29, 2024 · 1 comment
Open

VM: vm.runInNewContext with timeout got perf impact #52261

Lellansin opened this issue Mar 29, 2024 · 1 comment
Labels
performance Issues and PRs related to the performance of Node.js. vm Issues and PRs related to the vm subsystem.

Comments

@Lellansin
Copy link

Lellansin commented Mar 29, 2024

Version

v20.10.0

Platform

Linux one-87-c49d8fd65-xg6j9 5.4.247-162.350.amzn2.x86_64 #1 SMP Tue Jun 27 22:03:59 UTC 2023 x86_64 GNU/Linux

Subsystem

vm

What steps will reproduce the bug?

We encountered a big performance difference in one particular scenario. Using vm script with timeout options, like:

const { runInContext, createContext } = require('vm');
const http = require('http');
const context = createContext({});
const server = http.createServer(async (req, res) => {
  await runInContext(
    ``, // empty code
    context, // empty context
    {
      timeout: 300, // but with default timeout
    });
  res.write('Hello, World!\n');
  res.end();
});
server.listen(8080, () => {
  console.log('Server is running on http://localhost:8080');
});

Then do a apache ab test: ab -n1000 -c100 http://localhost:8080/.

The programs are running in pods of 2 different k8s clusters. So I also made an image, which we can build with Dockerfile like:

# FROM amd64/node:21.7.1
# FROM amd64/node:16.16.0
FROM amd64/node:20.10.0
# FROM amd64/node:20.11.0
# FROM node:20.10.0

RUN apt-get update && apt-get install apache2-utils

WORKDIR /root/home

COPY . .

# generate application version
RUN TZ=Asia/Shanghai date +%Y_%m%d_%H%M > .version

# startup cmd
ENTRYPOINT [ "/bin/bash", "/root/home/entrypoint.sh" ]

the entrypointh.sh is like:

#!/bin/bash

(sleep 3 && time curl localhost:8080/) &
(sleep 3 && echo "start apache ab benchmark") &
(sleep 4 && ab -n1000 -c100 localhost:8080/) &

node /root/home/test.js

If docker is installed, you can directly test it with this command docker run -it lellansin/test-vm-scripts:0.1.

We got:

  • Total time 60s with 15 QPS: on the machine in question
    • uname: Linux xx 5.4.247-162.350.amzn2.x86_64 # 1 SMP Tue Jun 27 22:03:59 UTC 2023 x86_64 GNU/Linux
    • CPU lower than 10%
  • Total time 10s with 100 QPS: on the machine in normal
    • uname: Linux xx 5.10.184-175.731.amzn2.x86_64 # 1 SMP Tue Jun 27 21:48:55 UTC 2023 x86_64 GNU/Linux
    • CPU can reach 100%

How often does it reproduce? Is there a required condition?

Must appear on the problem machine now.

What is the expected behavior? Why is that the expected behavior?

The performance of normal machines and problem machines should not be so different.

What do you see instead?

During my troubleshooting process, I noticed that this Tuesday (2024-03-26), Node.js released the v20.12.0 version, which included a vm: fix V8 compilation cache support for vm.Script.

Then I tested the upgrade from v2.10.0 -> v2.12.0, and the new version did fix this problem. it's seem that hitting the V8 compilation cache is more of a workaround than a final solution to the current problem. With the desire to know the root cause of the problem, I still want to confirm what specifically causes this problem.

According to this code, the difference between setting timeout or not is that setting it will initialize a watchdog object. This object will initialize the event loop and corresponding thread in the constructor.

The CPU utilization on the problem machine is always below 10%. It looks like a certain resource is restricted, or the main thread of Node.js is waiting for something.

Through ps -eLf, I observed that the Node.js process can scale from 11 to 28 on the normal machine, and the number of threads on the problem machine is always 11.

I tried to write the following libuv test code:

#include <iostream>
#include <pthread.h>
#include <vector>
#include <chrono>
#include <uv.h>

using namespace std;

struct ThreadData {
    uv_loop_t* loop;
    uv_timer_t timer;
    bool timed_out;
};

void timer_cb(uv_timer_t* handle) {
    ThreadData* data = static_cast<ThreadData*>(handle->data);
    data->timed_out = true;
    uv_stop(data->loop);
}

void* threadFunc(void* arg) {
    ThreadData* data = static_cast<ThreadData*>(arg);

    data->loop = new uv_loop_t;
    uv_loop_init(data->loop);

    data->timed_out = false;
    uv_timer_init(data->loop, &data->timer);
    data->timer.data = data;

    uv_timer_start(&data->timer, timer_cb, 100, 0);

    uv_run(data->loop, UV_RUN_DEFAULT);

    uv_loop_close(data->loop);
    delete data->loop;

    return nullptr;
}

int main() {
    const int NUM_THREADS = 10;
    pthread_t threads[NUM_THREADS];
    ThreadData threadData[NUM_THREADS];

    for (int i = 0; i < NUM_THREADS; ++i) {
        if (pthread_create(&threads[i], nullptr, threadFunc, &threadData[i])) {
            cerr << "Error creating thread " << i << endl;
            return 1;
        }
    }

    for (int i = 0; i < NUM_THREADS; ++i) {
        pthread_join(threads[i], nullptr);
    }

    int timed_out_count = 0;
    for (int i = 0; i < NUM_THREADS; ++i) {
        if (threadData[i].timed_out) {
            timed_out_count++;
        }
    }

    cout << "Timed out threads: " << timed_out_count << endl;

    return 0;
}

And found that there is not much difference between the normal machine and the problem machine.

Additional information

No response

@Lellansin
Copy link
Author

The both k8s clusters normal machines and problem machines mentioned are configured with CPU overcommit.

@mertcanaltin mertcanaltin added the vm Issues and PRs related to the vm subsystem. label Mar 29, 2024
@RedYetiDev RedYetiDev added the performance Issues and PRs related to the performance of Node.js. label Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues and PRs related to the performance of Node.js. vm Issues and PRs related to the vm subsystem.
Projects
None yet
Development

No branches or pull requests

3 participants