Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient looping in PluginLoader "load" function cause new NodeTracerProvider() to take few seconds in serverless framework #1805

Closed
blumamir opened this issue Jan 10, 2021 · 4 comments · Fixed by #1807
Labels
bug Something isn't working

Comments

@blumamir
Copy link
Member

What version of OpenTelemetry are you using?

v0.14.0 (latest)

What version of Node are you using?

12

Please provide the code you used to setup the OpenTelemetry SDK

handler.js

const { NodeTracerProvider } = require('@opentelemetry/node');

console.time('new NodeTracerProvider()');
const provider = new NodeTracerProvider();
console.timeEnd('new NodeTracerProvider()');

module.exports.hello = async (event) => {
  return {
    statusCode: 200,
    body: 'Go Serverless v1.0! Your function executed successfully!',
  };
};

package.json

{
  "devDependencies": {
    "serverless-offline": "^6.8.0"
  },
  "dependencies": {
    "@opentelemetry/api": "^0.14.0",
    "@opentelemetry/node": "^0.14.0"
  }
}

serverless.yml

service: serverless-pg
frameworkVersion: '2'

provider:
  name: aws
  runtime: nodejs12.x
  lambdaHashingVersion: 20201221

functions:
  hello:
    handler: handler.hello
    events:
      - http:
          path: hello
          method: post

plugins:
  - serverless-offline

What did you do?

run this super basic serverless offline setup with sls offline, and sent a request to the exposed endpoint: curl -X POST http://localhost:3000/dev/hello

What did you expect to see?

expect a small increase in the function Duration

What did you see instead?

offline: POST /dev/hello (λ: hello)
Some modules (@hapi/hapi) were already required when their respective plugin was loaded, some plugins might not work. Make sure the SDK is setup before you require in other modules.
new NodeTracerProvider(): 5764.306ms
offline: (λ: hello) RequestId: ckjr4lmce0005sep3fg492vr1  Duration: 5904.73 ms  Billed Duration: 6000 ms

Additional context

As you can see, calling new NodeTracerProvider(); is taken ~5 seconds (tested multiple times on my setup).
Profiling the issue shows that this section from PluginLoader in @opentelemetry/node is to taking most of the time:

      const alreadyRequiredModules = Object.keys(require.cache);
      const requiredModulesToHook = modulesToHook.filter(
        name =>
          alreadyRequiredModules.find(cached => {
            try {
              return require.resolve(name) === cached;
            } catch (err) {
              return false;
            }
          }) !== undefined
      );

This algorithm is very inefficient, running in O(requiredModules)*O(modulesToHook) which can be quite large (in the setup for the above example, alreadyRequiredModules.length is 4687 and modulesToHook.length is 14 (and expected to grow over time as more plugins are added).

The points of inefficiency are:

  1. calling the io-blocking, synchronous require.resolve(name) function from within a loop (alreadyRequiredModules.find), inside a try-catch, where the value is not expected to change over iterations.
  2. since alreadyRequiredModules is an array of strings, finding over it takes O(n) (in the above case, 4687 string comparisons per plugin), where we can directly search for the value on the keys of require.cache in O(1) with hasOwnProperty.

I'll create a PR soon with a suggestion for performence fix, which on my setup, reduces the above section runtime from ~5000ms to just 3ms

@vmarchaud
Copy link
Member

I'm guilty for this code, however i'm surpised that alreadyRequiredModules.length is that high since we are supposed to be the first module required (so require.cache should be quite small in theory).

Anyway that would explain a behavior i have with my one of my applications running at @reelevant-tech where it would take >20s to startup :/

@blumamir
Copy link
Member Author

@vmarchaud I assume that in most cases the required array is small.
In my case, the serverless framework is responsible for running the lambda and initializing otel, which happens after 4800 packages have already been loaded by the framework.

@vmarchaud
Copy link
Member

@blumamir Well we only have one or two packages loaded before otel is loaded in our case and we still saw massive slowdown when enabling otel, i guess that "most cases" is smaller than we can assume with common sense.

Thanks for investigating this anyway !

@blumamir
Copy link
Member Author

@vmarchaud I agree.
Thanks for commenting and reviewing the issue.
Could be great if you can take a look on the PR when you have time, so it can go into the next release :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants