Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(common): faster logs by caching intl.datetimeformat #10823

Merged
merged 1 commit into from Feb 1, 2023

Conversation

H4ad
Copy link
Contributor

@H4ad H4ad commented Jan 7, 2023

PR Checklist

Please check if your PR fulfills the following requirements:

PR Type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Other... Please describe:

What is the current behavior?

This is the profiler information when we call #log method 10.000 times:

[Bottom up (heavy) profile]:
  Note: percentage shows a share of a particular caller in the total
  amount of its parent calls.
  Callers occupying less than 1.0% are not shown.

   ticks parent  name
    958   56.8%  /home/h4ad/.asdf/installs/nodejs/12.22.12/bin/node
    763   79.6%    v8::internal::Builtin_DatePrototypeToLocaleString(int, unsigned long*, v8::internal::Isolate*)
    560   73.4%      LazyCompile: *getTimestamp /home/h4ad/Projects/opensource/performance-test-nestjs/test-performance/node_modules/@nestjs/common/services/console-logger.service.js:104:17
    510   91.1%        LazyCompile: *log /home/h4ad/Projects/opensource/performance-test-nestjs/test-performance/node_modules/@nestjs/common/services/console-logger.service.js:28:8
    510  100.0%          LazyCompile: *log /home/h4ad/Projects/opensource/performance-test-nestjs/test-performance/node_modules/@nestjs/common/services/logger.service.js:36:8
    510  100.0%            LazyCompile: ~descriptor.value /home/h4ad/Projects/opensource/performance-test-nestjs/test-performance/node_modules/@nestjs/common/services/logger.service.js:155:33
     50    8.9%        LazyCompile: *printMessages /home/h4ad/Projects/opensource/performance-test-nestjs/test-performance/node_modules/@nestjs/common/services/console-logger.service.js:115:18
     50  100.0%          LazyCompile: ~log /home/h4ad/Projects/opensource/performance-test-nestjs/test-performance/node_modules/@nestjs/common/services/console-logger.service.js:28:8
     50  100.0%            LazyCompile: *log /home/h4ad/Projects/opensource/performance-test-nestjs/test-performance/node_modules/@nestjs/common/services/logger.service.js:36:8
    203   26.6%      LazyCompile: ~getTimestamp /home/h4ad/Projects/opensource/performance-test-nestjs/test-performance/node_modules/@nestjs/common/services/console-logger.service.js:104:17
    179   88.2%        LazyCompile: ~formatMessage /home/h4ad/Projects/opensource/performance-test-nestjs/test-performance/node_modules/@nestjs/common/services/console-logger.service.js:131:18
    179  100.0%          LazyCompile: ~<anonymous> /home/h4ad/Projects/opensource/performance-test-nestjs/test-performance/node_modules/@nestjs/common/services/console-logger.service.js:116:26
    179  100.0%            /home/h4ad/.asdf/installs/nodejs/12.22.12/bin/node
     24   11.8%        LazyCompile: *printMessages /home/h4ad/Projects/opensource/performance-test-nestjs/test-performance/node_modules/@nestjs/common/services/console-logger.service.js:115:18
     24  100.0%          LazyCompile: ~log /home/h4ad/Projects/opensource/performance-test-nestjs/test-performance/node_modules/@nestjs/common/services/console-logger.service.js:28:8
     24  100.0%            LazyCompile: ~log /home/h4ad/Projects/opensource/performance-test-nestjs/test-performance/node_modules/@nestjs/common/services/logger.service.js:36:8

Source

As you can see, getTimestamp consumes most of the execution time, which doesn't make sense, I did some CPU profiling and I found this:

image

normal_logger_nodejs12.cpuprofile.zip

The #getTimestamp method took 1ms, in another test that I did, this function

When we run again on nodejs16 or nodejs19:

image

The #getTimestamp method took 13ms (I think I found a regression on NodeJS).

normal_logger_nodejs16.cpuprofile.zip
normal_logger_nodejs19.cpuprofile.zip

Also, decorators like WrapBuffer tends to not be optimized by V8, so this could be another point of optimization.

Issue Number: N/A

What is the new behavior?

So, to solve the issue with #getTimestamp, I started using Intl.DateTimeFormat API,
and for the decorator WrapBuffer, I just refactor to use a function.

The final profiler information became:

[Bottom up (heavy) profile]:
  Note: percentage shows a share of a particular caller in the total
  amount of its parent calls.
  Callers occupying less than 1.0% are not shown.

   ticks parent  name
    207   45.3%  /home/h4ad/.asdf/installs/nodejs/12.22.12/bin/node
     61   29.5%    /home/h4ad/.asdf/installs/nodejs/12.22.12/bin/node
      6    9.8%      /home/h4ad/.asdf/installs/nodejs/12.22.12/bin/node
      2   33.3%        LazyCompile: ~readPackage internal/modules/cjs/loader.js:231:21
      2  100.0%          LazyCompile: ~readPackageScope internal/modules/cjs/loader.js:262:26
      1   50.0%            LazyCompile: ~trySelf internal/modules/cjs/loader.js:383:17
      1   50.0%            LazyCompile: ~Module._extensions..js internal/modules/cjs/loader.js:1008:37
      2   33.3%        LazyCompile: ~Module._findPath internal/modules/cjs/loader.js:436:28
      2  100.0%          LazyCompile: ~Module._resolveFilename internal/modules/cjs/loader.js:731:35
      2  100.0%            LazyCompile: ~Module._load internal/modules/cjs/loader.js:648:24
      1   16.7%        LazyCompile: ~stat internal/modules/cjs/loader.js:123:14
      1  100.0%          LazyCompile: ~tryFile internal/modules/cjs/loader.js:326:17
      1  100.0%            LazyCompile: ~tryExtensions internal/modules/cjs/loader.js:342:23
      1   16.7%        LazyCompile: ~realpathSync fs.js:1568:22
      1  100.0%          LazyCompile: ~toRealPath internal/modules/cjs/loader.js:335:20
      1  100.0%            LazyCompile: ~tryFile internal/modules/cjs/loader.js:326:17
      5    8.2%      LazyCompile: ~toRealPath internal/modules/cjs/loader.js:335:20
      4   80.0%        LazyCompile: ~tryFile internal/modules/cjs/loader.js:326:17
      4  100.0%          LazyCompile: ~tryExtensions internal/modules/cjs/loader.js:342:23
      3   75.0%            LazyCompile: ~Module._findPath internal/modules/cjs/loader.js:436:28
      1   25.0%            LazyCompile: ~tryPackage internal/modules/cjs/loader.js:279:20
      1   20.0%        LazyCompile: ~Module._findPath internal/modules/cjs/loader.js:436:28
      1  100.0%          LazyCompile: ~resolveMainPath internal/modules/run_main.js:8:25
      1  100.0%            LazyCompile: ~executeUserEntryPoint internal/modules/run_main.js:53:31
      5    8.2%      LazyCompile: *log /home/h4ad/Projects/opensource/performance-test-nestjs/test-performance/dist/perf/better-console-logger.service.js:47:8
      4   80.0%        LazyCompile: *log /home/h4ad/Projects/opensource/performance-test-nestjs/test-performance/dist/perf/better-logger.service.js:52:8
      4  100.0%          Eval: ~<anonymous> /home/h4ad/Projects/opensource/performance-test-nestjs/test-performance/dist/perf/perf-better-logger.js:1:1
      4  100.0%            LazyCompile: ~Module._compile internal/modules/cjs/loader.js:953:37
      1   20.0%        LazyCompile: *<anonymous> /home/h4ad/Projects/opensource/performance-test-nestjs/test-performance/dist/perf/perf-better-logger.js:1:1
      1  100.0%          LazyCompile: ~Module._compile internal/modules/cjs/loader.js:953:37
      1  100.0%            LazyCompile: ~Module._extensions..js internal/modules/cjs/loader.js:1008:37

Source

And the CPU profiling now looks like:

image

And the time to perform the log looks consistent between NodeJS environments.

perf-better-nodejs12.cpuprofile.zip
perf-better-nodejs16.cpuprofile.zip
perf-better-nodejs19.cpuprofile.zip

Finally, about the performance improvements, this is the result running on NodeJS 16:

Logger x 7,127 ops/sec ±1.55% (87 runs sampled)
BetterLogger x 60,055 ops/sec ±1.90% (65 runs sampled)
Console x 133,689 ops/sec ±1.14% (90 runs sampled)

Console is just console.log.

An improvement of 8x in the performance when logging.

I didn't make any breaking changes in the logs, but if we change from Intl.DateTimeFormat API to just new Date().toISOString(), the benchmark will look like:

Logger x 7,165 ops/sec ±1.80% (80 runs sampled)
BetterLogger x 79,513 ops/sec ±2.74% (62 runs sampled)
Console x 140,275 ops/sec ±1.61% (88 runs sampled)

So let me know if worth to introduce some breaking changes just to have better performance when logging.

To reproduce all those tests, see this gist: https://gist.github.com/H4ad/da838dfc33b2060504ca980644d804b5

Does this PR introduce a breaking change?

  • Yes
  • No

Other information

@micalevisk
Copy link
Member

micalevisk commented Jan 7, 2023

can we split those two optizations into 2 separated commits (then force push it)

@H4ad
Copy link
Contributor Author

H4ad commented Jan 7, 2023

@micalevisk Done!

@H4ad H4ad force-pushed the perf/logger branch 2 times, most recently from d1d1bbf to 65e8704 Compare January 8, 2023 16:13
@H4ad
Copy link
Contributor Author

H4ad commented Jan 11, 2023

Based on jmcdo29/ogma#1493, If we use the library sonic-boom, I think we can achieve 200k~300k/ops.

In ogma, we went from 100k to 400k op/s.

Sonic boom is part of PinoJS and it buffers the logs, then sends to directly to file descriptor, using this approach is faster than writing directly to process.stdout.

Do you think it's worth optimizing?

@kamilmysliwiec
Copy link
Member

Sounds good @H4ad. Would you like to create a separate PR for this?

@kamilmysliwiec kamilmysliwiec merged commit 66a76e2 into nestjs:master Feb 1, 2023
@H4ad H4ad deleted the perf/logger branch May 24, 2023 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants