Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf_hooks: emit user timing marks, measures and timerify to trace events #18789

Closed
wants to merge 3 commits into from

Conversation

jasnell
Copy link
Member

@jasnell jasnell commented Feb 14, 2018

Emit user timing marks/measures and performance.timerify() measures to trace events.

image

/cc @nodejs/diagnostics @mcollina

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

perf_hooks, trace_events

@jasnell jasnell added trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events. perf_hooks Issues and PRs related to the implementation of the Performance Timing API. labels Feb 14, 2018
@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Feb 14, 2018
@AndreasMadsen
Copy link
Member

Didn’t review closely but tests are missing.

src/node_perf.cc Outdated
// TODO(jasnell): Once Tracing API is fully implemented, this should
// record a trace event also.
TRACE_EVENT_COPY_MARK_WITH_TIMESTAMP(
"node.perf,node.perf.usertiming", *name, now / 1e3);
Copy link
Contributor

@ofrobots ofrobots Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, this is doing a floating point divide which is more expensive than an integer divide. 1000 would be an integral constant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah, good point :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed. Floating point divide + 2 conversions between integer and floating point.

#include <stdint.h>
uint64_t foo(uint64_t nanos) {
   return nanos / 1e3;
}
// clang -O3 -mllvm --x86-asm-syntax=intel -S foo.c
	movq	xmm0, rdi
	punpckldq	xmm0, xmmword ptr [rip + LCPI0_0] ## xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
	subpd	xmm0, xmmword ptr [rip + LCPI0_1]
	haddpd	xmm0, xmm0
	divsd	xmm0, qword ptr [rip + LCPI0_2]
	movsd	xmm1, qword ptr [rip + LCPI0_3] ## xmm1 = mem[0],zero
	movapd	xmm2, xmm0
	subsd	xmm2, xmm1
	cvttsd2si	rax, xmm2
	movabs	rcx, -9223372036854775808
	xor	rcx, rax
	cvttsd2si	rax, xmm0
	ucomisd	xmm0, xmm1
	cmovae	rax, rcx
	pop	rbp
	ret

@jasnell
Copy link
Member Author

jasnell commented Feb 15, 2018

@ofrobots ... nit fixed!
@AndreasMadsen ... test added.

src/node_perf.cc Outdated
env->performance_entry_callback(),
1, &object);
1, &object, {0, 0});
Copy link
Contributor

@ofrobots ofrobots Feb 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: what would the the correct context here? To me it seems that an AsyncResource or async_context be created when the observer is constructed, and that should be used on the callback. Do you agree?

Aside: Do our compilers support async_context{0,0} syntax yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still thinking about that. PerformanceObserver currently is not an AsyncResource and only actually exists on the JS side. Making it an AsyncResource would be a minor divergence from the spec but is certainly not out of the question.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I don't know about the compilers question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't want to inherit from AsyncResource, you can attache an async_context to the PerformanceObserver instances for the same effect.

I'm okay if you want to do this in a follow-on, as the issue existed before this PR, and is orthogonal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added example to #18810 in case it is useful. If the CI passes there, then that would answer the compiler question :).

@jasnell
Copy link
Member Author

jasnell commented Feb 16, 2018

@ofrobots ... just went ahead and made PerformanceObserver an AsyncResource.

@jasnell
Copy link
Member Author

jasnell commented Feb 16, 2018

@jasnell
Copy link
Member Author

jasnell commented Feb 17, 2018

@jasnell
Copy link
Member Author

jasnell commented Feb 17, 2018

CI is looking good with one unrelated failure.

@BridgeAR BridgeAR added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 17, 2018
#define INTERNAL_TRACE_EVENT_ADD_WITH_ID_TID_AND_TIMESTAMP( \
phase, category_group, name, id, thread_id, timestamp, flags, ...) \
UNIMPLEMENTED()
#define INTERNAL_TRACE_EVENT_ADD_WITH_ID_TID_AND_TIMESTAMP( \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a sync with upstream?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In sync with upstream? Nope, not yet. But that reminds me...

@ofrobots ... ^^ To implement this I needed this macro implemented also.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasnell created https://chromium-review.googlesource.com/c/v8/v8/+/924507 that adds the same macro upstream.

Note that for the trace-event macros, a sync with upstream is no longer a necessity. Our copy of trace_event.h is derived from upstream, but has a slightly different implementation. It is okay for them to pick up features at a different pace. Please do continue to ping if we add more macros locally. There is no need to block on upstream.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a nit


The set of categories for which traces are recorded can be specified using the
`--trace-event-categories` flag followed by a list of comma separated category names.
`--trace-event-categories` flag followed by a list of comma separated category
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag should be named --trace-events-categories  for consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag already exists and is not introduced by this PR

Adds the `node.perf.usertiming` trace events category for recording
usertiming marks and measures (e.g. `perf_hooks.performance.mark()`)
in the trace events timeline.

Adds the `node.perf.function` trace events category for recording
`perf_hooks.performance.timerify()` durations in the trace events
timeline.
@jasnell jasnell added the semver-minor PRs that contain new features and should be released in the next minor version. label Feb 26, 2018
jasnell added a commit that referenced this pull request Feb 26, 2018
PR-URL: #18789
Reviewed-By: Ali Ijaz Sheikh <ofrobots@google.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
jasnell added a commit that referenced this pull request Feb 26, 2018
Adds the `node.perf.usertiming` trace events category for recording
usertiming marks and measures (e.g. `perf_hooks.performance.mark()`)
in the trace events timeline.

Adds the `node.perf.function` trace events category for recording
`perf_hooks.performance.timerify()` durations in the trace events
timeline.

PR-URL: #18789
Reviewed-By: Ali Ijaz Sheikh <ofrobots@google.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
jasnell added a commit that referenced this pull request Feb 26, 2018
PR-URL: #18789
Reviewed-By: Ali Ijaz Sheikh <ofrobots@google.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
@jasnell
Copy link
Member Author

jasnell commented Feb 26, 2018

Landed in aca8e76, 9e509b6, and 009e418

@jasnell jasnell closed this Feb 26, 2018
@MylesBorins
Copy link
Member

Should this be backported to v9.x-staging? If yes please follow the guide and raise a backport PR, if not let me know or add the dont-land-on label.

@targos
Copy link
Member

targos commented Apr 2, 2018

This depends on #17640 which can't go in v9.x.

@targos targos added dont-land-on-v9.x and removed author ready PRs that have at least one approval, no pending requests for changes, and a CI started. backport-requested-v9.x labels Apr 2, 2018
MayaLekova pushed a commit to MayaLekova/node that referenced this pull request May 8, 2018
PR-URL: nodejs#18789
Reviewed-By: Ali Ijaz Sheikh <ofrobots@google.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
MayaLekova pushed a commit to MayaLekova/node that referenced this pull request May 8, 2018
Adds the `node.perf.usertiming` trace events category for recording
usertiming marks and measures (e.g. `perf_hooks.performance.mark()`)
in the trace events timeline.

Adds the `node.perf.function` trace events category for recording
`perf_hooks.performance.timerify()` durations in the trace events
timeline.

PR-URL: nodejs#18789
Reviewed-By: Ali Ijaz Sheikh <ofrobots@google.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
MayaLekova pushed a commit to MayaLekova/node that referenced this pull request May 8, 2018
PR-URL: nodejs#18789
Reviewed-By: Ali Ijaz Sheikh <ofrobots@google.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Trott pushed a commit to ZauberNerd/node that referenced this pull request Dec 11, 2020
This reverts commit 009e418.

AFAIU the discussion at [1], PerformanceObserver had been made to
inherit from AsyncResource more or less as a band-aid in lack of a
better async_context candidate to invoke it in. In order to enable
access to AsyncLocalStores from PerformanceObservers invoked
synchronously through e.g. measure() or mark(), the current
async_context, if any, should be retained.

Note that this is a breaking change, but
- as has been commented at [1], PerformanceObserver being derived from
  AsyncResource is a "minor divergence from the spec" anyway,
- to my knowledge this is an internal implementation detail which has
  never been documented and
- I can't think of a good reason why existing PerformanceObserver
  implementations would possibly rely on it.

OTOH, it's probably worthwhile to not potentially invoke before() and
after() async_hooks for each and every PerformanceObserver notification.

[1] nodejs#18789

Co-Authored-By: ZauberNerd <zaubernerd@zaubernerd.de>

PR-URL: nodejs#36343
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Rich Trott <rtrott@gmail.com>
targos pushed a commit that referenced this pull request Dec 21, 2020
This reverts commit 009e418.

AFAIU the discussion at [1], PerformanceObserver had been made to
inherit from AsyncResource more or less as a band-aid in lack of a
better async_context candidate to invoke it in. In order to enable
access to AsyncLocalStores from PerformanceObservers invoked
synchronously through e.g. measure() or mark(), the current
async_context, if any, should be retained.

Note that this is a breaking change, but
- as has been commented at [1], PerformanceObserver being derived from
  AsyncResource is a "minor divergence from the spec" anyway,
- to my knowledge this is an internal implementation detail which has
  never been documented and
- I can't think of a good reason why existing PerformanceObserver
  implementations would possibly rely on it.

OTOH, it's probably worthwhile to not potentially invoke before() and
after() async_hooks for each and every PerformanceObserver notification.

[1] #18789

Co-Authored-By: ZauberNerd <zaubernerd@zaubernerd.de>

PR-URL: #36343
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. perf_hooks Issues and PRs related to the implementation of the Performance Timing API. semver-minor PRs that contain new features and should be released in the next minor version. trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants