Skip to content
This repository has been archived by the owner on Jul 27, 2022. It is now read-only.

About cost of TRACE_EVENT when tracing is not enabled #494

Closed
natduca opened this issue Sep 22, 2014 · 14 comments
Closed

About cost of TRACE_EVENT when tracing is not enabled #494

natduca opened this issue Sep 22, 2014 · 14 comments
Labels

Comments

@natduca
Copy link
Contributor

natduca commented Sep 22, 2014

From wangxianzhu@chromium.org on October 21, 2013 16:39:13

The cost of TRACE_EVENT when tracing is not enabled is supposed to be very trivial just like checking the value of a single boolean variable in memory. Perhaps it should cost just like the following three instructions:
load variable -> register
compare register, 0
jmp_if_zero

However, now it seems to generate much more instructions in the disabled-path, especially when TRACE_MEMORY is enabled.

Original issue: http://code.google.com/p/trace-viewer/issues/detail?id=489

@natduca
Copy link
Contributor Author

natduca commented Sep 22, 2014

From jamescook@chromium.org on October 21, 2013 17:19:34

nat, it looks like my implementation of TRACE_EVENT_MEMORY might create a function call into the ScopedTraceMemory constructor to see if memory tracing is enabled (or a bunch more instructions if inlined). This will happen for TRACE_EVENT{0,1,2}, basically anything that adds a scoped trace event.

It might be possible to avoid this by changing the ScopedTraceMemory constructor to be inline in the header file and do something like:

ScopedTraceMemory(foo, bar) {
if (!enabled_) return;
Init();
}

OTOH, I haven't yet used this system to diagnose any real-world problems, so maybe I should revert it until I have time to work on it some more.

Thoughts?

@natduca
Copy link
Contributor Author

natduca commented Sep 22, 2014

From wangxianzhu@chromium.org on October 21, 2013 18:03:47

@jamescook: https://codereview.chromium.org/32823005/ :)

Omitting TRACE_MEMORY, on x86, TRACE_EVENT0 will generate the following instructions in the disabled path:

Begin of scope:

mov ...(%rip), %rbx ; category_group_enabled = NoBarrier_Load(&atomic);
test %rbx,%rbx ; if (!category_group_enabled) ...
jne ...
moveq $0, ...(%rsp) ; ScopedTracer scopedTracer;
cmpb $0, (%rbx) ; if (*category_group_enabled) ...
je ...

End of scope:

mov %rbx,...(%rsp)
cmpb $0, (%rbx) ; if (pdata_ ...) ...
je ...

That is, on begin of scope, the number of instructions is twice of that of simple-variable-checking; on end of scope, the number of instructions seems already minimum.

Thought of a method using a global variable indicating if tracing is enabled:
ScopedTracer scopedTracer;
if (NoBarrier_Load(&g_tracing_enabled)) ...

In this way, the numbers of overhead instructions in different cases are:
On begin of scope (always need to construct ScopedTracer which uses 1 instruction):

  • Tracing is not enabled: 4 (3 for the above check, 1 for constructing ScopedTracer)
  • Tracing is enabled, category is not enabled: 9
    On end of scope: not changed because checking ScopedTracer::pdata_ is enough.

Though the benefit seems not that much, it makes the overhead comparable with simple-variable-checking when tracing is not enabled (making https://codereview.chromium.org/29963002/ feasible). Another benefit is that TraceLog::GetCategoryGroupEnabled() won't be called until tracing is enabled, which may reduce the trace_event overhead during chrome startup.

In addition, the benefit looks bigger considering TRACE_MEMORY because extra checking in it can be eliminated.

Status: Assigned
Owner: wangxianzhu@chromium.org

@natduca
Copy link
Contributor Author

natduca commented Sep 22, 2014

From wangxianzhu@chromium.org on October 21, 2013 18:11:17

Forgot to mention, the above "current" data is before my X Event change or after the extra constructor issue fixed by ( https://codereview.chromium.org/33423006/ ). The extra constructors cause 3 more mov instructions on begin of the scope.

@natduca
Copy link
Contributor Author

natduca commented Sep 22, 2014

From wangxianzhu@chromium.org on October 22, 2013 09:44:36

The method in #2 would cause the difficulty of TraceLog::GetKnownCategoryGroups(). Perhaps we could collect the known categories with a build step which generates the categories in a 'static const char*[]' array. wdyt?

@natduca
Copy link
Contributor Author

natduca commented Sep 22, 2014

From wangxianzhu@chromium.org on October 22, 2013 14:17:08

Measured the cost of TraceLog::GetCategoryGroupEnabled() during chrome startup with https://codereview.chromium.org/30253003 .

Linux:
Browser process: 0.169ms total; 0.001ms / call;
The first renderer process: 0.097ms total; 0.0015ms / call;

Android:
Browser process: 1.134ms total; 0.004ms / call;
The first renderer process: 1.865ms total; 0.008ms / call;

The cost looks trivial, but I'm wondering how much cost during startup could be considered as significant.

However, static category_groups would be required by the method in #2, and reducing startup cost would be a good side-effect of it.

@natduca
Copy link
Contributor Author

natduca commented Sep 22, 2014

From piman@chromium.org on October 22, 2013 14:22:38

@#5: 1us (if that number is correct) isn't trivial at all for https://codereview.chromium.org/29963002/ because heavy pages can do 10k to 100k GL calls per frame, i.e. 10-100ms per frame. Or 40-400ms on Android.

@natduca
Copy link
Contributor Author

natduca commented Sep 22, 2014

From wangxianzhu@chromium.org on October 22, 2013 14:26:57

The 1us cost is only for the first call of TRACE_EVENT at each place. #5 is about the impact of the first calls chrome startup performance.

The normal cost should still be as low as 3~6 instructions as shown in first half of #2.

@natduca
Copy link
Contributor Author

natduca commented Sep 22, 2014

From wangxianzhu@chromium.org on October 22, 2013 15:08:51

piman@, do you think the overhead acceptable for CommandParser::ProcessCommand(): 6 (or 9 if TRACE_MEMORY is enabled) instructions on scope begin and 3 instructions on scope end, after https://codereview.chromium.org/33423006/ and https://codereview.chromium.org/32823005/? Currently with trace_gl_commands_ the overhead is 3/3 on scope begin/end.

Would the 3- or 6-instruction extra cost be worth it considering the benefits of https://codereview.chromium.org/29963002/? - it allows the category to be selected at runtime in chrome://tracing;

  • it would reduce the trace overhead by nearly half when the category is enabled.

@natduca
Copy link
Contributor Author

natduca commented Sep 22, 2014

From piman@chromium.org on October 22, 2013 15:38:57

What is the cost in real time, on a real world benchmark? Instruction cound can be highly misleading.

@natduca
Copy link
Contributor Author

natduca commented Sep 22, 2014

From wangxianzhu@chromium.org on October 22, 2013 15:44:25

I think the cost is so trivial and not actually measurable (less than several ns), so just use number of instructions.

@natduca
Copy link
Contributor Author

natduca commented Sep 22, 2014

From piman@chromium.org on October 22, 2013 16:21:42

In the past, we reverted a similar patch because, even though it was thought it would have no measurable impact, it actually significantly regressed performance. So excuse me for being extra careful.
You just have to show me the data that it doesn't regress performance in a measurable way, and I'll ack the patch.

@natduca
Copy link
Contributor Author

natduca commented Sep 22, 2014

From wangxianzhu@chromium.org on October 22, 2013 17:09:01

Here are the measurement results. Test method:

  • On Android Nexus 4 Latest build
  • Measure the total time of CommandParser::ProcessCommand and average time of each call
  • Load smashcat.org/av/canvas_test on startup
  • Record the result at the 500,000th call
  • Repeated 3 times with and without my CL

Without the change:
11350.8 0.0227015
11993.0 0.0239861
11217.9 0.0224358

With the change:
11544.4 0.0230889
12111.4 0.0242227
10646.5 0.021293

I knew the change ( https://chromiumcodereview.appspot.com/10442090 ) to disable gl tracing by default when I made the change. I believe the reason of it was not the overhead of the not-enabled path, but because at that time we didn't have DISABLED_BY_DEFAULT tracing category, so "cb_command" category would be enabled by default during tracing, and the too many events made tracing unusable for normal use.

Forgot to mention another biggest benefit of my CL: it can reduce the number of gl tracing events by half :)

@natduca
Copy link
Contributor Author

natduca commented Sep 22, 2014

From piman@chromium.org on October 22, 2013 17:19:58

Ok, thanks, that looks good. Please keep an eye on possible perf regressions on the bots.

@natduca
Copy link
Contributor Author

natduca commented Sep 22, 2014

From wangxianzhu@chromium.org on October 23, 2013 11:57:45

https://codereview.chromium.org/33423006/ Remove constructors of TraceEventHandle https://codereview.chromium.org/32823005/ Reduce overhead of TRACE_MEMORY when disable

The overhead of initialization and not-enabled path seem acceptable now.

Status: Fixed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant