Trace2 stopwatch timers and global counters #1099

jeffhostetler · 2021-12-17T21:20:56Z

Here is version 2 of my series to add stopwatch timers and global counters
to Trace2. I think this version address all of the comments made on V1.

I moved the Trace2 "thread_name" field into a flex-array at the bottom
of the thread local storage block. This avoids the issue of whether it should
be allocated and by whom and its const-ness.
I moved the truncation of the "thread_name" into the "_perf" target (which
was the only target that actually cared) so that columns still line up.
Started phasing out the TLS and CTX acronyms in the Trace2 code.
There is an ambiguity between "thread local storage" and "transport layer security"
that caused some confusion. In this patch series, I eliminated new uses of the
TLS term. A future series will be needed to actually rename variables, functions,
and data types to fully eliminate the TLS term.
In V1 I included a change to the "_event" target version number. I've rolled
this back in favor of Ævar's new proposal describing when/why we change it.
(That proposal is independent of this series.)
In V1 I had reported timer values {total, min, max} in floating point seconds
with microsecond precision (using a "%9.6f" format) and was internally accumulating
interval times in microseconds. After some discussion, I've changed this to
accumulate in nanoseconds and report integer nanoseconds. This may avoid
some accumulated round off error. (However, on some platforms getnanotime()
only has microsecond accuracy, so this increased precision may be misleading.)
Refactor the pattern model used in the unit tests to make it easier to
visually parse.
Some cosmetic cleanup of the private timer and counter API.

There were additional requests/comments that I have not addressed in this
version because I think they should be in their own top-level topic in a future
series rather than appended onto this series:

The full elimination of the TLS and CTX terms.
Ævar proposed a new test_trace2 test function to parse trace output.
This would be similar to (or a generalization of) the test_region function that
we already have in test-lib-functions.sh.
Ævar proposed a large refactor of the "_perf" target to have a "fmt()"
varargs function to reduce the amount of copy-n-pasted code in many of
the "fn" event handlers. This looks like a good change based on the mockup
but is a large refactor.
Ævar proposed a new rationale for when/why we change the "_event"
version number. That text can be added to the design document independently.

cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Jeff Hostetler git@jeffhostetler.com
cc: Derrick Stolee stolee@gmail.com
cc: Matheus Tavares matheus.bernardino@usp.br
cc: Johannes Sixt j6t@kdbg.org

Use "size_t" rather than "int" for the "alloc" and "nr_open_regions" fields in the "tr2tls_thread_ctx". These are used by ALLOC_GROW(). This was discussed in: https://lore.kernel.org/all/YULF3hoaDxA9ENdO@nand.local/ Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>

jeffhostetler · 2021-12-20T14:59:05Z

/submit

gitgitgadget · 2021-12-20T15:01:23Z

Submitted as pull.1099.git.1640012469.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-1099/jeffhostetler/trace2-stopwatch-v2-v1

To fetch this version to local tag pr-1099/jeffhostetler/trace2-stopwatch-v2-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-1099/jeffhostetler/trace2-stopwatch-v2-v1

gitgitgadget · 2021-12-20T16:39:09Z

trace2/tr2_tgt_event.c

@@ -88,7 +88,7 @@ static void event_fmt_prepare(const char *event_name, const char *file,



On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Use a 'char *' to hold the thread name rather than a 'struct strbuf'. > The thread name is set when the thread is created and should not be > be modified afterwards. Replace the strbuf with an allocated pointer > to make that more clear. > > This was discussed in: https://lore.kernel.org/all/xmqqa6kdwo24.fsf@gitster.g/ > > Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> > --- > trace2/tr2_tgt_event.c | 2 +- > trace2/tr2_tgt_perf.c | 2 +- > trace2/tr2_tls.c | 16 +++++++++------- > trace2/tr2_tls.h | 2 +- > 4 files changed, 12 insertions(+), 10 deletions(-) > > diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c > index 3a0014417cc..ca48d00aebc 100644 > --- a/trace2/tr2_tgt_event.c > +++ b/trace2/tr2_tgt_event.c > @@ -88,7 +88,7 @@ static void event_fmt_prepare(const char *event_name, const char *file, > > jw_object_string(jw, "event", event_name); > jw_object_string(jw, "sid", tr2_sid_get()); > - jw_object_string(jw, "thread", ctx->thread_name.buf); > + jw_object_string(jw, "thread", ctx->thread_name); > > /* > * In brief mode, only emit <time> on these 2 event types. > diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c > index e4acca13d64..c3e57fcb3c0 100644 > --- a/trace2/tr2_tgt_perf.c > +++ b/trace2/tr2_tgt_perf.c > @@ -106,7 +106,7 @@ static void perf_fmt_prepare(const char *event_name, > > strbuf_addf(buf, "d%d | ", tr2_sid_depth()); > strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME, > - ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME, > + ctx->thread_name, TR2FMT_PERF_MAX_EVENT_NAME, > event_name); > > len = buf->len + TR2FMT_PERF_REPO_WIDTH; > diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c > index 7da94aba522..cd8b9f2f0a0 100644 > --- a/trace2/tr2_tls.c > +++ b/trace2/tr2_tls.c > @@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, > uint64_t us_thread_start) > { > struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx)); > + struct strbuf buf_name = STRBUF_INIT; > > /* > * Implicitly "tr2tls_push_self()" to capture the thread's start > @@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, > > ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id); > > - strbuf_init(&ctx->thread_name, 0); > if (ctx->thread_id) > - strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id); > - strbuf_addstr(&ctx->thread_name, thread_name); > - if (ctx->thread_name.len > TR2_MAX_THREAD_NAME) > - strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME); > + strbuf_addf(&buf_name, "th%02d:", ctx->thread_id); > + strbuf_addstr(&buf_name, thread_name); > + if (buf_name.len > TR2_MAX_THREAD_NAME) > + strbuf_setlen(&buf_name, TR2_MAX_THREAD_NAME); > + > + ctx->thread_name = strbuf_detach(&buf_name, NULL); > > pthread_setspecific(tr2tls_key, ctx); > > @@ -95,7 +97,7 @@ void tr2tls_unset_self(void) > > pthread_setspecific(tr2tls_key, NULL); > > - strbuf_release(&ctx->thread_name); > + free(ctx->thread_name); > free(ctx->array_us_start); > free(ctx); > } > @@ -113,7 +115,7 @@ void tr2tls_pop_self(void) > struct tr2tls_thread_ctx *ctx = tr2tls_get_self(); > > if (!ctx->nr_open_regions) > - BUG("no open regions in thread '%s'", ctx->thread_name.buf); > + BUG("no open regions in thread '%s'", ctx->thread_name); > > ctx->nr_open_regions--; > } > diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h > index a90bd639d48..d968da6a679 100644 > --- a/trace2/tr2_tls.h > +++ b/trace2/tr2_tls.h > @@ -9,7 +9,7 @@ > #define TR2_MAX_THREAD_NAME (24) > > struct tr2tls_thread_ctx { > - struct strbuf thread_name; > + char *thread_name; > uint64_t *array_us_start; > size_t alloc; > size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */ Junio's suggestion in the linked E-Mail was to make this a "const char *". Narrowly, I don't see why not just add a "const" to the "struct strbuf *" instead. But less narrowly if we're not going to change it why malloc a new one at all? Can't we just use the "const char *" passed into tr2tls_create_self(), and for the "th%02d:" case have the code that's formatting it handle that case? I.e. have the things that use it as a "%s" now call a function that formats things as a function of the "ctx->thread_id" (which may be 0) and limit it by TR2_MAX_THREAD_NAME?

On the Git mailing list, Jeff Hostetler wrote (reply to this):

On 12/20/21 11:31 AM, Ævar Arnfjörð Bjarmason wrote: > > On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote: > >> From: Jeff Hostetler <jeffhost@microsoft.com> >> >> Use a 'char *' to hold the thread name rather than a 'struct strbuf'. >> The thread name is set when the thread is created and should not be >> be modified afterwards. Replace the strbuf with an allocated pointer >> to make that more clear. >> >> This was discussed in: https://lore.kernel.org/all/xmqqa6kdwo24.fsf@gitster.g/ >>... >> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c >> index 7da94aba522..cd8b9f2f0a0 100644 >> --- a/trace2/tr2_tls.c >> +++ b/trace2/tr2_tls.c >> @@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, >> uint64_t us_thread_start) >> { >> struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx)); >> + struct strbuf buf_name = STRBUF_INIT; >> >> /* >> * Implicitly "tr2tls_push_self()" to capture the thread's start >> @@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, >> >> ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id); >> >> - strbuf_init(&ctx->thread_name, 0); >> if (ctx->thread_id) >> - strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id); >> - strbuf_addstr(&ctx->thread_name, thread_name); >> - if (ctx->thread_name.len > TR2_MAX_THREAD_NAME) >> - strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME); >> + strbuf_addf(&buf_name, "th%02d:", ctx->thread_id); >> + strbuf_addstr(&buf_name, thread_name); >> + if (buf_name.len > TR2_MAX_THREAD_NAME) >> + strbuf_setlen(&buf_name, TR2_MAX_THREAD_NAME); >> + >> + ctx->thread_name = strbuf_detach(&buf_name, NULL); >> >> pthread_setspecific(tr2tls_key, ctx); >> >>.. >> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h >> index a90bd639d48..d968da6a679 100644 >> --- a/trace2/tr2_tls.h >> +++ b/trace2/tr2_tls.h >> @@ -9,7 +9,7 @@ >> #define TR2_MAX_THREAD_NAME (24) >> >> struct tr2tls_thread_ctx { >> - struct strbuf thread_name; >> + char *thread_name; >> uint64_t *array_us_start; >> size_t alloc; >> size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */ > > Junio's suggestion in the linked E-Mail was to make this a "const char *". Yes, it was. To me a "const char *" in a structure means that the structure does not own the pointer and must not free it. Whereas as "char *" means that the structure might own it and should maybe free it when the structure is freed. My usage here is that the structure does own it (because it took it from the temporary strbuf using strbuf_detach()) and so it must free it. Therefore it should not be "const". This has nothing to do with whether or not we allow the thread name to be changed after the fact. (We don't, but that is a different issue). > > Narrowly, I don't see why not just add a "const" to the "struct strbuf > *" instead. Adding "const" to a strbuf would be wrong in this case, since the structure owns the strbuf and needs to strbuf_release the contained buffer and (now) free the strbuf pointer, right? This also makes things confusing -- all callers of tr2tls_create_self() would now be responsible for allocating a strbuf to pass in -- and who would own those. This would also create opportunities for mistakes if they pass in the address of a stack-based strbuf, right? This is being used to initialize thread-based data, so the caller can't just use a "function local static" or a "global static" strbuf. > > But less narrowly if we're not going to change it why malloc a new one > at all? Can't we just use the "const char *" passed into > tr2tls_create_self(), and for the "th%02d:" case have the code that's > formatting it handle that case? > > I.e. have the things that use it as a "%s" now call a function that > formats things as a function of the "ctx->thread_id" (which may be 0) > and limit it by TR2_MAX_THREAD_NAME? > This would be less efficient, right? That thread name is included in *EVERY* _perf and _event message emitted. If we were to change the design to have basically a callback to get the formatted value based on the `ctx` or `cts->thread_id` and dynamically formatting the name, then we would have to hit that callback once (or twice) for every Trace2 message, right? That would be much slower than just having a fixed string (formatted when the thread is created) that we can just use. And even if we said that the callback could cache the result (like we do when we lookup env vars), where would it cache it? It would have to cache it in the `ctx`, which is where it currently is and without any of the unnecessary overhead, right? I think you're assuming that callers of `tr2tls_create_self()` always pass a literal string such that that string value is always safe to reference later. Nothing would prevent a caller from passing the address of a stack buffer. It is not safe to assume that that string pointer will always be valid, such as after the thread exits. It is better for _create_self() to copy the given string (whether we format it immediately or not) than to assume that the pointer will always be valid, right? So I don't think we should deviate from the patch that I submitted. Jeff

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Mon, Dec 20 2021, Jeff Hostetler wrote: > On 12/20/21 11:31 AM, Ævar Arnfjörð Bjarmason wrote: >> On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote: >> >>> From: Jeff Hostetler <jeffhost@microsoft.com> >>> >>> Use a 'char *' to hold the thread name rather than a 'struct strbuf'. >>> The thread name is set when the thread is created and should not be >>> be modified afterwards. Replace the strbuf with an allocated pointer >>> to make that more clear. >>> >>> This was discussed in: https://lore.kernel.org/all/xmqqa6kdwo24.fsf@gitster.g/ >>>... >>> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c >>> index 7da94aba522..cd8b9f2f0a0 100644 >>> --- a/trace2/tr2_tls.c >>> +++ b/trace2/tr2_tls.c >>> @@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, >>> uint64_t us_thread_start) >>> { >>> struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx)); >>> + struct strbuf buf_name = STRBUF_INIT; >>> /* >>> * Implicitly "tr2tls_push_self()" to capture the thread's start >>> @@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, >>> ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id); >>> - strbuf_init(&ctx->thread_name, 0); >>> if (ctx->thread_id) >>> - strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id); >>> - strbuf_addstr(&ctx->thread_name, thread_name); >>> - if (ctx->thread_name.len > TR2_MAX_THREAD_NAME) >>> - strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME); >>> + strbuf_addf(&buf_name, "th%02d:", ctx->thread_id); >>> + strbuf_addstr(&buf_name, thread_name); >>> + if (buf_name.len > TR2_MAX_THREAD_NAME) >>> + strbuf_setlen(&buf_name, TR2_MAX_THREAD_NAME); >>> + >>> + ctx->thread_name = strbuf_detach(&buf_name, NULL); >>> pthread_setspecific(tr2tls_key, ctx); >>> >>>.. >>> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h >>> index a90bd639d48..d968da6a679 100644 >>> --- a/trace2/tr2_tls.h >>> +++ b/trace2/tr2_tls.h >>> @@ -9,7 +9,7 @@ >>> #define TR2_MAX_THREAD_NAME (24) >>> struct tr2tls_thread_ctx { >>> - struct strbuf thread_name; >>> + char *thread_name; >>> uint64_t *array_us_start; >>> size_t alloc; >>> size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */ >> Junio's suggestion in the linked E-Mail was to make this a "const >> char *". > > Yes, it was. To me a "const char *" in a structure means that > the structure does not own the pointer and must not free it. > Whereas as "char *" means that the structure might own it and > should maybe free it when the structure is freed. My usage here > is that the structure does own it (because it took it from the > temporary strbuf using strbuf_detach()) and so it must free it. > Therefore it should not be "const". This has nothing to do with > whether or not we allow the thread name to be changed after the > fact. (We don't, but that is a different issue). We use the pattern of having a "const char *" that's really a "char *" with a cast to free() in many existing APIs for this scenario. Maybe the cast for free would be more correct here, see my recent 9081a421a6d (checkout: fix "branch info" memory leaks, 2021-11-16) & the discussion it referencese. I.e. in that case we didn't go for the "free((char *)ptr)" cast as it was a private API. >> Narrowly, I don't see why not just add a "const" to the "struct >> strbuf >> *" instead. > > Adding "const" to a strbuf would be wrong in this case, since the > structure owns the strbuf and needs to strbuf_release the contained > buffer and (now) free the strbuf pointer, right? > > This also makes things confusing -- all callers of tr2tls_create_self() > would now be responsible for allocating a strbuf to pass in -- and who > would own those. This would also create opportunities for mistakes if > they pass in the address of a stack-based strbuf, right? > > This is being used to initialize thread-based data, so the caller > can't just use a "function local static" or a "global static" strbuf. Right, I meant that in the context of who/where you'd have your casts. >> But less narrowly if we're not going to change it why malloc a new >> one >> at all? Can't we just use the "const char *" passed into >> tr2tls_create_self(), and for the "th%02d:" case have the code that's >> formatting it handle that case? >> I.e. have the things that use it as a "%s" now call a function that >> formats things as a function of the "ctx->thread_id" (which may be 0) >> and limit it by TR2_MAX_THREAD_NAME? >> > > This would be less efficient, right? That thread name is included in > *EVERY* _perf and _event message emitted. If we were to change the > design to have basically a callback to get the formatted value based > on the `ctx` or `cts->thread_id` and dynamically formatting the name, > then we would have to hit that callback once (or twice) for every Trace2 > message, right? That would be much slower than just having a fixed > string (formatted when the thread is created) that we can just use. > And even if we said that the callback could cache the result (like > we do when we lookup env vars), where would it cache it? It would have > to cache it in the `ctx`, which is where it currently is and without > any of the unnecessary overhead, right? Aren't we per https://lore.kernel.org/git/211220.86czlrurm6.gmgdl@evledraar.gmail.com/ doing a lot of that formatting (and sometimes allocation) anyway in a way that's easily avoidable for the "perf" backend? And for tr2_tgt_event.c we'll call jw_object_string(), which calls append_quoted_string() for each event. That'll be re-quoting (presumably always needlessly) the thread_name every time. So just deferring a single strbuf_addf() doesn't seem like it would slow things down. > I think you're assuming that callers of `tr2tls_create_self()` always > pass a literal string such that that string value is always safe to > reference later. Nothing would prevent a caller from passing the > address of a stack buffer. It is not safe to assume that that string > pointer will always be valid, such as after the thread exits. It is > better for _create_self() to copy the given string (whether we format > it immediately or not) than to assume that the pointer will always be > valid, right? Sure, if that's the API we can xstrdup() it, and/or xstrfmt() it etc. as we're doing now. > So I don't think we should deviate from the patch that I submitted. I'm not saying anything needs to change here, these were really just read-through suggestion, but I think per the above (about the casts & optimization) that some of your assumptions here may not hold.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> struct tr2tls_thread_ctx { >> - struct strbuf thread_name; >> + char *thread_name; >> uint64_t *array_us_start; >> size_t alloc; >> size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */ > > Junio's suggestion in the linked E-Mail was to make this a "const char *". Sorry, but in that linked E-Mail, I wasn't picking between "const char *" and "char *" at all. What I cared was *not* to keep a long-term constant string in a member whose type is "struct strbuf".

On the Git mailing list, Jeff Hostetler wrote (reply to this):

On 12/20/21 2:35 PM, Ævar Arnfjörð Bjarmason wrote: > > On Mon, Dec 20 2021, Jeff Hostetler wrote: > >> On 12/20/21 11:31 AM, Ævar Arnfjörð Bjarmason wrote: >>> On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote: >>> >>>> From: Jeff Hostetler <jeffhost@microsoft.com> >>>> >> >> Yes, it was. To me a "const char *" in a structure means that >> the structure does not own the pointer and must not free it. >> Whereas as "char *" means that the structure might own it and >> should maybe free it when the structure is freed. My usage here >> is that the structure does own it (because it took it from the >> temporary strbuf using strbuf_detach()) and so it must free it. >> Therefore it should not be "const". This has nothing to do with >> whether or not we allow the thread name to be changed after the >> fact. (We don't, but that is a different issue). > > We use the pattern of having a "const char *" that's really a "char *" > with a cast to free() in many existing APIs for this scenario. As I mention later in this thread, I'm going to convert the field into a flex-array, so most of the discussion in this part of the thread no longer applies.

gitgitgadget · 2021-12-20T16:39:12Z

User Ævar Arnfjörð Bjarmason <avarab@gmail.com> has been added to the cc: list.

gitgitgadget · 2021-12-20T16:45:08Z

Documentation/technical/api-trace2.txt

@@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
 {


On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Teach Trace2 "perf" and "event" formats to handle "timer" events for > stopwatch timers. Update API documentation accordingly. > > In a future commit, stopwatch timers will be added to the Trace2 API > and it will emit these "timer" events. > > Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> > --- > Documentation/technical/api-trace2.txt | 25 +++++++++++++++- > trace2/tr2_tgt.h | 25 ++++++++++++++++ > trace2/tr2_tgt_event.c | 40 +++++++++++++++++++++++++- > trace2/tr2_tgt_normal.c | 1 + > trace2/tr2_tgt_perf.c | 29 +++++++++++++++++++ > 5 files changed, 118 insertions(+), 2 deletions(-) > > diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt > index bb13ca3db8b..e6ed94ba814 100644 > --- a/Documentation/technical/api-trace2.txt > +++ b/Documentation/technical/api-trace2.txt > @@ -391,7 +391,7 @@ only present on the "start" and "atexit" events. > { > "event":"version", > ... > - "evt":"3", # EVENT format version > + "evt":"4", # EVENT format version > "exe":"2.20.1.155.g426c96fcdb" # git version > } FWIW this seems like a time not to bump the version per the proposed approach in: https://lore.kernel.org/git/211201.86zgpk9u3t.gmgdl@evledraar.gmail.com/ Not directly related to this series, which just preserves the status quo, but it would be nice to get feedback on that proposal from you. > [...] > + * Verison 1: original version A typo of "Version". > + * Version 2: added "too_many_files" event > + * Version 3: added "child_ready" event > + * Version 4: added "timer" event > */ > -#define TR2_EVENT_VERSION "3" > +#define TR2_EVENT_VERSION "4" > > /* > * Region nesting limit for messages written to the event target. > @@ -615,6 +620,38 @@ static void fn_data_json_fl(const char *file, int line, > } > } > > +static void fn_timer(uint64_t us_elapsed_absolute, > + const char *thread_name, > + const char *category, > + const char *timer_name, > + uint64_t interval_count, > + uint64_t us_total_time, > + uint64_t us_min_time, > + uint64_t us_max_time) > +{ > + const char *event_name = "timer"; > + struct json_writer jw = JSON_WRITER_INIT; > + double t_abs = (double)us_elapsed_absolute / 1000000.0; > + nit: Odd placement of \n\n > + double t_total = (double)us_total_time / 1000000.0; > + double t_min = (double)us_min_time / 1000000.0; > + double t_max = (double)us_max_time / 1000000.0; Both for this... > + jw_object_begin(&jw, 0); > + event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, thread_name); > + jw_object_double(&jw, "t_abs", 6, t_abs); > + jw_object_string(&jw, "name", timer_name); > + jw_object_intmax(&jw, "count", interval_count); > + jw_object_double(&jw, "t_total", 6, t_total); > + jw_object_double(&jw, "t_min", 6, t_min); > + jw_object_double(&jw, "t_max", 6, t_max); [...] > +static void fn_timer(uint64_t us_elapsed_absolute, > + const char *thread_name, > + const char *category, > + const char *timer_name, > + uint64_t interval_count, > + uint64_t us_total_time, > + uint64_t us_min_time, > + uint64_t us_max_time) > +{ > + const char *event_name = "timer"; > + struct strbuf buf_payload = STRBUF_INIT; > + > + double t_total = (double)us_total_time / 1000000.0; > + double t_min = (double)us_min_time / 1000000.0; > + double t_max = (double)us_max_time / 1000000.0; > + > + strbuf_addf(&buf_payload, "name:%s", timer_name); > + strbuf_addf(&buf_payload, " count:%"PRIu64, interval_count); > + strbuf_addf(&buf_payload, " total:%9.6f", t_total); > + strbuf_addf(&buf_payload, " min:%9.6f", t_min); > + strbuf_addf(&buf_payload, " max:%9.6f", t_max); ...and this, wouldn't it be better/more readable to retain the uint64_t for the math, and just cast if needed when we're doing the formatting?

On the Git mailing list, Jeff Hostetler wrote (reply to this):

On 12/20/21 11:39 AM, Ævar Arnfjörð Bjarmason wrote: > > On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote: > >> From: Jeff Hostetler <jeffhost@microsoft.com> >> >> Teach Trace2 "perf" and "event" formats to handle "timer" events for >> stopwatch timers. Update API documentation accordingly. >> >> In a future commit, stopwatch timers will be added to the Trace2 API >> and it will emit these "timer" events. >> >> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> >>... >> diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt >> index bb13ca3db8b..e6ed94ba814 100644 >> --- a/Documentation/technical/api-trace2.txt >> +++ b/Documentation/technical/api-trace2.txt >> @@ -391,7 +391,7 @@ only present on the "start" and "atexit" events. >> { >> "event":"version", >> ... >> - "evt":"3", # EVENT format version >> + "evt":"4", # EVENT format version >> "exe":"2.20.1.155.g426c96fcdb" # git version >> } > > FWIW this seems like a time not to bump the version per the proposed > approach in: > https://lore.kernel.org/git/211201.86zgpk9u3t.gmgdl@evledraar.gmail.com/ > > Not directly related to this series, which just preserves the status > quo, but it would be nice to get feedback on that proposal from you. Frankly, my eyes glazed over every time I tried to read it.... Your proposal looks fine. And yes, our assumptions are that because we have structured data, new event types and/or new fields can be added and safely ignored by JSON parsers, so we should be OK. So we're assuming that only if we drop events or fields or change the meaning of one of them, would a parser need to react and so we can limit version bumps to those instances. I'm OK with this. I'll let you draft the wording in api-trace2.txt to explain the how/when/why we want to update the version number in the future. Thanks. > >> [...] >> + * Verison 1: original version > > A typo of "Version". > >> + * Version 2: added "too_many_files" event >> + * Version 3: added "child_ready" event >> + * Version 4: added "timer" event >> */ >> -#define TR2_EVENT_VERSION "3" >> +#define TR2_EVENT_VERSION "4" >> I'll roll this back in my next version. >> /* >> * Region nesting limit for messages written to the event target. >> @@ -615,6 +620,38 @@ static void fn_data_json_fl(const char *file, int line, >> } >> } >> >> +static void fn_timer(uint64_t us_elapsed_absolute, >> + const char *thread_name, >> + const char *category, >> + const char *timer_name, >> + uint64_t interval_count, >> + uint64_t us_total_time, >> + uint64_t us_min_time, >> + uint64_t us_max_time) >> +{ >> + const char *event_name = "timer"; >> + struct json_writer jw = JSON_WRITER_INIT; >> + double t_abs = (double)us_elapsed_absolute / 1000000.0; >> + > > nit: Odd placement of \n\n > >> + double t_total = (double)us_total_time / 1000000.0; >> + double t_min = (double)us_min_time / 1000000.0; >> + double t_max = (double)us_max_time / 1000000.0; > > Both for this... > >> + jw_object_begin(&jw, 0); >> + event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, thread_name); >> + jw_object_double(&jw, "t_abs", 6, t_abs); >> + jw_object_string(&jw, "name", timer_name); >> + jw_object_intmax(&jw, "count", interval_count); >> + jw_object_double(&jw, "t_total", 6, t_total); >> + jw_object_double(&jw, "t_min", 6, t_min); >> + jw_object_double(&jw, "t_max", 6, t_max); > > [...] > >> +static void fn_timer(uint64_t us_elapsed_absolute, >> + const char *thread_name, >> + const char *category, >> + const char *timer_name, >> + uint64_t interval_count, >> + uint64_t us_total_time, >> + uint64_t us_min_time, >> + uint64_t us_max_time) >> +{ >> + const char *event_name = "timer"; >> + struct strbuf buf_payload = STRBUF_INIT; >> + >> + double t_total = (double)us_total_time / 1000000.0; >> + double t_min = (double)us_min_time / 1000000.0; >> + double t_max = (double)us_max_time / 1000000.0; >> + >> + strbuf_addf(&buf_payload, "name:%s", timer_name); >> + strbuf_addf(&buf_payload, " count:%"PRIu64, interval_count); >> + strbuf_addf(&buf_payload, " total:%9.6f", t_total); >> + strbuf_addf(&buf_payload, " min:%9.6f", t_min); >> + strbuf_addf(&buf_payload, " max:%9.6f", t_max); > > ....and this, wouldn't it be better/more readable to retain the uint64_t > for the math, and just cast if needed when we're doing the formatting? > I had those expressions inline at first and it really junked up the lines and made things hard to read -- partially because of the need to wrap the lines a lot. I went with the local t_* temp vars to make it more clear what we were doing. This style also matched the existing code in _tgt_event.c for `t_abs` and `t_rel` in all of the fn_*. Jeff

gitgitgadget · 2021-12-20T16:53:59Z

Documentation/technical/api-trace2.txt

@@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
 {


On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > [...] > +static void tr2main_emit_summary_timers(uint64_t us_elapsed_absolute) > +{ > + struct tr2_tgt *tgt_j; > + int j; > + struct tr2tmr_block merged; > + > + memset(&merged, 0, sizeof(merged)); Nit: just do a " = { 0 }" assignment above instead. > + if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS) > + BUG("invalid timer id: %d", tid); > + > + tr2tmr_start(tid); > +} > + > +void trace2_timer_stop(enum trace2_timer_id tid) > +{ > + if (!trace2_enabled) > + return; > + > + if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS) > + BUG("invalid timer id: %d", tid); nit / style: maybe assert() instead for cases where assert() produces better info than BUG(). I.e. it would quote the whole expression, and show you what condition it violated.... > +void tr2tmr_stop(enum trace2_timer_id tid) > +{ > + struct tr2tls_thread_ctx *ctx = tr2tls_get_self(); > + struct tr2tmr_timer *t = &ctx->timers.timer[tid]; > + uint64_t us_now; > + uint64_t us_interval; > + > + assert(t->recursion_count > 0); ...as you opted to do here. > + > + t->recursion_count--; > + if (t->recursion_count > 0) > + return; /* still in recursive call */ > + > + us_now = getnanotime() / 1000; > + us_interval = us_now - t->start_us; > + > + t->total_us += us_interval; > + > + if (!t->interval_count) { > + t->min_us = us_interval; > + t->max_us = us_interval; > + } else { > + if (us_interval < t->min_us) > + t->min_us = us_interval; > + if (us_interval > t->max_us) > + t->max_us = us_interval; > + } Perhaps more readable/easily understood as just a (untested): if (!t->interval_count || us_interval >= t->min_us) t->min_us = us_interval; if (!t->interval_count || us_interval >= t->max_us) t->max_us = us_interval; I.e. to avoid duplicating the identical assignment... > [...] > + if (!t->interval_count) > + continue; /* this timer was not used by this thread. */ > + > + t_merged->total_us += t->total_us; > + > + if (!t_merged->interval_count) { > + t_merged->min_us = t->min_us; > + t_merged->max_us = t->max_us; > + } else { > + if (t->min_us < t_merged->min_us) > + t_merged->min_us = t->min_us; > + if (t->max_us > t_merged->max_us) > + t_merged->max_us = t->max_us; > + } ...ditto, maybe since it's used at least twice factor it out to some trivial "static" helper here (maybe not worth it..)> > + /* > + * Number of nested starts on the stack in this thread. (We > + * ignore recursive starts and use this to track the recursive > + * calls.) > + */ > + unsigned int recursion_count; Earlier we have various forms of: if (t->recursion_count > 1) But since it's unsigned can we just make those a: if (t->recursion_count) ?

On the Git mailing list, Jeff Hostetler wrote (reply to this):

On 12/20/21 11:42 AM, Ævar Arnfjörð Bjarmason wrote: > > On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote: > >> From: Jeff Hostetler <jeffhost@microsoft.com> >> [...] >> + >> +void trace2_timer_stop(enum trace2_timer_id tid) >> +{ >> + if (!trace2_enabled) >> + return; >> + >> + if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS) >> + BUG("invalid timer id: %d", tid); > > nit / style: maybe assert() instead for cases where assert() produces > better info than BUG(). I.e. it would quote the whole expression, and > show you what condition it violated.... I'd rather leave it a BUG() so that we always get the guard code. assert() goes away in non-debug builds and a little while later "tid" will be used as a subscript. I'll add the function name to the BUG message to make it a little clearer. [...] > > Perhaps more readable/easily understood as just a (untested): > > if (!t->interval_count || us_interval >= t->min_us) > t->min_us = us_interval; > if (!t->interval_count || us_interval >= t->max_us) > t->max_us = us_interval; > > I.e. to avoid duplicating the identical assignment... [...] I'll look at something here to make this a little less messy. Probably add a MIN() and MAX() to the mixture. > >> + /* >> + * Number of nested starts on the stack in this thread. (We >> + * ignore recursive starts and use this to track the recursive >> + * calls.) >> + */ >> + unsigned int recursion_count; > > Earlier we have various forms of: > > if (t->recursion_count > 1) > > But since it's unsigned can we just make those a: > > if (t->recursion_count) > The places that are > 0, yes. But the > 1 instances are different since we're counting how many calls are on the stack and want to handle recursive calls differently than the first.

gitgitgadget · 2021-12-20T17:15:14Z

Documentation/technical/api-trace2.txt

@@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
 {


On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > [...] > diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c > index 9b3905b920c..ca36d44dfd7 100644 > --- a/trace2/tr2_tgt_event.c > +++ b/trace2/tr2_tgt_event.c > @@ -23,7 +23,7 @@ static struct tr2_dst tr2dst_event = { TR2_SYSENV_EVENT, 0, 0, 0, 0 }; > * Verison 1: original version > * Version 2: added "too_many_files" event > * Version 3: added "child_ready" event > - * Version 4: added "timer" event > + * Version 4: added "timer" and "counter" events > */ > #define TR2_EVENT_VERSION "4" > Nit on series structure: Earlier we bumped the version to 4, but here we're changing existing version 4 behavior. Would be better IMO just bump it at the end (if at all needed, per: https://lore.kernel.org/git/211201.86zgpk9u3t.gmgdl@evledraar.gmail.com/) > +static void fn_counter(uint64_t us_elapsed_absolute, > + const char *thread_name, > + const char *category, > + const char *counter_name, > + uint64_t value) > +{ > + const char *event_name = "counter"; > + struct strbuf buf_payload = STRBUF_INIT; > + > + strbuf_addf(&buf_payload, "name:%s", counter_name); > + strbuf_addf(&buf_payload, " value:%"PRIu64, value); Odd to have these be two seperate strbuf_addf()... > + > + perf_io_write_fl(__FILE__, __LINE__, event_name, NULL, > + &us_elapsed_absolute, NULL, > + category, &buf_payload, thread_name); > + strbuf_release(&buf_payload); ...but more generally, and I see from e.g. the existing fn_version_fl that you're just using existing patterns, but it seems odd not to have a trivial varargs fmt helper for perf_io_write_fl that would avoid the whole strbuf/addf/release dance. I did a quick experiment to do that, patch on "master" below. A lot of the boilerplate could be simplified by factoring out the sq_quote_buf_pretty() case, and even this approach (re)allocs in a way that looks avoidable in many cases if perf_fmt_prepare() were improved (but it looks like it nedes its if/while loops in some cases still): diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c index 2ff9cf70835..bcbb0d8a250 100644 --- a/trace2/tr2_tgt_perf.c +++ b/trace2/tr2_tgt_perf.c @@ -153,16 +153,33 @@ static void perf_io_write_fl(const char *file, int line, const char *event_name, strbuf_release(&buf_line); } +__attribute__((format (printf, 8, 9))) +static void perf_io_write_fl_fmt(const char *file, int line, const char *event_name, + const struct repository *repo, + uint64_t *p_us_elapsed_absolute, + uint64_t *p_us_elapsed_relative, + const char *category, + const char *fmt, ...) +{ + va_list ap; + struct strbuf sb = STRBUF_INIT; + + va_start(ap, fmt); + strbuf_vaddf(&sb, fmt, ap); + va_end(ap); + + perf_io_write_fl(file, line, event_name, repo, p_us_elapsed_absolute, + p_us_elapsed_relative, category, &sb); + + strbuf_release(&sb); +} + static void fn_version_fl(const char *file, int line) { const char *event_name = "version"; - struct strbuf buf_payload = STRBUF_INIT; - - strbuf_addstr(&buf_payload, git_version_string); - perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL, - &buf_payload); - strbuf_release(&buf_payload); + perf_io_write_fl_fmt(file, line, event_name, NULL, NULL, NULL, NULL, + "%s", git_version_string); } static void fn_start_fl(const char *file, int line, @@ -182,37 +199,25 @@ static void fn_exit_fl(const char *file, int line, uint64_t us_elapsed_absolute, int code) { const char *event_name = "exit"; - struct strbuf buf_payload = STRBUF_INIT; - strbuf_addf(&buf_payload, "code:%d", code); - - perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute, - NULL, NULL, &buf_payload); - strbuf_release(&buf_payload); + perf_io_write_fl_fmt(file, line, event_name, NULL, &us_elapsed_absolute, + NULL, NULL, "code:%d", code); } static void fn_signal(uint64_t us_elapsed_absolute, int signo) { const char *event_name = "signal"; - struct strbuf buf_payload = STRBUF_INIT; - strbuf_addf(&buf_payload, "signo:%d", signo); - - perf_io_write_fl(__FILE__, __LINE__, event_name, NULL, - &us_elapsed_absolute, NULL, NULL, &buf_payload); - strbuf_release(&buf_payload); + perf_io_write_fl_fmt(__FILE__, __LINE__, event_name, NULL, + &us_elapsed_absolute, NULL, NULL, "signo:%d", signo); } static void fn_atexit(uint64_t us_elapsed_absolute, int code) { const char *event_name = "atexit"; - struct strbuf buf_payload = STRBUF_INIT; - - strbuf_addf(&buf_payload, "code:%d", code); - perf_io_write_fl(__FILE__, __LINE__, event_name, NULL, - &us_elapsed_absolute, NULL, NULL, &buf_payload); - strbuf_release(&buf_payload); + perf_io_write_fl_fmt(__FILE__, __LINE__, event_name, NULL, + &us_elapsed_absolute, NULL, NULL, "code:%d", code); } static void maybe_append_string_va(struct strbuf *buf, const char *fmt, @@ -244,13 +249,9 @@ static void fn_error_va_fl(const char *file, int line, const char *fmt, static void fn_command_path_fl(const char *file, int line, const char *pathname) { const char *event_name = "cmd_path"; - struct strbuf buf_payload = STRBUF_INIT; - - strbuf_addstr(&buf_payload, pathname); - perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL, - &buf_payload); - strbuf_release(&buf_payload); + perf_io_write_fl_fmt(file, line, event_name, NULL, NULL, NULL, NULL, + "%s", pathname); } static void fn_command_ancestry_fl(const char *file, int line, const char **parent_names) @@ -286,13 +287,9 @@ static void fn_command_name_fl(const char *file, int line, const char *name, static void fn_command_mode_fl(const char *file, int line, const char *mode) { const char *event_name = "cmd_mode"; - struct strbuf buf_payload = STRBUF_INIT; - - strbuf_addstr(&buf_payload, mode); - perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL, - &buf_payload); - strbuf_release(&buf_payload); + perf_io_write_fl_fmt(file, line, event_name, NULL, NULL, NULL, NULL, + "%s", mode); } static void fn_alias_fl(const char *file, int line, const char *alias, @@ -351,13 +348,10 @@ static void fn_child_exit_fl(const char *file, int line, int code, uint64_t us_elapsed_child) { const char *event_name = "child_exit"; - struct strbuf buf_payload = STRBUF_INIT; - strbuf_addf(&buf_payload, "[ch%d] pid:%d code:%d", cid, pid, code); - - perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute, - &us_elapsed_child, NULL, &buf_payload); - strbuf_release(&buf_payload); + perf_io_write_fl_fmt(file, line, event_name, NULL, &us_elapsed_absolute, + &us_elapsed_child, NULL, "[ch%d] pid:%d code:%d", + cid, pid, code); } static void fn_child_ready_fl(const char *file, int line, @@ -365,24 +359,19 @@ static void fn_child_ready_fl(const char *file, int line, const char *ready, uint64_t us_elapsed_child) { const char *event_name = "child_ready"; - struct strbuf buf_payload = STRBUF_INIT; - - strbuf_addf(&buf_payload, "[ch%d] pid:%d ready:%s", cid, pid, ready); - perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute, - &us_elapsed_child, NULL, &buf_payload); - strbuf_release(&buf_payload); + perf_io_write_fl_fmt(file, line, event_name, NULL, &us_elapsed_absolute, + &us_elapsed_child, NULL, + "[ch%d] pid:%d ready:%s", cid, pid, ready); } static void fn_thread_start_fl(const char *file, int line, uint64_t us_elapsed_absolute) { const char *event_name = "thread_start"; - struct strbuf buf_payload = STRBUF_INIT; - perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute, - NULL, NULL, &buf_payload); - strbuf_release(&buf_payload); + perf_io_write_fl_fmt(file, line, event_name, NULL, &us_elapsed_absolute, + NULL, NULL, "%s", ""); /* TODO: No payload, support NULL? */ } static void fn_thread_exit_fl(const char *file, int line, @@ -390,11 +379,9 @@ static void fn_thread_exit_fl(const char *file, int line, uint64_t us_elapsed_thread) { const char *event_name = "thread_exit"; - struct strbuf buf_payload = STRBUF_INIT; - perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute, - &us_elapsed_thread, NULL, &buf_payload); - strbuf_release(&buf_payload); + perf_io_write_fl_fmt(file, line, event_name, NULL, &us_elapsed_absolute, + &us_elapsed_thread, NULL, "%s", ""); /* TODO: No payload, support NULL ? */ } static void fn_exec_fl(const char *file, int line, uint64_t us_elapsed_absolute, @@ -438,13 +425,9 @@ static void fn_param_fl(const char *file, int line, const char *param, const char *value) { const char *event_name = "def_param"; - struct strbuf buf_payload = STRBUF_INIT; - - strbuf_addf(&buf_payload, "%s:%s", param, value); - perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL, - &buf_payload); - strbuf_release(&buf_payload); + perf_io_write_fl_fmt(file, line, event_name, NULL, NULL, NULL, NULL, + "%s:%s", param, value); } static void fn_repo_fl(const char *file, int line, @@ -525,13 +508,10 @@ static void fn_data_json_fl(const char *file, int line, const struct json_writer *value) { const char *event_name = "data_json"; - struct strbuf buf_payload = STRBUF_INIT; - - strbuf_addf(&buf_payload, "%s:%s", key, value->json.buf); - perf_io_write_fl(file, line, event_name, repo, &us_elapsed_absolute, - &us_elapsed_region, category, &buf_payload); - strbuf_release(&buf_payload); + perf_io_write_fl_fmt(file, line, event_name, repo, &us_elapsed_absolute, + &us_elapsed_region, category, + "%s:%s", key, value->json.buf); } static void fn_printf_va_fl(const char *file, int line,

On the Git mailing list, Jeff Hostetler wrote (reply to this):

On 12/20/21 11:51 AM, Ævar Arnfjörð Bjarmason wrote: > > On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote: > >> From: Jeff Hostetler <jeffhost@microsoft.com> >> [...] > >> +static void fn_counter(uint64_t us_elapsed_absolute, >> + const char *thread_name, >> + const char *category, >> + const char *counter_name, >> + uint64_t value) >> +{ >> + const char *event_name = "counter"; >> + struct strbuf buf_payload = STRBUF_INIT; >> + >> + strbuf_addf(&buf_payload, "name:%s", counter_name); >> + strbuf_addf(&buf_payload, " value:%"PRIu64, value); > > Odd to have these be two seperate strbuf_addf()... yeah, i'll combine. and in the body of fn_timer in 6/9. > ....but more generally, and I see from e.g. the existing fn_version_fl > that you're just using existing patterns, but it seems odd not to have a > trivial varargs fmt helper for perf_io_write_fl that would avoid the > whole strbuf/addf/release dance. [...] yeah, cut-n-paste was used here and i was maintaining consistency with the other functions -- rather than inventing something new and refactoring stuff that didn't need be refactored in the middle of an on-going patch series. > I did a quick experiment to do that, patch on "master" below. A lot of > the boilerplate could be simplified by factoring out the > sq_quote_buf_pretty() case, and even this approach (re)allocs in a way > that looks avoidable in many cases if perf_fmt_prepare() were improved > (but it looks like it nedes its if/while loops in some cases still): > [...] > > +__attribute__((format (printf, 8, 9))) > +static void perf_io_write_fl_fmt(const char *file, int line, const char *event_name, > + const struct repository *repo, > + uint64_t *p_us_elapsed_absolute, > + uint64_t *p_us_elapsed_relative, > + const char *category, > + const char *fmt, ...) > +{ > + va_list ap; > + struct strbuf sb = STRBUF_INIT; > + > + va_start(ap, fmt); > + strbuf_vaddf(&sb, fmt, ap); > + va_end(ap); > + > + perf_io_write_fl(file, line, event_name, repo, p_us_elapsed_absolute, > + p_us_elapsed_relative, category, &sb); > + > + strbuf_release(&sb); > +} > + > static void fn_version_fl(const char *file, int line) > { > const char *event_name = "version"; > - struct strbuf buf_payload = STRBUF_INIT; > - > - strbuf_addstr(&buf_payload, git_version_string); > > - perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL, > - &buf_payload); > - strbuf_release(&buf_payload); > + perf_io_write_fl_fmt(file, line, event_name, NULL, NULL, NULL, NULL, > + "%s", git_version_string); > } [...] Yes, it might be nice to have a _fmt() version as you suggest and simplify many of the existing fn_*() function bodies. It seems like I keep saying this today, but can we discuss that in a new top-level topic and not down inside commit 8/9 of this series? Thanks, Jeff

gitgitgadget · 2021-12-20T17:28:33Z

Documentation/technical/api-trace2.txt

@@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
 {


On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > [...] > +struct ut_011_data { > + int v1, v2; > +}; > [...] > + struct ut_011_data data = { 0, 0 }; Nit: Just "{ 0 }" is OK for zero'd out initialization. No need to keep extending this for every field. For things that want to exhaustively list fields for clarity, designated would be preferred: o { .v1 = 0, .v2 = 0 } > + int nr_threads = 0; > + int k; > + pthread_t *pids = NULL; > + > + if (argc != 3) > + die("%s", usage_error); > + if (get_i(&data.v1, argv[0])) > + die("%s", usage_error); > + if (get_i(&data.v2, argv[1])) > + die("%s", usage_error); > + if (get_i(&nr_threads, argv[2])) > + die("%s", usage_error); A partial nit on existing code, as this just extends the pattern, but couldn't much of this get_i() etc. just be made redundant by simply using the parse-options.c API here? I.e. OPTION_INTEGER and using named arguments would do the validation or you. > +# Exercise the global counter in a loop and confirm that we get the > +# expected sum in an event record. > +# > + > +have_counter_event () { > + thread=$1 > + name=$2 > + value=$3 > + file=$4 > + > + grep "\"event\":\"counter\".*\"thread\":\"${thread}\".*\"name\":\"${name}\".*\"value\":${value}" $file > + > + return $? > +} It looks like there's no helper, but this is the Nth thing I see where wish our "test_region" helper were just a bit more generalized. I.e.: test_trace2 --match=counter --match=thread=$thread --match=name=$name --match=value=$value <trace> With test_region just being a wrapper for something like: test_trace2 --match=region_enter --match=category=$category --match=label=$label <trace> && test_trace2 --match=region_leave --match=category=$category --match=label=$label <trace> > +static void tr2main_emit_summary_counters(uint64_t us_elapsed_absolute) > +{ > + struct tr2_tgt *tgt_j; > + int j; > + struct tr2ctr_block merged; > + > + memset(&merged, 0, sizeof(merged)); nit: more memset v.s. "{ 0 }".

On the Git mailing list, Jeff Hostetler wrote (reply to this):

On 12/20/21 12:14 PM, Ævar Arnfjörð Bjarmason wrote: > > On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote: > >> From: Jeff Hostetler <jeffhost@microsoft.com> >> [...] >> + int nr_threads = 0; >> + int k; >> + pthread_t *pids = NULL; >> + >> + if (argc != 3) >> + die("%s", usage_error); >> + if (get_i(&data.v1, argv[0])) >> + die("%s", usage_error); >> + if (get_i(&data.v2, argv[1])) >> + die("%s", usage_error); >> + if (get_i(&nr_threads, argv[2])) >> + die("%s", usage_error); > > A partial nit on existing code, as this just extends the pattern, but > couldn't much of this get_i() etc. just be made redundant by simply > using the parse-options.c API here? I.e. OPTION_INTEGER and using named > arguments would do the validation or you. I suppose. It just seemed like a little overkill setting things up for such a simple and isolated test. And the cut-n-paste was quick enough for my purposes. > >> +# Exercise the global counter in a loop and confirm that we get the >> +# expected sum in an event record. >> +# >> + >> +have_counter_event () { >> + thread=$1 >> + name=$2 >> + value=$3 >> + file=$4 >> + >> + grep "\"event\":\"counter\".*\"thread\":\"${thread}\".*\"name\":\"${name}\".*\"value\":${value}" $file >> + >> + return $? >> +} > > It looks like there's no helper, but this is the Nth thing I see where > wish our "test_region" helper were just a bit more generalized. I.e.: > > test_trace2 --match=counter --match=thread=$thread --match=name=$name --match=value=$value <trace> > > With test_region just being a wrapper for something like: > > test_trace2 --match=region_enter --match=category=$category --match=label=$label <trace> && > test_trace2 --match=region_leave --match=category=$category --match=label=$label <trace> Yes, that would be nice. But I don't think we should start that in the middle of this patch series. Perhaps you could start a top-level message with a fleshed out proposal and let everyone discuss it. > >> +static void tr2main_emit_summary_counters(uint64_t us_elapsed_absolute) >> +{ >> + struct tr2_tgt *tgt_j; >> + int j; >> + struct tr2ctr_block merged; >> + >> + memset(&merged, 0, sizeof(merged)); > > nit: more memset v.s. "{ 0 }". Yeah, but lldb wouldn't stop complaining until it was "= { { { 0 } } }" Jeff

gitgitgadget · 2021-12-20T19:09:20Z

User Jeff Hostetler <git@jeffhostetler.com> has been added to the cc: list.

gitgitgadget · 2021-12-21T07:26:28Z

trace2/tr2_tgt_event.c

@@ -88,7 +88,7 @@ static void event_fmt_prepare(const char *event_name, const char *file,



On the Git mailing list, Junio C Hamano wrote (reply to this):

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Use a 'char *' to hold the thread name rather than a 'struct strbuf'. > The thread name is set when the thread is created and should not be > be modified afterwards. Replace the strbuf with an allocated pointer > to make that more clear. Sounds good. Use of strbuf is perfectly fine while you compute the final value of the string, but as a more permanent location to store the result, it often is unsuitable (and strbuf_split_buf() is a prime example of how *not* to design your API function around the type). > diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c > index 7da94aba522..cd8b9f2f0a0 100644 > --- a/trace2/tr2_tls.c > +++ b/trace2/tr2_tls.c > @@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, > uint64_t us_thread_start) > { > struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx)); > + struct strbuf buf_name = STRBUF_INIT; > > /* > * Implicitly "tr2tls_push_self()" to capture the thread's start > @@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, > > ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id); > > - strbuf_init(&ctx->thread_name, 0); > if (ctx->thread_id) > - strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id); > - strbuf_addstr(&ctx->thread_name, thread_name); > - if (ctx->thread_name.len > TR2_MAX_THREAD_NAME) > - strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME); > + strbuf_addf(&buf_name, "th%02d:", ctx->thread_id); > + strbuf_addstr(&buf_name, thread_name); > + if (buf_name.len > TR2_MAX_THREAD_NAME) > + strbuf_setlen(&buf_name, TR2_MAX_THREAD_NAME); > + > + ctx->thread_name = strbuf_detach(&buf_name, NULL); This is not exactly a new problem, but if we use a mechanism to allow arbitrary long string (like composing with strbuf and detaching the resulting string as is), instead of having a fixed name[] array embedded in the ctx structure, I wonder if applying the maximum length this early makes sense. Such a truncation would allow more than one ctx structures to share the same name, which somehow feels error prone, inviting a mistake to use .thread_name member as an identifier, when its only intended use is to give a human-readable and not necessarily unique label. Of course, if the maximum is reasonably low, like a few dozen bytes, it may even make sense to embed an array of the fixed size and not worry about an extra pointer. > diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h > index a90bd639d48..d968da6a679 100644 > --- a/trace2/tr2_tls.h > +++ b/trace2/tr2_tls.h > @@ -9,7 +9,7 @@ > #define TR2_MAX_THREAD_NAME (24) > > struct tr2tls_thread_ctx { > - struct strbuf thread_name; > + char *thread_name; That is, something like char thread_name[TR2_MAX_THREAD_NAME + 1]; perhaps with moving it to the end of the struct to avoid padding waste, would make more sense than the posted patch, if we accept an early truncation and information loss. The other extreme would also make equally more sense than the posted patch. Just grab strbuf_detach() result without truncation and point at it with "char *thread_name" here, and if the output layer wants to limit the names to some reasonable length, deal with the TR2_MAX_THREAD_NAME at that layer, without losing information too early. It might be a much bigger surgery, I am afraid, because the users of ctx->thread_name (and old ctx->thread_name.buf) all are relying on the string being shorter than TR2_MAX_THREAD_NAME. > uint64_t *array_us_start; > size_t alloc; > size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */

On the Git mailing list, Jeff Hostetler wrote (reply to this):

On 12/21/21 2:22 AM, Junio C Hamano wrote: > "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Jeff Hostetler <jeffhost@microsoft.com> >> >> Use a 'char *' to hold the thread name rather than a 'struct strbuf'. >> The thread name is set when the thread is created and should not be >> be modified afterwards. Replace the strbuf with an allocated pointer >> to make that more clear. > > Sounds good. Use of strbuf is perfectly fine while you compute the > final value of the string, but as a more permanent location to store > the result, it often is unsuitable (and strbuf_split_buf() is a prime > example of how *not* to design your API function around the type). > >> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c >> index 7da94aba522..cd8b9f2f0a0 100644 >> --- a/trace2/tr2_tls.c >> +++ b/trace2/tr2_tls.c >> @@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, >> uint64_t us_thread_start) >> { >> struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx)); >> + struct strbuf buf_name = STRBUF_INIT; >> >> /* >> * Implicitly "tr2tls_push_self()" to capture the thread's start >> @@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, >> >> ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id); >> >> - strbuf_init(&ctx->thread_name, 0); >> if (ctx->thread_id) >> - strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id); >> - strbuf_addstr(&ctx->thread_name, thread_name); >> - if (ctx->thread_name.len > TR2_MAX_THREAD_NAME) >> - strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME); >> + strbuf_addf(&buf_name, "th%02d:", ctx->thread_id); >> + strbuf_addstr(&buf_name, thread_name); >> + if (buf_name.len > TR2_MAX_THREAD_NAME) >> + strbuf_setlen(&buf_name, TR2_MAX_THREAD_NAME); >> + >> + ctx->thread_name = strbuf_detach(&buf_name, NULL); > > This is not exactly a new problem, but if we use a mechanism to > allow arbitrary long string (like composing with strbuf and > detaching the resulting string as is), instead of having a fixed > name[] array embedded in the ctx structure, I wonder if applying the > maximum length this early makes sense. Such a truncation would > allow more than one ctx structures to share the same name, which > somehow feels error prone, inviting a mistake to use .thread_name > member as an identifier, when its only intended use is to give a > human-readable and not necessarily unique label. Of course, if the > maximum is reasonably low, like a few dozen bytes, it may even make > sense to embed an array of the fixed size and not worry about an > extra pointer. > I'll convert it to a flex-array at the bottom of the CTX structure and then defer the truncation to the _perf target (which only does that to keep the columns lined up). That will simplify things considerably. Thanks Jeff

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff Hostetler <git@jeffhostetler.com> writes: > I'll convert it to a flex-array at the bottom of the CTX structure > and then defer the truncation to the _perf target (which only does > that to keep the columns lined up). > > That will simplify things considerably. I am not sure if the complexity of flex-array is worth it. You have been storing an up-to-24-byte human readable name by embedding a strbuf that has two size_t plus a pointer (i.e. 24-bytes even on Windows), and the posted patch changes it to a pointer plus a on-heap allocation with malloc() overhead. An embedded fixed-size thread_name[TR2_MAX_THREAD_NAME+1] member may be the simplest thing to do, I suspect.

gitgitgadget · 2021-12-21T07:32:21Z

trace2/tr2_tls.c

@@ -15,7 +15,16 @@ static uint64_t tr2tls_us_start_process;
 static pthread_mutex_t tr2tls_mutex;


On the Git mailing list, Junio C Hamano wrote (reply to this):

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Defer freeing of the Trace2 thread CTX data until program exit. > Create a global list of thread CTX data to own the storage. > > TLS CTX data is allocated when a thread is created and associated > with that thread. Previously, that storage was deleted when the > thread exited. Now we simply disassociate the CTX data from the > thread when it exits and let the global CTX list manage the cleanup. By the way, TLS CTX sounds embarrassingly close and confusing to some function that we may find in say openssl or some crypto stuff X-<. Was there a strong reason to avoid calling these functions and types something like tr2_thread_ctx instead of tr2tls_thread_ctx?

On the Git mailing list, Jeff Hostetler wrote (reply to this):

On 12/21/21 2:30 AM, Junio C Hamano wrote: > "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Jeff Hostetler <jeffhost@microsoft.com> >> >> Defer freeing of the Trace2 thread CTX data until program exit. >> Create a global list of thread CTX data to own the storage. >> >> TLS CTX data is allocated when a thread is created and associated >> with that thread. Previously, that storage was deleted when the >> thread exited. Now we simply disassociate the CTX data from the >> thread when it exits and let the global CTX list manage the cleanup. > > By the way, TLS CTX sounds embarrassingly close and confusing to > some function that we may find in say openssl or some crypto stuff > X-<. Was there a strong reason to avoid calling these functions and > types something like tr2_thread_ctx instead of tr2tls_thread_ctx? > I hadn't really thought about the term "TLS" in the context of crypto -- I had "thread local storage" on the brain. I guess I've spent too much of my youth using Win32 thread APIs. :-) Let me take a look at removing those terms. Jeff

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff Hostetler <git@jeffhostetler.com> writes: > I hadn't really thought about the term "TLS" in the context > of crypto -- I had "thread local storage" on the brain. I guess > I've spent too much of my youth using Win32 thread APIs. :-) > > Let me take a look at removing those terms. Nah, it may be just me. As long as what TLS stands for is clear in the context, it is fine.

On the Git mailing list, Jeff Hostetler wrote (reply to this):

On 12/22/21 5:56 PM, Junio C Hamano wrote: > Jeff Hostetler <git@jeffhostetler.com> writes: > >> I hadn't really thought about the term "TLS" in the context >> of crypto -- I had "thread local storage" on the brain. I guess >> I've spent too much of my youth using Win32 thread APIs. :-) >> >> Let me take a look at removing those terms. > > Nah, it may be just me. As long as what TLS stands for is clear in > the context, it is fine. > ok thanks. i took a quick look at scrubbing the code of TLS and even though most of the uses are in private (or protected) tr2_*.[ch] files, it will be a large churn-type change and i'm not sure it's worth the effort. thanks jeff

On the Git mailing list, Johannes Sixt wrote (reply to this):

Am 22.12.21 um 23:56 schrieb Junio C Hamano: > Jeff Hostetler <git@jeffhostetler.com> writes: > >> I hadn't really thought about the term "TLS" in the context >> of crypto -- I had "thread local storage" on the brain. I guess >> I've spent too much of my youth using Win32 thread APIs. :-) >> >> Let me take a look at removing those terms. > > Nah, it may be just me. As long as what TLS stands for is clear in > the context, it is fine. No, really, my first reaction was, too: what the hack has crypto to do with trace2? Are we now sending around trace output by email? Please use "TLS" next to "CTX" only when it means "Transport Layer Security". -- Hannes

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Sixt <j6t@kdbg.org> writes: > Am 22.12.21 um 23:56 schrieb Junio C Hamano: >> Jeff Hostetler <git@jeffhostetler.com> writes: >> >>> I hadn't really thought about the term "TLS" in the context >>> of crypto -- I had "thread local storage" on the brain. I guess >>> I've spent too much of my youth using Win32 thread APIs. :-) >>> >>> Let me take a look at removing those terms. >> >> Nah, it may be just me. As long as what TLS stands for is clear in >> the context, it is fine. > > No, really, my first reaction was, too: what the hack has crypto to do > with trace2? Are we now sending around trace output by email? Ok, then it is not just me ;-) > > Please use "TLS" next to "CTX" only when it means "Transport Layer > Security".

On the Git mailing list, Jeff Hostetler wrote (reply to this):

On 12/23/21 1:18 PM, Junio C Hamano wrote: > Johannes Sixt <j6t@kdbg.org> writes: > >> Am 22.12.21 um 23:56 schrieb Junio C Hamano: >>> Jeff Hostetler <git@jeffhostetler.com> writes: >>> >>>> I hadn't really thought about the term "TLS" in the context >>>> of crypto -- I had "thread local storage" on the brain. I guess >>>> I've spent too much of my youth using Win32 thread APIs. :-) >>>> >>>> Let me take a look at removing those terms. >>> >>> Nah, it may be just me. As long as what TLS stands for is clear in >>> the context, it is fine. >> >> No, really, my first reaction was, too: what the hack has crypto to do >> with trace2? Are we now sending around trace output by email? > > Ok, then it is not just me ;-) >> >> Please use "TLS" next to "CTX" only when it means "Transport Layer >> Security". I'll make a note to go thru and remove/change these terms in a future series rather than mix it in with this one. Jeff

gitgitgadget · 2021-12-21T14:23:54Z

Documentation/technical/api-trace2.txt

@@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
 {


On the Git mailing list, Derrick Stolee wrote (reply to this):

On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote: > +`"timer"`:: > + This event is generated at the end of the program and contains > + statistics for a global stopwatch timer. > ++ > +------------ > +{ > + "event":"timer", > + ... > + "name":"test", # timer name > + "count":42, # number of start+stop intervals > + "t_total":1.234, # sum of all intervals (by thread or globally) > + "t_min":0.1, # shortest interval > + "t_max":0.9, # longest interval Could you specify the units for these t_* entries? I'm guessing seconds based on the example, but I've seen similar timers using milliseconds instead so it's best to be super clear here. > +/* > + * Stopwatch timer event. This function writes the previously accumlated s/accumlated/accumulated/ > + * stopwatch timer values to the event streams. Unlike other Trace2 API > + * events, this is decoupled from the data collection. > + * > + * This does not take a (file,line) pair because a timer event reports > + * the cummulative time spend in the timer over a series of intervals s/cummulative/cumulative/ > + * -- it does not represent a single usage (like region or data events > + * do). > + * > + * The thread name is optional. If non-null it will override the > + * value inherited from the caller's TLS CTX. This allows data > + * for global timers to be reported without associating it with a > + * single thread. > + */ > +typedef void(tr2_tgt_evt_timer_t)(uint64_t us_elapsed_absolute, > + const char *thread_name, > + const char *category, > + const char *timer_name, > + uint64_t interval_count, > + uint64_t us_total_time, > + uint64_t us_min_time, > + uint64_t us_max_time); > diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c > index 4ce50944298..9b3905b920c 100644 > --- a/trace2/tr2_tgt_event.c > +++ b/trace2/tr2_tgt_event.c ... > +static void fn_timer(uint64_t us_elapsed_absolute, (I was going to complain about the generic name here, but it's static to the tr2_tgt_event.c file, so that's fine.) > + const char *thread_name, > + const char *category, > + const char *timer_name, > + uint64_t interval_count, > + uint64_t us_total_time, > + uint64_t us_min_time, > + uint64_t us_max_time) > +{ > + const char *event_name = "timer"; > + struct json_writer jw = JSON_WRITER_INIT; > + double t_abs = (double)us_elapsed_absolute / 1000000.0; > + > + double t_total = (double)us_total_time / 1000000.0; > + double t_min = (double)us_min_time / 1000000.0; > + double t_max = (double)us_max_time / 1000000.0; Looks like seconds here. At first glance, I thought this large division might cause some loss of precision. However, the structure of floating point numbers means we probably don't lose that much. It might be worth _considering_ using milliseconds (only divide by 1000.0) but I'm probably just being paranoid here. Thanks, -Stolee

gitgitgadget · 2021-12-21T14:23:57Z

User Derrick Stolee <stolee@gmail.com> has been added to the cc: list.

gitgitgadget · 2021-12-21T14:47:47Z

Documentation/technical/api-trace2.txt

@@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
 {


On the Git mailing list, Derrick Stolee wrote (reply to this):

On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Add a stopwatch timer mechanism to Git. > +static void *ut_009timer_thread_proc(void *_ut_009_data) > +{ > + struct ut_009_data *data = _ut_009_data; > + int k; > + > + trace2_thread_start("ut_009"); > + > + for (k = 0; k < data->count; k++) { > + trace2_timer_start(TRACE2_TIMER_ID_TEST2); > + sleep_millisec(data->delay); > + trace2_timer_stop(TRACE2_TIMER_ID_TEST2); > + } > + > + trace2_thread_exit(); > + return NULL; > +} > + > + nit: double newline. > +# Exercise the stopwatch timer "test" in a loop and confirm that it was > +# we have as many start/stop intervals as expected. We cannot really test > +# the (elapsed, min, max) timer values, so we assume they are good. We can't check their values, but we could check that their labels are emitted. > +test_expect_success 'test stopwatch timers - summary only' ' > + test_when_finished "rm trace.perf actual" && > + test_config_global trace2.perfBrief 1 && > + test_config_global trace2.perfTarget "$(pwd)/trace.perf" && > + test-tool trace2 008timer 5 10 && > + perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual && > + grep "d0|summary|timer||_T_ABS_||test|name:test1 count:5" actual adding something like " total:.* min: .* max:.*" to the end of this pattern might be good. You could even get really specific about the ".*" being a floating point number, but I'm not too concerned about that. I just want to see that these other labels stay consistent in future Git versions. > +# Exercise the stopwatch timer "test" in a loop and confirm that it was > +# we have as many start/stop intervals as expected. We cannot really test > +# the (t_timer, t_min, t_max) timer values, so we assume they are good. Similar, we can do something such as... > +have_timer_event () { > + thread=$1 > + name=$2 > + count=$3 > + file=$4 > + > + grep "\"event\":\"timer\".*\"thread\":\"${thread}\".*\"name\":\"${name}\".*\"count\":${count}" $file Adding more detail to this pattern. This helper could probably benefit from constructing the regex across multiple string concatenations, so we can see the different pieces. Something like pattern="\"event\":\"timer\"" pattern="$pattern.*\"thread\":\"${thread}\"" pattern="$pattern.*\"name\":\"${name}\"" pattern="$pattern.*\"count\":\"${count}\"" pattern="$pattern.*\"t_total\":" pattern="$pattern.*\"t_min\":" pattern="$pattern.*\"t_max\":" grep "$pattern" $file > + > + return $? If we used && throughout this method, would this return not be necessary? > +static void tr2main_emit_summary_timers(uint64_t us_elapsed_absolute) > +{ > + struct tr2_tgt *tgt_j; > + int j; > + struct tr2tmr_block merged; > + > + memset(&merged, 0, sizeof(merged)); > + > + /* > + * Sum across all of the per-thread stopwatch timer data into > + * a single composite block of timer values. > + */ > + tr2tls_aggregate_timer_blocks(&merged); > + > + /* > + * Emit "summary" timer events for each composite timer value > + * that had activity. > + */ > + for_each_wanted_builtin (j, tgt_j) > + if (tgt_j->pfn_timer) > + tr2tmr_emit_block(tgt_j->pfn_timer, > + us_elapsed_absolute, > + &merged, "summary"); I'd put braces at the for-loop level, even though this is semantically correct without them. > +} > + > +static void tr2main_emit_thread_timers(uint64_t us_elapsed_absolute) > +{ > + struct tr2_tgt *tgt_j; > + int j; > + > + for_each_wanted_builtin (j, tgt_j) > + if (tgt_j->pfn_timer) > + tr2tls_emit_timer_blocks_by_thread(tgt_j->pfn_timer, > + us_elapsed_absolute); (same here) > +/* > + * Define the set of stopwatch timers. > + * > + * We can add more at any time, but they must be defined at compile > + * time (to avoid the need to dynamically allocate and synchronize > + * them between different threads). > + * > + * These must start at 0 and be contiguous (because we use them > + * elsewhere as array indexes). I was worried at first about using an array here, but this is essentially one chunk of global memory per process that will not be very large, even if we add a lot of timer IDs here. If we use this API enough that that memory is a problem, then we can refactor the memory to be a hashmap that only populates entries for IDs that are used by the process. > + * Any values added to this enum must also be added to the timer definitions > + * array. See `trace2/tr2_tmr.c:tr2tmr_def_block[]`. > + */ > +enum trace2_timer_id { > + /* > + * Define two timers for testing. See `t/helper/test-trace2.c`. > + * These can be used for ad hoc testing, but should not be used > + * for permanent analysis code. > + */ > + TRACE2_TIMER_ID_TEST1 = 0, /* emits summary event only */ > + TRACE2_TIMER_ID_TEST2, /* emits summary and thread events */ > + > + > + /* Add additional timer definitions before here. */ > + TRACE2_NUMBER_OF_TIMERS > +}; ... > +static struct tr2tmr_def tr2tmr_def_block[TRACE2_NUMBER_OF_TIMERS] = { > + [TRACE2_TIMER_ID_TEST1] = { "test", "test1", 0 }, > + [TRACE2_TIMER_ID_TEST2] = { "test", "test2", 1 }, > +}; Although this will always be populated, so maybe my thoughts about how to reduce memory load in the hypothetical future are worthless. > +void tr2tmr_start(enum trace2_timer_id tid) > +{ > + struct tr2tls_thread_ctx *ctx = tr2tls_get_self(); > + struct tr2tmr_timer *t = &ctx->timers.timer[tid]; > + > + t->recursion_count++; > + if (t->recursion_count > 1) > + return; /* ignore recursive starts */ > + > + t->start_us = getnanotime() / 1000; Using nanotime gives us the best precision available, and dividing by 1000 will lose some precision. This is likely why we saw some 0.000000 values for t_min in some of your experiments. That should be rare for real uses of this API (such as wrapping lstat() calls). But why do we divide by 1000 here at all? 2^63 nanoseconds is still 292 years, so we don't risk overflow. You specify uint64_t so this isn't different on 32-bit machines. Thanks, -Stolee

On the Git mailing list, Jeff Hostetler wrote (reply to this):

On 12/21/21 9:45 AM, Derrick Stolee wrote: > On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote: >> From: Jeff Hostetler <jeffhost@microsoft.com> >> >> Add a stopwatch timer mechanism to Git. [...] > >> +# Exercise the stopwatch timer "test" in a loop and confirm that it was >> +# we have as many start/stop intervals as expected. We cannot really test >> +# the (elapsed, min, max) timer values, so we assume they are good. > > We can't check their values, but we could check that their labels are > emitted. good point. i'll add that to the patterns in the grep. > [...] >> + grep "\"event\":\"timer\".*\"thread\":\"${thread}\".*\"name\":\"${name}\".*\"count\":${count}" $file > > Adding more detail to this pattern. > > This helper could probably benefit from constructing the regex across > multiple string concatenations, so we can see the different pieces. > Something like > > pattern="\"event\":\"timer\"" > pattern="$pattern.*\"thread\":\"${thread}\"" > pattern="$pattern.*\"name\":\"${name}\"" > pattern="$pattern.*\"count\":\"${count}\"" > pattern="$pattern.*\"t_total\":" > pattern="$pattern.*\"t_min\":" > pattern="$pattern.*\"t_max\":" > > grep "$pattern" $file > yeah, that helps a lot. thanks. [...] >> +/* >> + * Define the set of stopwatch timers. >> + * >> + * We can add more at any time, but they must be defined at compile >> + * time (to avoid the need to dynamically allocate and synchronize >> + * them between different threads). >> + * >> + * These must start at 0 and be contiguous (because we use them >> + * elsewhere as array indexes). > > I was worried at first about using an array here, but this is essentially > one chunk of global memory per process that will not be very large, even s/process/thread/ > if we add a lot of timer IDs here. If we use this API enough that that > memory is a problem, then we can refactor the memory to be a hashmap that > only populates entries for IDs that are used by the process. we're only talking about 48 bytes per timer being added to the thread context. and it is allocated, not stack based, so i'm not worried about it. and besides, we get constant time lookups when starting/stopping a timer. And when we get ready to sum across the thread pool, we can do it efficiently. > >> + * Any values added to this enum must also be added to the timer definitions >> + * array. See `trace2/tr2_tmr.c:tr2tmr_def_block[]`. >> + */ >> +enum trace2_timer_id { >> + /* >> + * Define two timers for testing. See `t/helper/test-trace2.c`. >> + * These can be used for ad hoc testing, but should not be used >> + * for permanent analysis code. >> + */ >> + TRACE2_TIMER_ID_TEST1 = 0, /* emits summary event only */ >> + TRACE2_TIMER_ID_TEST2, /* emits summary and thread events */ >> + >> + >> + /* Add additional timer definitions before here. */ >> + TRACE2_NUMBER_OF_TIMERS >> +}; > > .... > >> +static struct tr2tmr_def tr2tmr_def_block[TRACE2_NUMBER_OF_TIMERS] = { >> + [TRACE2_TIMER_ID_TEST1] = { "test", "test1", 0 }, >> + [TRACE2_TIMER_ID_TEST2] = { "test", "test2", 1 }, >> +}; > > Although this will always be populated, so maybe my thoughts about how > to reduce memory load in the hypothetical future are worthless. yeah, i think this model works well for us. and it is lock free. > >> +void tr2tmr_start(enum trace2_timer_id tid) >> +{ >> + struct tr2tls_thread_ctx *ctx = tr2tls_get_self(); >> + struct tr2tmr_timer *t = &ctx->timers.timer[tid]; >> + >> + t->recursion_count++; >> + if (t->recursion_count > 1) >> + return; /* ignore recursive starts */ >> + >> + t->start_us = getnanotime() / 1000; > > Using nanotime gives us the best precision available, and dividing > by 1000 will lose some precision. This is likely why we saw some > 0.000000 values for t_min in some of your experiments. That should > be rare for real uses of this API (such as wrapping lstat() calls). > > But why do we divide by 1000 here at all? 2^63 nanoseconds is > still 292 years, so we don't risk overflow. You specify uint64_t > so this isn't different on 32-bit machines. When I did the original Trace2 parts, I made absolute and relative elapsed times be in microseconds. With the overhead of logging and etc, the lower bits weren't really useful. And then I converted those to "%9.6f" in the trace logs, so that we always have "seconds" in the traces. I just copied that model when I did timers. But I could see keeping nanoseconds around for these timers (since they don't log on every start/stop, like regions do). <grin> While drafting this reply I've been fixing up the code in parallel. I converted timers to report integer ns values rather than floats. And in every line of trace output the total/min/max timer values all end in 000 -- because (at least on MacOS) getnanotime() calls gettimeofday() and computes (tv.tv_sec * 1000000000 + tv.tv_usec). </grin> So maybe I did the original "%9.6" for other reasons.... I'll try it on another OS later and see if it is useful. Thanks, Jeff

gitgitgadget · 2021-12-21T14:53:49Z

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote:
> Extend Trace2 to provide multiple "stopwatch timers" and "global counters".

>  3. Rationale
> 
> Timers and counters are an alternative to the existing "region" and "data"
> events. The latter are intended to trace the major flow (or phases) of the
> program and possibly capture the amount of work performed within a loop, for
> example. The former are offered as a way to measure activity that is not
> localized, such as the time spent in zlib or lstat, which may be called from
> many different parts of the program.
> 
> There are currently several places in the Git code where we want to measure
> such activity -- changed-path Bloom filter stats, topo-walk commit counts,
> and tree-walk counts and max-depths. A conversation in [1] suggested that we
> should investigate a more general mechanism to collect stats so that each
> instance doesn't need to recreate their own atexit handling mechanism.
> 
> This is an attempt to address that and let us easily explore other areas in
> the future.
> 
> This patch series does not attempt to refactor those three instances to use
> the new timers and counters. That should be a separate effort -- in part
> because we may want to retool them rather than just translate them. For
> example, rather than just translating the existing four Bloom filter counts
> (in revision.c) into Trace2 counters, we may instead want to have a "happy
> path timer" and a "sad path timer" if that would provide more insight.

I'm excited for these API features. It might be nice to have an RFC-
quality series demonstrating how these examples could work with the
new API. Makes sense to delay in case there were recommended changes
to the API from review in this v1.

I also like your attention to thread contexts. I think these timers
would be very interesting to use in parallel checkout. CC'ing Matheus
for his thoughts on where he would want timer summaries for that
feature. I would probably want the per-thread summary to know if we
are blocked on one really long thread while the others finish quickly.
Within that: what are the things causing us to be slow? Is it zlib?
Is it lstat()?

Thanks,
-Stolee

gitgitgadget · 2021-12-21T23:42:25Z

On the Git mailing list, Matheus Tavares wrote (reply to this):

On Tue, Dec 21, 2021 at 11:51 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote:
> >
> >  3. Rationale
> >
> > Timers and counters are an alternative to the existing "region" and "data"
> > events. The latter are intended to trace the major flow (or phases) of the
> > program and possibly capture the amount of work performed within a loop, for
> > example. The former are offered as a way to measure activity that is not
> > localized, such as the time spent in zlib or lstat, which may be called from
> > many different parts of the program.
>
> I'm excited for these API features.

Me too! This would have been very useful on some experiments I had to
run in the past.

Thanks for working on it, Jeff :)

> I also like your attention to thread contexts. I think these timers
> would be very interesting to use in parallel checkout. CC'ing Matheus
> for his thoughts on where he would want timer summaries for that
> feature.

For parallel checkout, I think it would be interesting to have timer
summaries for open/close, fstat/lstat, write, and
inflation/delta-reconstruction. Perhaps pkt-line routines too, so that
we can see how much time we spend in inter-process communication.

It would be nice to have timer information for disk reading as well
(more on that below), but I don't think it is possible since we read
the objects through mmap() and thus, we cannot easily isolate the
actual reading time from the decompression time :(

> I would probably want the per-thread summary to know if we
> are blocked on one really long thread while the others finish quickly.

That would be interesting. Parallel checkout actually uses
subprocesses, but I can see the per-thread summary being useful on
grep, for example. (Nevertheless, the use case you mentioned for the
timers -- to evaluate the work balance on parallel checkout -- seems
very interesting.)

> Within that: what are the things causing us to be slow? Is it zlib?
> Is it lstat()?

On my tests, the bottleneck on checkout heavily depended on the
underlying storage type. On HDDs, the bottleneck was object reading
(i.e. page faults on mmap()-ed files), with about 70% to 80% of the
checkout runtime.

On SSDs, reading was much faster, so CPU (i.e. inflation) became the
bottleneck, with 50% of the runtime. (Inflation only lost to reading
when checking out from *many* loose objects.)

Finally, on NFS, file creation with open(O_CREAT | O_EXCL) and fstat()
(which makes the NFS client flush previously cached writes to the
server) were the bottlenecks, with about 40% of the total runtime
each.

These numbers come from a (sequential) `git checkout .` execution on
an empty working tree of the Linux kernel (v5.12), and they were
gathered using eBPF-based profilers. For other operations, especially
ones that require many file removals or more laborious tree merging in
unpack_trees(), I suspect the bottlenecks may change.

If anyone would be interested in seeing the flamegraphs and other
plots for these profiling numbers, I have them at:
https://matheustavares.gitlab.io/annexes/parallel-checkout/profiling

And there is a bit more context at:
https://matheustavares.gitlab.io/posts/parallel-checkout

gitgitgadget · 2021-12-21T23:42:27Z

User Matheus Tavares <matheus.bernardino@usp.br> has been added to the cc: list.

gitgitgadget · 2021-12-23T07:42:53Z

User Johannes Sixt <j6t@kdbg.org> has been added to the cc: list.

Move the thread name to a flex array at the bottom of the Trace2 thread local storage data and get rid of the strbuf. Let the flex array have the full computed value of the thread name without truncation. Change the PERF target to truncate the thread name so that the columns still line up. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>

Defer freeing of the Trace2 per-thread thread local storage until program exit. Create a global list to own them. Trace2 thread local storage data is allocated when a thread is created and associated with that thread. Previously, that storage was deleted when the thread exited. Now at thread exit, we simply disassociate the data from the thread and let the global list manage the cleanup. This will be used by a later commit when we add "counters" and stopwatch-style "timers" to the Trace2 API. We will add those fields to the thread local storage block and allow each thread to efficiently (without locks) accumulate counter and timer data. At program exit, the main thread will run thru the global list and compute and report totals before freeing the list. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>

Teach the message formatter in the Trace2 event target to take an optional thread-name argument. This overrides the thread name inherited from the thread local storage data. This will be used in a future commit for global events that should not be tied to a particular thread, such as a global stopwatch timer that aggregates data from all threads. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>

Teach the message formatter in the Trace2 perf target to accept an optional thread name argument. This will override the thread name inherited from the thread local storage data block. This will be used in a future commit for global events that should not be tied to a particular thread, such as a global stopwatch timer. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>

Teach Trace2 "perf" and "event" formats to handle "timer" events for stopwatch timers. Update API documentation accordingly. In a future commit, stopwatch timers will be added to the Trace2 API and it will emit these "timer" events. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>

Add a stopwatch timer mechanism to Git. Timers are an alternative to regions. Timers can capture a series of intervals, such as calls to a library routine or a span of code. They are intended for code that is not necessarily associated with a particular phase of the command. Timer data is accumulated throughout the command and a timer "summary" event is logged (one per timer) at program exit. Optionally, timer data may also be reported by thread for certain timers. (See trace2/tr2_tmr.c:tr2_timer_def_block[].) Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>

Teach Trace2 "perf" and "event" formats to handle "counter" events for global counters. Update the API documentation accordingly. In a future commit, global counters will be added to the Trace2 API and it will emit these "counter" events at program exit. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>

Add global counters to Trace2. Create a mechanism in Trace2 to count an activity and emit a single "counter" event at the end of the program. This is an alternative to the existing "data" events that are emitted immediately. Create an array of counters (indexed by `enum trace2_counter_id`) to allow various activites to be tracked as desired. Preload the array with two counters for testing purposes. Create unit tests to demonstrate and verify. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>

jeffhostetler · 2021-12-28T19:36:01Z

/submit

gitgitgadget · 2021-12-28T19:36:58Z

Submitted as pull.1099.v2.git.1640720202.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-1099/jeffhostetler/trace2-stopwatch-v2-v2

To fetch this version to local tag pr-1099/jeffhostetler/trace2-stopwatch-v2-v2:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-1099/jeffhostetler/trace2-stopwatch-v2-v2

gitgitgadget · 2021-12-29T00:53:51Z

trace2/tr2_tls.h

@@ -11,8 +11,8 @@
 struct tr2tls_thread_ctx {


On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Use "size_t" rather than "int" for the "alloc" and "nr_open_regions" > fields in the "tr2tls_thread_ctx". These are used by ALLOC_GROW(). > > This was discussed in: https://lore.kernel.org/all/YULF3hoaDxA9ENdO@nand.local/ Let's keep commit messages self-contained when possible. It's fine to reference on-list discussion (and I often do), but in this case all that's being referenced just seems to be Taylor saying we might as well change this while we're at it. So I'd think a short sentence saying we generally prefer "size_t" for these these days and we might as well change it here while we're at it would suffice over the ML link. > Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> > --- > trace2/tr2_tls.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h > index b1e327a928e..a90bd639d48 100644 > --- a/trace2/tr2_tls.h > +++ b/trace2/tr2_tls.h > @@ -11,8 +11,8 @@ > struct tr2tls_thread_ctx { > struct strbuf thread_name; > uint64_t *array_us_start; > - int alloc; > - int nr_open_regions; /* plays role of "nr" in ALLOC_GROW */ > + size_t alloc; > + size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */ > int thread_id; > };

gitgitgadget · 2021-12-29T01:17:51Z

trace2/tr2_tgt_event.c

@@ -88,7 +88,7 @@ static void event_fmt_prepare(const char *event_name, const char *file,



On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Move the thread name to a flex array at the bottom of the Trace2 > thread local storage data and get rid of the strbuf. > > Let the flex array have the full computed value of the thread name > without truncation. > > Change the PERF target to truncate the thread name so that the columns > still line up. This commit message really doesn't help in explaining what we're trying to do here and why it's needed. I'm not saying it's not, but why not a strbuf, why a flex array? The diff below also shows changes unrelated to this. I tried this local fixup on top of this series which works, so I wonder if we're just trying to get rid of the strbuf to signal that this shouldn't change why not just strbuf_detach() and keep a "const char *thread_name"? diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c index 28ea55863d1..35d49b27b2e 100644 --- a/trace2/tr2_tls.c +++ b/trace2/tr2_tls.c @@ -48,7 +48,7 @@ void tr2tls_start_process_clock(void) struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, uint64_t us_thread_start) { - struct tr2tls_thread_ctx *ctx; + struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(struct tr2tls_thread_ctx)); struct strbuf buf_name = STRBUF_INIT; int thread_id = tr2tls_locked_increment(&tr2_next_thread_id); @@ -56,8 +56,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, strbuf_addf(&buf_name, "th%02d:", thread_id); strbuf_addstr(&buf_name, thread_name); - FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len); - strbuf_release(&buf_name); + ctx->thread_name = strbuf_detach(&buf_name, NULL); ctx->thread_id = thread_id; @@ -188,6 +187,7 @@ void tr2tls_release(void) while (ctx) { struct tr2tls_thread_ctx *next = ctx->next_ctx; + free((char *)ctx->thread_name); free(ctx->array_us_start); free(ctx); diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h index 503829bbd44..bc6c6f12e38 100644 --- a/trace2/tr2_tls.h +++ b/trace2/tr2_tls.h @@ -6,6 +6,7 @@ #include "trace2/tr2_tmr.h" struct tr2tls_thread_ctx { + const char *thread_name; struct tr2tls_thread_ctx *next_ctx; uint64_t *array_us_start; size_t alloc; @@ -14,8 +15,6 @@ struct tr2tls_thread_ctx { struct tr2_timer_block timers; struct tr2_counter_block counters; - - char thread_name[FLEX_ARRAY]; }; /* > [...] > index 7da94aba522..ed99a234b95 100644 > --- a/trace2/tr2_tls.c > +++ b/trace2/tr2_tls.c > @@ -34,7 +34,18 @@ void tr2tls_start_process_clock(void) > struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, > uint64_t us_thread_start) > { > - struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx)); > + struct tr2tls_thread_ctx *ctx; > + struct strbuf buf_name = STRBUF_INIT; > + int thread_id = tr2tls_locked_increment(&tr2_next_thread_id); Here's the looks-to-be-unrelated to this strbuf conversion code I mentioned above. > + > + if (thread_id) > + strbuf_addf(&buf_name, "th%02d:", thread_id); > + strbuf_addstr(&buf_name, thread_name); > + > + FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len); > + strbuf_release(&buf_name); > + > + ctx->thread_id = thread_id; > > /* > [...]

On the Git mailing list, Jeff Hostetler wrote (reply to this):

On 12/28/21 8:11 PM, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote: > >> From: Jeff Hostetler <jeffhost@microsoft.com> >> >> Move the thread name to a flex array at the bottom of the Trace2 >> thread local storage data and get rid of the strbuf. >> >> Let the flex array have the full computed value of the thread name >> without truncation. >> >> Change the PERF target to truncate the thread name so that the columns >> still line up. > > This commit message really doesn't help in explaining what we're trying > to do here and why it's needed. I'm not saying it's not, but why not a > strbuf, why a flex array? The diff below also shows changes unrelated to > this. > > I tried this local fixup on top of this series which works, so I wonder > if we're just trying to get rid of the strbuf to signal that this > shouldn't change why not just strbuf_detach() and keep a "const char > *thread_name"? > > diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c > index 28ea55863d1..35d49b27b2e 100644 > --- a/trace2/tr2_tls.c > +++ b/trace2/tr2_tls.c > @@ -48,7 +48,7 @@ void tr2tls_start_process_clock(void) > struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, > uint64_t us_thread_start) > { > - struct tr2tls_thread_ctx *ctx; > + struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(struct tr2tls_thread_ctx)); > struct strbuf buf_name = STRBUF_INIT; > int thread_id = tr2tls_locked_increment(&tr2_next_thread_id); > > @@ -56,8 +56,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, > strbuf_addf(&buf_name, "th%02d:", thread_id); > strbuf_addstr(&buf_name, thread_name); > > - FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len); > - strbuf_release(&buf_name); > + ctx->thread_name = strbuf_detach(&buf_name, NULL); > > ctx->thread_id = thread_id; > > @@ -188,6 +187,7 @@ void tr2tls_release(void) > while (ctx) { > struct tr2tls_thread_ctx *next = ctx->next_ctx; > > + free((char *)ctx->thread_name); > free(ctx->array_us_start); > free(ctx); > > diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h > index 503829bbd44..bc6c6f12e38 100644 > --- a/trace2/tr2_tls.h > +++ b/trace2/tr2_tls.h > @@ -6,6 +6,7 @@ > #include "trace2/tr2_tmr.h" > > struct tr2tls_thread_ctx { > + const char *thread_name; > struct tr2tls_thread_ctx *next_ctx; > uint64_t *array_us_start; > size_t alloc; > @@ -14,8 +15,6 @@ struct tr2tls_thread_ctx { > > struct tr2_timer_block timers; > struct tr2_counter_block counters; > - > - char thread_name[FLEX_ARRAY]; > }; > > /* I have to admit that I really don't know how to please you. In V1 I converted the "strbuf" to a "char *" inside the structure because there was concern that one might assume that the thread name could be changed after the thread was created. You complained that I made it a "char *" rather than a "const char *". I explained pointer ownership and you completely ignored that. You explained that I should just "cast away the const during the free" because other places in the code use that "anti-pattern". You also complained that I didn't use a callback to get the thread name dynamically rather than having a string field in the thread's TLS. I explained that it was faster to compute it once than to generate it on every logging call. You ignored that and hinted that the message formatting in each of the target destinations would make that cost irrelevant. I convert the field to a flex-array to avoid all of the allocation and ownership issues and now you send me a "fixup" patch that undoes the flex-array change and makes it look mostly like my previous version -- but WITH the "const" and the "cast" (that I've already talked about in this paragraph). So, where does this leave us? I'm really trying to "assume good intentions" here, but we've spent way toooooooo long discussing this thread_name field. It's starting to feel like you're going to just keep nagging me about this field until I make it look exactly like you would have written it. So, sorry to rant, but I don't know what else to say about this field. It is especially troubling that this "issue" has taken so much time -- time that would be better spent actually looking at the new timers and counters feature. > >> [...] >> index 7da94aba522..ed99a234b95 100644 >> --- a/trace2/tr2_tls.c >> +++ b/trace2/tr2_tls.c >> @@ -34,7 +34,18 @@ void tr2tls_start_process_clock(void) >> struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, >> uint64_t us_thread_start) >> { >> - struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx)); >> + struct tr2tls_thread_ctx *ctx; >> + struct strbuf buf_name = STRBUF_INIT; >> + int thread_id = tr2tls_locked_increment(&tr2_next_thread_id); > > Here's the looks-to-be-unrelated to this strbuf conversion code I > mentioned above. In the flex-array version, we defer the alloc of "ctx" until after we have computed the thread name -- we to do that so that we know the length of the thread name (and thus the size of the flex-array). To do that we need to know the thread id that we will be formatting into the thread name. And to do that we need to reserve a thread id -- which is a global and requires a lock. So the call to tr2tls_locked_increment() (as well as the formatting of the name itself) was moved up to the top of the function rather than after the "ctx" was allocated. > >> + >> + if (thread_id) >> + strbuf_addf(&buf_name, "th%02d:", thread_id); >> + strbuf_addstr(&buf_name, thread_name); >> + >> + FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len); >> + strbuf_release(&buf_name); >> + >> + ctx->thread_id = thread_id; >> >> /* >> [...] Jeff

gitgitgadget · 2021-12-29T01:56:44Z

trace2/tr2_tgt_perf.c

@@ -23,6 +23,7 @@ static int tr2env_perf_be_brief;



On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Teach the message formatter in the Trace2 perf target to accept an > optional thread name argument. This will override the thread name > inherited from the thread local storage data block. > > This will be used in a future commit for global events that should > not be tied to a particular thread, such as a global stopwatch timer. We already have a "ctx", and that "ctx" has a "thread_name", but here and in the preceding commit we're adding a "thread_name" to every caller of these functions in case we'd like to override it. Wouldn't it make more sense to just pass a "ctx" to these functions? One of them already takes it, here's an (obviously incomplete) fixup on top of your series to make the one that doesn't take a "ctx", and for the only non-NULL users of "thread_name" to just use a trivial helper to pass in a "ctx" with a new "thread_name", then to swap it back. It would make for a smaller diffstat for this already large series, or we could do exactly what we're doing now, but avoid the churn of adjusting every caller by introducing a new sister function for those who want this parameter to be non-NULL. (The below patch is "broken" in that __FILE__ and __LINE__ need to be passed in as parameters, but this is just a trivial change for show/commentary) diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c index b9eb2cdb77a..7aaec83dff7 100644 --- a/trace2/tr2_tgt_event.c +++ b/trace2/tr2_tgt_event.c @@ -82,16 +82,15 @@ static void fn_term(void) static void event_fmt_prepare(const char *event_name, const char *file, int line, const struct repository *repo, struct json_writer *jw, - const char *thread_name_override) + struct tr2tls_thread_ctx *ctx) { struct tr2_tbuf tb_now; + if (!ctx) + ctx = tr2tls_get_self(); jw_object_string(jw, "event", event_name); jw_object_string(jw, "sid", tr2_sid_get()); - jw_object_string(jw, "thread", - ((thread_name_override && *thread_name_override) - ? thread_name_override - : tr2tls_get_self()->thread_name)); + jw_object_string(jw, "thread", ctx->thread_name); /* * In brief mode, only emit <time> on these 2 event types. @@ -111,6 +110,20 @@ static void event_fmt_prepare(const char *event_name, const char *file, jw_object_intmax(jw, "repo", repo->trace2_repo_id); } +static void event_fmt_prepare_tn(const char *event_name, const char *file, + int line, const struct repository *repo, + struct json_writer *jw, + const char *thread_name) +{ + struct tr2tls_thread_ctx *ctx = tr2tls_get_self(); + const char *tmp; + + tmp = ctx->thread_name; + ctx->thread_name = thread_name; + event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, jw, ctx); + ctx->thread_name = tmp; +} + static void fn_too_many_files_fl(const char *file, int line) { const char *event_name = "too_many_files"; @@ -629,7 +642,7 @@ static void fn_timer(uint64_t us_elapsed_absolute, double t_abs = (double)us_elapsed_absolute / 1000000.0; jw_object_begin(&jw, 0); - event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, thread_name); + event_fmt_prepare_tn(event_name, __FILE__, __LINE__, NULL, &jw, thread_name); jw_object_double(&jw, "t_abs", 6, t_abs); jw_object_string(&jw, "name", timer_name); jw_object_intmax(&jw, "count", interval_count); @@ -654,7 +667,7 @@ static void fn_counter(uint64_t us_elapsed_absolute, double t_abs = (double)us_elapsed_absolute / 1000000.0; jw_object_begin(&jw, 0); - event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, thread_name); + event_fmt_prepare_tn(event_name, __FILE__, __LINE__, NULL, &jw, thread_name); jw_object_double(&jw, "t_abs", 6, t_abs); jw_object_string(&jw, "name", counter_name); jw_object_intmax(&jw, "value", value);

On the Git mailing list, Jeff Hostetler wrote (reply to this):

On 12/28/21 8:48 PM, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote: > >> From: Jeff Hostetler <jeffhost@microsoft.com> >> >> Teach the message formatter in the Trace2 perf target to accept an >> optional thread name argument. This will override the thread name >> inherited from the thread local storage data block. >> >> This will be used in a future commit for global events that should >> not be tied to a particular thread, such as a global stopwatch timer. > > We already have a "ctx", and that "ctx" has a "thread_name", but here > and in the preceding commit we're adding a "thread_name" to every caller > of these functions in case we'd like to override it. > > Wouldn't it make more sense to just pass a "ctx" to these functions? One > of them already takes it, here's an (obviously incomplete) fixup on top > of your series to make the one that doesn't take a "ctx", and for the > only non-NULL users of "thread_name" to just use a trivial helper to > pass in a "ctx" with a new "thread_name", then to swap it back. > > It would make for a smaller diffstat for this already large series, or > we could do exactly what we're doing now, but avoid the churn of > adjusting every caller by introducing a new sister function for those > who want this parameter to be non-NULL. I suppose it is possible to have a helper version of `event_fmt_prepare()` that takes the extra argument and fixup the existing function to call it with NULL. I'll see if that makes sense. [...] > > +static void event_fmt_prepare_tn(const char *event_name, const char *file, > + int line, const struct repository *repo, > + struct json_writer *jw, > + const char *thread_name) > +{ > + struct tr2tls_thread_ctx *ctx = tr2tls_get_self(); > + const char *tmp; > + > + tmp = ctx->thread_name; > + ctx->thread_name = thread_name; > + event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, jw, ctx); > + ctx->thread_name = tmp; > +} [...] This only works if we agree that thread name is a pointer inside the structure and not a flex-array. Personally, I think this is trying to do things backwards by temporarily changing the ctx->thread_name field. I think it would be better to `event_fmt_prepare_tn()` do the actual work with the supplied thread name and have the existing `event_fmt_prepare()` just call it with ctx->thread_name. Then we don't need to hack up the ctx. I'll see if this makes the diffs a little cleaner. Jeff

gitgitgadget · 2021-12-29T02:17:51Z

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote:

I left some other comments on the series inline, just on the notes in
the CL:

>  * Ævar proposed a large refactor of the "_perf" target to have a "fmt()"
>    varargs function to reduce the amount of copy-n-pasted code in many of
>    the "fn" event handlers. This looks like a good change based on the
>    mockup but is a large refactor.

FWIW what I meant with [1] was not that this series needed to take the
detour of refactoring trace2/tr2_tgt_perf.c to use such a helper, but
that for the function additions in this series it might make sense to
introduce one and use it for the new functions.

For this series I think it's probably not worth it, so I'm fine with
leaving this for some other time. Just pointing out that rather than
your reading of:

 1. We have some refactorable verbosity
 2. Refactor all callers
 3. Change existing code to use that refactoring
 4. Add new code to use the refactoring

It's also perfectly fine to do just:

 1. We have some refactorable verbosity
 2. Introduce a less verbose
 3. Add new code to use the helper

And leave the "refactor all callers" for some other time.

Anyway, I think for the two callers just leaving it entirely for this
series is the right thing to do. It was more of a "hrm, that's some odd
and avoidable verbosity..." comment on me read-through of v1.

1. https://lore.kernel.org/git/211220.86czlrurm6.gmgdl@evledraar.gmail.com/

>  * Ævar proposed a new rationale for when/why we change the "_event" version
>    number. That text can be added to the design document independently.

Hrm, no. In [1] I linked to some earlier musings of mine about what we
should do about the TR2_EVENT_VERSION (mainly as an FYI since you added
it, but hadn't commented on that post).

But my main comment there was that the series wasn't progressing as
atomic changes. I.e. we promise to change the TR2_EVENT_VERSION version
every time we change the event format, but v1 first changed the format
and bumped the version, then made some more changes.

I think that's probably fine per-se within a git release cycle, but it
might be a symtom of commits that could be split up to be more atomic (I
don't know, didn't look in detail).

However, in this v2 of the series the TR2_EVENT_VERSION bump is entirely
gone.

Maybe that means that you so vehemently agree with my proposal in [1] it
that you'd like to start taking that view for trace2 changes right away
:-)

For me it's fine either way, I think TR2_EVENT_VERSION probably isn't
that important.

But if that's the case it should probably be called out more explictly
in the CL/commit. I.e. even if our "policy" (such as it is) about
TR2_EVENT_VERSION currently says X we're going to start doing Y here
intentionally.

And in that case I should probably turn that suggestion in [1] into a an
actual PATCH sooner than later...

1. https://lore.kernel.org/git/211220.86czlrurm6.gmgdl@evledraar.gmail.com/

gitgitgadget · 2021-12-30T16:45:51Z

On the Git mailing list, Jeff Hostetler wrote (reply to this):

On 12/28/21 8:54 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote:
> 
> I left some other comments on the series inline, just on the notes in
> the CL:
> 
>>   * Ævar proposed a large refactor of the "_perf" target to have a "fmt()"
>>     varargs function to reduce the amount of copy-n-pasted code in many of
>>     the "fn" event handlers. This looks like a good change based on the
>>     mockup but is a large refactor.
> 
> FWIW what I meant with [1] was not that this series needed to take the
> detour of refactoring trace2/tr2_tgt_perf.c to use such a helper, but
> that for the function additions in this series it might make sense to
> introduce one and use it for the new functions.
> 
> For this series I think it's probably not worth it, so I'm fine with
> leaving this for some other time. Just pointing out that rather than
> your reading of:
> 
>   1. We have some refactorable verbosity
>   2. Refactor all callers
>   3. Change existing code to use that refactoring
>   4. Add new code to use the refactoring
> 
> It's also perfectly fine to do just:
> 
>   1. We have some refactorable verbosity
>   2. Introduce a less verbose
>   3. Add new code to use the helper
> 
> And leave the "refactor all callers" for some other time.
> 
> Anyway, I think for the two callers just leaving it entirely for this
> series is the right thing to do. It was more of a "hrm, that's some odd
> and avoidable verbosity..." comment on me read-through of v1.
> 
> 1. https://lore.kernel.org/git/211220.86czlrurm6.gmgdl@evledraar.gmail.com/

Sorry, but I'm going to call BS on this.  You sent a ~200 line diff
showing how we could refactor and reduce some of the duplicated code.
You have a history of introducing unnecessary refactorings in the middle
of other topics, and this looks like another example of that.  Another
example of distracting everyone from reviewing the actual new code.

And when I say that it should be an independent topic in its own
series, you fall back to the your "oh, it was just a drive-by comment."
and/or "i didn't mean for you to actually do it." and/or "you just
read my email incorrectly."

Drive-by comments don't usually have ~200 line diffs attached....

A drive-by comment would just say that "there is an opportunity to
create a varargs version of the existing io function and reduce
some duplication in the bodies of the existing callers" and be done.
I don't need a 200 line diff to see how you spell that.

Again, sorry to rant, but I'm tired looking like the stupid half
in these conversations.

> 
>>   * Ævar proposed a new rationale for when/why we change the "_event" version
>>     number. That text can be added to the design document independently.
> 
> Hrm, no. In [1] I linked to some earlier musings of mine about what we
> should do about the TR2_EVENT_VERSION (mainly as an FYI since you added
> it, but hadn't commented on that post).
> 
> But my main comment there was that the series wasn't progressing as
> atomic changes. I.e. we promise to change the TR2_EVENT_VERSION version
> every time we change the event format, but v1 first changed the format
> and bumped the version, then made some more changes.

Did you really expect me to change it twice within a single 9 commit
patch series?

This series creates both "timers" and "counters" and will both appear
together if/when they are merged.  From an external point of view,
users would see version 4 added two new event types.  So I either
increment it for "timers" or I increment it for "counters" or I squash
the two commits together and increment it then.

I didn't want to squash them, so I chose the former.

> 
> I think that's probably fine per-se within a git release cycle, but it
> might be a symtom of commits that could be split up to be more atomic (I
> don't know, didn't look in detail).
> 
> However, in this v2 of the series the TR2_EVENT_VERSION bump is entirely
> gone.

You complained when/how I bumped it in V1.  So I removed it.

And I suggested that you commit your "earlier musings".  With
that in place, there would be no need for me to change the
version number (which is what you wanted all along, right?)

> 
> Maybe that means that you so vehemently agree with my proposal in [1] it
> that you'd like to start taking that view for trace2 changes right away
> :-)

s/so vehemently agree with/are tired of debating/

> 
> For me it's fine either way, I think TR2_EVENT_VERSION probably isn't
> that important.
> 
> But if that's the case it should probably be called out more explictly
> in the CL/commit. I.e. even if our "policy" (such as it is) about
> TR2_EVENT_VERSION currently says X we're going to start doing Y here
> intentionally.
> 
> And in that case I should probably turn that suggestion in [1] into a an
> actual PATCH sooner than later...
> 
> 1. https://lore.kernel.org/git/211220.86czlrurm6.gmgdl@evledraar.gmail.com/
> 

Right, I'll add a note to V3 stating that I did not update
the version number.

Jeff

jeffhostetler · 2022-10-03T19:59:10Z

I'm going to close this and send a new (rewritten) series to the list in #1373.

jeffhostetler force-pushed the trace2-stopwatch-v2 branch from 12fd901 to 596caed Compare December 17, 2021 22:09

gitgitgadget bot reviewed Dec 20, 2021

View reviewed changes

gitgitgadget bot reviewed Dec 21, 2021

View reviewed changes

jeffhostetler force-pushed the trace2-stopwatch-v2 branch 3 times, most recently from f5f7086 to ee2dba6 Compare December 22, 2021 22:59

jeffhostetler added 7 commits December 27, 2021 14:53

jeffhostetler force-pushed the trace2-stopwatch-v2 branch from ee2dba6 to 4d6155e Compare December 27, 2021 23:05

gitgitgadget bot reviewed Dec 29, 2021

View reviewed changes

jeffhostetler closed this Oct 3, 2022

		@@ -88,7 +88,7 @@ static void event_fmt_prepare(const char event_name, const char file,

		@@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
		{

		@@ -15,7 +15,16 @@ static uint64_t tr2tls_us_start_process;
		static pthread_mutex_t tr2tls_mutex;

Trace2 stopwatch timers and global counters #1099

Trace2 stopwatch timers and global counters #1099

Conversation

jeffhostetler commented Dec 17, 2021 • edited

jeffhostetler commented Dec 20, 2021

gitgitgadget bot commented Dec 20, 2021

gitgitgadget bot Dec 20, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 20, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 20, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 21, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 22, 2021

Choose a reason for hiding this comment

gitgitgadget bot commented Dec 20, 2021

gitgitgadget bot Dec 20, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 20, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 20, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 22, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 20, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 22, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 20, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 22, 2021

Choose a reason for hiding this comment

gitgitgadget bot commented Dec 20, 2021

gitgitgadget bot Dec 21, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 22, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 22, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 21, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 22, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 22, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 22, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 23, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 23, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 27, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 21, 2021

Choose a reason for hiding this comment

gitgitgadget bot commented Dec 21, 2021

gitgitgadget bot Dec 21, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 22, 2021

Choose a reason for hiding this comment

gitgitgadget bot commented Dec 21, 2021

gitgitgadget bot commented Dec 21, 2021

gitgitgadget bot commented Dec 21, 2021

gitgitgadget bot commented Dec 23, 2021

jeffhostetler commented Dec 28, 2021

gitgitgadget bot commented Dec 28, 2021

gitgitgadget bot Dec 29, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 29, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 29, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 29, 2021

Choose a reason for hiding this comment

gitgitgadget bot Dec 29, 2021

Choose a reason for hiding this comment

gitgitgadget bot commented Dec 29, 2021

gitgitgadget bot commented Dec 30, 2021

jeffhostetler commented Oct 3, 2022

jeffhostetler commented Dec 17, 2021 •

edited