Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace2 stopwatch timers and global counters #1099

Closed

Conversation

jeffhostetler
Copy link

@jeffhostetler jeffhostetler commented Dec 17, 2021

Here is version 2 of my series to add stopwatch timers and global counters
to Trace2. I think this version address all of the comments made on V1.

  • I moved the Trace2 "thread_name" field into a flex-array at the bottom
    of the thread local storage block. This avoids the issue of whether it should
    be allocated and by whom and its const-ness.

  • I moved the truncation of the "thread_name" into the "_perf" target (which
    was the only target that actually cared) so that columns still line up.

  • Started phasing out the TLS and CTX acronyms in the Trace2 code.
    There is an ambiguity between "thread local storage" and "transport layer security"
    that caused some confusion. In this patch series, I eliminated new uses of the
    TLS term. A future series will be needed to actually rename variables, functions,
    and data types to fully eliminate the TLS term.

  • In V1 I included a change to the "_event" target version number. I've rolled
    this back in favor of Ævar's new proposal describing when/why we change it.
    (That proposal is independent of this series.)

  • In V1 I had reported timer values {total, min, max} in floating point seconds
    with microsecond precision (using a "%9.6f" format) and was internally accumulating
    interval times in microseconds. After some discussion, I've changed this to
    accumulate in nanoseconds and report integer nanoseconds. This may avoid
    some accumulated round off error. (However, on some platforms getnanotime()
    only has microsecond accuracy, so this increased precision may be misleading.)

  • Refactor the pattern model used in the unit tests to make it easier to
    visually parse.

  • Some cosmetic cleanup of the private timer and counter API.

There were additional requests/comments that I have not addressed in this
version because I think they should be in their own top-level topic in a future
series rather than appended onto this series:

  • The full elimination of the TLS and CTX terms.

  • Ævar proposed a new test_trace2 test function to parse trace output.
    This would be similar to (or a generalization of) the test_region function that
    we already have in test-lib-functions.sh.

  • Ævar proposed a large refactor of the "_perf" target to have a "fmt()"
    varargs function to reduce the amount of copy-n-pasted code in many of
    the "fn
    " event handlers. This looks like a good change based on the mockup
    but is a large refactor.

  • Ævar proposed a new rationale for when/why we change the "_event"
    version number. That text can be added to the design document independently.

cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Jeff Hostetler git@jeffhostetler.com
cc: Derrick Stolee stolee@gmail.com
cc: Matheus Tavares matheus.bernardino@usp.br
cc: Johannes Sixt j6t@kdbg.org

Use "size_t" rather than "int" for the "alloc" and "nr_open_regions"
fields in the "tr2tls_thread_ctx".  These are used by ALLOC_GROW().

This was discussed in: https://lore.kernel.org/all/YULF3hoaDxA9ENdO@nand.local/

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
@jeffhostetler
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 20, 2021

Submitted as pull.1099.git.1640012469.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-1099/jeffhostetler/trace2-stopwatch-v2-v1

To fetch this version to local tag pr-1099/jeffhostetler/trace2-stopwatch-v2-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-1099/jeffhostetler/trace2-stopwatch-v2-v1

@@ -88,7 +88,7 @@ static void event_fmt_prepare(const char *event_name, const char *file,

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Use a 'char *' to hold the thread name rather than a 'struct strbuf'.
> The thread name is set when the thread is created and should not be
> be modified afterwards.  Replace the strbuf with an allocated pointer
> to make that more clear.
>
> This was discussed in: https://lore.kernel.org/all/xmqqa6kdwo24.fsf@gitster.g/
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  trace2/tr2_tgt_event.c |  2 +-
>  trace2/tr2_tgt_perf.c  |  2 +-
>  trace2/tr2_tls.c       | 16 +++++++++-------
>  trace2/tr2_tls.h       |  2 +-
>  4 files changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
> index 3a0014417cc..ca48d00aebc 100644
> --- a/trace2/tr2_tgt_event.c
> +++ b/trace2/tr2_tgt_event.c
> @@ -88,7 +88,7 @@ static void event_fmt_prepare(const char *event_name, const char *file,
>  
>  	jw_object_string(jw, "event", event_name);
>  	jw_object_string(jw, "sid", tr2_sid_get());
> -	jw_object_string(jw, "thread", ctx->thread_name.buf);
> +	jw_object_string(jw, "thread", ctx->thread_name);
>  
>  	/*
>  	 * In brief mode, only emit <time> on these 2 event types.
> diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
> index e4acca13d64..c3e57fcb3c0 100644
> --- a/trace2/tr2_tgt_perf.c
> +++ b/trace2/tr2_tgt_perf.c
> @@ -106,7 +106,7 @@ static void perf_fmt_prepare(const char *event_name,
>  
>  	strbuf_addf(buf, "d%d | ", tr2_sid_depth());
>  	strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME,
> -		    ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME,
> +		    ctx->thread_name, TR2FMT_PERF_MAX_EVENT_NAME,
>  		    event_name);
>  
>  	len = buf->len + TR2FMT_PERF_REPO_WIDTH;
> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
> index 7da94aba522..cd8b9f2f0a0 100644
> --- a/trace2/tr2_tls.c
> +++ b/trace2/tr2_tls.c
> @@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>  					     uint64_t us_thread_start)
>  {
>  	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
> +	struct strbuf buf_name = STRBUF_INIT;
>  
>  	/*
>  	 * Implicitly "tr2tls_push_self()" to capture the thread's start
> @@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>  
>  	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
>  
> -	strbuf_init(&ctx->thread_name, 0);
>  	if (ctx->thread_id)
> -		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
> -	strbuf_addstr(&ctx->thread_name, thread_name);
> -	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
> -		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
> +		strbuf_addf(&buf_name, "th%02d:", ctx->thread_id);
> +	strbuf_addstr(&buf_name, thread_name);
> +	if (buf_name.len > TR2_MAX_THREAD_NAME)
> +		strbuf_setlen(&buf_name, TR2_MAX_THREAD_NAME);
> +
> +	ctx->thread_name = strbuf_detach(&buf_name, NULL);
>  
>  	pthread_setspecific(tr2tls_key, ctx);
>  
> @@ -95,7 +97,7 @@ void tr2tls_unset_self(void)
>  
>  	pthread_setspecific(tr2tls_key, NULL);
>  
> -	strbuf_release(&ctx->thread_name);
> +	free(ctx->thread_name);
>  	free(ctx->array_us_start);
>  	free(ctx);
>  }
> @@ -113,7 +115,7 @@ void tr2tls_pop_self(void)
>  	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
>  
>  	if (!ctx->nr_open_regions)
> -		BUG("no open regions in thread '%s'", ctx->thread_name.buf);
> +		BUG("no open regions in thread '%s'", ctx->thread_name);
>  
>  	ctx->nr_open_regions--;
>  }
> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
> index a90bd639d48..d968da6a679 100644
> --- a/trace2/tr2_tls.h
> +++ b/trace2/tr2_tls.h
> @@ -9,7 +9,7 @@
>  #define TR2_MAX_THREAD_NAME (24)
>  
>  struct tr2tls_thread_ctx {
> -	struct strbuf thread_name;
> +	char *thread_name;
>  	uint64_t *array_us_start;
>  	size_t alloc;
>  	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */

Junio's suggestion in the linked E-Mail was to make this a "const char *".

Narrowly, I don't see why not just add a "const" to the "struct strbuf
*" instead.

But less narrowly if we're not going to change it why malloc a new one
at all? Can't we just use the "const char *" passed into
tr2tls_create_self(), and for the "th%02d:" case have the code that's
formatting it handle that case?

I.e. have the things that use it as a "%s" now call a function that
formats things as a function of the "ctx->thread_id" (which may be 0)
and limit it by TR2_MAX_THREAD_NAME?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 12/20/21 11:31 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Use a 'char *' to hold the thread name rather than a 'struct strbuf'.
>> The thread name is set when the thread is created and should not be
>> be modified afterwards.  Replace the strbuf with an allocated pointer
>> to make that more clear.
>>
>> This was discussed in: https://lore.kernel.org/all/xmqqa6kdwo24.fsf@gitster.g/
 >>...
>> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
>> index 7da94aba522..cd8b9f2f0a0 100644
>> --- a/trace2/tr2_tls.c
>> +++ b/trace2/tr2_tls.c
>> @@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>>   					     uint64_t us_thread_start)
>>   {
>>   	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
>> +	struct strbuf buf_name = STRBUF_INIT;
>>   
>>   	/*
>>   	 * Implicitly "tr2tls_push_self()" to capture the thread's start
>> @@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>>   
>>   	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
>>   
>> -	strbuf_init(&ctx->thread_name, 0);
>>   	if (ctx->thread_id)
>> -		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
>> -	strbuf_addstr(&ctx->thread_name, thread_name);
>> -	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
>> -		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
>> +		strbuf_addf(&buf_name, "th%02d:", ctx->thread_id);
>> +	strbuf_addstr(&buf_name, thread_name);
>> +	if (buf_name.len > TR2_MAX_THREAD_NAME)
>> +		strbuf_setlen(&buf_name, TR2_MAX_THREAD_NAME);
>> +
>> +	ctx->thread_name = strbuf_detach(&buf_name, NULL);
>>   
>>   	pthread_setspecific(tr2tls_key, ctx);
>>   
 >>..
>> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
>> index a90bd639d48..d968da6a679 100644
>> --- a/trace2/tr2_tls.h
>> +++ b/trace2/tr2_tls.h
>> @@ -9,7 +9,7 @@
>>   #define TR2_MAX_THREAD_NAME (24)
>>   
>>   struct tr2tls_thread_ctx {
>> -	struct strbuf thread_name;
>> +	char *thread_name;
>>   	uint64_t *array_us_start;
>>   	size_t alloc;
>>   	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
> 
> Junio's suggestion in the linked E-Mail was to make this a "const char *".

Yes, it was.  To me a "const char *" in a structure means that
the structure does not own the pointer and must not free it.
Whereas as "char *" means that the structure might own it and
should maybe free it when the structure is freed.  My usage here
is that the structure does own it (because it took it from the
temporary strbuf using strbuf_detach()) and so it must free it.
Therefore it should not be "const".  This has nothing to do with
whether or not we allow the thread name to be changed after the
fact.  (We don't, but that is a different issue).

> 
> Narrowly, I don't see why not just add a "const" to the "struct strbuf
> *" instead.

Adding "const" to a strbuf would be wrong in this case, since the
structure owns the strbuf and needs to strbuf_release the contained
buffer and (now) free the strbuf pointer, right?

This also makes things confusing -- all callers of tr2tls_create_self()
would now be responsible for allocating a strbuf to pass in -- and who
would own those.  This would also create opportunities for mistakes if
they pass in the address of a stack-based strbuf, right?

This is being used to initialize thread-based data, so the caller
can't just use a "function local static" or a "global static" strbuf.


> 
> But less narrowly if we're not going to change it why malloc a new one
> at all? Can't we just use the "const char *" passed into
> tr2tls_create_self(), and for the "th%02d:" case have the code that's
> formatting it handle that case?
> 
> I.e. have the things that use it as a "%s" now call a function that
> formats things as a function of the "ctx->thread_id" (which may be 0)
> and limit it by TR2_MAX_THREAD_NAME?
> 

This would be less efficient, right?  That thread name is included in
*EVERY* _perf and _event message emitted.  If we were to change the
design to have basically a callback to get the formatted value based
on the `ctx` or `cts->thread_id` and dynamically formatting the name,
then we would have to hit that callback once (or twice) for every Trace2
message, right?  That would be much slower than just having a fixed
string (formatted when the thread is created) that we can just use.
And even if we said that the callback could cache the result (like
we do when we lookup env vars), where would it cache it?  It would have
to cache it in the `ctx`, which is where it currently is and without
any of the unnecessary overhead, right?

I think you're assuming that callers of `tr2tls_create_self()` always
pass a literal string such that that string value is always safe to
reference later.  Nothing would prevent a caller from passing the
address of a stack buffer.  It is not safe to assume that that string
pointer will always be valid, such as after the thread exits.  It is
better for _create_self() to copy the given string (whether we format
it immediately or not) than to assume that the pointer will always be
valid, right?


So I don't think we should deviate from the patch that I submitted.

Jeff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Mon, Dec 20 2021, Jeff Hostetler wrote:

> On 12/20/21 11:31 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:
>> 
>>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>>
>>> Use a 'char *' to hold the thread name rather than a 'struct strbuf'.
>>> The thread name is set when the thread is created and should not be
>>> be modified afterwards.  Replace the strbuf with an allocated pointer
>>> to make that more clear.
>>>
>>> This was discussed in: https://lore.kernel.org/all/xmqqa6kdwo24.fsf@gitster.g/
>>>...
>>> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
>>> index 7da94aba522..cd8b9f2f0a0 100644
>>> --- a/trace2/tr2_tls.c
>>> +++ b/trace2/tr2_tls.c
>>> @@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>>>   					     uint64_t us_thread_start)
>>>   {
>>>   	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
>>> +	struct strbuf buf_name = STRBUF_INIT;
>>>     	/*
>>>   	 * Implicitly "tr2tls_push_self()" to capture the thread's start
>>> @@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>>>     	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
>>>   -	strbuf_init(&ctx->thread_name, 0);
>>>   	if (ctx->thread_id)
>>> -		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
>>> -	strbuf_addstr(&ctx->thread_name, thread_name);
>>> -	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
>>> -		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
>>> +		strbuf_addf(&buf_name, "th%02d:", ctx->thread_id);
>>> +	strbuf_addstr(&buf_name, thread_name);
>>> +	if (buf_name.len > TR2_MAX_THREAD_NAME)
>>> +		strbuf_setlen(&buf_name, TR2_MAX_THREAD_NAME);
>>> +
>>> +	ctx->thread_name = strbuf_detach(&buf_name, NULL);
>>>     	pthread_setspecific(tr2tls_key, ctx);
>>>   
>>>..
>>> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
>>> index a90bd639d48..d968da6a679 100644
>>> --- a/trace2/tr2_tls.h
>>> +++ b/trace2/tr2_tls.h
>>> @@ -9,7 +9,7 @@
>>>   #define TR2_MAX_THREAD_NAME (24)
>>>     struct tr2tls_thread_ctx {
>>> -	struct strbuf thread_name;
>>> +	char *thread_name;
>>>   	uint64_t *array_us_start;
>>>   	size_t alloc;
>>>   	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
>> Junio's suggestion in the linked E-Mail was to make this a "const
>> char *".
>
> Yes, it was.  To me a "const char *" in a structure means that
> the structure does not own the pointer and must not free it.
> Whereas as "char *" means that the structure might own it and
> should maybe free it when the structure is freed.  My usage here
> is that the structure does own it (because it took it from the
> temporary strbuf using strbuf_detach()) and so it must free it.
> Therefore it should not be "const".  This has nothing to do with
> whether or not we allow the thread name to be changed after the
> fact.  (We don't, but that is a different issue).

We use the pattern of having a "const char *" that's really a "char *"
with a cast to free() in many existing APIs for this scenario.

Maybe the cast for free would be more correct here, see my recent
9081a421a6d (checkout: fix "branch info" memory leaks, 2021-11-16) & the
discussion it referencese. I.e. in that case we didn't go for the
"free((char *)ptr)" cast as it was a private API.

>> Narrowly, I don't see why not just add a "const" to the "struct
>> strbuf
>> *" instead.
>
> Adding "const" to a strbuf would be wrong in this case, since the
> structure owns the strbuf and needs to strbuf_release the contained
> buffer and (now) free the strbuf pointer, right?
>
> This also makes things confusing -- all callers of tr2tls_create_self()
> would now be responsible for allocating a strbuf to pass in -- and who
> would own those.  This would also create opportunities for mistakes if
> they pass in the address of a stack-based strbuf, right?
>
> This is being used to initialize thread-based data, so the caller
> can't just use a "function local static" or a "global static" strbuf.

Right, I meant that in the context of who/where you'd have your casts.

>> But less narrowly if we're not going to change it why malloc a new
>> one
>> at all? Can't we just use the "const char *" passed into
>> tr2tls_create_self(), and for the "th%02d:" case have the code that's
>> formatting it handle that case?
>> I.e. have the things that use it as a "%s" now call a function that
>> formats things as a function of the "ctx->thread_id" (which may be 0)
>> and limit it by TR2_MAX_THREAD_NAME?
>> 
>
> This would be less efficient, right?  That thread name is included in
> *EVERY* _perf and _event message emitted.  If we were to change the
> design to have basically a callback to get the formatted value based
> on the `ctx` or `cts->thread_id` and dynamically formatting the name,
> then we would have to hit that callback once (or twice) for every Trace2
> message, right?  That would be much slower than just having a fixed
> string (formatted when the thread is created) that we can just use.
> And even if we said that the callback could cache the result (like
> we do when we lookup env vars), where would it cache it?  It would have
> to cache it in the `ctx`, which is where it currently is and without
> any of the unnecessary overhead, right?

Aren't we per
https://lore.kernel.org/git/211220.86czlrurm6.gmgdl@evledraar.gmail.com/
doing a lot of that formatting (and sometimes allocation) anyway in a
way that's easily avoidable for the "perf" backend?

And for tr2_tgt_event.c we'll call jw_object_string(), which calls
append_quoted_string() for each event. That'll be re-quoting (presumably
always needlessly) the thread_name every time.

So just deferring a single strbuf_addf() doesn't seem like it would slow
things down.

> I think you're assuming that callers of `tr2tls_create_self()` always
> pass a literal string such that that string value is always safe to
> reference later.  Nothing would prevent a caller from passing the
> address of a stack buffer.  It is not safe to assume that that string
> pointer will always be valid, such as after the thread exits.  It is
> better for _create_self() to copy the given string (whether we format
> it immediately or not) than to assume that the pointer will always be
> valid, right?

Sure, if that's the API we can xstrdup() it, and/or xstrfmt() it etc. as
we're doing now.

> So I don't think we should deviate from the patch that I submitted.

I'm not saying anything needs to change here, these were really just
read-through suggestion, but I think per the above (about the casts &
optimization) that some of your assumptions here may not hold.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>>  struct tr2tls_thread_ctx {
>> -	struct strbuf thread_name;
>> +	char *thread_name;
>>  	uint64_t *array_us_start;
>>  	size_t alloc;
>>  	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
>
> Junio's suggestion in the linked E-Mail was to make this a "const char *".

Sorry, but in that linked E-Mail, I wasn't picking between "const
char *" and "char *" at all.  What I cared was *not* to keep a
long-term constant string in a member whose type is "struct strbuf".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 12/20/21 2:35 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Dec 20 2021, Jeff Hostetler wrote:
> 
>> On 12/20/21 11:31 AM, Ævar Arnfjörð Bjarmason wrote:
>>> On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:
>>>
>>>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>>>
>>
>> Yes, it was.  To me a "const char *" in a structure means that
>> the structure does not own the pointer and must not free it.
>> Whereas as "char *" means that the structure might own it and
>> should maybe free it when the structure is freed.  My usage here
>> is that the structure does own it (because it took it from the
>> temporary strbuf using strbuf_detach()) and so it must free it.
>> Therefore it should not be "const".  This has nothing to do with
>> whether or not we allow the thread name to be changed after the
>> fact.  (We don't, but that is a different issue).
> 
> We use the pattern of having a "const char *" that's really a "char *"
> with a cast to free() in many existing APIs for this scenario.


As I mention later in this thread, I'm going to convert the
field into a flex-array, so most of the discussion in this
part of the thread no longer applies.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 20, 2021

User Ævar Arnfjörð Bjarmason <avarab@gmail.com> has been added to the cc: list.

@@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Teach Trace2 "perf" and "event" formats to handle "timer" events for
> stopwatch timers.  Update API documentation accordingly.
>
> In a future commit, stopwatch timers will be added to the Trace2 API
> and it will emit these "timer" events.
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  Documentation/technical/api-trace2.txt | 25 +++++++++++++++-
>  trace2/tr2_tgt.h                       | 25 ++++++++++++++++
>  trace2/tr2_tgt_event.c                 | 40 +++++++++++++++++++++++++-
>  trace2/tr2_tgt_normal.c                |  1 +
>  trace2/tr2_tgt_perf.c                  | 29 +++++++++++++++++++
>  5 files changed, 118 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
> index bb13ca3db8b..e6ed94ba814 100644
> --- a/Documentation/technical/api-trace2.txt
> +++ b/Documentation/technical/api-trace2.txt
> @@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
>  {
>  	"event":"version",
>  	...
> -	"evt":"3",		       # EVENT format version
> +	"evt":"4",		       # EVENT format version
>  	"exe":"2.20.1.155.g426c96fcdb" # git version
>  }

FWIW this seems like a time not to bump the version per the proposed
approach in:
https://lore.kernel.org/git/211201.86zgpk9u3t.gmgdl@evledraar.gmail.com/

Not directly related to this series, which just preserves the status
quo, but it would be nice to get feedback on that proposal from you.

> [...]
> + * Verison 1: original version

A typo of "Version".

> + * Version 2: added "too_many_files" event
> + * Version 3: added "child_ready" event
> + * Version 4: added "timer" event
>   */
> -#define TR2_EVENT_VERSION "3"
> +#define TR2_EVENT_VERSION "4"
>  
>  /*
>   * Region nesting limit for messages written to the event target.
> @@ -615,6 +620,38 @@ static void fn_data_json_fl(const char *file, int line,
>  	}
>  }
>  
> +static void fn_timer(uint64_t us_elapsed_absolute,
> +		     const char *thread_name,
> +		     const char *category,
> +		     const char *timer_name,
> +		     uint64_t interval_count,
> +		     uint64_t us_total_time,
> +		     uint64_t us_min_time,
> +		     uint64_t us_max_time)
> +{
> +	const char *event_name = "timer";
> +	struct json_writer jw = JSON_WRITER_INIT;
> +	double t_abs = (double)us_elapsed_absolute / 1000000.0;
> +

nit: Odd placement of \n\n

> +	double t_total = (double)us_total_time / 1000000.0;
> +	double t_min   = (double)us_min_time   / 1000000.0;
> +	double t_max   = (double)us_max_time   / 1000000.0;

Both for this...

> +	jw_object_begin(&jw, 0);
> +	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, thread_name);
> +	jw_object_double(&jw, "t_abs", 6, t_abs);
> +	jw_object_string(&jw, "name", timer_name);
> +	jw_object_intmax(&jw, "count", interval_count);
> +	jw_object_double(&jw, "t_total", 6, t_total);
> +	jw_object_double(&jw, "t_min", 6, t_min);
> +	jw_object_double(&jw, "t_max", 6, t_max);

[...]

> +static void fn_timer(uint64_t us_elapsed_absolute,
> +		     const char *thread_name,
> +		     const char *category,
> +		     const char *timer_name,
> +		     uint64_t interval_count,
> +		     uint64_t us_total_time,
> +		     uint64_t us_min_time,
> +		     uint64_t us_max_time)
> +{
> +	const char *event_name = "timer";
> +	struct strbuf buf_payload = STRBUF_INIT;
> +
> +	double t_total = (double)us_total_time / 1000000.0;
> +	double t_min   = (double)us_min_time   / 1000000.0;
> +	double t_max   = (double)us_max_time   / 1000000.0;
> +
> +	strbuf_addf(&buf_payload, "name:%s", timer_name);
> +	strbuf_addf(&buf_payload, " count:%"PRIu64, interval_count);
> +	strbuf_addf(&buf_payload, " total:%9.6f", t_total);
> +	strbuf_addf(&buf_payload, " min:%9.6f", t_min);
> +	strbuf_addf(&buf_payload, " max:%9.6f", t_max);

...and this, wouldn't it be better/more readable to retain the uint64_t
for the math, and just cast if needed when we're doing the formatting?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 12/20/21 11:39 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Teach Trace2 "perf" and "event" formats to handle "timer" events for
>> stopwatch timers.  Update API documentation accordingly.
>>
>> In a future commit, stopwatch timers will be added to the Trace2 API
>> and it will emit these "timer" events.
>>
>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
 >>...
>> diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
>> index bb13ca3db8b..e6ed94ba814 100644
>> --- a/Documentation/technical/api-trace2.txt
>> +++ b/Documentation/technical/api-trace2.txt
>> @@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
>>   {
>>   	"event":"version",
>>   	...
>> -	"evt":"3",		       # EVENT format version
>> +	"evt":"4",		       # EVENT format version
>>   	"exe":"2.20.1.155.g426c96fcdb" # git version
>>   }
> 
> FWIW this seems like a time not to bump the version per the proposed
> approach in:
> https://lore.kernel.org/git/211201.86zgpk9u3t.gmgdl@evledraar.gmail.com/
> 
> Not directly related to this series, which just preserves the status
> quo, but it would be nice to get feedback on that proposal from you.

Frankly, my eyes glazed over every time I tried to read it....

Your proposal looks fine.  And yes, our assumptions are that because
we have structured data, new event types and/or new fields can be
added and safely ignored by JSON parsers, so we should be OK.
So we're assuming that only if we drop events or fields or change
the meaning of one of them, would a parser need to react and so
we can limit version bumps to those instances.

I'm OK with this.

I'll let you draft the wording in api-trace2.txt to explain the
how/when/why we want to update the version number in the future.
Thanks.


> 
>> [...]
>> + * Verison 1: original version
> 
> A typo of "Version".
> 
>> + * Version 2: added "too_many_files" event
>> + * Version 3: added "child_ready" event
>> + * Version 4: added "timer" event
>>    */
>> -#define TR2_EVENT_VERSION "3"
>> +#define TR2_EVENT_VERSION "4"
>>   

I'll roll this back in my next version.

>>   /*
>>    * Region nesting limit for messages written to the event target.
>> @@ -615,6 +620,38 @@ static void fn_data_json_fl(const char *file, int line,
>>   	}
>>   }
>>   
>> +static void fn_timer(uint64_t us_elapsed_absolute,
>> +		     const char *thread_name,
>> +		     const char *category,
>> +		     const char *timer_name,
>> +		     uint64_t interval_count,
>> +		     uint64_t us_total_time,
>> +		     uint64_t us_min_time,
>> +		     uint64_t us_max_time)
>> +{
>> +	const char *event_name = "timer";
>> +	struct json_writer jw = JSON_WRITER_INIT;
>> +	double t_abs = (double)us_elapsed_absolute / 1000000.0;
>> +
> 
> nit: Odd placement of \n\n
> 
>> +	double t_total = (double)us_total_time / 1000000.0;
>> +	double t_min   = (double)us_min_time   / 1000000.0;
>> +	double t_max   = (double)us_max_time   / 1000000.0;
> 
> Both for this...
> 
>> +	jw_object_begin(&jw, 0);
>> +	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, thread_name);
>> +	jw_object_double(&jw, "t_abs", 6, t_abs);
>> +	jw_object_string(&jw, "name", timer_name);
>> +	jw_object_intmax(&jw, "count", interval_count);
>> +	jw_object_double(&jw, "t_total", 6, t_total);
>> +	jw_object_double(&jw, "t_min", 6, t_min);
>> +	jw_object_double(&jw, "t_max", 6, t_max);
> 
> [...]
> 
>> +static void fn_timer(uint64_t us_elapsed_absolute,
>> +		     const char *thread_name,
>> +		     const char *category,
>> +		     const char *timer_name,
>> +		     uint64_t interval_count,
>> +		     uint64_t us_total_time,
>> +		     uint64_t us_min_time,
>> +		     uint64_t us_max_time)
>> +{
>> +	const char *event_name = "timer";
>> +	struct strbuf buf_payload = STRBUF_INIT;
>> +
>> +	double t_total = (double)us_total_time / 1000000.0;
>> +	double t_min   = (double)us_min_time   / 1000000.0;
>> +	double t_max   = (double)us_max_time   / 1000000.0;
>> +
>> +	strbuf_addf(&buf_payload, "name:%s", timer_name);
>> +	strbuf_addf(&buf_payload, " count:%"PRIu64, interval_count);
>> +	strbuf_addf(&buf_payload, " total:%9.6f", t_total);
>> +	strbuf_addf(&buf_payload, " min:%9.6f", t_min);
>> +	strbuf_addf(&buf_payload, " max:%9.6f", t_max);
> 
> ....and this, wouldn't it be better/more readable to retain the uint64_t
> for the math, and just cast if needed when we're doing the formatting?
> 

I had those expressions inline at first and it really junked up the
lines and made things hard to read -- partially because of the need
to wrap the lines a lot.  I went with the local t_* temp vars to make
it more clear what we were doing.  This style also matched the existing
code in _tgt_event.c for `t_abs` and `t_rel` in all of the fn_*.

Jeff

@@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
> [...]
> +static void tr2main_emit_summary_timers(uint64_t us_elapsed_absolute)
> +{
> +	struct tr2_tgt *tgt_j;
> +	int j;
> +	struct tr2tmr_block merged;
> +
> +	memset(&merged, 0, sizeof(merged));

Nit: just do a " = { 0 }" assignment above instead.

> +	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
> +		BUG("invalid timer id: %d", tid);
> +
> +	tr2tmr_start(tid);
> +}
> +
> +void trace2_timer_stop(enum trace2_timer_id tid)
> +{
> +	if (!trace2_enabled)
> +		return;
> +
> +	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
> +		BUG("invalid timer id: %d", tid);

nit / style: maybe assert() instead for cases where assert() produces
better info than BUG(). I.e. it would quote the whole expression, and
show you what condition it violated....

> +void tr2tmr_stop(enum trace2_timer_id tid)
> +{
> +	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
> +	struct tr2tmr_timer *t = &ctx->timers.timer[tid];
> +	uint64_t us_now;
> +	uint64_t us_interval;
> +
> +	assert(t->recursion_count > 0);

...as you opted to do here.

> +
> +	t->recursion_count--;
> +	if (t->recursion_count > 0)
> +		return; /* still in recursive call */
> +
> +	us_now = getnanotime() / 1000;
> +	us_interval = us_now - t->start_us;
> +
> +	t->total_us += us_interval;
> +
> +	if (!t->interval_count) {
> +		t->min_us = us_interval;
> +		t->max_us = us_interval;
> +	} else {
> +		if (us_interval < t->min_us)
> +			t->min_us = us_interval;
> +		if (us_interval > t->max_us)
> +			t->max_us = us_interval;
> +	}

Perhaps more readable/easily understood as just a (untested):

    if (!t->interval_count || us_interval >= t->min_us)
	    t->min_us = us_interval;
    if (!t->interval_count || us_interval >= t->max_us)
	    t->max_us = us_interval;

I.e. to avoid duplicating the identical assignment...

> [...]
> +		if (!t->interval_count)
> +			continue; /* this timer was not used by this thread. */
> +
> +		t_merged->total_us += t->total_us;
> +
> +		if (!t_merged->interval_count) {
> +			t_merged->min_us = t->min_us;
> +			t_merged->max_us = t->max_us;
> +		} else {
> +			if (t->min_us < t_merged->min_us)
> +				t_merged->min_us = t->min_us;
> +			if (t->max_us > t_merged->max_us)
> +				t_merged->max_us = t->max_us;
> +		}

...ditto, maybe since it's used at least twice factor it out to some
trivial "static" helper here (maybe not worth it..)>

> +	/*
> +	 * Number of nested starts on the stack in this thread.  (We
> +	 * ignore recursive starts and use this to track the recursive
> +	 * calls.)
> +	 */
> +	unsigned int recursion_count;

Earlier we have various forms of:

    if (t->recursion_count > 1)

But since it's unsigned can we just make those a:

    if (t->recursion_count)

?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 12/20/21 11:42 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>> [...]
>> +
>> +void trace2_timer_stop(enum trace2_timer_id tid)
>> +{
>> +	if (!trace2_enabled)
>> +		return;
>> +
>> +	if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS)
>> +		BUG("invalid timer id: %d", tid);
> 
> nit / style: maybe assert() instead for cases where assert() produces
> better info than BUG(). I.e. it would quote the whole expression, and
> show you what condition it violated....

I'd rather leave it a BUG() so that we always get the
guard code.  assert() goes away in non-debug builds and
a little while later "tid" will be used as a subscript.

I'll add the function name to the BUG message to make
it a little clearer.


[...]
> 
> Perhaps more readable/easily understood as just a (untested):
> 
>      if (!t->interval_count || us_interval >= t->min_us)
> 	    t->min_us = us_interval;
>      if (!t->interval_count || us_interval >= t->max_us)
> 	    t->max_us = us_interval;
> 
> I.e. to avoid duplicating the identical assignment...
[...]

I'll look at something here to make this a little less
messy.  Probably add a MIN() and MAX() to the mixture.

> 
>> +	/*
>> +	 * Number of nested starts on the stack in this thread.  (We
>> +	 * ignore recursive starts and use this to track the recursive
>> +	 * calls.)
>> +	 */
>> +	unsigned int recursion_count;
> 
> Earlier we have various forms of:
> 
>      if (t->recursion_count > 1)
> 
> But since it's unsigned can we just make those a:
> 
>      if (t->recursion_count)
> 

The places that are > 0, yes.  But the > 1 instances
are different since we're counting how many calls are
on the stack and want to handle recursive calls differently
than the first.

@@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
> [...]
> diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
> index 9b3905b920c..ca36d44dfd7 100644
> --- a/trace2/tr2_tgt_event.c
> +++ b/trace2/tr2_tgt_event.c
> @@ -23,7 +23,7 @@ static struct tr2_dst tr2dst_event = { TR2_SYSENV_EVENT, 0, 0, 0, 0 };
>   * Verison 1: original version
>   * Version 2: added "too_many_files" event
>   * Version 3: added "child_ready" event
> - * Version 4: added "timer" event
> + * Version 4: added "timer" and "counter" events
>   */
>  #define TR2_EVENT_VERSION "4"
>  

Nit on series structure: Earlier we bumped the version to 4, but here
we're changing existing version 4 behavior. Would be better IMO just
bump it at the end (if at all needed, per:
https://lore.kernel.org/git/211201.86zgpk9u3t.gmgdl@evledraar.gmail.com/)

> +static void fn_counter(uint64_t us_elapsed_absolute,
> +		       const char *thread_name,
> +		       const char *category,
> +		       const char *counter_name,
> +		       uint64_t value)
> +{
> +	const char *event_name = "counter";
> +	struct strbuf buf_payload = STRBUF_INIT;
> +
> +	strbuf_addf(&buf_payload, "name:%s", counter_name);
> +	strbuf_addf(&buf_payload, " value:%"PRIu64, value);

Odd to have these be two seperate strbuf_addf()...
> +
> +	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL,
> +			 &us_elapsed_absolute, NULL,
> +			 category, &buf_payload, thread_name);
> +	strbuf_release(&buf_payload);

...but more generally, and I see from e.g. the existing fn_version_fl
that you're just using existing patterns, but it seems odd not to have a
trivial varargs fmt helper for perf_io_write_fl that would avoid the
whole strbuf/addf/release dance.

I did a quick experiment to do that, patch on "master" below. A lot of
the boilerplate could be simplified by factoring out the
sq_quote_buf_pretty() case, and even this approach (re)allocs in a way
that looks avoidable in many cases if perf_fmt_prepare() were improved
(but it looks like it nedes its if/while loops in some cases still):

diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c
index 2ff9cf70835..bcbb0d8a250 100644
--- a/trace2/tr2_tgt_perf.c
+++ b/trace2/tr2_tgt_perf.c
@@ -153,16 +153,33 @@ static void perf_io_write_fl(const char *file, int line, const char *event_name,
 	strbuf_release(&buf_line);
 }
 
+__attribute__((format (printf, 8, 9)))
+static void perf_io_write_fl_fmt(const char *file, int line, const char *event_name,
+				 const struct repository *repo,
+				 uint64_t *p_us_elapsed_absolute,
+				 uint64_t *p_us_elapsed_relative,
+				 const char *category,
+				 const char *fmt, ...)
+{
+	va_list ap;
+	struct strbuf sb = STRBUF_INIT;
+
+	va_start(ap, fmt);
+	strbuf_vaddf(&sb, fmt, ap);
+	va_end(ap);
+
+	perf_io_write_fl(file, line, event_name, repo, p_us_elapsed_absolute,
+			 p_us_elapsed_relative, category, &sb);
+
+	strbuf_release(&sb);
+}
+
 static void fn_version_fl(const char *file, int line)
 {
 	const char *event_name = "version";
-	struct strbuf buf_payload = STRBUF_INIT;
-
-	strbuf_addstr(&buf_payload, git_version_string);
 
-	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, NULL, NULL, NULL,
+			     "%s", git_version_string);
 }
 
 static void fn_start_fl(const char *file, int line,
@@ -182,37 +199,25 @@ static void fn_exit_fl(const char *file, int line, uint64_t us_elapsed_absolute,
 		       int code)
 {
 	const char *event_name = "exit";
-	struct strbuf buf_payload = STRBUF_INIT;
 
-	strbuf_addf(&buf_payload, "code:%d", code);
-
-	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, &us_elapsed_absolute,
+			     NULL, NULL, "code:%d", code);
 }
 
 static void fn_signal(uint64_t us_elapsed_absolute, int signo)
 {
 	const char *event_name = "signal";
-	struct strbuf buf_payload = STRBUF_INIT;
 
-	strbuf_addf(&buf_payload, "signo:%d", signo);
-
-	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL,
-			 &us_elapsed_absolute, NULL, NULL, &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(__FILE__, __LINE__, event_name, NULL,
+			 &us_elapsed_absolute, NULL, NULL, "signo:%d", signo);
 }
 
 static void fn_atexit(uint64_t us_elapsed_absolute, int code)
 {
 	const char *event_name = "atexit";
-	struct strbuf buf_payload = STRBUF_INIT;
-
-	strbuf_addf(&buf_payload, "code:%d", code);
 
-	perf_io_write_fl(__FILE__, __LINE__, event_name, NULL,
-			 &us_elapsed_absolute, NULL, NULL, &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(__FILE__, __LINE__, event_name, NULL,
+			 &us_elapsed_absolute, NULL, NULL, "code:%d", code);
 }
 
 static void maybe_append_string_va(struct strbuf *buf, const char *fmt,
@@ -244,13 +249,9 @@ static void fn_error_va_fl(const char *file, int line, const char *fmt,
 static void fn_command_path_fl(const char *file, int line, const char *pathname)
 {
 	const char *event_name = "cmd_path";
-	struct strbuf buf_payload = STRBUF_INIT;
-
-	strbuf_addstr(&buf_payload, pathname);
 
-	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, NULL, NULL, NULL,
+			     "%s", pathname);
 }
 
 static void fn_command_ancestry_fl(const char *file, int line, const char **parent_names)
@@ -286,13 +287,9 @@ static void fn_command_name_fl(const char *file, int line, const char *name,
 static void fn_command_mode_fl(const char *file, int line, const char *mode)
 {
 	const char *event_name = "cmd_mode";
-	struct strbuf buf_payload = STRBUF_INIT;
-
-	strbuf_addstr(&buf_payload, mode);
 
-	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, NULL, NULL, NULL,
+			     "%s", mode);
 }
 
 static void fn_alias_fl(const char *file, int line, const char *alias,
@@ -351,13 +348,10 @@ static void fn_child_exit_fl(const char *file, int line,
 			     int code, uint64_t us_elapsed_child)
 {
 	const char *event_name = "child_exit";
-	struct strbuf buf_payload = STRBUF_INIT;
 
-	strbuf_addf(&buf_payload, "[ch%d] pid:%d code:%d", cid, pid, code);
-
-	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 &us_elapsed_child, NULL, &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, &us_elapsed_absolute,
+			     &us_elapsed_child, NULL, "[ch%d] pid:%d code:%d",
+			     cid, pid, code);
 }
 
 static void fn_child_ready_fl(const char *file, int line,
@@ -365,24 +359,19 @@ static void fn_child_ready_fl(const char *file, int line,
 			      const char *ready, uint64_t us_elapsed_child)
 {
 	const char *event_name = "child_ready";
-	struct strbuf buf_payload = STRBUF_INIT;
-
-	strbuf_addf(&buf_payload, "[ch%d] pid:%d ready:%s", cid, pid, ready);
 
-	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 &us_elapsed_child, NULL, &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, &us_elapsed_absolute,
+			     &us_elapsed_child, NULL,
+			     "[ch%d] pid:%d ready:%s", cid, pid, ready);
 }
 
 static void fn_thread_start_fl(const char *file, int line,
 			       uint64_t us_elapsed_absolute)
 {
 	const char *event_name = "thread_start";
-	struct strbuf buf_payload = STRBUF_INIT;
 
-	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 NULL, NULL, &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, &us_elapsed_absolute,
+			     NULL, NULL, "%s", ""); /* TODO: No payload, support NULL? */
 }
 
 static void fn_thread_exit_fl(const char *file, int line,
@@ -390,11 +379,9 @@ static void fn_thread_exit_fl(const char *file, int line,
 			      uint64_t us_elapsed_thread)
 {
 	const char *event_name = "thread_exit";
-	struct strbuf buf_payload = STRBUF_INIT;
 
-	perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute,
-			 &us_elapsed_thread, NULL, &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, &us_elapsed_absolute,
+			     &us_elapsed_thread, NULL, "%s", ""); /* TODO: No payload, support NULL ? */
 }
 
 static void fn_exec_fl(const char *file, int line, uint64_t us_elapsed_absolute,
@@ -438,13 +425,9 @@ static void fn_param_fl(const char *file, int line, const char *param,
 			const char *value)
 {
 	const char *event_name = "def_param";
-	struct strbuf buf_payload = STRBUF_INIT;
-
-	strbuf_addf(&buf_payload, "%s:%s", param, value);
 
-	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
-			 &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, NULL, NULL, NULL, NULL,
+			     "%s:%s", param, value);
 }
 
 static void fn_repo_fl(const char *file, int line,
@@ -525,13 +508,10 @@ static void fn_data_json_fl(const char *file, int line,
 			    const struct json_writer *value)
 {
 	const char *event_name = "data_json";
-	struct strbuf buf_payload = STRBUF_INIT;
-
-	strbuf_addf(&buf_payload, "%s:%s", key, value->json.buf);
 
-	perf_io_write_fl(file, line, event_name, repo, &us_elapsed_absolute,
-			 &us_elapsed_region, category, &buf_payload);
-	strbuf_release(&buf_payload);
+	perf_io_write_fl_fmt(file, line, event_name, repo, &us_elapsed_absolute,
+			     &us_elapsed_region, category,
+			     "%s:%s", key, value->json.buf);
 }
 
 static void fn_printf_va_fl(const char *file, int line,

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 12/20/21 11:51 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>> [...]
> 
>> +static void fn_counter(uint64_t us_elapsed_absolute,
>> +		       const char *thread_name,
>> +		       const char *category,
>> +		       const char *counter_name,
>> +		       uint64_t value)
>> +{
>> +	const char *event_name = "counter";
>> +	struct strbuf buf_payload = STRBUF_INIT;
>> +
>> +	strbuf_addf(&buf_payload, "name:%s", counter_name);
>> +	strbuf_addf(&buf_payload, " value:%"PRIu64, value);
> 
> Odd to have these be two seperate strbuf_addf()...

yeah, i'll combine.  and in the body of fn_timer in 6/9.


> ....but more generally, and I see from e.g. the existing fn_version_fl
> that you're just using existing patterns, but it seems odd not to have a
> trivial varargs fmt helper for perf_io_write_fl that would avoid the
> whole strbuf/addf/release dance.
[...]

yeah, cut-n-paste was used here and i was maintaining
consistency with the other functions -- rather than inventing
something new and refactoring stuff that didn't need be refactored
in the middle of an on-going patch series.


> I did a quick experiment to do that, patch on "master" below. A lot of
> the boilerplate could be simplified by factoring out the
> sq_quote_buf_pretty() case, and even this approach (re)allocs in a way
> that looks avoidable in many cases if perf_fmt_prepare() were improved
> (but it looks like it nedes its if/while loops in some cases still):
> 
[...]
>   
> +__attribute__((format (printf, 8, 9)))
> +static void perf_io_write_fl_fmt(const char *file, int line, const char *event_name,
> +				 const struct repository *repo,
> +				 uint64_t *p_us_elapsed_absolute,
> +				 uint64_t *p_us_elapsed_relative,
> +				 const char *category,
> +				 const char *fmt, ...)
> +{
> +	va_list ap;
> +	struct strbuf sb = STRBUF_INIT;
> +
> +	va_start(ap, fmt);
> +	strbuf_vaddf(&sb, fmt, ap);
> +	va_end(ap);
> +
> +	perf_io_write_fl(file, line, event_name, repo, p_us_elapsed_absolute,
> +			 p_us_elapsed_relative, category, &sb);
> +
> +	strbuf_release(&sb);
> +}
> +
>   static void fn_version_fl(const char *file, int line)
>   {
>   	const char *event_name = "version";
> -	struct strbuf buf_payload = STRBUF_INIT;
> -
> -	strbuf_addstr(&buf_payload, git_version_string);
>   
> -	perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
> -			 &buf_payload);
> -	strbuf_release(&buf_payload);
> +	perf_io_write_fl_fmt(file, line, event_name, NULL, NULL, NULL, NULL,
> +			     "%s", git_version_string);
>   }
[...]

Yes, it might be nice to have a _fmt() version as you suggest
and simplify many of the existing fn_*() function bodies.

It seems like I keep saying this today, but can we discuss that
in a new top-level topic and not down inside commit 8/9 of this
series?

Thanks,
Jeff

@@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
> [...]
> +struct ut_011_data {
> +	int v1, v2;
> +};
> [...]
> +	struct ut_011_data data = { 0, 0 };

Nit: Just "{ 0 }" is OK for zero'd out initialization. No need to keep
extending this for every field.

For things that want to exhaustively list fields for clarity, designated
would be preferred:
o
    { .v1 = 0, .v2 = 0 }

> +	int nr_threads = 0;
> +	int k;
> +	pthread_t *pids = NULL;
> +
> +	if (argc != 3)
> +		die("%s", usage_error);
> +	if (get_i(&data.v1, argv[0]))
> +		die("%s", usage_error);
> +	if (get_i(&data.v2, argv[1]))
> +		die("%s", usage_error);
> +	if (get_i(&nr_threads, argv[2]))
> +		die("%s", usage_error);

A partial nit on existing code, as this just extends the pattern, but
couldn't much of this get_i() etc. just be made redundant by simply
using the parse-options.c API here?  I.e. OPTION_INTEGER and using named
arguments would do the validation or you.

> +# Exercise the global counter in a loop and confirm that we get the
> +# expected sum in an event record.
> +#
> +
> +have_counter_event () {
> +	thread=$1
> +	name=$2
> +	value=$3
> +	file=$4
> +
> +	grep "\"event\":\"counter\".*\"thread\":\"${thread}\".*\"name\":\"${name}\".*\"value\":${value}" $file
> +
> +	return $?
> +}

It looks like there's no helper, but this is the Nth thing I see where
wish our "test_region" helper were just a bit more generalized. I.e.:

    test_trace2 --match=counter --match=thread=$thread --match=name=$name --match=value=$value <trace>

With test_region just being a wrapper for something like:

    test_trace2 --match=region_enter --match=category=$category --match=label=$label <trace> &&
    test_trace2 --match=region_leave --match=category=$category --match=label=$label <trace>

> +static void tr2main_emit_summary_counters(uint64_t us_elapsed_absolute)
> +{
> +	struct tr2_tgt *tgt_j;
> +	int j;
> +	struct tr2ctr_block merged;
> +
> +	memset(&merged, 0, sizeof(merged));

nit: more memset v.s. "{ 0 }".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 12/20/21 12:14 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Dec 20 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>> [...]

>> +	int nr_threads = 0;
>> +	int k;
>> +	pthread_t *pids = NULL;
>> +
>> +	if (argc != 3)
>> +		die("%s", usage_error);
>> +	if (get_i(&data.v1, argv[0]))
>> +		die("%s", usage_error);
>> +	if (get_i(&data.v2, argv[1]))
>> +		die("%s", usage_error);
>> +	if (get_i(&nr_threads, argv[2]))
>> +		die("%s", usage_error);
> 
> A partial nit on existing code, as this just extends the pattern, but
> couldn't much of this get_i() etc. just be made redundant by simply
> using the parse-options.c API here?  I.e. OPTION_INTEGER and using named
> arguments would do the validation or you.

I suppose.  It just seemed like a little overkill setting things
up for such a simple and isolated test.  And the cut-n-paste was
quick enough for my purposes.

> 
>> +# Exercise the global counter in a loop and confirm that we get the
>> +# expected sum in an event record.
>> +#
>> +
>> +have_counter_event () {
>> +	thread=$1
>> +	name=$2
>> +	value=$3
>> +	file=$4
>> +
>> +	grep "\"event\":\"counter\".*\"thread\":\"${thread}\".*\"name\":\"${name}\".*\"value\":${value}" $file
>> +
>> +	return $?
>> +}
> 
> It looks like there's no helper, but this is the Nth thing I see where
> wish our "test_region" helper were just a bit more generalized. I.e.:
> 
>      test_trace2 --match=counter --match=thread=$thread --match=name=$name --match=value=$value <trace>
> 
> With test_region just being a wrapper for something like:
> 
>      test_trace2 --match=region_enter --match=category=$category --match=label=$label <trace> &&
>      test_trace2 --match=region_leave --match=category=$category --match=label=$label <trace>

Yes, that would be nice.  But I don't think we should
start that in the middle of this patch series.  Perhaps
you could start a top-level message with a fleshed out
proposal and let everyone discuss it.

> 
>> +static void tr2main_emit_summary_counters(uint64_t us_elapsed_absolute)
>> +{
>> +	struct tr2_tgt *tgt_j;
>> +	int j;
>> +	struct tr2ctr_block merged;
>> +
>> +	memset(&merged, 0, sizeof(merged));
> 
> nit: more memset v.s. "{ 0 }".

Yeah, but lldb wouldn't stop complaining until it was "= { { { 0 } } }"

Jeff

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 20, 2021

User Jeff Hostetler <git@jeffhostetler.com> has been added to the cc: list.

@@ -88,7 +88,7 @@ static void event_fmt_prepare(const char *event_name, const char *file,

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Use a 'char *' to hold the thread name rather than a 'struct strbuf'.
> The thread name is set when the thread is created and should not be
> be modified afterwards.  Replace the strbuf with an allocated pointer
> to make that more clear.

Sounds good.  Use of strbuf is perfectly fine while you compute the
final value of the string, but as a more permanent location to store
the result, it often is unsuitable (and strbuf_split_buf() is a prime
example of how *not* to design your API function around the type).

> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
> index 7da94aba522..cd8b9f2f0a0 100644
> --- a/trace2/tr2_tls.c
> +++ b/trace2/tr2_tls.c
> @@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>  					     uint64_t us_thread_start)
>  {
>  	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
> +	struct strbuf buf_name = STRBUF_INIT;
>  
>  	/*
>  	 * Implicitly "tr2tls_push_self()" to capture the thread's start
> @@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>  
>  	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
>  
> -	strbuf_init(&ctx->thread_name, 0);
>  	if (ctx->thread_id)
> -		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
> -	strbuf_addstr(&ctx->thread_name, thread_name);
> -	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
> -		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
> +		strbuf_addf(&buf_name, "th%02d:", ctx->thread_id);
> +	strbuf_addstr(&buf_name, thread_name);
> +	if (buf_name.len > TR2_MAX_THREAD_NAME)
> +		strbuf_setlen(&buf_name, TR2_MAX_THREAD_NAME);
> +
> +	ctx->thread_name = strbuf_detach(&buf_name, NULL);

This is not exactly a new problem, but if we use a mechanism to
allow arbitrary long string (like composing with strbuf and
detaching the resulting string as is), instead of having a fixed
name[] array embedded in the ctx structure, I wonder if applying the
maximum length this early makes sense.  Such a truncation would
allow more than one ctx structures to share the same name, which
somehow feels error prone, inviting a mistake to use .thread_name
member as an identifier, when its only intended use is to give a
human-readable and not necessarily unique label.  Of course, if the
maximum is reasonably low, like a few dozen bytes, it may even make
sense to embed an array of the fixed size and not worry about an
extra pointer.

> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
> index a90bd639d48..d968da6a679 100644
> --- a/trace2/tr2_tls.h
> +++ b/trace2/tr2_tls.h
> @@ -9,7 +9,7 @@
>  #define TR2_MAX_THREAD_NAME (24)
>  
>  struct tr2tls_thread_ctx {
> -	struct strbuf thread_name;
> +	char *thread_name;

That is, something like

	char thread_name[TR2_MAX_THREAD_NAME + 1];

perhaps with moving it to the end of the struct to avoid padding
waste, would make more sense than the posted patch, if we accept
an early truncation and information loss.

The other extreme would also make equally more sense than the posted
patch.  Just grab strbuf_detach() result without truncation and
point at it with "char *thread_name" here, and if the output layer
wants to limit the names to some reasonable length, deal with the
TR2_MAX_THREAD_NAME at that layer, without losing information too
early.  It might be a much bigger surgery, I am afraid, because the
users of ctx->thread_name (and old ctx->thread_name.buf) all are
relying on the string being shorter than TR2_MAX_THREAD_NAME.

>  	uint64_t *array_us_start;
>  	size_t alloc;
>  	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 12/21/21 2:22 AM, Junio C Hamano wrote:
> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Use a 'char *' to hold the thread name rather than a 'struct strbuf'.
>> The thread name is set when the thread is created and should not be
>> be modified afterwards.  Replace the strbuf with an allocated pointer
>> to make that more clear.
> 
> Sounds good.  Use of strbuf is perfectly fine while you compute the
> final value of the string, but as a more permanent location to store
> the result, it often is unsuitable (and strbuf_split_buf() is a prime
> example of how *not* to design your API function around the type).
> 
>> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
>> index 7da94aba522..cd8b9f2f0a0 100644
>> --- a/trace2/tr2_tls.c
>> +++ b/trace2/tr2_tls.c
>> @@ -35,6 +35,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>>   					     uint64_t us_thread_start)
>>   {
>>   	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
>> +	struct strbuf buf_name = STRBUF_INIT;
>>   
>>   	/*
>>   	 * Implicitly "tr2tls_push_self()" to capture the thread's start
>> @@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>>   
>>   	ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
>>   
>> -	strbuf_init(&ctx->thread_name, 0);
>>   	if (ctx->thread_id)
>> -		strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id);
>> -	strbuf_addstr(&ctx->thread_name, thread_name);
>> -	if (ctx->thread_name.len > TR2_MAX_THREAD_NAME)
>> -		strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME);
>> +		strbuf_addf(&buf_name, "th%02d:", ctx->thread_id);
>> +	strbuf_addstr(&buf_name, thread_name);
>> +	if (buf_name.len > TR2_MAX_THREAD_NAME)
>> +		strbuf_setlen(&buf_name, TR2_MAX_THREAD_NAME);
>> +
>> +	ctx->thread_name = strbuf_detach(&buf_name, NULL);
> 
> This is not exactly a new problem, but if we use a mechanism to
> allow arbitrary long string (like composing with strbuf and
> detaching the resulting string as is), instead of having a fixed
> name[] array embedded in the ctx structure, I wonder if applying the
> maximum length this early makes sense.  Such a truncation would
> allow more than one ctx structures to share the same name, which
> somehow feels error prone, inviting a mistake to use .thread_name
> member as an identifier, when its only intended use is to give a
> human-readable and not necessarily unique label.  Of course, if the
> maximum is reasonably low, like a few dozen bytes, it may even make
> sense to embed an array of the fixed size and not worry about an
> extra pointer.
> 

I'll convert it to a flex-array at the bottom of the CTX structure
and then defer the truncation to the _perf target (which only does
that to keep the columns lined up).

That will simplify things considerably.

Thanks
Jeff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff Hostetler <git@jeffhostetler.com> writes:

> I'll convert it to a flex-array at the bottom of the CTX structure
> and then defer the truncation to the _perf target (which only does
> that to keep the columns lined up).
>
> That will simplify things considerably.

I am not sure if the complexity of flex-array is worth it.

You have been storing an up-to-24-byte human readable name by
embedding a strbuf that has two size_t plus a pointer (i.e. 24-bytes
even on Windows), and the posted patch changes it to a pointer plus
a on-heap allocation with malloc() overhead.

An embedded fixed-size thread_name[TR2_MAX_THREAD_NAME+1] member
may be the simplest thing to do, I suspect.

@@ -15,7 +15,16 @@ static uint64_t tr2tls_us_start_process;
static pthread_mutex_t tr2tls_mutex;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Defer freeing of the Trace2 thread CTX data until program exit.
> Create a global list of thread CTX data to own the storage.
>
> TLS CTX data is allocated when a thread is created and associated
> with that thread.  Previously, that storage was deleted when the
> thread exited.  Now we simply disassociate the CTX data from the
> thread when it exits and let the global CTX list manage the cleanup.

By the way, TLS CTX sounds embarrassingly close and confusing to
some function that we may find in say openssl or some crypto stuff
X-<.  Was there a strong reason to avoid calling these functions and
types something like tr2_thread_ctx instead of tr2tls_thread_ctx?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 12/21/21 2:30 AM, Junio C Hamano wrote:
> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Defer freeing of the Trace2 thread CTX data until program exit.
>> Create a global list of thread CTX data to own the storage.
>>
>> TLS CTX data is allocated when a thread is created and associated
>> with that thread.  Previously, that storage was deleted when the
>> thread exited.  Now we simply disassociate the CTX data from the
>> thread when it exits and let the global CTX list manage the cleanup.
> 
> By the way, TLS CTX sounds embarrassingly close and confusing to
> some function that we may find in say openssl or some crypto stuff
> X-<.  Was there a strong reason to avoid calling these functions and
> types something like tr2_thread_ctx instead of tr2tls_thread_ctx?
> 

I hadn't really thought about the term "TLS" in the context
of crypto -- I had "thread local storage" on the brain.  I guess
I've spent too much of my youth using Win32 thread APIs. :-)

Let me take a look at removing those terms.

Jeff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff Hostetler <git@jeffhostetler.com> writes:

> I hadn't really thought about the term "TLS" in the context
> of crypto -- I had "thread local storage" on the brain.  I guess
> I've spent too much of my youth using Win32 thread APIs. :-)
>
> Let me take a look at removing those terms.

Nah, it may be just me.  As long as what TLS stands for is clear in
the context, it is fine.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 12/22/21 5:56 PM, Junio C Hamano wrote:
> Jeff Hostetler <git@jeffhostetler.com> writes:
> 
>> I hadn't really thought about the term "TLS" in the context
>> of crypto -- I had "thread local storage" on the brain.  I guess
>> I've spent too much of my youth using Win32 thread APIs. :-)
>>
>> Let me take a look at removing those terms.
> 
> Nah, it may be just me.  As long as what TLS stands for is clear in
> the context, it is fine.
> 

ok thanks.  i took a quick look at scrubbing the
code of TLS and even though most of the uses are
in private (or protected) tr2_*.[ch] files, it
will be a large churn-type change and i'm not
sure it's worth the effort.

thanks
jeff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Sixt wrote (reply to this):

Am 22.12.21 um 23:56 schrieb Junio C Hamano:
> Jeff Hostetler <git@jeffhostetler.com> writes:
> 
>> I hadn't really thought about the term "TLS" in the context
>> of crypto -- I had "thread local storage" on the brain.  I guess
>> I've spent too much of my youth using Win32 thread APIs. :-)
>>
>> Let me take a look at removing those terms.
> 
> Nah, it may be just me.  As long as what TLS stands for is clear in
> the context, it is fine.

No, really, my first reaction was, too: what the hack has crypto to do
with trace2? Are we now sending around trace output by email?

Please use "TLS" next to "CTX" only when it means "Transport Layer
Security".

-- Hannes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Sixt <j6t@kdbg.org> writes:

> Am 22.12.21 um 23:56 schrieb Junio C Hamano:
>> Jeff Hostetler <git@jeffhostetler.com> writes:
>> 
>>> I hadn't really thought about the term "TLS" in the context
>>> of crypto -- I had "thread local storage" on the brain.  I guess
>>> I've spent too much of my youth using Win32 thread APIs. :-)
>>>
>>> Let me take a look at removing those terms.
>> 
>> Nah, it may be just me.  As long as what TLS stands for is clear in
>> the context, it is fine.
>
> No, really, my first reaction was, too: what the hack has crypto to do
> with trace2? Are we now sending around trace output by email?

Ok, then it is not just me ;-)
>
> Please use "TLS" next to "CTX" only when it means "Transport Layer
> Security".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 12/23/21 1:18 PM, Junio C Hamano wrote:
> Johannes Sixt <j6t@kdbg.org> writes:
> 
>> Am 22.12.21 um 23:56 schrieb Junio C Hamano:
>>> Jeff Hostetler <git@jeffhostetler.com> writes:
>>>
>>>> I hadn't really thought about the term "TLS" in the context
>>>> of crypto -- I had "thread local storage" on the brain.  I guess
>>>> I've spent too much of my youth using Win32 thread APIs. :-)
>>>>
>>>> Let me take a look at removing those terms.
>>>
>>> Nah, it may be just me.  As long as what TLS stands for is clear in
>>> the context, it is fine.
>>
>> No, really, my first reaction was, too: what the hack has crypto to do
>> with trace2? Are we now sending around trace output by email?
> 
> Ok, then it is not just me ;-)
>>
>> Please use "TLS" next to "CTX" only when it means "Transport Layer
>> Security".

I'll make a note to go thru and remove/change these
terms in a future series rather than mix it in with
this one.

Jeff

@@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote:
> +`"timer"`::
> +	This event is generated at the end of the program and contains
> +	statistics for a global stopwatch timer.
> ++
> +------------
> +{
> +	"event":"timer",
> +	...
> +	"name":"test",      # timer name
> +	"count":42,         # number of start+stop intervals
> +	"t_total":1.234,    # sum of all intervals (by thread or globally)
> +	"t_min":0.1,        # shortest interval
> +	"t_max":0.9,        # longest interval

Could you specify the units for these t_* entries? I'm guessing seconds
based on the example, but I've seen similar timers using milliseconds
instead so it's best to be super clear here.

> +/*
> + * Stopwatch timer event.  This function writes the previously accumlated

s/accumlated/accumulated/

> + * stopwatch timer values to the event streams.  Unlike other Trace2 API
> + * events, this is decoupled from the data collection.
> + *
> + * This does not take a (file,line) pair because a timer event reports
> + * the cummulative time spend in the timer over a series of intervals

s/cummulative/cumulative/

> + * -- it does not represent a single usage (like region or data events
> + * do).
> + *
> + * The thread name is optional.  If non-null it will override the
> + * value inherited from the caller's TLS CTX.  This allows data
> + * for global timers to be reported without associating it with a
> + * single thread.
> + */
> +typedef void(tr2_tgt_evt_timer_t)(uint64_t us_elapsed_absolute,
> +				  const char *thread_name,
> +				  const char *category,
> +				  const char *timer_name,
> +				  uint64_t interval_count,
> +				  uint64_t us_total_time,
> +				  uint64_t us_min_time,
> +				  uint64_t us_max_time);

> diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
> index 4ce50944298..9b3905b920c 100644
> --- a/trace2/tr2_tgt_event.c
> +++ b/trace2/tr2_tgt_event.c
...
> +static void fn_timer(uint64_t us_elapsed_absolute,

(I was going to complain about the generic name here, but it's static
to the tr2_tgt_event.c file, so that's fine.)

> +		     const char *thread_name,
> +		     const char *category,
> +		     const char *timer_name,
> +		     uint64_t interval_count,
> +		     uint64_t us_total_time,
> +		     uint64_t us_min_time,
> +		     uint64_t us_max_time)
> +{
> +	const char *event_name = "timer";
> +	struct json_writer jw = JSON_WRITER_INIT;
> +	double t_abs = (double)us_elapsed_absolute / 1000000.0;
> +
> +	double t_total = (double)us_total_time / 1000000.0;
> +	double t_min   = (double)us_min_time   / 1000000.0;
> +	double t_max   = (double)us_max_time   / 1000000.0;

Looks like seconds here. At first glance, I thought this large division
might cause some loss of precision. However, the structure of floating
point numbers means we probably don't lose that much. It might be worth
_considering_ using milliseconds (only divide by 1000.0) but I'm
probably just being paranoid here.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 21, 2021

User Derrick Stolee <stolee@gmail.com> has been added to the cc: list.

@@ -391,7 +391,7 @@ only present on the "start" and "atexit" events.
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhost@microsoft.com>
> 
> Add a stopwatch timer mechanism to Git.

> +static void *ut_009timer_thread_proc(void *_ut_009_data)
> +{
> +	struct ut_009_data *data = _ut_009_data;
> +	int k;
> +
> +	trace2_thread_start("ut_009");
> +
> +	for (k = 0; k < data->count; k++) {
> +		trace2_timer_start(TRACE2_TIMER_ID_TEST2);
> +		sleep_millisec(data->delay);
> +		trace2_timer_stop(TRACE2_TIMER_ID_TEST2);
> +	}
> +
> +	trace2_thread_exit();
> +	return NULL;
> +}
> +
> +

nit: double newline.

> +# Exercise the stopwatch timer "test" in a loop and confirm that it was
> +# we have as many start/stop intervals as expected.  We cannot really test
> +# the (elapsed, min, max) timer values, so we assume they are good.

We can't check their values, but we could check that their labels are
emitted.

> +test_expect_success 'test stopwatch timers - summary only' '
> +	test_when_finished "rm trace.perf actual" &&
> +	test_config_global trace2.perfBrief 1 &&
> +	test_config_global trace2.perfTarget "$(pwd)/trace.perf" &&
> +	test-tool trace2 008timer 5 10 &&
> +	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual &&
> +	grep "d0|summary|timer||_T_ABS_||test|name:test1 count:5" actual

adding something like " total:.* min: .* max:.*" to the end of this
pattern might be good. You could even get really specific about the ".*"
being a floating point number, but I'm not too concerned about that. I
just want to see that these other labels stay consistent in future Git
versions.

> +# Exercise the stopwatch timer "test" in a loop and confirm that it was
> +# we have as many start/stop intervals as expected.  We cannot really test
> +# the (t_timer, t_min, t_max) timer values, so we assume they are good.
Similar, we can do something such as...

> +have_timer_event () {
> +	thread=$1
> +	name=$2
> +	count=$3
> +	file=$4
> +
> +	grep "\"event\":\"timer\".*\"thread\":\"${thread}\".*\"name\":\"${name}\".*\"count\":${count}" $file

Adding more detail to this pattern.

This helper could probably benefit from constructing the regex across
multiple string concatenations, so we can see the different pieces.
Something like

	pattern="\"event\":\"timer\""
	pattern="$pattern.*\"thread\":\"${thread}\""
	pattern="$pattern.*\"name\":\"${name}\""
	pattern="$pattern.*\"count\":\"${count}\""
	pattern="$pattern.*\"t_total\":"
	pattern="$pattern.*\"t_min\":"
	pattern="$pattern.*\"t_max\":"

	grep "$pattern" $file

> +
> +	return $?

If we used && throughout this method, would this return not be
necessary?

> +static void tr2main_emit_summary_timers(uint64_t us_elapsed_absolute)
> +{
> +	struct tr2_tgt *tgt_j;
> +	int j;
> +	struct tr2tmr_block merged;
> +
> +	memset(&merged, 0, sizeof(merged));
> +
> +	/*
> +	 * Sum across all of the per-thread stopwatch timer data into
> +	 * a single composite block of timer values.
> +	 */
> +	tr2tls_aggregate_timer_blocks(&merged);
> +
> +	/*
> +	 * Emit "summary" timer events for each composite timer value
> +	 * that had activity.
> +	 */
> +	for_each_wanted_builtin (j, tgt_j)
> +		if (tgt_j->pfn_timer)
> +			tr2tmr_emit_block(tgt_j->pfn_timer,
> +					  us_elapsed_absolute,
> +					  &merged, "summary");

I'd put braces at the for-loop level, even though this is semantically
correct without them.

> +}
> +
> +static void tr2main_emit_thread_timers(uint64_t us_elapsed_absolute)
> +{
> +	struct tr2_tgt *tgt_j;
> +	int j;
> +
> +	for_each_wanted_builtin (j, tgt_j)
> +		if (tgt_j->pfn_timer)
> +			tr2tls_emit_timer_blocks_by_thread(tgt_j->pfn_timer,
> +							   us_elapsed_absolute);

(same here)

> +/*
> + * Define the set of stopwatch timers.
> + *
> + * We can add more at any time, but they must be defined at compile
> + * time (to avoid the need to dynamically allocate and synchronize
> + * them between different threads).
> + *
> + * These must start at 0 and be contiguous (because we use them
> + * elsewhere as array indexes).

I was worried at first about using an array here, but this is essentially
one chunk of global memory per process that will not be very large, even
if we add a lot of timer IDs here. If we use this API enough that that
memory is a problem, then we can refactor the memory to be a hashmap that
only populates entries for IDs that are used by the process.

> + * Any values added to this enum must also be added to the timer definitions
> + * array.  See `trace2/tr2_tmr.c:tr2tmr_def_block[]`.
> + */
> +enum trace2_timer_id {
> +	/*
> +	 * Define two timers for testing.  See `t/helper/test-trace2.c`.
> +	 * These can be used for ad hoc testing, but should not be used
> +	 * for permanent analysis code.
> +	 */
> +	TRACE2_TIMER_ID_TEST1 = 0, /* emits summary event only */
> +	TRACE2_TIMER_ID_TEST2,     /* emits summary and thread events */
> +
> +
> +	/* Add additional timer definitions before here. */
> +	TRACE2_NUMBER_OF_TIMERS
> +};

...

> +static struct tr2tmr_def tr2tmr_def_block[TRACE2_NUMBER_OF_TIMERS] = {
> +	[TRACE2_TIMER_ID_TEST1] = { "test", "test1", 0 },
> +	[TRACE2_TIMER_ID_TEST2] = { "test", "test2", 1 },
> +};

Although this will always be populated, so maybe my thoughts about how
to reduce memory load in the hypothetical future are worthless.

> +void tr2tmr_start(enum trace2_timer_id tid)
> +{
> +	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
> +	struct tr2tmr_timer *t = &ctx->timers.timer[tid];
> +
> +	t->recursion_count++;
> +	if (t->recursion_count > 1)
> +		return; /* ignore recursive starts */
> +
> +	t->start_us = getnanotime() / 1000;

Using nanotime gives us the best precision available, and dividing
by 1000 will lose some precision. This is likely why we saw some
0.000000 values for t_min in some of your experiments. That should
be rare for real uses of this API (such as wrapping lstat() calls).

But why do we divide by 1000 here at all? 2^63 nanoseconds is
still 292 years, so we don't risk overflow. You specify uint64_t
so this isn't different on 32-bit machines.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 12/21/21 9:45 AM, Derrick Stolee wrote:
> On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote:
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Add a stopwatch timer mechanism to Git.
[...]
> 
>> +# Exercise the stopwatch timer "test" in a loop and confirm that it was
>> +# we have as many start/stop intervals as expected.  We cannot really test
>> +# the (elapsed, min, max) timer values, so we assume they are good.
> 
> We can't check their values, but we could check that their labels are
> emitted.

good point.  i'll add that to the patterns in the grep.


> 
[...]
>> +	grep "\"event\":\"timer\".*\"thread\":\"${thread}\".*\"name\":\"${name}\".*\"count\":${count}" $file
> 
> Adding more detail to this pattern.
> 
> This helper could probably benefit from constructing the regex across
> multiple string concatenations, so we can see the different pieces.
> Something like
> 
> 	pattern="\"event\":\"timer\""
> 	pattern="$pattern.*\"thread\":\"${thread}\""
> 	pattern="$pattern.*\"name\":\"${name}\""
> 	pattern="$pattern.*\"count\":\"${count}\""
> 	pattern="$pattern.*\"t_total\":"
> 	pattern="$pattern.*\"t_min\":"
> 	pattern="$pattern.*\"t_max\":"
> 
> 	grep "$pattern" $file
> 

yeah, that helps a lot.  thanks.


[...]
>> +/*
>> + * Define the set of stopwatch timers.
>> + *
>> + * We can add more at any time, but they must be defined at compile
>> + * time (to avoid the need to dynamically allocate and synchronize
>> + * them between different threads).
>> + *
>> + * These must start at 0 and be contiguous (because we use them
>> + * elsewhere as array indexes).
> 
> I was worried at first about using an array here, but this is essentially
> one chunk of global memory per process that will not be very large, even

s/process/thread/

> if we add a lot of timer IDs here. If we use this API enough that that
> memory is a problem, then we can refactor the memory to be a hashmap that
> only populates entries for IDs that are used by the process.

we're only talking about 48 bytes per timer being added to the thread
context.  and it is allocated, not stack based, so i'm not worried
about it.

and besides, we get constant time lookups when starting/stopping
a timer.  And when we get ready to sum across the thread pool, we
can do it efficiently.

> 
>> + * Any values added to this enum must also be added to the timer definitions
>> + * array.  See `trace2/tr2_tmr.c:tr2tmr_def_block[]`.
>> + */
>> +enum trace2_timer_id {
>> +	/*
>> +	 * Define two timers for testing.  See `t/helper/test-trace2.c`.
>> +	 * These can be used for ad hoc testing, but should not be used
>> +	 * for permanent analysis code.
>> +	 */
>> +	TRACE2_TIMER_ID_TEST1 = 0, /* emits summary event only */
>> +	TRACE2_TIMER_ID_TEST2,     /* emits summary and thread events */
>> +
>> +
>> +	/* Add additional timer definitions before here. */
>> +	TRACE2_NUMBER_OF_TIMERS
>> +};
> 
> ....
> 
>> +static struct tr2tmr_def tr2tmr_def_block[TRACE2_NUMBER_OF_TIMERS] = {
>> +	[TRACE2_TIMER_ID_TEST1] = { "test", "test1", 0 },
>> +	[TRACE2_TIMER_ID_TEST2] = { "test", "test2", 1 },
>> +};
> 
> Although this will always be populated, so maybe my thoughts about how
> to reduce memory load in the hypothetical future are worthless.

yeah, i think this model works well for us.  and it is lock free.

> 
>> +void tr2tmr_start(enum trace2_timer_id tid)
>> +{
>> +	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
>> +	struct tr2tmr_timer *t = &ctx->timers.timer[tid];
>> +
>> +	t->recursion_count++;
>> +	if (t->recursion_count > 1)
>> +		return; /* ignore recursive starts */
>> +
>> +	t->start_us = getnanotime() / 1000;
> 
> Using nanotime gives us the best precision available, and dividing
> by 1000 will lose some precision. This is likely why we saw some
> 0.000000 values for t_min in some of your experiments. That should
> be rare for real uses of this API (such as wrapping lstat() calls).
> 
> But why do we divide by 1000 here at all? 2^63 nanoseconds is
> still 292 years, so we don't risk overflow. You specify uint64_t
> so this isn't different on 32-bit machines.

When I did the original Trace2 parts, I made absolute and relative
elapsed times be in microseconds.  With the overhead of logging
and etc, the lower bits weren't really useful.  And then I converted
those to "%9.6f" in the trace logs, so that we always have "seconds"
in the traces.

I just copied that model when I did timers.  But I could see keeping
nanoseconds around for these timers (since they don't log on every
start/stop, like regions do).

<grin> While drafting this reply I've been fixing up the code in
parallel.  I converted timers to report integer ns values rather
than floats.  And in every line of trace output the total/min/max
timer values all end in 000 -- because (at least on MacOS) getnanotime()
calls gettimeofday() and computes (tv.tv_sec * 1000000000 + tv.tv_usec).
</grin>

So maybe I did the original "%9.6" for other reasons....
I'll try it on another OS later and see if it is useful.


Thanks,
Jeff


@gitgitgadget
Copy link

gitgitgadget bot commented Dec 21, 2021

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote:
> Extend Trace2 to provide multiple "stopwatch timers" and "global counters".

>  3. Rationale
> 
> Timers and counters are an alternative to the existing "region" and "data"
> events. The latter are intended to trace the major flow (or phases) of the
> program and possibly capture the amount of work performed within a loop, for
> example. The former are offered as a way to measure activity that is not
> localized, such as the time spent in zlib or lstat, which may be called from
> many different parts of the program.
> 
> There are currently several places in the Git code where we want to measure
> such activity -- changed-path Bloom filter stats, topo-walk commit counts,
> and tree-walk counts and max-depths. A conversation in [1] suggested that we
> should investigate a more general mechanism to collect stats so that each
> instance doesn't need to recreate their own atexit handling mechanism.
> 
> This is an attempt to address that and let us easily explore other areas in
> the future.
> 
> This patch series does not attempt to refactor those three instances to use
> the new timers and counters. That should be a separate effort -- in part
> because we may want to retool them rather than just translate them. For
> example, rather than just translating the existing four Bloom filter counts
> (in revision.c) into Trace2 counters, we may instead want to have a "happy
> path timer" and a "sad path timer" if that would provide more insight.

I'm excited for these API features. It might be nice to have an RFC-
quality series demonstrating how these examples could work with the
new API. Makes sense to delay in case there were recommended changes
to the API from review in this v1.

I also like your attention to thread contexts. I think these timers
would be very interesting to use in parallel checkout. CC'ing Matheus
for his thoughts on where he would want timer summaries for that
feature. I would probably want the per-thread summary to know if we
are blocked on one really long thread while the others finish quickly.
Within that: what are the things causing us to be slow? Is it zlib?
Is it lstat()?

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 21, 2021

On the Git mailing list, Matheus Tavares wrote (reply to this):

On Tue, Dec 21, 2021 at 11:51 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote:
> >
> >  3. Rationale
> >
> > Timers and counters are an alternative to the existing "region" and "data"
> > events. The latter are intended to trace the major flow (or phases) of the
> > program and possibly capture the amount of work performed within a loop, for
> > example. The former are offered as a way to measure activity that is not
> > localized, such as the time spent in zlib or lstat, which may be called from
> > many different parts of the program.
>
> I'm excited for these API features.

Me too! This would have been very useful on some experiments I had to
run in the past.

Thanks for working on it, Jeff :)

> I also like your attention to thread contexts. I think these timers
> would be very interesting to use in parallel checkout. CC'ing Matheus
> for his thoughts on where he would want timer summaries for that
> feature.

For parallel checkout, I think it would be interesting to have timer
summaries for open/close, fstat/lstat, write, and
inflation/delta-reconstruction. Perhaps pkt-line routines too, so that
we can see how much time we spend in inter-process communication.

It would be nice to have timer information for disk reading as well
(more on that below), but I don't think it is possible since we read
the objects through mmap() and thus, we cannot easily isolate the
actual reading time from the decompression time :(

> I would probably want the per-thread summary to know if we
> are blocked on one really long thread while the others finish quickly.

That would be interesting. Parallel checkout actually uses
subprocesses, but I can see the per-thread summary being useful on
grep, for example. (Nevertheless, the use case you mentioned for the
timers -- to evaluate the work balance on parallel checkout -- seems
very interesting.)

> Within that: what are the things causing us to be slow? Is it zlib?
> Is it lstat()?

On my tests, the bottleneck on checkout heavily depended on the
underlying storage type. On HDDs, the bottleneck was object reading
(i.e. page faults on mmap()-ed files), with about 70% to 80% of the
checkout runtime.

On SSDs, reading was much faster, so CPU (i.e. inflation) became the
bottleneck, with 50% of the runtime. (Inflation only lost to reading
when checking out from *many* loose objects.)

Finally, on NFS, file creation with open(O_CREAT | O_EXCL) and fstat()
(which makes the NFS client flush previously cached writes to the
server) were the bottlenecks, with about 40% of the total runtime
each.

These numbers come from a (sequential) `git checkout .` execution on
an empty working tree of the Linux kernel (v5.12), and they were
gathered using eBPF-based profilers. For other operations, especially
ones that require many file removals or more laborious tree merging in
unpack_trees(), I suspect the bottlenecks may change.

If anyone would be interested in seeing the flamegraphs and other
plots for these profiling numbers, I have them at:
https://matheustavares.gitlab.io/annexes/parallel-checkout/profiling

And there is a bit more context at:
https://matheustavares.gitlab.io/posts/parallel-checkout

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 21, 2021

User Matheus Tavares <matheus.bernardino@usp.br> has been added to the cc: list.

@jeffhostetler jeffhostetler force-pushed the trace2-stopwatch-v2 branch 3 times, most recently from f5f7086 to ee2dba6 Compare December 22, 2021 22:59
@gitgitgadget
Copy link

gitgitgadget bot commented Dec 23, 2021

User Johannes Sixt <j6t@kdbg.org> has been added to the cc: list.

Move the thread name to a flex array at the bottom of the Trace2
thread local storage data and get rid of the strbuf.

Let the flex array have the full computed value of the thread name
without truncation.

Change the PERF target to truncate the thread name so that the columns
still line up.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Defer freeing of the Trace2 per-thread thread local storage until
program exit.  Create a global list to own them.

Trace2 thread local storage data is allocated when a thread is created
and associated with that thread.  Previously, that storage was deleted
when the thread exited.  Now at thread exit, we simply disassociate
the data from the thread and let the global list manage the cleanup.

This will be used by a later commit when we add "counters" and
stopwatch-style "timers" to the Trace2 API.  We will add those fields
to the thread local storage block and allow each thread to efficiently
(without locks) accumulate counter and timer data.  At program exit,
the main thread will run thru the global list and compute and report
totals before freeing the list.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach the message formatter in the Trace2 event target to take an
optional thread-name argument.  This overrides the thread name
inherited from the thread local storage data.

This will be used in a future commit for global events that should
not be tied to a particular thread, such as a global stopwatch timer
that aggregates data from all threads.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach the message formatter in the Trace2 perf target to accept an
optional thread name argument.  This will override the thread name
inherited from the thread local storage data block.

This will be used in a future commit for global events that should
not be tied to a particular thread, such as a global stopwatch timer.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach Trace2 "perf" and "event" formats to handle "timer" events for
stopwatch timers.  Update API documentation accordingly.

In a future commit, stopwatch timers will be added to the Trace2 API
and it will emit these "timer" events.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add a stopwatch timer mechanism to Git.

Timers are an alternative to regions.  Timers can capture a series of
intervals, such as calls to a library routine or a span of code.  They
are intended for code that is not necessarily associated with a
particular phase of the command.

Timer data is accumulated throughout the command and a timer "summary"
event is logged (one per timer) at program exit.

Optionally, timer data may also be reported by thread for certain
timers.  (See trace2/tr2_tmr.c:tr2_timer_def_block[].)

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach Trace2 "perf" and "event" formats to handle "counter" events
for global counters.  Update the API documentation accordingly.

In a future commit, global counters will be added to the Trace2 API
and it will emit these "counter" events at program exit.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add global counters to Trace2.

Create a mechanism in Trace2 to count an activity and emit a single
"counter" event at the end of the program.  This is an alternative
to the existing "data" events that are emitted immediately.

Create an array of counters (indexed by `enum trace2_counter_id`)
to allow various activites to be tracked as desired.

Preload the array with two counters for testing purposes.

Create unit tests to demonstrate and verify.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
@jeffhostetler
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 28, 2021

Submitted as pull.1099.v2.git.1640720202.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-1099/jeffhostetler/trace2-stopwatch-v2-v2

To fetch this version to local tag pr-1099/jeffhostetler/trace2-stopwatch-v2-v2:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-1099/jeffhostetler/trace2-stopwatch-v2-v2

@@ -11,8 +11,8 @@
struct tr2tls_thread_ctx {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Use "size_t" rather than "int" for the "alloc" and "nr_open_regions"
> fields in the "tr2tls_thread_ctx".  These are used by ALLOC_GROW().
>
> This was discussed in: https://lore.kernel.org/all/YULF3hoaDxA9ENdO@nand.local/

Let's keep commit messages self-contained when possible. It's fine to
reference on-list discussion (and I often do), but in this case all
that's being referenced just seems to be Taylor saying we might as well
change this while we're at it.

So I'd think a short sentence saying we generally prefer "size_t" for
these these days and we might as well change it here while we're at it
would suffice over the ML link.

> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  trace2/tr2_tls.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
> index b1e327a928e..a90bd639d48 100644
> --- a/trace2/tr2_tls.h
> +++ b/trace2/tr2_tls.h
> @@ -11,8 +11,8 @@
>  struct tr2tls_thread_ctx {
>  	struct strbuf thread_name;
>  	uint64_t *array_us_start;
> -	int alloc;
> -	int nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
> +	size_t alloc;
> +	size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */
>  	int thread_id;
>  };

@@ -88,7 +88,7 @@ static void event_fmt_prepare(const char *event_name, const char *file,

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Move the thread name to a flex array at the bottom of the Trace2
> thread local storage data and get rid of the strbuf.
>
> Let the flex array have the full computed value of the thread name
> without truncation.
>
> Change the PERF target to truncate the thread name so that the columns
> still line up.

This commit message really doesn't help in explaining what we're trying
to do here and why it's needed. I'm not saying it's not, but why not a
strbuf, why a flex array? The diff below also shows changes unrelated to
this.

I tried this local fixup on top of this series which works, so I wonder
if we're just trying to get rid of the strbuf to signal that this
shouldn't change why not just strbuf_detach() and keep a "const char
*thread_name"?

diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
index 28ea55863d1..35d49b27b2e 100644
--- a/trace2/tr2_tls.c
+++ b/trace2/tr2_tls.c
@@ -48,7 +48,7 @@ void tr2tls_start_process_clock(void)
 struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 					     uint64_t us_thread_start)
 {
-	struct tr2tls_thread_ctx *ctx;
+	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(struct tr2tls_thread_ctx));
 	struct strbuf buf_name = STRBUF_INIT;
 	int thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
 
@@ -56,8 +56,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
 		strbuf_addf(&buf_name, "th%02d:", thread_id);
 	strbuf_addstr(&buf_name, thread_name);
 
-	FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len);
-	strbuf_release(&buf_name);
+	ctx->thread_name = strbuf_detach(&buf_name, NULL);
 
 	ctx->thread_id = thread_id;
 
@@ -188,6 +187,7 @@ void tr2tls_release(void)
 	while (ctx) {
 		struct tr2tls_thread_ctx *next = ctx->next_ctx;
 
+		free((char *)ctx->thread_name);
 		free(ctx->array_us_start);
 		free(ctx);
 
diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
index 503829bbd44..bc6c6f12e38 100644
--- a/trace2/tr2_tls.h
+++ b/trace2/tr2_tls.h
@@ -6,6 +6,7 @@
 #include "trace2/tr2_tmr.h"
 
 struct tr2tls_thread_ctx {
+	const char *thread_name;
 	struct tr2tls_thread_ctx *next_ctx;
 	uint64_t *array_us_start;
 	size_t alloc;
@@ -14,8 +15,6 @@ struct tr2tls_thread_ctx {
 
 	struct tr2_timer_block timers;
 	struct tr2_counter_block counters;
-
-	char thread_name[FLEX_ARRAY];
 };
 
 /*

> [...]
> index 7da94aba522..ed99a234b95 100644
> --- a/trace2/tr2_tls.c
> +++ b/trace2/tr2_tls.c
> @@ -34,7 +34,18 @@ void tr2tls_start_process_clock(void)
>  struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>  					     uint64_t us_thread_start)
>  {
> -	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
> +	struct tr2tls_thread_ctx *ctx;
> +	struct strbuf buf_name = STRBUF_INIT;
> +	int thread_id = tr2tls_locked_increment(&tr2_next_thread_id);

Here's the looks-to-be-unrelated to this strbuf conversion code I
mentioned above.

> +
> +	if (thread_id)
> +		strbuf_addf(&buf_name, "th%02d:", thread_id);
> +	strbuf_addstr(&buf_name, thread_name);
> +
> +	FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len);
> +	strbuf_release(&buf_name);
> +
> +	ctx->thread_id = thread_id;
>  
>  	/*
> [...]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 12/28/21 8:11 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Move the thread name to a flex array at the bottom of the Trace2
>> thread local storage data and get rid of the strbuf.
>>
>> Let the flex array have the full computed value of the thread name
>> without truncation.
>>
>> Change the PERF target to truncate the thread name so that the columns
>> still line up.
> 
> This commit message really doesn't help in explaining what we're trying
> to do here and why it's needed. I'm not saying it's not, but why not a
> strbuf, why a flex array? The diff below also shows changes unrelated to
> this.
> 
> I tried this local fixup on top of this series which works, so I wonder
> if we're just trying to get rid of the strbuf to signal that this
> shouldn't change why not just strbuf_detach() and keep a "const char
> *thread_name"?
> 
> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
> index 28ea55863d1..35d49b27b2e 100644
> --- a/trace2/tr2_tls.c
> +++ b/trace2/tr2_tls.c
> @@ -48,7 +48,7 @@ void tr2tls_start_process_clock(void)
>   struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>   					     uint64_t us_thread_start)
>   {
> -	struct tr2tls_thread_ctx *ctx;
> +	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(struct tr2tls_thread_ctx));
>   	struct strbuf buf_name = STRBUF_INIT;
>   	int thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
>   
> @@ -56,8 +56,7 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>   		strbuf_addf(&buf_name, "th%02d:", thread_id);
>   	strbuf_addstr(&buf_name, thread_name);
>   
> -	FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len);
> -	strbuf_release(&buf_name);
> +	ctx->thread_name = strbuf_detach(&buf_name, NULL);
>   
>   	ctx->thread_id = thread_id;
>   
> @@ -188,6 +187,7 @@ void tr2tls_release(void)
>   	while (ctx) {
>   		struct tr2tls_thread_ctx *next = ctx->next_ctx;
>   
> +		free((char *)ctx->thread_name);
>   		free(ctx->array_us_start);
>   		free(ctx);
>   
> diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h
> index 503829bbd44..bc6c6f12e38 100644
> --- a/trace2/tr2_tls.h
> +++ b/trace2/tr2_tls.h
> @@ -6,6 +6,7 @@
>   #include "trace2/tr2_tmr.h"
>   
>   struct tr2tls_thread_ctx {
> +	const char *thread_name;
>   	struct tr2tls_thread_ctx *next_ctx;
>   	uint64_t *array_us_start;
>   	size_t alloc;
> @@ -14,8 +15,6 @@ struct tr2tls_thread_ctx {
>   
>   	struct tr2_timer_block timers;
>   	struct tr2_counter_block counters;
> -
> -	char thread_name[FLEX_ARRAY];
>   };
>   
>   /*

I have to admit that I really don't know how to please you.

In V1 I converted the "strbuf" to a "char *" inside the structure
because there was concern that one might assume that the thread
name could be changed after the thread was created.  You complained
that I made it a "char *" rather than a "const char *".  I explained
pointer ownership and you completely ignored that.  You explained
that I should just "cast away the const during the free" because
other places in the code use that "anti-pattern".  You also complained
that I didn't use a callback to get the thread name dynamically rather
than having a string field in the thread's TLS.  I explained that it
was faster to compute it once than to generate it on every logging
call.  You ignored that and hinted that the message formatting in
each of the target destinations would make that cost irrelevant.
I convert the field to a flex-array to avoid all of the allocation and
ownership issues and now you send me a "fixup" patch that undoes
the flex-array change and makes it look mostly like my previous
version -- but WITH the "const" and the "cast" (that I've already
talked about in this paragraph).

So, where does this leave us?  I'm really trying to "assume good
intentions" here, but we've spent way toooooooo long discussing
this thread_name field.  It's starting to feel like you're going
to just keep nagging me about this field until I make it look
exactly like you would have written it.

So, sorry to rant, but I don't know what else to say about this
field.  It is especially troubling that this "issue" has taken
so much time -- time that would be better spent actually looking
at the new timers and counters feature.


> 
>> [...]
>> index 7da94aba522..ed99a234b95 100644
>> --- a/trace2/tr2_tls.c
>> +++ b/trace2/tr2_tls.c
>> @@ -34,7 +34,18 @@ void tr2tls_start_process_clock(void)
>>   struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name,
>>   					     uint64_t us_thread_start)
>>   {
>> -	struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx));
>> +	struct tr2tls_thread_ctx *ctx;
>> +	struct strbuf buf_name = STRBUF_INIT;
>> +	int thread_id = tr2tls_locked_increment(&tr2_next_thread_id);
> 
> Here's the looks-to-be-unrelated to this strbuf conversion code I
> mentioned above.

In the flex-array version, we defer the alloc of "ctx" until
after we have computed the thread name -- we to do that so that
we know the length of the thread name (and thus the size of the
flex-array).  To do that we need to know the thread id that we
will be formatting into the thread name.  And to do that we need
to reserve a thread id -- which is a global and requires a lock.

So the call to tr2tls_locked_increment() (as well as the formatting
of the name itself) was moved up to the top of the function rather
than after the "ctx" was allocated.

> 
>> +
>> +	if (thread_id)
>> +		strbuf_addf(&buf_name, "th%02d:", thread_id);
>> +	strbuf_addstr(&buf_name, thread_name);
>> +
>> +	FLEX_ALLOC_MEM(ctx, thread_name, buf_name.buf, buf_name.len);
>> +	strbuf_release(&buf_name);
>> +
>> +	ctx->thread_id = thread_id;
>>   
>>   	/*
>> [...]

Jeff

@@ -23,6 +23,7 @@ static int tr2env_perf_be_brief;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Teach the message formatter in the Trace2 perf target to accept an
> optional thread name argument.  This will override the thread name
> inherited from the thread local storage data block.
>
> This will be used in a future commit for global events that should
> not be tied to a particular thread, such as a global stopwatch timer.

We already have a "ctx", and that "ctx" has a "thread_name", but here
and in the preceding commit we're adding a "thread_name" to every caller
of these functions in case we'd like to override it.

Wouldn't it make more sense to just pass a "ctx" to these functions? One
of them already takes it, here's an (obviously incomplete) fixup on top
of your series to make the one that doesn't take a "ctx", and for the
only non-NULL users of "thread_name" to just use a trivial helper to
pass in a "ctx" with a new "thread_name", then to swap it back.

It would make for a smaller diffstat for this already large series, or
we could do exactly what we're doing now, but avoid the churn of
adjusting every caller by introducing a new sister function for those
who want this parameter to be non-NULL.

(The below patch is "broken" in that __FILE__ and __LINE__ need to be
passed in as parameters, but this is just a trivial change for
show/commentary)

diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index b9eb2cdb77a..7aaec83dff7 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -82,16 +82,15 @@ static void fn_term(void)
 static void event_fmt_prepare(const char *event_name, const char *file,
 			      int line, const struct repository *repo,
 			      struct json_writer *jw,
-			      const char *thread_name_override)
+			      struct tr2tls_thread_ctx *ctx)
 {
 	struct tr2_tbuf tb_now;
+	if (!ctx)
+		ctx = tr2tls_get_self();
 
 	jw_object_string(jw, "event", event_name);
 	jw_object_string(jw, "sid", tr2_sid_get());
-	jw_object_string(jw, "thread",
-			 ((thread_name_override && *thread_name_override)
-			  ? thread_name_override
-			  : tr2tls_get_self()->thread_name));
+	jw_object_string(jw, "thread", ctx->thread_name);
 
 	/*
 	 * In brief mode, only emit <time> on these 2 event types.
@@ -111,6 +110,20 @@ static void event_fmt_prepare(const char *event_name, const char *file,
 		jw_object_intmax(jw, "repo", repo->trace2_repo_id);
 }
 
+static void event_fmt_prepare_tn(const char *event_name, const char *file,
+				 int line, const struct repository *repo,
+				 struct json_writer *jw,
+				 const char *thread_name)
+{
+	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
+	const char *tmp;
+
+	tmp = ctx->thread_name;
+	ctx->thread_name = thread_name;
+	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, jw, ctx);
+	ctx->thread_name = tmp;
+}
+
 static void fn_too_many_files_fl(const char *file, int line)
 {
 	const char *event_name = "too_many_files";
@@ -629,7 +642,7 @@ static void fn_timer(uint64_t us_elapsed_absolute,
 	double t_abs = (double)us_elapsed_absolute / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, thread_name);
+	event_fmt_prepare_tn(event_name, __FILE__, __LINE__, NULL, &jw, thread_name);
 	jw_object_double(&jw, "t_abs", 6, t_abs);
 	jw_object_string(&jw, "name", timer_name);
 	jw_object_intmax(&jw, "count", interval_count);
@@ -654,7 +667,7 @@ static void fn_counter(uint64_t us_elapsed_absolute,
 	double t_abs = (double)us_elapsed_absolute / 1000000.0;
 
 	jw_object_begin(&jw, 0);
-	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw, thread_name);
+	event_fmt_prepare_tn(event_name, __FILE__, __LINE__, NULL, &jw, thread_name);
 	jw_object_double(&jw, "t_abs", 6, t_abs);
 	jw_object_string(&jw, "name", counter_name);
 	jw_object_intmax(&jw, "value", value);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 12/28/21 8:48 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Teach the message formatter in the Trace2 perf target to accept an
>> optional thread name argument.  This will override the thread name
>> inherited from the thread local storage data block.
>>
>> This will be used in a future commit for global events that should
>> not be tied to a particular thread, such as a global stopwatch timer.
> 
> We already have a "ctx", and that "ctx" has a "thread_name", but here
> and in the preceding commit we're adding a "thread_name" to every caller
> of these functions in case we'd like to override it.
> 
> Wouldn't it make more sense to just pass a "ctx" to these functions? One
> of them already takes it, here's an (obviously incomplete) fixup on top
> of your series to make the one that doesn't take a "ctx", and for the
> only non-NULL users of "thread_name" to just use a trivial helper to
> pass in a "ctx" with a new "thread_name", then to swap it back.
> 
> It would make for a smaller diffstat for this already large series, or
> we could do exactly what we're doing now, but avoid the churn of
> adjusting every caller by introducing a new sister function for those
> who want this parameter to be non-NULL.

I suppose it is possible to have a helper version of
`event_fmt_prepare()` that takes the extra argument and
fixup the existing function to call it with NULL.

I'll see if that makes sense.


[...]
>   
> +static void event_fmt_prepare_tn(const char *event_name, const char *file,
> +				 int line, const struct repository *repo,
> +				 struct json_writer *jw,
> +				 const char *thread_name)
> +{
> +	struct tr2tls_thread_ctx *ctx = tr2tls_get_self();
> +	const char *tmp;
> +
> +	tmp = ctx->thread_name;
> +	ctx->thread_name = thread_name;
> +	event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, jw, ctx);
> +	ctx->thread_name = tmp;
> +}
[...]

This only works if we agree that thread name is a pointer inside
the structure and not a flex-array.

Personally, I think this is trying to do things backwards by
temporarily changing the ctx->thread_name field.  I think it
would be better to `event_fmt_prepare_tn()` do the actual
work with the supplied thread name and have the existing
`event_fmt_prepare()` just call it with ctx->thread_name.
Then we don't need to hack up the ctx.

I'll see if this makes the diffs a little cleaner.

Jeff

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 29, 2021

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote:

I left some other comments on the series inline, just on the notes in
the CL:

>  * Ævar proposed a large refactor of the "_perf" target to have a "fmt()"
>    varargs function to reduce the amount of copy-n-pasted code in many of
>    the "fn" event handlers. This looks like a good change based on the
>    mockup but is a large refactor.

FWIW what I meant with [1] was not that this series needed to take the
detour of refactoring trace2/tr2_tgt_perf.c to use such a helper, but
that for the function additions in this series it might make sense to
introduce one and use it for the new functions.

For this series I think it's probably not worth it, so I'm fine with
leaving this for some other time. Just pointing out that rather than
your reading of:

 1. We have some refactorable verbosity
 2. Refactor all callers
 3. Change existing code to use that refactoring
 4. Add new code to use the refactoring

It's also perfectly fine to do just:

 1. We have some refactorable verbosity
 2. Introduce a less verbose
 3. Add new code to use the helper

And leave the "refactor all callers" for some other time.

Anyway, I think for the two callers just leaving it entirely for this
series is the right thing to do. It was more of a "hrm, that's some odd
and avoidable verbosity..." comment on me read-through of v1.

1. https://lore.kernel.org/git/211220.86czlrurm6.gmgdl@evledraar.gmail.com/

>  * Ævar proposed a new rationale for when/why we change the "_event" version
>    number. That text can be added to the design document independently.

Hrm, no. In [1] I linked to some earlier musings of mine about what we
should do about the TR2_EVENT_VERSION (mainly as an FYI since you added
it, but hadn't commented on that post).

But my main comment there was that the series wasn't progressing as
atomic changes. I.e. we promise to change the TR2_EVENT_VERSION version
every time we change the event format, but v1 first changed the format
and bumped the version, then made some more changes.

I think that's probably fine per-se within a git release cycle, but it
might be a symtom of commits that could be split up to be more atomic (I
don't know, didn't look in detail).

However, in this v2 of the series the TR2_EVENT_VERSION bump is entirely
gone.

Maybe that means that you so vehemently agree with my proposal in [1] it
that you'd like to start taking that view for trace2 changes right away
:-)

For me it's fine either way, I think TR2_EVENT_VERSION probably isn't
that important.

But if that's the case it should probably be called out more explictly
in the CL/commit. I.e. even if our "policy" (such as it is) about
TR2_EVENT_VERSION currently says X we're going to start doing Y here
intentionally.

And in that case I should probably turn that suggestion in [1] into a an
actual PATCH sooner than later...

1. https://lore.kernel.org/git/211220.86czlrurm6.gmgdl@evledraar.gmail.com/

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 30, 2021

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 12/28/21 8:54 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Dec 28 2021, Jeff Hostetler via GitGitGadget wrote:
> 
> I left some other comments on the series inline, just on the notes in
> the CL:
> 
>>   * Ævar proposed a large refactor of the "_perf" target to have a "fmt()"
>>     varargs function to reduce the amount of copy-n-pasted code in many of
>>     the "fn" event handlers. This looks like a good change based on the
>>     mockup but is a large refactor.
> 
> FWIW what I meant with [1] was not that this series needed to take the
> detour of refactoring trace2/tr2_tgt_perf.c to use such a helper, but
> that for the function additions in this series it might make sense to
> introduce one and use it for the new functions.
> 
> For this series I think it's probably not worth it, so I'm fine with
> leaving this for some other time. Just pointing out that rather than
> your reading of:
> 
>   1. We have some refactorable verbosity
>   2. Refactor all callers
>   3. Change existing code to use that refactoring
>   4. Add new code to use the refactoring
> 
> It's also perfectly fine to do just:
> 
>   1. We have some refactorable verbosity
>   2. Introduce a less verbose
>   3. Add new code to use the helper
> 
> And leave the "refactor all callers" for some other time.
> 
> Anyway, I think for the two callers just leaving it entirely for this
> series is the right thing to do. It was more of a "hrm, that's some odd
> and avoidable verbosity..." comment on me read-through of v1.
> 
> 1. https://lore.kernel.org/git/211220.86czlrurm6.gmgdl@evledraar.gmail.com/

Sorry, but I'm going to call BS on this.  You sent a ~200 line diff
showing how we could refactor and reduce some of the duplicated code.
You have a history of introducing unnecessary refactorings in the middle
of other topics, and this looks like another example of that.  Another
example of distracting everyone from reviewing the actual new code.

And when I say that it should be an independent topic in its own
series, you fall back to the your "oh, it was just a drive-by comment."
and/or "i didn't mean for you to actually do it." and/or "you just
read my email incorrectly."

Drive-by comments don't usually have ~200 line diffs attached....

A drive-by comment would just say that "there is an opportunity to
create a varargs version of the existing io function and reduce
some duplication in the bodies of the existing callers" and be done.
I don't need a 200 line diff to see how you spell that.

Again, sorry to rant, but I'm tired looking like the stupid half
in these conversations.

> 
>>   * Ævar proposed a new rationale for when/why we change the "_event" version
>>     number. That text can be added to the design document independently.
> 
> Hrm, no. In [1] I linked to some earlier musings of mine about what we
> should do about the TR2_EVENT_VERSION (mainly as an FYI since you added
> it, but hadn't commented on that post).
> 
> But my main comment there was that the series wasn't progressing as
> atomic changes. I.e. we promise to change the TR2_EVENT_VERSION version
> every time we change the event format, but v1 first changed the format
> and bumped the version, then made some more changes.

Did you really expect me to change it twice within a single 9 commit
patch series?

This series creates both "timers" and "counters" and will both appear
together if/when they are merged.  From an external point of view,
users would see version 4 added two new event types.  So I either
increment it for "timers" or I increment it for "counters" or I squash
the two commits together and increment it then.

I didn't want to squash them, so I chose the former.

> 
> I think that's probably fine per-se within a git release cycle, but it
> might be a symtom of commits that could be split up to be more atomic (I
> don't know, didn't look in detail).
> 
> However, in this v2 of the series the TR2_EVENT_VERSION bump is entirely
> gone.

You complained when/how I bumped it in V1.  So I removed it.

And I suggested that you commit your "earlier musings".  With
that in place, there would be no need for me to change the
version number (which is what you wanted all along, right?)


> 
> Maybe that means that you so vehemently agree with my proposal in [1] it
> that you'd like to start taking that view for trace2 changes right away
> :-)

s/so vehemently agree with/are tired of debating/

> 
> For me it's fine either way, I think TR2_EVENT_VERSION probably isn't
> that important.
> 
> But if that's the case it should probably be called out more explictly
> in the CL/commit. I.e. even if our "policy" (such as it is) about
> TR2_EVENT_VERSION currently says X we're going to start doing Y here
> intentionally.
> 
> And in that case I should probably turn that suggestion in [1] into a an
> actual PATCH sooner than later...
> 
> 1. https://lore.kernel.org/git/211220.86czlrurm6.gmgdl@evledraar.gmail.com/
> 

Right, I'll add a note to V3 stating that I did not update
the version number.

Jeff

@jeffhostetler
Copy link
Author

I'm going to close this and send a new (rewritten) series to the list in #1373.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant