Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builtin FSMonitor Part 1 #1040

Closed

Conversation

jeffhostetler
Copy link

@jeffhostetler jeffhostetler commented Sep 15, 2021

Here is V2 of Part 1 of my Builtin FSMonitor series.

Changes since V1 include:

  • Drop the Trace2 memory leak.
  • Added a new "child_ready" event to Trace2 as an alternative to the "child_exit" event for background processes.
  • Convert the Trace2-related NEEDSWORK items in "start_bg_command()" to use the new "child_ready" event.
  • Various minor code and documentation cleanups.

cc: Eric Sunshine sunshine@sunshineco.com
cc: Taylor Blau me@ttaylorr.com
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Jeff Hostetler git@jeffhostetler.com
cc: Carlo Arenas carenas@gmail.com

cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Jeff Hostetler git@jeffhostetler.com
cc: Taylor Blau me@ttaylorr.com
cc: Adam Dinwoodie adam@dinwoodie.org
cc: Ramsay Jones ramsay@ramsayjones.plus.com
cc: Johannes Schindelin Johannes.Schindelin@gmx.de

@jeffhostetler
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 15, 2021

Submitted as pull.1040.git.1631738177.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-1040/jeffhostetler/builtin-fsmonitor-part1-v1

To fetch this version to local tag pr-1040/jeffhostetler/builtin-fsmonitor-part1-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-1040/jeffhostetler/builtin-fsmonitor-part1-v1

@@ -168,15 +168,16 @@ void ipc_client_close_connection(struct ipc_client_connection *connection)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Wed, Sep 15, 2021 at 4:36 PM Jeff Hostetler via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> Add `command_len` argument to the Simple IPC API.
>
> In my original Simple IPC API, I assumed that the request
> would always be a null-terminated string of text characters.
> The command arg was just a `const char *`.
>
> I found a caller that would like to pass a binary command
> to the daemon, so I want to ammend the Simple IPC API to

s/ammend/amend/

> take `const char *command, size_t command_len` and pass
> that to the daemon.  (Really, the first arg should just be
> a `void *` or `const unsigned byte *` to make that clearer.)

The reader is left wondering why you didn't also change it to `const
void *` (or one of the other choices) while at it.

> Note, the response side has always been a `struct strbuf`
> which includes the buffer and length, so we already support
> returning a binary answer.  (Yes, it feels a little weird
> returning a binary buffer in a `strbuf`, but it works.)
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 9/15/21 4:43 PM, Eric Sunshine wrote:
> On Wed, Sep 15, 2021 at 4:36 PM Jeff Hostetler via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> Add `command_len` argument to the Simple IPC API.
>>
>> In my original Simple IPC API, I assumed that the request
>> would always be a null-terminated string of text characters.
>> The command arg was just a `const char *`.
>>
>> I found a caller that would like to pass a binary command
>> to the daemon, so I want to ammend the Simple IPC API to
> 
> s/ammend/amend/
> 
>> take `const char *command, size_t command_len` and pass
>> that to the daemon.  (Really, the first arg should just be
>> a `void *` or `const unsigned byte *` to make that clearer.)
> 
> The reader is left wondering why you didn't also change it to `const
> void *` (or one of the other choices) while at it.
> 

The simple ipc layer just passes the buffer to the pkt-line layer
and it takes a "const char *", so to avoid confusion I just left
the type is it was.  If later we want to fix pkt-line, we can
investigate passing a "const unsigned byte *" value down the
call chain, but that is more than I want to do right now.

>> Note, the response side has always been a `struct strbuf`
>> which includes the buffer and length, so we already support
>> returning a binary answer.  (Yes, it feels a little weird
>> returning a binary buffer in a `strbuf`, but it works.)
>>
>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 15, 2021

User Eric Sunshine <sunshine@sunshineco.com> has been added to the cc: list.

trace2/tr2_tls.c Outdated
@@ -95,6 +95,7 @@ void tr2tls_unset_self(void)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Do not leak the thread name (contained within the thread context) when
> a thread terminates.
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  trace2/tr2_tls.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
> index 067c23755fb..7da94aba522 100644
> --- a/trace2/tr2_tls.c
> +++ b/trace2/tr2_tls.c
> @@ -95,6 +95,7 @@ void tr2tls_unset_self(void)
>  
>  	pthread_setspecific(tr2tls_key, NULL);
>  
> +	strbuf_release(&ctx->thread_name);
>  	free(ctx->array_us_start);
>  	free(ctx);
>  }

So the idea is create allocates a new instance, and unset is to
release the resource held by it?

This is not a problem in _this_ patch but more about the base code
that is being fixed here, but using strbuf as thread_name member
sends a strong signal that it is designed to be inexpensive to
change thread_name after a context is created by create_self.  If
it is not the case, the member should be "const char *", which may
be computed using a temporary strbuf in create_self() and taken from
the strbuf with strbuf_detach(), I would think.

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Taylor Blau wrote (reply to this):

On Wed, Sep 15, 2021 at 02:01:39PM -0700, Junio C Hamano wrote:
> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Jeff Hostetler <jeffhost@microsoft.com>
> >
> > Do not leak the thread name (contained within the thread context) when
> > a thread terminates.
> >
> > Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> > ---
> >  trace2/tr2_tls.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
> > index 067c23755fb..7da94aba522 100644
> > --- a/trace2/tr2_tls.c
> > +++ b/trace2/tr2_tls.c
> > @@ -95,6 +95,7 @@ void tr2tls_unset_self(void)
> >
> >  	pthread_setspecific(tr2tls_key, NULL);
> >
> > +	strbuf_release(&ctx->thread_name);
> >  	free(ctx->array_us_start);
> >  	free(ctx);
> >  }
>
> So the idea is create allocates a new instance, and unset is to
> release the resource held by it?
>
> This is not a problem in _this_ patch but more about the base code
> that is being fixed here, but using strbuf as thread_name member
> sends a strong signal that it is designed to be inexpensive to
> change thread_name after a context is created by create_self.  If
> it is not the case, the member should be "const char *", which may
> be computed using a temporary strbuf in create_self() and taken from
> the strbuf with strbuf_detach(), I would think.

It looks like we do not change the contents of the thread_name buffer
anywhere. I assume that we store the strbuf in struct tr2tls_thread_ctx
because we do some string formatting in tr2tls_create_self().

But there we could easily say ctx->thread_name = strbuf_release(&buf).

More concerning to me is that we use signed integers to keep track of nr
and alloc of array_us_start. But both of these are separate issues and
don't need to be dealt with here.

Thanks,
Taylor

@@ -5,13 +5,6 @@
* See Documentation/technical/api-simple-ipc.txt
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> From: Carlo Marcelo Arenas Belón <carenas@gmail.com>
>
> Move the declartion of the `enum ipc_active_state` type outside of
> the SUPPORTS_SIMPLE_IPC ifdef.

The second one is not an in-body header since there is already a
blank line that signals the end of in-body headers after the first
one.

This _may_ be a bug in GGG, perhaps?


> A later commit will introduce the `fsmonitor_ipc__*()` API and stub in
> a "mock" implementation that requires this enum in some function
> signatures.
>
> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  simple-ipc.h | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/simple-ipc.h b/simple-ipc.h
> index 9c7330fcda0..b396293bdfc 100644
> --- a/simple-ipc.h
> +++ b/simple-ipc.h
> @@ -5,13 +5,6 @@
>   * See Documentation/technical/api-simple-ipc.txt
>   */
>  
> -#ifdef SUPPORTS_SIMPLE_IPC
> -#include "pkt-line.h"
> -
> -/*
> - * Simple IPC Client Side API.
> - */
> -
>  enum ipc_active_state {
>  	/*
>  	 * The pipe/socket exists and the daemon is waiting for connections.
> @@ -43,6 +36,13 @@ enum ipc_active_state {
>  	IPC_STATE__OTHER_ERROR,
>  };
>  
> +#ifdef SUPPORTS_SIMPLE_IPC
> +#include "pkt-line.h"
> +
> +/*
> + * Simple IPC Client Side API.
> + */
> +
>  struct ipc_client_connect_options {
>  	/*
>  	 * Spin under timeout if the server is running but can't

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 9/15/21 5:06 PM, Junio C Hamano wrote:
> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> From: Carlo Marcelo Arenas Belón <carenas@gmail.com>
>>
>> Move the declartion of the `enum ipc_active_state` type outside of
>> the SUPPORTS_SIMPLE_IPC ifdef.
> 
> The second one is not an in-body header since there is already a
> blank line that signals the end of in-body headers after the first
> one.
> 
> This _may_ be a bug in GGG, perhaps?

Maybe.  I'll make a note to ask @dscho when he gets back from vacation.

Credit for the commit should go to Carlo.  I just added it to the series
with his "From:" line and it looks like GGG added an extra one before
it.

Jeff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Carlo Arenas wrote (reply to this):

On Fri, Sep 17, 2021 at 11:39 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
>
> Maybe.  I'll make a note to ask @dscho when he gets back from vacation.

I am actually more worried about GGG introducing a typo in my flawless
commit message.

Carlo

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff Hostetler <git@jeffhostetler.com> writes:

> On 9/15/21 5:06 PM, Junio C Hamano wrote:
>> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> 
>>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>>
>>> From: Carlo Marcelo Arenas Belón <carenas@gmail.com>
>>>
>>> Move the declartion of the `enum ipc_active_state` type outside of
>>> the SUPPORTS_SIMPLE_IPC ifdef.
>> The second one is not an in-body header since there is already a
>> blank line that signals the end of in-body headers after the first
>> one.
>> This _may_ be a bug in GGG, perhaps?
>
> Maybe.  I'll make a note to ask @dscho when he gets back from vacation.
>
> Credit for the commit should go to Carlo.  I just added it to the series
> with his "From:" line and it looks like GGG added an extra one before
> it.

Thanks for a clarification.  I'll tweak the authorship when queuing.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 15, 2021

This branch is now known as jh/builtin-fsmonitor-part1.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 15, 2021

This patch series was integrated into seen via git@83835d9.

@gitgitgadget gitgitgadget bot added the seen label Sep 15, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Sep 16, 2021

User Taylor Blau <me@ttaylorr.com> has been added to the cc: list.

@@ -1901,3 +1901,126 @@ void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Taylor Blau wrote (reply to this):

On Wed, Sep 15, 2021 at 08:36:16PM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Create a variation of `run_command()` and `start_command()` to launch a command
> into the background and optionally wait for it to become "ready" before returning.
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  run-command.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  run-command.h |  48 ++++++++++++++++++++
>  2 files changed, 171 insertions(+)
>
> diff --git a/run-command.c b/run-command.c
> index 3e4e082e94d..fe75fd08f74 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -1901,3 +1901,126 @@ void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir)
>  	}
>  	strvec_pushf(env_array, "%s=%s", GIT_DIR_ENVIRONMENT, new_git_dir);
>  }
> +
> +enum start_bg_result start_bg_command(struct child_process *cmd,
> +				      start_bg_wait_cb *wait_cb,
> +				      void *cb_data,
> +				      unsigned int timeout_sec)
> +{
> +	enum start_bg_result sbgr = SBGR_ERROR;
> +	int ret;
> +	int wait_status;
> +	pid_t pid_seen;
> +	time_t time_limit;
> +
> +	/*
> +	 * Silently disallow child cleanup -- even if requested.
> +	 * The child process should persist in the background
> +	 * and possibly/probably after this process exits.  That
> +	 * is, don't kill the child during our atexit routine.
> +	 */
> +	cmd->clean_on_exit = 0;
> +
> +	ret = start_command(cmd);
> +	if (ret) {
> +		/*
> +		 * We assume that if `start_command()` fails, we
> +		 * either get a complete `trace2_child_start() /
> +		 * trace2_child_exit()` pair or it fails before the
> +		 * `trace2_child_start()` is emitted, so we do not
> +		 * need to worry about it here.
> +		 *
> +		 * We also assume that `start_command()` does not add
> +		 * us to the cleanup list.  And that it calls
> +		 * calls `child_process_clear()`.
> +		 */
> +		sbgr = SBGR_ERROR;
> +		goto done;
> +	}
> +
> +	time(&time_limit);
> +	time_limit += timeout_sec;

This jumped out to me as unsafe, since POSIX guarantees time_t to be an
integral value holding a number of seconds (so += timeout_sec is safe
there), but it isn't in the C standard.

But we have lots of other examples of adding a number of seconds
directly the value filled in by time(2), so I think this is fine.

> +
> +wait:
> +	pid_seen = waitpid(cmd->pid, &wait_status, WNOHANG);
> +
> +	if (pid_seen == 0) {

Small nit, probably better to write this as if (!pid_seen), but not
worth a reroll alone.

> +		/*
> +		 * The child is currently running.  Ask the callback
> +		 * if the child is ready to do work or whether we
> +		 * should keep waiting for it to boot up.
> +		 */

This comment is simple and informative, thank you!

> +		ret = (*wait_cb)(cb_data, cmd);
> +		if (!ret) {
> +			/*
> +			 * The child is running and "ready".
> +			 *
> +			 * NEEDSWORK: As we prepare to orphan (release to
> +			 * the background) this child, it is not appropriate
> +			 * to emit a `trace2_child_exit()` event.  Should we
> +			 * create a new event for this case?

Probably. Maybe trace2_child_orphaned() or trace2_child_background()?

> +			 */
> +			sbgr = SBGR_READY;
> +			goto done;
> +		} else if (ret > 0) {
> +			time_t now;
> +
> +			time(&now);
> +			if (now < time_limit)
> +				goto wait;
> +
> +			/*
> +			 * Our timeout has expired.  We don't try to
> +			 * kill the child, but rather let it continue
> +			 * (hopefully) trying to startup.
> +			 *
> +			 * NEEDSWORK: Like the "ready" case, should we
> +			 * log a custom child-something Trace2 event here?
> +			 */
> +			sbgr = SBGR_TIMEOUT;
> +			goto done;
> +		} else {
> +			/*
> +			 * The cb gave up on this child.
> +			 *
> +			 * NEEDSWORK: Like above, should we log a custom
> +			 * Trace2 child-something event here?
> +			 */
> +			sbgr = SBGR_CB_ERROR;
> +			goto done;
> +		}

OK, so assuming that the child is running, then we ask wait_cb what to
do. Returning zero from the callback means to background it, a positive
value means to give it more time, and negative means to cause an error.
And those match the documentation below, good.

> +	if (pid_seen == cmd->pid) {

This could be an "else if", no?

> +		int child_code = -1;
> +
> +		/*
> +		 * The child started, but exited or was terminated
> +		 * before becoming "ready".
> +		 *
> +		 * We try to match the behavior of `wait_or_whine()`
> +		 * and convert the child's status to a return code for
> +		 * tracing purposes and emit the `trace2_child_exit()`
> +		 * event.
> +		 */
> +		if (WIFEXITED(wait_status))
> +			child_code = WEXITSTATUS(wait_status);
> +		else if (WIFSIGNALED(wait_status))
> +			child_code = WTERMSIG(wait_status) + 128;

Do we care about emitting the same error (when it was signaled with
something other than SIGINT/SIGQUIT/SIGPIPE) as is reported by
wait_or_whine()?

If we want that error here, too, we could probably share the same code
here from here and in wait_or_whine(). I would probably write something
like:

    static int handle_awaited_status(int status, int *code)
    {
      if (WIFSIGNALED(status)) {
        *code = WTERMSIG(status);
        if (*code != SIGINT && *code != SIGQUIT && *code != SIGPIPE)
               error("%s died of signal %d", argv0, *code);
        /*
         * This return value is chosen so that code & 0xff
         * mimics the exit code that a POSIX shell would report for
         * a program that died from this signal.
         */
        *code += 128;
        return 1;
      } else if (WIFEXITED(status)) {
        *code = WEXITSTATUS(status);
        return 1;
      }
      return 0;
    }

so that we could call it in wait_or_whine() like:

    } else if (!handle_awaited_status(status, &code)) {
      error("waitpid is confused (%s)", argv0);
    }

and similarly here in this new function. Alternatively, if we don't want
that error, then it may help future readers to add a short comment
explaining why not.

> +/**
> + * Callback used by `start_bg_command()` to ask whether the
> + * child process is ready or needs more time to become ready.
> + *
> + * Returns 1 is child needs more time (subject to the requested timeout).
> + * Returns 0 if child is ready.
> + * Returns -1 on any error and cause `start_bg_command()` to also error out.
> + */
> +typedef int(start_bg_wait_cb)(void *cb_data,
> +			      const struct child_process *cmd);

Nitpicking, but typically I would assume that the "extra" void pointer
is the last argument in a callback. It definitely does not matter,
though.

Thanks,
Taylor

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Taylor Blau wrote (reply to this):

On Thu, Sep 16, 2021 at 12:53:07AM -0400, Taylor Blau wrote:
> > +/**
> > + * Callback used by `start_bg_command()` to ask whether the
> > + * child process is ready or needs more time to become ready.
> > + *
> > + * Returns 1 is child needs more time (subject to the requested timeout).
> > + * Returns 0 if child is ready.
> > + * Returns -1 on any error and cause `start_bg_command()` to also error out.
> > + */
> > +typedef int(start_bg_wait_cb)(void *cb_data,
> > +			      const struct child_process *cmd);
>
> Nitpicking, but typically I would assume that the "extra" void pointer
> is the last argument in a callback. It definitely does not matter,
> though.

Looking at the last patch (which adds the first implementation of one of
these callbacks) it appears that this cb_data pointer is unused. I
assume that it is used in later patches which aren't in this topic?

If so, then it may help future readers to indicate as much in the patch
message. Perhaps "the cb_data argument in the start_bg_wait_cb callback
is unused in this series, but will be useful in later patches".

Thanks,
Taylor

@@ -9,6 +9,7 @@
#include "parse-options.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Taylor Blau wrote (reply to this):

On Wed, Sep 15, 2021 at 08:36:17PM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Convert test helper to use `start_bg_command()` when spawning a server
> daemon in the background rather than blocks of platform-specific code.
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  t/helper/test-simple-ipc.c | 193 ++++++++-----------------------------
>  1 file changed, 40 insertions(+), 153 deletions(-)
>
> diff --git a/t/helper/test-simple-ipc.c b/t/helper/test-simple-ipc.c
> index 91345180750..59a950f3b00 100644
> --- a/t/helper/test-simple-ipc.c
> +++ b/t/helper/test-simple-ipc.c
> @@ -9,6 +9,7 @@
>  #include "parse-options.h"
>  #include "thread-utils.h"
>  #include "strvec.h"
> +#include "run-command.h"
>
>  #ifndef SUPPORTS_SIMPLE_IPC
>  int cmd__simple_ipc(int argc, const char **argv)
> @@ -274,178 +275,64 @@ static int daemon__run_server(void)
>  	return ret;
>  }
>
> -#ifndef GIT_WINDOWS_NATIVE
> -/*
> - * This is adapted from `daemonize()`.  Use `fork()` to directly create and
> - * run the daemon in a child process.
> - */
> -static int spawn_server(pid_t *pid)
> -{
> -	struct ipc_server_opts opts = {
> -		.nr_threads = cl_args.nr_threads,
> -	};
> +static start_bg_wait_cb bg_wait_cb;

This whole patch is delightful to read, as the new implementation is so
much cleaner as a result of the earlier work in this series.

Am I correct in assuming that this is to encourage a compiler error if
bg_wait_cb does not satisfy the type of start_bg_wait_cb? If so, then I
think we are already getting that by trying to pass bg_wait_cb to
start_bg_command().

E.g., applying this (intentionally broken) diff on top:

--- 8< ---

diff --git a/t/helper/test-simple-ipc.c b/t/helper/test-simple-ipc.c
index 59a950f3b0..3aed787206 100644
--- a/t/helper/test-simple-ipc.c
+++ b/t/helper/test-simple-ipc.c
@@ -275,9 +275,7 @@ static int daemon__run_server(void)
 	return ret;
 }

-static start_bg_wait_cb bg_wait_cb;
-
-static int bg_wait_cb(void *cb_data, const struct child_process *cp)
+static int bg_wait_cb(const void *cb_data, const struct child_process *cp)
 {
 	int s = ipc_get_active_state(cl_args.path);

--- >8 ---

and then compiling still warns of a mismatched type when calling
start_bg_command().

> -	*pid = fork();
> -
> -	switch (*pid) {
> -	case 0:
> -		if (setsid() == -1)
> -			error_errno(_("setsid failed"));
> -		close(0);
> -		close(1);
> -		close(2);
> -		sanitize_stdfds();
> +static int bg_wait_cb(void *cb_data, const struct child_process *cp)
> +{
> +	int s = ipc_get_active_state(cl_args.path);
>
> -		return ipc_server_run(cl_args.path, &opts, test_app_cb,
> -				      (void*)&my_app_data);
> +	switch (s) {
> +	case IPC_STATE__LISTENING:
> +		/* child is "ready" */
> +		return 0;
>
> -	case -1:
> -		return error_errno(_("could not spawn daemon in the background"));
> +	case IPC_STATE__NOT_LISTENING:
> +	case IPC_STATE__PATH_NOT_FOUND:
> +		/* give child more time */
> +		return 1;
>
>  	default:

I'm always a little hesitant to have default cases when switch over enum
types, since it suppresses the warning when there's a new value of that
type. But we already have a similar default in client__probe_server().

> -		else if (pid_seen == pid_child) {
> -			/*
> -			 * The new child daemon process shutdown while
> -			 * it was starting up, so it is not listening
> -			 * on the socket.
> -			 *
> -			 * Try to ping the socket in the odd chance
> -			 * that another daemon started (or was already
> -			 * running) while our child was starting.
> -			 *
> -			 * Again, we don't care who services the socket.
> -			 */
> -			s = ipc_get_active_state(cl_args.path);
> -			if (s == IPC_STATE__LISTENING)
> -				return 0;
> +	default:

Ditto.

Thanks,
Taylor

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 9/16/21 1:06 AM, Taylor Blau wrote:
> On Wed, Sep 15, 2021 at 08:36:17PM +0000, Jeff Hostetler via GitGitGadget wrote:
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Convert test helper to use `start_bg_command()` when spawning a server
>> daemon in the background rather than blocks of platform-specific code.
>>
>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
>> ---
>>   t/helper/test-simple-ipc.c | 193 ++++++++-----------------------------
>>   1 file changed, 40 insertions(+), 153 deletions(-)
>>
>> diff --git a/t/helper/test-simple-ipc.c b/t/helper/test-simple-ipc.c
>> index 91345180750..59a950f3b00 100644
>> --- a/t/helper/test-simple-ipc.c
>> +++ b/t/helper/test-simple-ipc.c
>> @@ -9,6 +9,7 @@
>>   #include "parse-options.h"
>>   #include "thread-utils.h"
>>   #include "strvec.h"
>> +#include "run-command.h"
>>
>>   #ifndef SUPPORTS_SIMPLE_IPC
>>   int cmd__simple_ipc(int argc, const char **argv)
>> @@ -274,178 +275,64 @@ static int daemon__run_server(void)
>>   	return ret;
>>   }
>>
>> -#ifndef GIT_WINDOWS_NATIVE
>> -/*
>> - * This is adapted from `daemonize()`.  Use `fork()` to directly create and
>> - * run the daemon in a child process.
>> - */
>> -static int spawn_server(pid_t *pid)
>> -{
>> -	struct ipc_server_opts opts = {
>> -		.nr_threads = cl_args.nr_threads,
>> -	};
>> +static start_bg_wait_cb bg_wait_cb;
> 
> This whole patch is delightful to read, as the new implementation is so
> much cleaner as a result of the earlier work in this series.
> 
> Am I correct in assuming that this is to encourage a compiler error if
> bg_wait_cb does not satisfy the type of start_bg_wait_cb? If so, then I
> think we are already getting that by trying to pass bg_wait_cb to
> start_bg_command().

I use that trick to get the compiler to give me a compiler error at the
point of the function declaration.

For example, If I add an arg to the function that doesn't match what's
in the prototype definition, I get:

t/helper/test-simple-ipc.c:280:12: error: conflicting types for 'bg_wait_cb'
static int bg_wait_cb(const struct child_process *cp, void *cb_data, int 
foo)
            ^
t/helper/test-simple-ipc.c:278:25: note: previous declaration is here
static start_bg_wait_cb bg_wait_cb;
                         ^
1 error generated.

Yes, we may get an error when the function pointer is referenced in
start_bg_command() or if we're using it to initialize a vtable or
something, but those errors are further away from the actual error
(and sometimes they can be a little cryptic).

Also, it helps document that this function's signature is predefined
for a reason.

It's a quirky trick I know, but it has served me well over the years.
	
> 
> E.g., applying this (intentionally broken) diff on top:
> 
> --- 8< ---
> 
> diff --git a/t/helper/test-simple-ipc.c b/t/helper/test-simple-ipc.c
> index 59a950f3b0..3aed787206 100644
> --- a/t/helper/test-simple-ipc.c
> +++ b/t/helper/test-simple-ipc.c
> @@ -275,9 +275,7 @@ static int daemon__run_server(void)
>   	return ret;
>   }
> 
> -static start_bg_wait_cb bg_wait_cb;
> -
> -static int bg_wait_cb(void *cb_data, const struct child_process *cp)
> +static int bg_wait_cb(const void *cb_data, const struct child_process *cp)
>   {
>   	int s = ipc_get_active_state(cl_args.path);
> 
> --- >8 ---
> 
> and then compiling still warns of a mismatched type when calling
> start_bg_command().
> 
>> -	*pid = fork();
>> -
>> -	switch (*pid) {
>> -	case 0:
>> -		if (setsid() == -1)
>> -			error_errno(_("setsid failed"));
>> -		close(0);
>> -		close(1);
>> -		close(2);
>> -		sanitize_stdfds();
>> +static int bg_wait_cb(void *cb_data, const struct child_process *cp)
>> +{
>> +	int s = ipc_get_active_state(cl_args.path);
>>
>> -		return ipc_server_run(cl_args.path, &opts, test_app_cb,
>> -				      (void*)&my_app_data);
>> +	switch (s) {
>> +	case IPC_STATE__LISTENING:
>> +		/* child is "ready" */
>> +		return 0;
>>
>> -	case -1:
>> -		return error_errno(_("could not spawn daemon in the background"));
>> +	case IPC_STATE__NOT_LISTENING:
>> +	case IPC_STATE__PATH_NOT_FOUND:
>> +		/* give child more time */
>> +		return 1;
>>
>>   	default:
> 
> I'm always a little hesitant to have default cases when switch over enum
> types, since it suppresses the warning when there's a new value of that
> type. But we already have a similar default in client__probe_server().

Do all compilers now handle switching over an enum and detect unhandled
cases?  Once upon a time that wasn't the case IIRC.

>...
> 
> Ditto.
> 
> Thanks,
> Taylor
> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Fri, Sep 17 2021, Jeff Hostetler wrote:

> On 9/16/21 1:06 AM, Taylor Blau wrote:
>> On Wed, Sep 15, 2021 at 08:36:17PM +0000, Jeff Hostetler via GitGitGadget wrote:
>>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>>
>>> Convert test helper to use `start_bg_command()` when spawning a server
>>> daemon in the background rather than blocks of platform-specific code.
>>>
>>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
>>> ---
>>>   t/helper/test-simple-ipc.c | 193 ++++++++-----------------------------
>>>   1 file changed, 40 insertions(+), 153 deletions(-)
>>>
>>> diff --git a/t/helper/test-simple-ipc.c b/t/helper/test-simple-ipc.c
>>> index 91345180750..59a950f3b00 100644
>>> --- a/t/helper/test-simple-ipc.c
>>> +++ b/t/helper/test-simple-ipc.c
>>> @@ -9,6 +9,7 @@
>>>   #include "parse-options.h"
>>>   #include "thread-utils.h"
>>>   #include "strvec.h"
>>> +#include "run-command.h"
>>>
>>>   #ifndef SUPPORTS_SIMPLE_IPC
>>>   int cmd__simple_ipc(int argc, const char **argv)
>>> @@ -274,178 +275,64 @@ static int daemon__run_server(void)
>>>   	return ret;
>>>   }
>>>
>>> -#ifndef GIT_WINDOWS_NATIVE
>>> -/*
>>> - * This is adapted from `daemonize()`.  Use `fork()` to directly create and
>>> - * run the daemon in a child process.
>>> - */
>>> -static int spawn_server(pid_t *pid)
>>> -{
>>> -	struct ipc_server_opts opts = {
>>> -		.nr_threads = cl_args.nr_threads,
>>> -	};
>>> +static start_bg_wait_cb bg_wait_cb;
>> This whole patch is delightful to read, as the new implementation is
>> so
>> much cleaner as a result of the earlier work in this series.
>> Am I correct in assuming that this is to encourage a compiler error
>> if
>> bg_wait_cb does not satisfy the type of start_bg_wait_cb? If so, then I
>> think we are already getting that by trying to pass bg_wait_cb to
>> start_bg_command().
>
> I use that trick to get the compiler to give me a compiler error at the
> point of the function declaration.
>
> For example, If I add an arg to the function that doesn't match what's
> in the prototype definition, I get:
>
> t/helper/test-simple-ipc.c:280:12: error: conflicting types for 'bg_wait_cb'
> static int bg_wait_cb(const struct child_process *cp, void *cb_data,
> int foo)
>            ^
> t/helper/test-simple-ipc.c:278:25: note: previous declaration is here
> static start_bg_wait_cb bg_wait_cb;
>                         ^
> 1 error generated.
>
> Yes, we may get an error when the function pointer is referenced in
> start_bg_command() or if we're using it to initialize a vtable or
> something, but those errors are further away from the actual error
> (and sometimes they can be a little cryptic).
>
> Also, it helps document that this function's signature is predefined
> for a reason.
>
> It's a quirky trick I know, but it has served me well over the years.

I haven't seen this idiom before. I think it's best to avoid patterns
designed to massage messages out of any specific compilers/versions.

It seems inevitable that it'll either be counter-productive or
redundant. Here with clang v11 doing this makes the warning
worse. I.e. without the forward declaration:

    t/helper/test-simple-ipc.c:315:31: error: incompatible function
    pointer types passing 'int (void *, const struct child_process *,
    int)' to parameter of type ' start_bg_wait_cb *' (aka 'int (*)(void
    *, const struct child_process *)')
    [-Werror,-Wincompatible-function-pointer-types]
            sbgr = start_bg_command(&cp, bg_wait_cb, NULL, cl_args.max_wait_sec);
                                         ^~~~~~~~~~
    ./run-command.h:564:29: note: passing argument to parameter 'wait_cb' here
                                      start_bg_wait_cb *wait_cb,
                                                        ^
    1 error generated.

I.e. we get the specific warning category for this type of error
(-Werror,-Wincompatible-function-pointer-types), and we're pointed at
the caller in question (which to be fair, it seems you don't prefer),
but also a reference to the run-command.h definition.

Most importantly, we get quoted what the type is/should be, which is
missing with the forward declaration. It's the equivalent of saying "you
did bad!" instead of "you did bad X, do Y instead!".

>> E.g., applying this (intentionally broken) diff on top:
>> --- 8< ---
>> diff --git a/t/helper/test-simple-ipc.c b/t/helper/test-simple-ipc.c
>> index 59a950f3b0..3aed787206 100644
>> --- a/t/helper/test-simple-ipc.c
>> +++ b/t/helper/test-simple-ipc.c
>> @@ -275,9 +275,7 @@ static int daemon__run_server(void)
>>   	return ret;
>>   }
>> -static start_bg_wait_cb bg_wait_cb;
>> -
>> -static int bg_wait_cb(void *cb_data, const struct child_process *cp)
>> +static int bg_wait_cb(const void *cb_data, const struct child_process *cp)
>>   {
>>   	int s = ipc_get_active_state(cl_args.path);
>> --- >8 ---
>> and then compiling still warns of a mismatched type when calling
>> start_bg_command().
>> 
>>> -	*pid = fork();
>>> -
>>> -	switch (*pid) {
>>> -	case 0:
>>> -		if (setsid() == -1)
>>> -			error_errno(_("setsid failed"));
>>> -		close(0);
>>> -		close(1);
>>> -		close(2);
>>> -		sanitize_stdfds();
>>> +static int bg_wait_cb(void *cb_data, const struct child_process *cp)
>>> +{
>>> +	int s = ipc_get_active_state(cl_args.path);
>>>
>>> -		return ipc_server_run(cl_args.path, &opts, test_app_cb,
>>> -				      (void*)&my_app_data);
>>> +	switch (s) {
>>> +	case IPC_STATE__LISTENING:
>>> +		/* child is "ready" */
>>> +		return 0;
>>>
>>> -	case -1:
>>> -		return error_errno(_("could not spawn daemon in the background"));
>>> +	case IPC_STATE__NOT_LISTENING:
>>> +	case IPC_STATE__PATH_NOT_FOUND:
>>> +		/* give child more time */
>>> +		return 1;
>>>
>>>   	default:
>> I'm always a little hesitant to have default cases when switch over
>> enum
>> types, since it suppresses the warning when there's a new value of that
>> type. But we already have a similar default in client__probe_server().
>
> Do all compilers now handle switching over an enum and detect unhandled
> cases?  Once upon a time that wasn't the case IIRC.

I don't think so, but the ones we widely use do, i.e. clang and gcc at
least.

For this sort of thing it really doesn't matter if *all* compilers
support it, since we'll only need to catch such "missing enum arm"
issues with one of them.

E.g. in my 338abb0f045 (builtins + test helpers: use return instead of
exit() in cmd_*, 2021-06-08) I fixed something that I've only gotten
Oracle SunCC to emit (gcc and clang don't detect it), but as long as
that one compiler does & someone checks it regularly...

By having a "default" case you're hiding that detection from the
compilers capable of detecting a logic error in this code, whereas if
the compiler can't do that it'll just ignore it.

trace2/tr2_tls.c Outdated
@@ -95,6 +95,7 @@ void tr2tls_unset_self(void)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Wed, Sep 15 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Do not leak the thread name (contained within the thread context) when
> a thread terminates.
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  trace2/tr2_tls.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c
> index 067c23755fb..7da94aba522 100644
> --- a/trace2/tr2_tls.c
> +++ b/trace2/tr2_tls.c
> @@ -95,6 +95,7 @@ void tr2tls_unset_self(void)
>  
>  	pthread_setspecific(tr2tls_key, NULL);
>  
> +	strbuf_release(&ctx->thread_name);
>  	free(ctx->array_us_start);
>  	free(ctx);
>  }

As noted in your cover letter:

 * A fix for a memory leak in the Trace2 code. (This was independently
   reported in last week in "ab/tr2-leaks-and-fixes".)

So I think this patch can be dropped from this series, since it's exact
duplicate of my 48f68715b14 (tr2: stop leaking "thread_name" memory,
2021-08-27) in ab/tr2-leaks-and-fixes, currently in "next" and marked
for a merge with master.

When submitting a series that depends on another one it's best to rebase
it on top of it & indicate it as such in the cover letter, Junio can
queue such a series on top of another one.

In this case I'm still not sure why this fix is here, i.e. surely
nothing later in the series absolutely needs this stray memory leak
fix...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Taylor Blau wrote (reply to this):

On Thu, Sep 16, 2021 at 07:35:59AM +0200, Ævar Arnfjörð Bjarmason wrote:
> So I think this patch can be dropped from this series, since it's exact
> duplicate of my 48f68715b14 (tr2: stop leaking "thread_name" memory,
> 2021-08-27) in ab/tr2-leaks-and-fixes, currently in "next" and marked
> for a merge with master.

I agree it can be dropped.

> When submitting a series that depends on another one it's best to rebase
> it on top of it & indicate it as such in the cover letter, Junio can
> queue such a series on top of another one.
>
> In this case I'm still not sure why this fix is here, i.e. surely
> nothing later in the series absolutely needs this stray memory leak
> fix...

But there's no need for Jeff to depend on your branch, since (as you
mentioned) this cleanup isn't relevant for anything else in this series,
which is a sort of grab-bag of miscellaneous clean-ups.

Thanks,
Taylor

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Thu, Sep 16 2021, Taylor Blau wrote:

> On Thu, Sep 16, 2021 at 07:35:59AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> So I think this patch can be dropped from this series, since it's exact
>> duplicate of my 48f68715b14 (tr2: stop leaking "thread_name" memory,
>> 2021-08-27) in ab/tr2-leaks-and-fixes, currently in "next" and marked
>> for a merge with master.
>
> I agree it can be dropped.
>
>> When submitting a series that depends on another one it's best to rebase
>> it on top of it & indicate it as such in the cover letter, Junio can
>> queue such a series on top of another one.
>>
>> In this case I'm still not sure why this fix is here, i.e. surely
>> nothing later in the series absolutely needs this stray memory leak
>> fix...
>
> But there's no need for Jeff to depend on your branch, since (as you
> mentioned) this cleanup isn't relevant for anything else in this series,
> which is a sort of grab-bag of miscellaneous clean-ups.

Indeed, to be clear it was just general advice about queue-on-top.

But to clarify what I was getting at here: If we just came up with the
same diff I'd have assumed Jeff just hadn't need the change in "next",
but since he clearly has I was confused by it being here.

I.e. it doesn't *seem* like anything in the rest of the series depends
on it, so why have it here at all since the bug is being fixed anyway?
Or if it does depend on it in some subtle way I've missed, perhaps it
does need to be queued on top of ab/tr2-leaks-and-fixes, and the
relevant commit/subtle dependency needs to be called out in a commit
message.

Or maybe Jeff had just come up with this independently, noticed it just
before submission and just updated the CL, not the patch or series
itself :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 9/16/21 4:01 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Sep 16 2021, Taylor Blau wrote:
> 
>> On Thu, Sep 16, 2021 at 07:35:59AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>> So I think this patch can be dropped from this series, since it's exact
>>> duplicate of my 48f68715b14 (tr2: stop leaking "thread_name" memory,
>>> 2021-08-27) in ab/tr2-leaks-and-fixes, currently in "next" and marked
>>> for a merge with master.
>>
>> I agree it can be dropped.
>>
>>> When submitting a series that depends on another one it's best to rebase
>>> it on top of it & indicate it as such in the cover letter, Junio can
>>> queue such a series on top of another one.
>>>
>>> In this case I'm still not sure why this fix is here, i.e. surely
>>> nothing later in the series absolutely needs this stray memory leak
>>> fix...
>>
>> But there's no need for Jeff to depend on your branch, since (as you
>> mentioned) this cleanup isn't relevant for anything else in this series,
>> which is a sort of grab-bag of miscellaneous clean-ups.
> 
> Indeed, to be clear it was just general advice about queue-on-top.
> 
> But to clarify what I was getting at here: If we just came up with the
> same diff I'd have assumed Jeff just hadn't need the change in "next",
> but since he clearly has I was confused by it being here.
> 
> I.e. it doesn't *seem* like anything in the rest of the series depends
> on it, so why have it here at all since the bug is being fixed anyway?
> Or if it does depend on it in some subtle way I've missed, perhaps it
> does need to be queued on top of ab/tr2-leaks-and-fixes, and the
> relevant commit/subtle dependency needs to be called out in a commit
> message.
> 
> Or maybe Jeff had just come up with this independently, noticed it just
> before submission and just updated the CL, not the patch or series
> itself :)
> 

I'll drop this commit since your version is already queued up
and headed to master.  I've been carrying it in my dev branch
for a while and was using it to make leak reporting a little
quieter.

And yes, I just noticed that yours had advanced when I wrote the
cover letter and ACKd it rather than dropping it.

And no, nothing in the rest of the whole FSMonitor series depends
on this, so I can leave my series based upon master rather than
your branch.

Thanks
Jeff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Thu, Sep 16 2021, Jeff Hostetler wrote:

> On 9/16/21 4:01 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Thu, Sep 16 2021, Taylor Blau wrote:
>> 
>>> On Thu, Sep 16, 2021 at 07:35:59AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>>> So I think this patch can be dropped from this series, since it's exact
>>>> duplicate of my 48f68715b14 (tr2: stop leaking "thread_name" memory,
>>>> 2021-08-27) in ab/tr2-leaks-and-fixes, currently in "next" and marked
>>>> for a merge with master.
>>>
>>> I agree it can be dropped.
>>>
>>>> When submitting a series that depends on another one it's best to rebase
>>>> it on top of it & indicate it as such in the cover letter, Junio can
>>>> queue such a series on top of another one.
>>>>
>>>> In this case I'm still not sure why this fix is here, i.e. surely
>>>> nothing later in the series absolutely needs this stray memory leak
>>>> fix...
>>>
>>> But there's no need for Jeff to depend on your branch, since (as you
>>> mentioned) this cleanup isn't relevant for anything else in this series,
>>> which is a sort of grab-bag of miscellaneous clean-ups.
>> Indeed, to be clear it was just general advice about queue-on-top.
>> But to clarify what I was getting at here: If we just came up with
>> the
>> same diff I'd have assumed Jeff just hadn't need the change in "next",
>> but since he clearly has I was confused by it being here.
>> I.e. it doesn't *seem* like anything in the rest of the series
>> depends
>> on it, so why have it here at all since the bug is being fixed anyway?
>> Or if it does depend on it in some subtle way I've missed, perhaps it
>> does need to be queued on top of ab/tr2-leaks-and-fixes, and the
>> relevant commit/subtle dependency needs to be called out in a commit
>> message.
>> Or maybe Jeff had just come up with this independently, noticed it
>> just
>> before submission and just updated the CL, not the patch or series
>> itself :)
>> 
>
> I'll drop this commit since your version is already queued up
> and headed to master.  I've been carrying it in my dev branch
> for a while and was using it to make leak reporting a little
> quieter.
>
> And yes, I just noticed that yours had advanced when I wrote the
> cover letter and ACKd it rather than dropping it.
>
> And no, nothing in the rest of the whole FSMonitor series depends
> on this, so I can leave my series based upon master rather than
> your branch.
>
> Thanks
> Jeff

Sounds good, thanks for clarifying.

In any case by the time you'll re-roll this (or soon thereafter) Junio
will probably have merged it down anyway.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Indeed, to be clear it was just general advice about queue-on-top.
>
> But to clarify what I was getting at here: If we just came up with the
> same diff I'd have assumed Jeff just hadn't need the change in "next",
> but since he clearly has I was confused by it being here.
>
> I.e. it doesn't *seem* like anything in the rest of the series depends
> on it, so why have it here at all since the bug is being fixed anyway?

I'd imagine that it was there just for the same reason series from
some people (yours included) tend to bloat, either over iterations
or from day one, by including "this is not necessary for the end
goal of this topic at all, but since I noticed it, I am including
this fix, which should be obvious enough" unrelated fix.

Here is a lesson to be learned.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 16, 2021

User Ævar Arnfjörð Bjarmason <avarab@gmail.com> has been added to the cc: list.

@@ -49,6 +49,9 @@ static enum ipc_active_state get_active_state(wchar_t *pipe_path)
if (GetLastError() == ERROR_FILE_NOT_FOUND)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Wed, Sep 15 2021, Jeff Hostetler via GitGitGadget wrote:

> @@ -112,6 +115,11 @@ static enum ipc_active_state connect_to_server(
>  				if (GetLastError() == ERROR_SEM_TIMEOUT)
>  					return IPC_STATE__NOT_LISTENING;
>  
> +				gle = GetLastError();
> +				trace2_data_intmax("ipc-debug", NULL,
> +						   "connect/waitpipe/gle",
> +						   (intmax_t)gle);
> +
>  				return IPC_STATE__OTHER_ERROR;
>  			}
>  

I've never used this Win32 API (or well, any Win32 API) but I'm guessing
that GetLastError() isn't here to check an error in GetLastError() itself.

Earlier in this function added in your 59c7b88198a (simple-ipc: add
win32 implementation, 2021-03-15) we assign to "gle", I'd really expect...:

>  	if (!SetNamedPipeHandleState(hPipe, &mode, NULL, NULL)) {
> +		gle = GetLastError();
> +		trace2_data_intmax("ipc-debug", NULL,
> +				   "connect/setpipestate/gle",
> +				   (intmax_t)gle);
> +
>  		CloseHandle(hPipe);
>  		return IPC_STATE__OTHER_ERROR;
>  	}
>  
>  	*pfd = _open_osfhandle((intptr_t)hPipe, O_RDWR|O_BINARY);
>  	if (*pfd < 0) {
> +		gle = GetLastError();
> +		trace2_data_intmax("ipc-debug", NULL,
> +				   "connect/openosfhandle/gle",
> +				   (intmax_t)gle);
> +
>  		CloseHandle(hPipe);
>  		return IPC_STATE__OTHER_ERROR;
>  	}

...something that looks exactly like this. I.e. as shown by the below
hunk-at-the-end, as it is I'm either missing some subtlety that could
really use explaining. I.e. this reads like:

    int saved_errno = errno;
    if (syscall()) {
        if (errno)
            die("bad");
        saved_errno = errno;
        log_it("...%d", saved_errno);
    }

When surely we want either of:

    int saved_errno = errno;
    if (syscall()) {
        if (errno)
            die("bad");
        log_it("...%d", errno);
    }

Or better yet (and consistent with the rest of your code):

    int saved_errno = errno;
    if (syscall()) {
        saved_errno = errno;
        if (saved_errno)
            die("bad");
        log_it("...%d", saved_errno);
    }

diff --git a/compat/simple-ipc/ipc-win32.c b/compat/simple-ipc/ipc-win32.c
index 8dc7bda087d..b0c422e4867 100644
--- a/compat/simple-ipc/ipc-win32.c
+++ b/compat/simple-ipc/ipc-win32.c
@@ -109,9 +109,12 @@ static enum ipc_active_state connect_to_server(
 			t_start_ms = (DWORD)(getnanotime() / 1000000);
 
 			if (!WaitNamedPipeW(wpath, timeout_ms)) {
-				if (GetLastError() == ERROR_SEM_TIMEOUT)
+				gle = GetLastError();
+				if (gle == ERROR_SEM_TIMEOUT)
 					return IPC_STATE__NOT_LISTENING;
 
+				/* ...rest of your patch */
+
 				return IPC_STATE__OTHER_ERROR;
 			}
 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 9/16/21 1:40 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Sep 15 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> @@ -112,6 +115,11 @@ static enum ipc_active_state connect_to_server(
>>   				if (GetLastError() == ERROR_SEM_TIMEOUT)
>>   					return IPC_STATE__NOT_LISTENING;
>>   
>> +				gle = GetLastError();
>> +				trace2_data_intmax("ipc-debug", NULL,
>> +						   "connect/waitpipe/gle",
>> +						   (intmax_t)gle);
>> +
>>   				return IPC_STATE__OTHER_ERROR;
>>   			}
>>   
>... 
 >
> @@ -109,9 +109,12 @@ static enum ipc_active_state connect_to_server(
>   			t_start_ms = (DWORD)(getnanotime() / 1000000);
>   
>   			if (!WaitNamedPipeW(wpath, timeout_ms)) {
> -				if (GetLastError() == ERROR_SEM_TIMEOUT)
> +				gle = GetLastError();
> +				if (gle == ERROR_SEM_TIMEOUT)
>   					return IPC_STATE__NOT_LISTENING;
>   
> +				/* ...rest of your patch */
> +
>   				return IPC_STATE__OTHER_ERROR;
>   			}
>   
> 

Yeah, I was just trying to minimize the size of the diff
and at the time was considering those debug messages to be
temporary, but I kind of like having them so it makes sense
to clean up a bit as you've indicated.

Thanks
Jeff

@@ -3,6 +3,8 @@
#include "strbuf.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Wed, Sep 15 2021, Jeff Hostetler via GitGitGadget wrote:

> +struct my_sa_data
> +{
> +	PSID pEveryoneSID;
> +	PACL pACL;
> +	PSECURITY_DESCRIPTOR pSD;
> +	LPSECURITY_ATTRIBUTES lpSA;
> +};
> +
> +static void init_sa(struct my_sa_data *d)
> +{
> +	memset(d, 0, sizeof(*d));
> +}
> +
> +static void release_sa(struct my_sa_data *d)
> +{
> +	if (d->pEveryoneSID)
> +		FreeSid(d->pEveryoneSID);
> +	if (d->pACL)
> +		LocalFree(d->pACL);
> +	if (d->pSD)
> +		LocalFree(d->pSD);
> +	if (d->lpSA)
> +		LocalFree(d->lpSA);
> +
> +	memset(d, 0, sizeof(*d));
> +}

[...]

> +fail:
> +	release_sa(d);
> +	return NULL;
> +}
> +
>  static HANDLE create_new_pipe(wchar_t *wpath, int is_first)
>  {
>  	HANDLE hPipe;
>  	DWORD dwOpenMode, dwPipeMode;
> -	LPSECURITY_ATTRIBUTES lpsa = NULL;
> +	struct my_sa_data my_sa_data;
> +
> +	init_sa(&my_sa_data);
>  

[...]

>  	hPipe = CreateNamedPipeW(wpath, dwOpenMode, dwPipeMode,
> -				 PIPE_UNLIMITED_INSTANCES, 1024, 1024, 0, lpsa);
> +				 PIPE_UNLIMITED_INSTANCES, 1024, 1024, 0,
> +				 my_sa_data.lpSA);
> +
> +	release_sa(&my_sa_data);
>  
>  	return hPipe;
>  }

Just a nit on the init pattern, since we always allocate this on the
stack (this as all the relevant "struct my_sa_data") I'd have thought to
see:

    #define INIT_MY_SA_DATA { 0 }
    [...]
    struct my_sa_data my_sa_data = INIT_MY_SA_DATA;

Which gets rid of the need for an init_sa() function.

Also having the release_sa() do a memset() is a bit odd, usually we have
a reset*() function do that if the intent is to re-use, but it doesn't
appear to be in this case, and we don't return this data anywhere, do
we?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 9/16/21 1:47 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Sep 15 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> +struct my_sa_data
>> +{
>> +	PSID pEveryoneSID;
>> +	PACL pACL;
>> +	PSECURITY_DESCRIPTOR pSD;
>> +	LPSECURITY_ATTRIBUTES lpSA;
>> +};
>> +
>> +static void init_sa(struct my_sa_data *d)
>> +{
>> +	memset(d, 0, sizeof(*d));
>> +}
>> +
>> +static void release_sa(struct my_sa_data *d)
>> +{
>> +	if (d->pEveryoneSID)
>> +		FreeSid(d->pEveryoneSID);
>> +	if (d->pACL)
>> +		LocalFree(d->pACL);
>> +	if (d->pSD)
>> +		LocalFree(d->pSD);
>> +	if (d->lpSA)
>> +		LocalFree(d->lpSA);
>> +
>> +	memset(d, 0, sizeof(*d));
>> +}
> 
> [...]
> 
>> +fail:
>> +	release_sa(d);
>> +	return NULL;
>> +}
>> +
>>   static HANDLE create_new_pipe(wchar_t *wpath, int is_first)
>>   {
>>   	HANDLE hPipe;
>>   	DWORD dwOpenMode, dwPipeMode;
>> -	LPSECURITY_ATTRIBUTES lpsa = NULL;
>> +	struct my_sa_data my_sa_data;
>> +
>> +	init_sa(&my_sa_data);
>>   
> 
> [...]
> 
>>   	hPipe = CreateNamedPipeW(wpath, dwOpenMode, dwPipeMode,
>> -				 PIPE_UNLIMITED_INSTANCES, 1024, 1024, 0, lpsa);
>> +				 PIPE_UNLIMITED_INSTANCES, 1024, 1024, 0,
>> +				 my_sa_data.lpSA);
>> +
>> +	release_sa(&my_sa_data);
>>   
>>   	return hPipe;
>>   }
> 
> Just a nit on the init pattern, since we always allocate this on the
> stack (this as all the relevant "struct my_sa_data") I'd have thought to
> see:
> 
>      #define INIT_MY_SA_DATA { 0 }
>      [...]
>      struct my_sa_data my_sa_data = INIT_MY_SA_DATA;
> 
> Which gets rid of the need for an init_sa() function.

The current "my_sa_data" is just a set of 4 pointers, so yes your
INIT_MY_SA_DATA macro and my init_sa function are equivalent.
(And assuming that mine and memset are inlined by the compiler, they
are both probably exactly equivalent.)   So it really doesn't matter
one way or the other.

> 
> Also having the release_sa() do a memset() is a bit odd, usually we have
> a reset*() function do that if the intent is to re-use, but it doesn't
> appear to be in this case, and we don't return this data anywhere, do
> we?
> 

I use the release_sa() function inside my "fail:" label in get_sa()
to cleanup if any of the component parts of the SA cannot be created.
So the return value of get_sa() is either fully constructed or contains
NULL pointers.

(The docs are jargon heavy and little obscure and circular and it
is not at all clear if/when we might fail to create some of these
sub-structures.)

So I allow the SA failure to be silently ignored and we create the named
pipe without an ACL.  Normally, this is fine since the daemon usually
isn't elevated.

In create_new_pipe we always call release_sa to free the pointers if
necessary.  (So in case of a failure in get_sa, we won't double free
the pointers when create_new_pipe calls release_sa.)

Another way to think of it is that in release_sa, I'd like to have
FREE_AND_NULL() variants for FreeSid() and LocalFree(), but a simple
memset is sufficient.


Jeff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Fri, Sep 17 2021, Jeff Hostetler wrote:

> On 9/16/21 1:47 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Wed, Sep 15 2021, Jeff Hostetler via GitGitGadget wrote:
>> 
>>> +struct my_sa_data
>>> +{
>>> +	PSID pEveryoneSID;
>>> +	PACL pACL;
>>> +	PSECURITY_DESCRIPTOR pSD;
>>> +	LPSECURITY_ATTRIBUTES lpSA;
>>> +};
>>> +
>>> +static void init_sa(struct my_sa_data *d)
>>> +{
>>> +	memset(d, 0, sizeof(*d));
>>> +}
>>> +
>>> +static void release_sa(struct my_sa_data *d)
>>> +{
>>> +	if (d->pEveryoneSID)
>>> +		FreeSid(d->pEveryoneSID);
>>> +	if (d->pACL)
>>> +		LocalFree(d->pACL);
>>> +	if (d->pSD)
>>> +		LocalFree(d->pSD);
>>> +	if (d->lpSA)
>>> +		LocalFree(d->lpSA);
>>> +
>>> +	memset(d, 0, sizeof(*d));
>>> +}
>> [...]
>> 
>>> +fail:
>>> +	release_sa(d);
>>> +	return NULL;
>>> +}
>>> +
>>>   static HANDLE create_new_pipe(wchar_t *wpath, int is_first)
>>>   {
>>>   	HANDLE hPipe;
>>>   	DWORD dwOpenMode, dwPipeMode;
>>> -	LPSECURITY_ATTRIBUTES lpsa = NULL;
>>> +	struct my_sa_data my_sa_data;
>>> +
>>> +	init_sa(&my_sa_data);
>>>   
>> [...]
>> 
>>>   	hPipe = CreateNamedPipeW(wpath, dwOpenMode, dwPipeMode,
>>> -				 PIPE_UNLIMITED_INSTANCES, 1024, 1024, 0, lpsa);
>>> +				 PIPE_UNLIMITED_INSTANCES, 1024, 1024, 0,
>>> +				 my_sa_data.lpSA);
>>> +
>>> +	release_sa(&my_sa_data);
>>>     	return hPipe;
>>>   }
>> Just a nit on the init pattern, since we always allocate this on the
>> stack (this as all the relevant "struct my_sa_data") I'd have thought to
>> see:
>>      #define INIT_MY_SA_DATA { 0 }
>>      [...]
>>      struct my_sa_data my_sa_data = INIT_MY_SA_DATA;
>> Which gets rid of the need for an init_sa() function.
>
> The current "my_sa_data" is just a set of 4 pointers, so yes your
> INIT_MY_SA_DATA macro and my init_sa function are equivalent.
> (And assuming that mine and memset are inlined by the compiler, they
> are both probably exactly equivalent.)   So it really doesn't matter
> one way or the other.

Yes it's the same to the compiler, see 5726a6b4012 (*.c *_init(): define
in terms of corresponding *_INIT macro, 2021-07-01).

But being the same to the compiler != the same to human readers. I was
going for the latter here, i.e. in git.git init with a macro tends to be
used if the init is simple enough to be run like that, and with a
function if it really does need to run some function (and then after
5726a6b4012 the init-via-function-via-macro for things that need init
after malloc() or whatever).

Whereas this one's just on the stack, and doesn't need anything special,
so the nit was about just preferring the simplest construct for the job.

>> Also having the release_sa() do a memset() is a bit odd, usually we
>> have
>> a reset*() function do that if the intent is to re-use, but it doesn't
>> appear to be in this case, and we don't return this data anywhere, do
>> we?
>> 
>
> I use the release_sa() function inside my "fail:" label in get_sa()
> to cleanup if any of the component parts of the SA cannot be created.
> So the return value of get_sa() is either fully constructed or contains
> NULL pointers.
>
> (The docs are jargon heavy and little obscure and circular and it
> is not at all clear if/when we might fail to create some of these
> sub-structures.)
>
> So I allow the SA failure to be silently ignored and we create the named
> pipe without an ACL.  Normally, this is fine since the daemon usually
> isn't elevated.
>
> In create_new_pipe we always call release_sa to free the pointers if
> necessary.  (So in case of a failure in get_sa, we won't double free
> the pointers when create_new_pipe calls release_sa.)
>
> Another way to think of it is that in release_sa, I'd like to have
> FREE_AND_NULL() variants for FreeSid() and LocalFree(), but a simple
> memset is sufficient.

*nod*, I meant if functions are doing that sort of intent-to-reuse we
 usually call them *_reset(), with a *_release() that's just a plain
 free() and forget.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 16, 2021

This patch series was integrated into seen via git@5248496.

@@ -9,6 +9,7 @@
#include "parse-options.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Wed, Sep 15 2021, Jeff Hostetler via GitGitGadget wrote:

[...]

> +	default:
> +	case SBGR_ERROR:
> +	case SBGR_CB_ERROR:
> +		return error("daemon failed to start");
>  
> -			/*
> -			 * We don't care about the WEXITSTATUS() nor
> -			 * any of the WIF*(status) values because
> -			 * `cmd__simple_ipc()` does the `!!result`
> -			 * trick on all function return values.
> -			 *
> -			 * So it is sufficient to just report the
> -			 * early shutdown as an error.
> -			 */
> -			return error(_("daemon failed to start"));
> -		}
> +	case SBGR_TIMEOUT:
> +		return error("daemon not online yet");
>  
> -		else
> -			return error(_("waitpid is confused"));
> +	case SBGR_DIED:
> +		return error("daemon terminated");
>  	}
>  }

It's not mentioned in the commit message, but the while-we're-at-it
dropping of _() makes sense here, it shouldn't have been used in a test
helper to begin with. I.e. translators don't need to be translating
stuff purely internal to the test suite.

@@ -1901,3 +1901,126 @@ void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Wed, Sep 15 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Create a variation of `run_command()` and `start_command()` to launch a command
> into the background and optionally wait for it to become "ready" before returning.
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  run-command.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  run-command.h |  48 ++++++++++++++++++++
>  2 files changed, 171 insertions(+)
>
> diff --git a/run-command.c b/run-command.c
> index 3e4e082e94d..fe75fd08f74 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -1901,3 +1901,126 @@ void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir)
>  	}
>  	strvec_pushf(env_array, "%s=%s", GIT_DIR_ENVIRONMENT, new_git_dir);
>  }
> +
> +enum start_bg_result start_bg_command(struct child_process *cmd,
> +				      start_bg_wait_cb *wait_cb,
> +				      void *cb_data,
> +				      unsigned int timeout_sec)
> +{
> +	enum start_bg_result sbgr = SBGR_ERROR;
> +	int ret;
> +	int wait_status;
> +	pid_t pid_seen;
> +	time_t time_limit;
> +
> +	/*
> +	 * Silently disallow child cleanup -- even if requested.
> +	 * The child process should persist in the background
> +	 * and possibly/probably after this process exits.  That
> +	 * is, don't kill the child during our atexit routine.
> +	 */
> +	cmd->clean_on_exit = 0;

Since we have no existing users, can we change this to:

if (cmd->clean_on_exit)
    BUG("start_bg_command() doesn't support non-zero clean_on_exit");

Just silently discarding what the caller asked for seems like the wrong
thing to do, why the silence?

> +
> +	ret = start_command(cmd);
> +	if (ret) {
> +		/*
> +		 * We assume that if `start_command()` fails, we
> +		 * either get a complete `trace2_child_start() /
> +		 * trace2_child_exit()` pair or it fails before the
> +		 * `trace2_child_start()` is emitted, so we do not
> +		 * need to worry about it here.
> +		 *
> +		 * We also assume that `start_command()` does not add
> +		 * us to the cleanup list.  And that it calls
> +		 * calls `child_process_clear()`.
> +		 */

These all look like sensible things to assume, but I think commentary /
writing on this would be much better just documenting that the
start_command() API does this in its comment in run-command.h, or
perhaps the comment starting with "The functions: child_process_init".

> +		sbgr = SBGR_ERROR;
> +		goto done;
> +	}
> +
> +	time(&time_limit);
> +	time_limit += timeout_sec;
> +
> +wait:
> +	pid_seen = waitpid(cmd->pid, &wait_status, WNOHANG);
> +
> +	if (pid_seen == 0) {
> +		/*
> +		 * The child is currently running.  Ask the callback
> +		 * if the child is ready to do work or whether we
> +		 * should keep waiting for it to boot up.
> +		 */
> +		ret = (*wait_cb)(cb_data, cmd);
> +		if (!ret) {
> +			/*
> +			 * The child is running and "ready".
> +			 *
> +			 * NEEDSWORK: As we prepare to orphan (release to
> +			 * the background) this child, it is not appropriate
> +			 * to emit a `trace2_child_exit()` event.  Should we
> +			 * create a new event for this case?
> +			 */
> +			sbgr = SBGR_READY;
> +			goto done;

Per api-trace2.txt:

    `"child_exit"`::
            This event is generated after the current process has returned
            from the waitpid() and collected the exit information from the
            child.

My (perhaps wrong) reading of that is that yes, we should do that, after
all if we've released the child are we otherwise going to hear from them
again in any way trace2 could log with a child_exit later?

Ditto the "commentary better elsewhere" whatever the result of this
discussion is, let's add a note to api-trace2.txt about how detached
children are handled by child_exit events.

> [...]
> +			 * NEEDSWORK: Like the "ready" case, should we
> +			 * log a custom child-something Trace2 event here?
> +			 */
> +			sbgr = SBGR_TIMEOUT;
> +			goto done;

[...]

> +		} else {
> +			/*
> +			 * The cb gave up on this child.
> +			 *
> +			 * NEEDSWORK: Like above, should we log a custom
> +			 * Trace2 child-something event here?
> +			 */
> +			sbgr = SBGR_CB_ERROR;
> +			goto done;

*ditto*

> +		}
> +	}
> +
> +	if (pid_seen == cmd->pid) {
> +		int child_code = -1;
> +
> +		/*
> +		 * The child started, but exited or was terminated
> +		 * before becoming "ready".
> +		 *
> +		 * We try to match the behavior of `wait_or_whine()`
> +		 * and convert the child's status to a return code for
> +		 * tracing purposes and emit the `trace2_child_exit()`
> +		 * event.
> +		 */
> +		if (WIFEXITED(wait_status))
> +			child_code = WEXITSTATUS(wait_status);
> +		else if (WIFSIGNALED(wait_status))
> +			child_code = WTERMSIG(wait_status) + 128;
> +		trace2_child_exit(cmd, child_code);

Although with the above: Perhaps I've missed some subtleties...

> [...]
> diff --git a/run-command.h b/run-command.h
> index af1296769f9..58065197d1b 100644
> --- a/run-command.h
> +++ b/run-command.h
> @@ -496,4 +496,52 @@ int run_processes_parallel_tr2(int n, get_next_task_fn, start_failure_fn,
>   */
>  void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir);
>  
> +/**
> + * Possible return values for `start_bg_command()`.
> + */
> +enum start_bg_result {
> +	/* child process is "ready" */
> +	SBGR_READY = 0,

Clarity nit, whenever I see an explicit enum assignment I start hunting
for things that depend on the specific value, as we sometimes do, but
there appears to be nothing...

> +	/* child process could not be started */
> +	SBGR_ERROR,
> +
> +	/* callback error when testing for "ready" */
> +	SBGR_CB_ERROR,
> +
> +	/* timeout expired waiting for child to become "ready" */
> +	SBGR_TIMEOUT,
> +
> +	/* child process exited or was signalled before becomming "ready" */
> +	SBGR_DIED,

...I'd think if we were tweaking the value we'd want to put all the
error values at < 0, but I'm fine with this pattern of them being
positive, it encourages users to use switch/case & compiler checks, and
since it's all new users...

> + * This is a combination of `start_command()` and `finish_command()`, but

Doc nit: I think with whatever syntax standard we use for /** comments
that we prefer functions like_this() not quoted `like_this()`, that
being for values like `123` or whatever. But I'm just going by a quick
grep there & what Emacs is highlighting.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 16, 2021

User Jeff Hostetler <git@jeffhostetler.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 16, 2021

This patch series was integrated into seen via git@bc456a3.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 17, 2021

This patch series was integrated into seen via git@c69c7eb.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 17, 2021

This patch series was integrated into seen via git@8618c90.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 17, 2021

There was a status update in the "New Topics" section about the branch jh/builtin-fsmonitor-part1 on the Git mailing list:

Built-in fsmonitor (part 1).

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 18, 2021

User Carlo Arenas <carenas@gmail.com> has been added to the cc: list.

Create "child_ready" event to capture the state of a child process
created in the background.

When a child command is started a "child_start" event is generated in
the Trace2 log.  For normal synchronous children, a "child_exit" event
is later generated when the child exits or is terminated.  The two events
include information, such as the "child_id" and "pid", to allow post
analysis to match-up the command line and exit status.

When a child is started in the background (and may outlive the parent
process), it is not possible for the parent to emit a "child_exit"
event.  Create a new "child_ready" event to indicate whether the
child was successfully started.  Also include the "child_id" and "pid"
to allow similar post processing.

This will be used in a later commit with the new "start_bg_command()".

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Add `command_len` argument to the Simple IPC API.

In my original Simple IPC API, I assumed that the request would always
be a null-terminated string of text characters.  The `command`
argument was just a `const char *`.

I found a caller that would like to pass a binary command to the
daemon, so I am amending the Simple IPC API to receive `const char
*command, size_t command_len` arguments.

I considered changing the `command` argument to be a `void *`, but the
IPC layer simply passes it to the pkt-line layer which takes a `const
char *`, so to avoid confusion I left it as is.

Note, the response side has always been a `struct strbuf` which
includes the buffer and length, so we already support returning a
binary answer.  (Yes, it feels a little weird returning a binary
buffer in a `strbuf`, but it works.)

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
From: Carlo Marcelo Arenas Belón <carenas@gmail.com>

Move the declartion of the `enum ipc_active_state` type outside of
the SUPPORTS_SIMPLE_IPC ifdef.

A later commit will introduce the `fsmonitor_ipc__*()` API and stub in
a "mock" implementation that requires this enum in some function
signatures.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
@gitgitgadget
Copy link

gitgitgadget bot commented Sep 29, 2021

This patch series was integrated into seen via git@f4843ee.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 29, 2021

This patch series was integrated into seen via git@bfa6279.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 30, 2021

This patch series was integrated into seen via git@4397e7a.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 30, 2021

This patch series was integrated into seen via git@f26331a.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 1, 2021

There was a status update in the "Cooking" section about the branch jh/builtin-fsmonitor-part1 on the Git mailing list:

Built-in fsmonitor (part 1).

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 1, 2021

This patch series was integrated into seen via git@8fb49be.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 2, 2021

This patch series was integrated into seen via git@a4a84b8.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 2, 2021

This patch series was integrated into seen via git@43b0645.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 4, 2021

This patch series was integrated into seen via git@e5aaca7.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 4, 2021

There was a status update in the "Cooking" section about the branch jh/builtin-fsmonitor-part1 on the Git mailing list:

Built-in fsmonitor (part 1).

Will merge to 'next'.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 6, 2021

This patch series was integrated into seen via git@b8c52be.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 6, 2021

This patch series was integrated into next via git@021f633.

@gitgitgadget gitgitgadget bot added the next label Oct 6, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Oct 7, 2021

There was a status update in the "Cooking" section about the branch jh/builtin-fsmonitor-part1 on the Git mailing list:

Built-in fsmonitor (part 1).

Will merge to 'master'.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 7, 2021

This patch series was integrated into seen via git@cfc0578.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 8, 2021

This patch series was integrated into seen via git@b35b679.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 11, 2021

This patch series was integrated into seen via git@e4998dc.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 12, 2021

There was a status update in the "Cooking" section about the branch jh/builtin-fsmonitor-part1 on the Git mailing list:

Built-in fsmonitor (part 1).

Will merge to 'master'.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 12, 2021

This patch series was integrated into seen via git@07994bd.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 14, 2021

This patch series was integrated into seen via git@af303ee.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 14, 2021

This patch series was integrated into next via git@af303ee.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 14, 2021

This patch series was integrated into master via git@af303ee.

@gitgitgadget gitgitgadget bot added the master label Oct 14, 2021
@gitgitgadget gitgitgadget bot closed this Oct 14, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Oct 14, 2021

Closed via af303ee.

@@ -9,6 +9,7 @@
#include "parse-options.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Adam Dinwoodie wrote (reply to this):

On Monday 20 September 2021 at 03:36 pm +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhost@microsoft.com>
> 
> Convert test helper to use `start_bg_command()` when spawning a server
> daemon in the background rather than blocks of platform-specific code.
> 
> Also, while here, remove _() translation around error messages since
> this is a test helper and not Git code.

As part of testing the v2.34.0-rc0 release candidate on Cygwin, I've
found this commit -- 05881a6fc9 (t/helper/simple-ipc: convert
test-simple-ipc to use start_bg_command, 2021-09-20), according to my
bisecting -- is causing t0052.1 to fail on 32-bit Cygwin.  Somehow this
is only affecting the 32-bit Cygwin build; the 64-bit build is working
as expected.

Specifically, the failure I'm seeing is as below:

```
$ GIT_TRACE=1 sh t0052-simple-ipc.sh -vix
trace: built-in: git init '/cygdrive/d/git-32/t/trash directory.t0052-simple-ipc'
Initialized empty Git repository in /cygdrive/d/git-32/t/trash directory.t0052-simple-ipc/.git/
expecting success of 0052.1 'start simple command server': 
	test_atexit stop_simple_IPC_server &&
	test-tool simple-ipc start-daemon --threads=8 &&
	test-tool simple-ipc is-active

++ test_atexit stop_simple_IPC_server
++ test 0 = 0
++ test_atexit_cleanup='{ stop_simple_IPC_server
		} && (exit "$eval_ret"); eval_ret=$?; :'
++ test-tool simple-ipc start-daemon --threads=8
trace: run_command: test-tool simple-ipc run-daemon --name=ipc-test --threads=8
error: daemon failed to start
error: last command exited with $?=1
not ok 1 - start simple command server
#	
#		test_atexit stop_simple_IPC_server &&
#		test-tool simple-ipc start-daemon --threads=8 &&
#		test-tool simple-ipc is-active
#	
++ stop_simple_IPC_server
++ test-tool simple-ipc stop-daemon
++ exit 1
++ eval_ret=1
++ :
```

I've had a look at the code changes, and cannot work out what might be
being handled differently in 32-bit and 64-bit Cygwin environments.
Given the Cygwin project is considering dropping support for 32-bit
Cygwin anyway, it might not be worth doing anything about this.  But I
thought it worth reporting in case there's something obvious to folk
more familiar with this code.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ramsay Jones wrote (reply to this):

Hi Adam,

On 04/11/2021 19:46, Adam Dinwoodie wrote:
> On Monday 20 September 2021 at 03:36 pm +0000, Jeff Hostetler via GitGitGadget wrote:
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Convert test helper to use `start_bg_command()` when spawning a server
>> daemon in the background rather than blocks of platform-specific code.
>>
>> Also, while here, remove _() translation around error messages since
>> this is a test helper and not Git code.
> 
> As part of testing the v2.34.0-rc0 release candidate on Cygwin, I've
> found this commit -- 05881a6fc9 (t/helper/simple-ipc: convert
> test-simple-ipc to use start_bg_command, 2021-09-20), according to my
> bisecting -- is causing t0052.1 to fail on 32-bit Cygwin.  Somehow this
> is only affecting the 32-bit Cygwin build; the 64-bit build is working
> as expected.

Hmmm, I am seeing exactly the same, but on 64-bit cygwin!

I haven't found time to look at this in detail yet (except for
what you have already done). Unfortunately, about an hour ago,
I did a 'make test' for the '-rc1' build, so I won't be able to
take a look for hours yet, ... :(

ATB,
Ramsay Jones

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff Hostetler wrote (reply to this):



On 11/4/21 3:46 PM, Adam Dinwoodie wrote:
> On Monday 20 September 2021 at 03:36 pm +0000, Jeff Hostetler via GitGitGadget wrote:
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Convert test helper to use `start_bg_command()` when spawning a server
>> daemon in the background rather than blocks of platform-specific code.
>>
>> Also, while here, remove _() translation around error messages since
>> this is a test helper and not Git code.
> 
> As part of testing the v2.34.0-rc0 release candidate on Cygwin, I've
> found this commit -- 05881a6fc9 (t/helper/simple-ipc: convert
> test-simple-ipc to use start_bg_command, 2021-09-20), according to my
> bisecting -- is causing t0052.1 to fail on 32-bit Cygwin.  Somehow this
> is only affecting the 32-bit Cygwin build; the 64-bit build is working
> as expected.
> 
> Specifically, the failure I'm seeing is as below:
> 
> ```
> $ GIT_TRACE=1 sh t0052-simple-ipc.sh -vix
> trace: built-in: git init '/cygdrive/d/git-32/t/trash directory.t0052-simple-ipc'
> Initialized empty Git repository in /cygdrive/d/git-32/t/trash directory.t0052-simple-ipc/.git/
> expecting success of 0052.1 'start simple command server':
> 	test_atexit stop_simple_IPC_server &&
> 	test-tool simple-ipc start-daemon --threads=8 &&
> 	test-tool simple-ipc is-active
> 
> ++ test_atexit stop_simple_IPC_server
> ++ test 0 = 0
> ++ test_atexit_cleanup='{ stop_simple_IPC_server
> 		} && (exit "$eval_ret"); eval_ret=$?; :'
> ++ test-tool simple-ipc start-daemon --threads=8
> trace: run_command: test-tool simple-ipc run-daemon --name=ipc-test --threads=8
> error: daemon failed to start
> error: last command exited with $?=1
> not ok 1 - start simple command server
> #	
> #		test_atexit stop_simple_IPC_server &&
> #		test-tool simple-ipc start-daemon --threads=8 &&
> #		test-tool simple-ipc is-active
> #	
> ++ stop_simple_IPC_server
> ++ test-tool simple-ipc stop-daemon
> ++ exit 1
> ++ eval_ret=1
> ++ :
> ```
> 
> I've had a look at the code changes, and cannot work out what might be
> being handled differently in 32-bit and 64-bit Cygwin environments.
> Given the Cygwin project is considering dropping support for 32-bit
> Cygwin anyway, it might not be worth doing anything about this.  But I
> thought it worth reporting in case there's something obvious to folk
> more familiar with this code.
> 

How odd!  Thanks for the report.  I'll investigate.
Jeff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Adam,

On Thu, 4 Nov 2021, Adam Dinwoodie wrote:

> On Monday 20 September 2021 at 03:36 pm +0000, Jeff Hostetler via GitGitGadget wrote:
> > From: Jeff Hostetler <jeffhost@microsoft.com>
> >
> > Convert test helper to use `start_bg_command()` when spawning a server
> > daemon in the background rather than blocks of platform-specific code.
> >
> > Also, while here, remove _() translation around error messages since
> > this is a test helper and not Git code.
>
> As part of testing the v2.34.0-rc0 release candidate on Cygwin, I've
> found this commit -- 05881a6fc9 (t/helper/simple-ipc: convert
> test-simple-ipc to use start_bg_command, 2021-09-20), according to my
> bisecting -- is causing t0052.1 to fail on 32-bit Cygwin.  Somehow this
> is only affecting the 32-bit Cygwin build; the 64-bit build is working
> as expected.
>
> Specifically, the failure I'm seeing is as below:
>
> ```
> $ GIT_TRACE=1 sh t0052-simple-ipc.sh -vix
> trace: built-in: git init '/cygdrive/d/git-32/t/trash directory.t0052-simple-ipc'
> Initialized empty Git repository in /cygdrive/d/git-32/t/trash directory.t0052-simple-ipc/.git/
> expecting success of 0052.1 'start simple command server':
> 	test_atexit stop_simple_IPC_server &&
> 	test-tool simple-ipc start-daemon --threads=8 &&
> 	test-tool simple-ipc is-active
>
> ++ test_atexit stop_simple_IPC_server
> ++ test 0 = 0
> ++ test_atexit_cleanup='{ stop_simple_IPC_server
> 		} && (exit "$eval_ret"); eval_ret=$?; :'
> ++ test-tool simple-ipc start-daemon --threads=8
> trace: run_command: test-tool simple-ipc run-daemon --name=ipc-test --threads=8
> error: daemon failed to start
> error: last command exited with $?=1
> not ok 1 - start simple command server
> #
> #		test_atexit stop_simple_IPC_server &&
> #		test-tool simple-ipc start-daemon --threads=8 &&
> #		test-tool simple-ipc is-active
> #
> ++ stop_simple_IPC_server
> ++ test-tool simple-ipc stop-daemon
> ++ exit 1
> ++ eval_ret=1
> ++ :
> ```
>
> I've had a look at the code changes, and cannot work out what might be
> being handled differently in 32-bit and 64-bit Cygwin environments.
> Given the Cygwin project is considering dropping support for 32-bit
> Cygwin anyway, it might not be worth doing anything about this.  But I
> thought it worth reporting in case there's something obvious to folk
> more familiar with this code.

I had a look at this and could reproduce... partially. I only managed to
make it fail every once in a while.

Digging deeper, it turns out that the `lstat()` call in
`ipc_get_active_state()` does not receive an `st_mode` indicating a
socket, but rather a file (in my tests, it was usually 0100644, but
sometimes even 0100755).

The reason is, of course, that Cygwin _emulates_ Unix sockets. What is in
the file system is just a special file, it is marked with the `system` bit
(which only exists on Windows), and its contents start with the tell-tale
`!<socket>`.

And as you might have guessed, there is a race going on between Cygwin
writing that file _and_ flipping that `system` bit, and Git trying to
access the Unix socket and encountering an unexpected file.

Now, why this only happens in your 32-bit setup, I have no idea.

In my tests, the following patch works around the issue. Could I ask you
to test it in your environment?

-- snip --
diff --git a/compat/simple-ipc/ipc-unix-socket.c
b/compat/simple-ipc/ipc-unix-socket.c
index 4e28857a0a..1c591b2adf 100644
--- a/compat/simple-ipc/ipc-unix-socket.c
+++ b/compat/simple-ipc/ipc-unix-socket.c
@@ -36,6 +36,23 @@ enum ipc_active_state ipc_get_active_state(const char
*path)
 	}

 	/* also complain if a plain file is in the way */
+#ifdef __CYGWIN__
+	{
+		static const int delay[] = { 1, 10, 20, 40, -1 };
+		int i;
+
+		for (i = 0; S_ISREG(st.st_mode) && delay[i] > 0; i++) {
+			/*
+			 * Cygwin might still be in the process of marking the
+			 * underlying file as a system file.
+			 */
+			sleep_millisec(delay[i]);
+			if (lstat(path, &st) == -1)
+				return IPC_STATE__INVALID_PATH;
+		}
+	}
+#endif
+
 	if ((st.st_mode & S_IFMT) != S_IFSOCK)
 		return IPC_STATE__INVALID_PATH;

-- snap --

FWIW it looks as if the loop might be a bit of an overkill, as I could not
get the code to need more than a single one-millisecond sleep, but it's
probably safer to just keep the delay loop in place as-is.

Ciao,
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ramsay Jones wrote (reply to this):



On 08/11/2021 23:59, Johannes Schindelin wrote:
[snip]
> I had a look at this and could reproduce... partially. I only managed to
> make it fail every once in a while.
> 
> Digging deeper, it turns out that the `lstat()` call in
> `ipc_get_active_state()` does not receive an `st_mode` indicating a
> socket, but rather a file (in my tests, it was usually 0100644, but
> sometimes even 0100755).
> 
> The reason is, of course, that Cygwin _emulates_ Unix sockets. What is in
> the file system is just a special file, it is marked with the `system` bit
> (which only exists on Windows), and its contents start with the tell-tale
> `!<socket>`.
> 
> And as you might have guessed, there is a race going on between Cygwin
> writing that file _and_ flipping that `system` bit, and Git trying to
> access the Unix socket and encountering an unexpected file.
> 
> Now, why this only happens in your 32-bit setup, I have no idea.
> 
> In my tests, the following patch works around the issue. Could I ask you
> to test it in your environment?

Just FYI, I just tried the patch below (on 64-bit cygwin) and this test
now works fine for me. (well, run 5 times by hand - not with --stress).

This is on windows 10 21H1 and cygwin:

  $ uname -a
  CYGWIN_NT-10.0 satellite 3.3.2(0.341/5/3) 2021-11-08 16:55 x86_64 Cygwin
  $ 

[Yes, I updated last night!]

ATB,
Ramsay Jones

> -- snip --
> diff --git a/compat/simple-ipc/ipc-unix-socket.c
> b/compat/simple-ipc/ipc-unix-socket.c
> index 4e28857a0a..1c591b2adf 100644
> --- a/compat/simple-ipc/ipc-unix-socket.c
> +++ b/compat/simple-ipc/ipc-unix-socket.c
> @@ -36,6 +36,23 @@ enum ipc_active_state ipc_get_active_state(const char
> *path)
>  	}
> 
>  	/* also complain if a plain file is in the way */
> +#ifdef __CYGWIN__
> +	{
> +		static const int delay[] = { 1, 10, 20, 40, -1 };
> +		int i;
> +
> +		for (i = 0; S_ISREG(st.st_mode) && delay[i] > 0; i++) {
> +			/*
> +			 * Cygwin might still be in the process of marking the
> +			 * underlying file as a system file.
> +			 */
> +			sleep_millisec(delay[i]);
> +			if (lstat(path, &st) == -1)
> +				return IPC_STATE__INVALID_PATH;
> +		}
> +	}
> +#endif
> +
>  	if ((st.st_mode & S_IFMT) != S_IFSOCK)
>  		return IPC_STATE__INVALID_PATH;
> 
> -- snap --
> 
> FWIW it looks as if the loop might be a bit of an overkill, as I could not
> get the code to need more than a single one-millisecond sleep, but it's
> probably safer to just keep the delay loop in place as-is.
> 
> Ciao,
> Dscho
> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ramsay Jones wrote (reply to this):



On 10/11/2021 12:27, Johannes Schindelin wrote:
[snip]
>> I cannot give the change any better test than they can, and it is
>> their platform to improve, or break by accident while trying to do
>> so.
> 
> Right. I tested this as well as I could, via the `--stress` option, and am
> fairly confident that it is correct. Since the patch touches only
> `simply-ipc` code, the only test that could possibly affected is t0052,
> and it passes with `--stress` over here (when it failed without the
> patch).
> 
> Ciao,
> Dscho
> 
> P.S.: in case you wondered, no, I did not run the entire test suite. With
> the performance characteristics of the POSIX emulation provided by the
> Cygwin runtime, this would simply take too long. It's not the first time I
> wish our test suite was more efficient, across _all_ supported platforms.


[I seem to have lost Adam's reply about this now being Hunky-Dory, but ...]

I ran the test-suite on -rc2 on thursday night (note _not_ rc2 + this patch)
and it deadlocked on me; I didn't notice for 4 hours, so (in the early hours)
I simply Ctl+C-ed it and went to bed. I haven't had the test-suite deadlock
for many many years - I've been spoilt! ;-)

I tried -rc2 again last night; this time it finished, but I gained another
test failure: t0301-credential-cache.sh. I have _never_ had this test fail
before, so that was unexpected. :(

[Yes, t0052-simple-ipc.sh failed as expected, since this patch was not
applied].

Also, I was half expecting a small speed-up due to the new pipe code in
v3.3.2 of the cygwin dll, but it actually took an hour longer than normal. :(

The only change to my setup, between -rc1 and -rc2, was the cygwin update
to v3.3.2, so this may point to some more fallout from the new pipe code
(maybe?).

Anyway, I haven't even looked at the new failure (see below), which we will
probably not have time to fix before release, so I am just now building
current master (v2.34.0-rc2-16-g5a73c6bdc7) to give that a try. (So, I won't
have anything to report until tomorrow).

Just FYI:

  $ ./t0301-credential-cache.sh
  ...
  not ok 13 - socket defaults to ~/.cache/git/credential/socket
  #
  #               test_when_finished "
  #                       git credential-cache exit &&
  #                       rmdir -p .cache/git/credential/
  #               " &&
  #               test_path_is_missing "$HOME/.git-credential-cache" &&
  #               test_path_is_socket "$HOME/.cache/git/credential/socket"
  #
  ...
  not ok 26 - use custom XDG_CACHE_HOME if set and default sockets are not created
  #
  #               test_when_finished "git credential-cache exit" &&
  #               test_path_is_socket "$XDG_CACHE_HOME/git/credential/socket" &&
  #               test_path_is_missing "$HOME/.git-credential-cache/socket" &&
  #               test_path_is_missing "$HOME/.cache/git/credential/socket"
  #
  not ok 27 - credential-cache --socket option overrides default location
  #
  #               test_when_finished "
  #                       git credential-cache exit --socket \"\$HOME/dir/socket\" &&
  #                       rmdir \"\$HOME/dir\"
  #               " &&
  #               check approve "cache --socket \"\$HOME/dir/socket\"" <<-\EOF &&
  #               protocol=https
  #               host=example.com
  #               username=store-user
  #               password=store-pass
  #               EOF
  #               test_path_is_socket "$HOME/dir/socket"
  #
  not ok 28 - use custom XDG_CACHE_HOME even if xdg socket exists
  #
  #               test_when_finished "
  #                       git credential-cache exit &&
  #                       sane_unset XDG_CACHE_HOME
  #               " &&
  #               check approve cache <<-\EOF &&
  #               protocol=https
  #               host=example.com
  #               username=store-user
  #               password=store-pass
  #               EOF
  #               test_path_is_socket "$HOME/.cache/git/credential/socket" &&
  #               XDG_CACHE_HOME="$HOME/xdg" &&
  #               export XDG_CACHE_HOME &&
  #               check approve cache <<-\EOF &&
  #               protocol=https
  #               host=example.com
  #               username=store-user
  #               password=store-pass
  #               EOF
  #               test_path_is_socket "$XDG_CACHE_HOME/git/credential/socket"
  #
  not ok 29 - use user socket if user directory exists
  #
  #               test_when_finished "
  #                       git credential-cache exit &&
  #                       rmdir \"\$HOME/.git-credential-cache/\"
  #               " &&
  #               mkdir -p "$HOME/.git-credential-cache/" &&
  #               chmod 700 "$HOME/.git-credential-cache/" &&
  #               check approve cache <<-\EOF &&
  #               protocol=https
  #               host=example.com
  #               username=store-user
  #               password=store-pass
  #               EOF
  #               test_path_is_socket "$HOME/.git-credential-cache/socket"
  #
  not ok 30 - use user socket if user directory is a symlink to a directory
  #
  #               test_when_finished "
  #                       git credential-cache exit &&
  #                       rmdir \"\$HOME/dir/\" &&
  #                       rm \"\$HOME/.git-credential-cache\"
  #               " &&
  #               mkdir -p -m 700 "$HOME/dir/" &&
  #               ln -s "$HOME/dir" "$HOME/.git-credential-cache" &&
  #               check approve cache <<-\EOF &&
  #               protocol=https
  #               host=example.com
  #               username=store-user
  #               password=store-pass
  #               EOF
  #               test_path_is_socket "$HOME/.git-credential-cache/socket"
  #
  ok 31 - helper (cache --timeout=1) times out
  # failed 6 among 31 test(s)
  1..31
  $
  
ATB,
Ramsay Jones

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ramsay Jones wrote (reply to this):



On 13/11/2021 19:11, Ramsay Jones wrote:
[snip]
> The only change to my setup, between -rc1 and -rc2, was the cygwin update
> to v3.3.2, so this may point to some more fallout from the new pipe code
> (maybe?).
> 
> Anyway, I haven't even looked at the new failure (see below), which we will
> probably not have time to fix before release, so I am just now building
> current master (v2.34.0-rc2-16-g5a73c6bdc7) to give that a try. (So, I won't
> have anything to report until tomorrow).

So, current 'master' fixes t0052-simple-ipc.sh, which is good, but the
t0301-credential-cache.sh test is still failing. Also, I can confirm
that cygwin v3.3.2 adds an additional hour to a test-suite run. :(

[ie it now takes 6 hours rather than 5 hours to run - I remember a time
when it used to only take 2 hours; those were the days!]

ATB,
Ramsay Jones


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Adam Dinwoodie wrote (reply to this):

On Sat, 13 Nov 2021 at 19:11, Ramsay Jones <ramsay@ramsayjones.plus.com> wrote:
> On 10/11/2021 12:27, Johannes Schindelin wrote:
> [snip]
> >> I cannot give the change any better test than they can, and it is
> >> their platform to improve, or break by accident while trying to do
> >> so.
> >
> > Right. I tested this as well as I could, via the `--stress` option, and am
> > fairly confident that it is correct. Since the patch touches only
> > `simply-ipc` code, the only test that could possibly affected is t0052,
> > and it passes with `--stress` over here (when it failed without the
> > patch).
> >
> > Ciao,
> > Dscho
> >
> > P.S.: in case you wondered, no, I did not run the entire test suite. With
> > the performance characteristics of the POSIX emulation provided by the
> > Cygwin runtime, this would simply take too long. It's not the first time I
> > wish our test suite was more efficient, across _all_ supported platforms.
>
>
> [I seem to have lost Adam's reply about this now being Hunky-Dory, but ...]
>
> I ran the test-suite on -rc2 on thursday night (note _not_ rc2 + this patch)
> and it deadlocked on me; I didn't notice for 4 hours, so (in the early hours)
> I simply Ctl+C-ed it and went to bed. I haven't had the test-suite deadlock
> for many many years - I've been spoilt! ;-)
>
> I tried -rc2 again last night; this time it finished, but I gained another
> test failure: t0301-credential-cache.sh. I have _never_ had this test fail
> before, so that was unexpected. :(
>
> [Yes, t0052-simple-ipc.sh failed as expected, since this patch was not
> applied].
>
> Also, I was half expecting a small speed-up due to the new pipe code in
> v3.3.2 of the cygwin dll, but it actually took an hour longer than normal. :(
>
> The only change to my setup, between -rc1 and -rc2, was the cygwin update
> to v3.3.2, so this may point to some more fallout from the new pipe code
> (maybe?).
>
> Anyway, I haven't even looked at the new failure (see below), which we will
> probably not have time to fix before release, so I am just now building
> current master (v2.34.0-rc2-16-g5a73c6bdc7) to give that a try. (So, I won't
> have anything to report until tomorrow).

I'm seeing the same failure. It isn't caused by a change in Git --
I've rebuilt and re-run the test on old versions where that test was
passing, and it's now failing -- so this is clearly something in the
Cygwin environment. I've not investigated further, but it's clearly
caused by a Cygwin change rather than a Git change, so I don't think
there's any reason to hold up the Git release.

(I should probably report it on the Cygwin mailing list, but I haven't
got around to that yet...)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi,

On Fri, 12 Nov 2021, Adam Dinwoodie wrote:

> On Fri, 12 Nov 2021 at 16:01, Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Out of curiosity, are the use cases and user base of Cygwin waning,
> > or are there still viable cases where Cygwin is a more preferred
> > solution over WSL (the question is not limited to use of Git)?
>
> No formal research here, just impressions as someone who has used
> Cygwin for a long time and who hangs out on the Cygwin mailing list:
> for a lot of use cases, WSL is at least as good, if not better, than
> Cygwin. There are a few areas where Cygwin is still a better solution,
> though:
>
> - WSL requires essentially installing an entire operating system. Disk
> space is relatively cheap, so that's not nearly the obstacle it used
> to be, but it is an obstacle. This is more relevant for people who
> want to distribute packaged installers to Windows users: most
> non-technical users won't want to get WSL working, but if you've
> written code for *nix and don't want to port it manually to Windows,
> it's relatively straightforward to compile it using Cygwin and bundle
> the cygwin1.dll file with the installer. That'll mostly get your code
> working with a user experience that doesn't differ too much from a
> fully native Windows application. (This is essentially what Git for
> Windows is doing, albeit with an increasingly distant Cygwin fork.)
>
> - There are some functions that Cygwin offers that WSL doesn't. The
> key one for me is the ability to access Windows network file shares,
> which WSL doesn't support (or at least didn't last time I checked). I
> expect some of these gaps will disappear as WSL gets more features,
> but I expect some of them are fairly fundamental restrictions: Cygwin
> applications can have code specifically to handle the fact that
> there's a Windows OS there, so they can -- with care -- interact with
> the Windows OS directly to (say) use Windows file access APIs or the
> Windows clipboard. WSL applications generally don't have that ability;
> if I install something from apt on my Debian WSL installation, it'll
> pull exactly the same binary as if I'd installed it on a normal Debian
> system. I guess in theory people could write code to detect that
> they're running in WSL and handle that specially, in the same way that
> it's normally possible to detect and handle when you're running in a
> VM versus running on bare metal. I expect that'll be much less common,
> though, just as Git has code for handling Cygwin specially but doesn't
> have code for handling Linux-within-WSL specially, even though both
> could be used to access a Git repository stored in the same Windows
> NTFS directory.

I would like to add two additional scenarios where Cygwin needs to be used
instead of WSL:

- In order to install WSL, you need to switch your machine to Developer
  Mode, which requires administrative privileges (which not evey developer
  enjoys, and given recent and not so recent security news, maybe that's a
  good thing, too). Cygwin does not require administrative privileges.

- There is (finally!) a way to run graphical Linux applications in WSL,
  but it requires Windows 11. That excludes many existing Windows users.

So yeah, I think Cygwin is here to stay. Besides, as long as I don't find
any better way to have a POSIX-compliant shell (Git will continue to
depend on one for a long, long time, I expect), Git for Windows will
indirectly _have_ to depend on Cygwin (Git for Windows bundles a Bash
using the MSYS2 runtime, which is a very close fork of the Cygwin
runtime).

Ciao,
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Adam,

On Fri, 12 Nov 2021, Adam Dinwoodie wrote:

> [...] the (lack of) speed of running the Git test suite on Cygwin
> is one of the reasons I run the tests on high-spec Azure VMs rather
> than my own systems. Unfortunately the Cygwin compatibility layer plus
> the overheads of NTFS mean things are unlikely to get significantly
> quicker any time soon, and between WSL and Git for Windows, I expect
> interest in improving Cygwin's performance is going to continue to
> wane.

Well, at least from the Git point of view, there is still _some_ hope. At
the Git Contributor Summit, we talked (very, very briefly) about moving
parts of Git's test infrastructure from shell code to C.

This would definitely help not only save electricity (and we all _do_ have
to get used to the idea that we cannot continue spending as much energy as
we do right now) when running the CI/PR builds, but also accelerate
running the test suite on Cygwin (or for that matter, I suspect HP NonStop
to be helped tremendously, too).

Ciao,
Dscho

@gitgitgadget
Copy link

gitgitgadget bot commented Nov 4, 2021

User Adam Dinwoodie <adam@dinwoodie.org> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Nov 4, 2021

User Ramsay Jones <ramsay@ramsayjones.plus.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Nov 9, 2021

User Johannes Schindelin <Johannes.Schindelin@gmx.de> has been added to the cc: list.

@jeffhostetler jeffhostetler deleted the builtin-fsmonitor-part1 branch June 21, 2023 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant