Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: make Git's GitHub workflow output much more helpful #1117

Closed
wants to merge 12 commits into from

Conversation

dscho
Copy link
Member

@dscho dscho commented Jan 18, 2022

Changes since v2:

  • Logs for successful test cases are no longer shown, which improves the time to load pages (thanks Victoria!).
  • The preamble for each test case is no longer shown twice (thanks Victoria!).
  • We now explicitly mention where the full logs can be found.
  • Some patches were reordered to make the story line of this patch series more coherent.
  • Rebased onto main due resolve merge conflicts with ab/test-tap-fix-for-immediate.

I cannot thank Victoria enough for the thorough investigation; It was exactly what I had hoped for, and if I had not been pulled into too many directions at once, I would have incorporated her suggestions and provided a new iteration much earlier.

It might not be all bad that this iteration had to wait a little longer, though: In the meantime, the errors on the summary page are now deep-linked into the part of the logs where the corresponding error message was generated (just click on the job name above the error message).

Note: I tried to add another patch that would turn GCC's compile errors into GitHub workflow commands that would list the error messages on the summary page. However, that would have required piping the output of make through a sed call, which in turn would have required set -o pipefail (which is not supported by all the shells that are used in our CI). I even dabbled with using process substitution, but that made things even worse: the sed process would continue outputting after make was finished and after the ::endgroup:: command, meaning that the output was garbled. I'll probably continue investigating at some stage, but for now I'll call my time-boxed experiment a wash.

Changes since v1:

  • In the patch that removed MAKE_TARGETS, a stale comment about that variable is also removed.
  • The comment about set -x has been adjusted because it no longer applies as-is.
  • The commit message of "ci: make it easier to find failed tests' logs in the GitHub workflow" has been adjusted to motivate the improvement better.

Background

Using CI and in general making it easier for new contributors is an area I'm passionate about, and one I'd like to see improved.

The current situation

Let me walk you through the current experience when a PR build fails: I get a notification mail that only says that a certain job failed. There's no indication of which test failed (or was it the build?). I can click on a link at it takes me to the workflow run. Once there, all it says is "Process completed with exit code 1", or even "code 2". Sure, I can click on one of the failed jobs. It even expands the failed step's log (collapsing the other steps). And what do I see there?

Let's look at an example of a failed linux-clang (ubuntu-latest) job:

[...]
Test Summary Report
-------------------
t1092-sparse-checkout-compatibility.sh           (Wstat: 256 Tests: 53 Failed: 1)
  Failed test:  49
  Non-zero exit status: 1
t3701-add-interactive.sh                         (Wstat: 0 Tests: 71 Failed: 0)
  TODO passed:   45, 47
Files=957, Tests=25489, 645 wallclock secs ( 5.74 usr  1.56 sys + 866.28 cusr 364.34 csys = 1237.92 CPU)
Result: FAIL
make[1]: *** [Makefile:53: prove] Error 1
make[1]: Leaving directory '/home/runner/work/git/git/t'
make: *** [Makefile:3018: test] Error 2

That's it. I count myself lucky not to be a new contributor being faced with something like this.

Now, since I am active in the Git project for a couple of days or so, I can make sense of the "TODO passed" label and know that for the purpose of fixing the build failures, I need to ignore this, and that I need to focus on the "Failed test" part instead.

I also know that I do not have to get myself an ubuntu-latest box just to reproduce the error, I do not even have to check out the code and run it just to learn what that "49" means.

I know, and I do not expect any new contributor, not even most seasoned contributors to know, that I have to patiently collapse the "Run ci/run-build-and-tests.sh" job's log, and instead expand the "Run ci/print-test-failures.sh" job log (which did not fail and hence does not draw any attention to it).

I know, and again: I do not expect many others to know this, that I then have to click into the "Search logs" box (not the regular web browser's search via Ctrl+F!) and type in "not ok" to find the log of the failed test case (and this might still be a "known broken" one that is marked via test_expect_failure and once again needs to be ignored).

To be excessively clear: This is not a great experience!

Improved output

Our previous Azure Pipelines-based CI builds had a much nicer UI, one that even showed flaky tests, and trends e.g. how long the test cases ran. When I ported Git's CI over to GitHub workflows (to make CI more accessible to new contributors), I knew fully well that we would leave this very nice UI behind, and I had hoped that we would get something similar back via new, community-contributed GitHub Actions that can be used in GitHub workflows. However, most likely because we use a home-grown test framework implemented in opinionated POSIX shells scripts, that did not happen.

So I had a look at what standards exist e.g. when testing PowerShell modules, in the way of marking up their test output in GitHub workflows, and I was not disappointed: GitHub workflows support "grouping" of output lines, i.e. marking sections of the output as a group that is then collapsed by default and can be expanded. And it is this feature I've decided to use in this patch series, along with GitHub workflows' commands to display errors or notices that are also shown on the summary page of the workflow run. Now, in addition to "Process completed with exit code" on the summary page, we also read something like:

⊗ linux-clang (ubuntu-latest)
   failed: t3400.22 rebase --apply -q is quiet

Even better, this message is a link, and following that, the reader is presented with something like this:

[...]
=== Failed test: t3420-rebase-autostash ===
The full logs are in the artifacts attached to this run.
Error: failed: t3420.12 rebase --apply: --quit
⏵ failure: t3420.12 rebase --apply: --quit 
Error: failed: t3420.13 rebase --apply: non-conflicting rebase, conflicting stash
⏵ failure: t3420.13 rebase --apply: non-conflicting rebase, conflicting stash 
Error: failed: t3420.14 rebase --apply: check output with conflicting stash
⏵ failure: t3420.14 rebase --apply: check output with conflicting stash 
Error: failed: t3420.23 rebase --merge: --quit
⏵ failure: t3420.23 rebase --merge: --quit 
Error: failed: t3420.24 rebase --merge: non-conflicting rebase, conflicting stash
⏵ failure: t3420.24 rebase --merge: non-conflicting rebase, conflicting stash 
Error: failed: t3420.25 rebase --merge: check output with conflicting stash
⏵ failure: t3420.25 rebase --merge: check output with conflicting stash 
Error: failed: t3420.34 rebase --interactive: --quit
⏵ failure: t3420.34 rebase --interactive: --quit 
Error: failed: t3420.35 rebase --interactive: non-conflicting rebase, conflicting stash
⏵ failure: t3420.35 rebase --interactive: non-conflicting rebase, conflicting stash 
Error: failed: t3420.36 rebase --interactive: check output with conflicting stash
⏵ failure: t3420.36 rebase --interactive: check output with conflicting stash 
Error: failed: t3420.39 autostash is saved on editor failure with conflict
⏵ failure: t3420.39 autostash is saved on editor failure with conflict 
[...]

The "Failed test:" lines are colored in yellow to give a better visual clue about the logs' structure, the "Error:" label is colored in red to draw the attention to the important part of the log, and the "⏵" characters indicate that part of the log is collapsed and can be expanded by clicking on it.

To drill down, the reader merely needs to expand the test case's log by clicking on it, and then study the log. If needed (e.g. when the test case relies on side effects from previous test cases), the logs of preceding test cases can be expanded as well. In case the full log is needed, including the successful test cases, they are included in the artifacts that are attached to the CI/PR run.

Is this the best UI we can have for test failures in CI runs? I hope we can do better. Having said that, this patch series presents a pretty good start, and offers a basis for future improvements.

cc: Eric Sunshine sunshine@sunshineco.com
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Phillip Wood phillip.wood123@gmail.com
cc: Victoria Dye vdye@github.com

@dscho dscho self-assigned this Jan 18, 2022
@dscho
Copy link
Member Author

dscho commented Jan 20, 2022

/preview

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 20, 2022

Preview email sent as pull.1117.git.1642696986.gitgitgadget@gmail.com

@dscho
Copy link
Member Author

dscho commented Jan 24, 2022

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 24, 2022

Submitted as pull.1117.git.1643050574.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1117/dscho/use-grouping-in-ci-v1

To fetch this version to local tag pr-1117/dscho/use-grouping-in-ci-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1117/dscho/use-grouping-in-ci-v1

@@ -10,7 +10,7 @@ windows*) cmd //c mklink //j t\\.prove "$(cygpath -aw "$cache_dir/.prove")";;
*) ln -s "$cache_dir/.prove" t/.prove;;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Mon, Jan 24, 2022 at 3:02 PM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> In the web UI of GitHub workflows, failed runs are presented with the
> job step that failed auto-expanded. In the current setup, this is not
> helpful at all because that shows only the output of `prove`, which says
> which test failed, but not in what way.
>
> What would help understand the reader what went wrong is the verbose
> test output of the failed test.
>
> The logs of the failed runs do contain that verbose test output, but it
> is shown in the _next_ step (which is marked as succeeding, and is
> therefore _not_ auto-expanded). Anyone not intimately familiar with this
> would completely miss the verbose test output, being left mostly
> puzzled with the test failures.
>
> We are about to show the failed test cases' output in the _same_ step,
> so that the user has a much easier time to figure out what was going
> wrong.
>
> But first, we must partially revert the change that tried to improve the
> CI runs by combining the `Makefile` targets to build into a single
> `make` invocation. That might have sounded like a good idea at the time,
> but it does make it rather impossible for the CI script to determine
> whether the _build_ failed, or the _tests_. If the tests were run at
> all, that is.
>
> So let's go back to calling `make` for the build, and call `make test`
> separately so that we can easily detect that _that_ invocation failed,
> and react appropriately.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
> @@ -10,7 +10,7 @@ windows*) cmd //c mklink //j t\\.prove "$(cygpath -aw "$cache_dir/.prove")";;
> -export MAKE_TARGETS="all test"
> +run_tests=t
>
>  case "$jobname" in
>  linux-gcc)
> @@ -41,14 +41,18 @@ pedantic)
>         # Don't run the tests; we only care about whether Git can be
>         # built.
>         export DEVOPTS=pedantic
> -       export MAKE_TARGETS=all
> +       run_tests=
>         ;;
>  esac
>
>  # Any new "test" targets should not go after this "make", but should
>  # adjust $MAKE_TARGETS. Otherwise compilation-only targets above will
>  # start running tests.
> -make $MAKE_TARGETS

The comment talking about MAKE_TARGETS seems out of date now that
MAKE_TARGETS has been removed from this script.

> +make
> +if test -n "$run_tests"
> +then
> +       make test
> +fi
>  check_unignored_build_artifacts

This changes behavior, doesn't it? Wth the original "make all test",
if the `all` target failed, then the `test` target would not be
invoked. However, with the revised code, `make test` is invoked even
if `make all` fails. Is that behavior change significant? Do we care
about it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Eric,

On Mon, 24 Jan 2022, Eric Sunshine wrote:

> On Mon, Jan 24, 2022 at 3:02 PM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > In the web UI of GitHub workflows, failed runs are presented with the
> > job step that failed auto-expanded. In the current setup, this is not
> > helpful at all because that shows only the output of `prove`, which says
> > which test failed, but not in what way.
> >
> > What would help understand the reader what went wrong is the verbose
> > test output of the failed test.
> >
> > The logs of the failed runs do contain that verbose test output, but it
> > is shown in the _next_ step (which is marked as succeeding, and is
> > therefore _not_ auto-expanded). Anyone not intimately familiar with this
> > would completely miss the verbose test output, being left mostly
> > puzzled with the test failures.
> >
> > We are about to show the failed test cases' output in the _same_ step,
> > so that the user has a much easier time to figure out what was going
> > wrong.
> >
> > But first, we must partially revert the change that tried to improve the
> > CI runs by combining the `Makefile` targets to build into a single
> > `make` invocation. That might have sounded like a good idea at the time,
> > but it does make it rather impossible for the CI script to determine
> > whether the _build_ failed, or the _tests_. If the tests were run at
> > all, that is.
> >
> > So let's go back to calling `make` for the build, and call `make test`
> > separately so that we can easily detect that _that_ invocation failed,
> > and react appropriately.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> > diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
> > @@ -10,7 +10,7 @@ windows*) cmd //c mklink //j t\\.prove "$(cygpath -aw "$cache_dir/.prove")";;
> > -export MAKE_TARGETS="all test"
> > +run_tests=t
> >
> >  case "$jobname" in
> >  linux-gcc)
> > @@ -41,14 +41,18 @@ pedantic)
> >         # Don't run the tests; we only care about whether Git can be
> >         # built.
> >         export DEVOPTS=pedantic
> > -       export MAKE_TARGETS=all
> > +       run_tests=
> >         ;;
> >  esac
> >
> >  # Any new "test" targets should not go after this "make", but should
> >  # adjust $MAKE_TARGETS. Otherwise compilation-only targets above will
> >  # start running tests.
> > -make $MAKE_TARGETS
>
> The comment talking about MAKE_TARGETS seems out of date now that
> MAKE_TARGETS has been removed from this script.

Good catch!

> > +make
> > +if test -n "$run_tests"
> > +then
> > +       make test
> > +fi
> >  check_unignored_build_artifacts
>
> This changes behavior, doesn't it? Wth the original "make all test",
> if the `all` target failed, then the `test` target would not be
> invoked. However, with the revised code, `make test` is invoked even
> if `make all` fails. Is that behavior change significant? Do we care
> about it?

That is actually not the case. Compare to what 25715419bf4 (CI: don't run
"make test" twice in one job, 2021-11-23) did: it removed code that _also_
did not specifically prevent `make test` from running when `make all`
failed.

The clue to the riddle is this line in `ci/lib.sh`:

	set -ex

The `-e` part lets the script fail whenever any command fails (unless it
is part of an `if`/`while` condition, or properly chained with `||`).

This line is actually touched by the "ci/run-build-and-tests: add some
structure to the GitHub workflow output" patch in this patch series, which
breaks it apart into the `set -e` and the `set -x` part (so that the
latter can be called later in GitHub workflows, to unclutter the output a
bit).

Ciao,
Dscho

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 25, 2022

User Eric Sunshine <sunshine@sunshineco.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 26, 2022

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Mon, Jan 24 2022, Johannes Schindelin via GitGitGadget wrote:

> Background
> ==========
>
> Recent patches intended to help readers figure out CI failures much quicker
> than before. Unfortunately, they haven't been entirely positive for me. For
> example, they broke the branch protections in Microsoft's fork of Git, where
> we require Pull Requests to pass a certain set of Checks (which are
> identified by their names) and therefore caused follow-up work.

This seems to be a reference to my df7375d7728 (CI: use shorter names
that fit in UX tooltips, 2021-11-23) merged as part of ab/ci-updates,
and I understand from this summary that you had some custom job
somewhere that scraped the job names which broke.

That's unfortunate, I do think being able to actually read the tooltips
in the GitHub UI was a worthwhile trade-off in the end though.

But I'm entirely confused about what any of that has to do with this
series, which is about changing how the job output itself is presented
and summarized, and not about the job names, and making them fit in
tooltips.

Later in the summary you note: 

> Using CI and in general making it easier for new contributors is an area I'm
> passionate about, and one I'd like to see improved.
> [...]
> ⊗ linux-gcc (ubuntu-latest)
>    failed: t9800.20 submit from detached head

Which has one of the new and shorter jobnames, but in a part of the UX
where the length didn't matter, and I can't find a way where it does.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 26, 2022

User Ævar Arnfjörð Bjarmason <avarab@gmail.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 27, 2022

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


[CC-ing some people who've been interested in CI architechture]

On Mon, Jan 24 2022, Johannes Schindelin via GitGitGadget wrote:

> [...]
> The current situation
> =====================
>
> Let me walk you through the current experience when a PR build fails: I get
> a notification mail that only says that a certain job failed. There's no
> indication of which test failed (or was it the build?). I can click on a
> link at it takes me to the workflow run. Once there, all it says is "Process
> completed with exit code 1", or even "code 2". Sure, I can click on one of
> the failed jobs. It even expands the failed step's log (collapsing the other
> steps). And what do I see there?
>
> Let's look at an example of a failed linux-clang (ubuntu-latest) job
> [https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true]:
>
> [...]
> Test Summary Report
> -------------------
> t1092-sparse-checkout-compatibility.sh           (Wstat: 256 Tests: 53 Failed: 1)
>   Failed test:  49
>   Non-zero exit status: 1
> t3701-add-interactive.sh                         (Wstat: 0 Tests: 71 Failed: 0)
>   TODO passed:   45, 47
> Files=957, Tests=25489, 645 wallclock secs ( 5.74 usr  1.56 sys + 866.28 cusr 364.34 csys = 1237.92 CPU)
> Result: FAIL
> make[1]: *** [Makefile:53: prove] Error 1
> make[1]: Leaving directory '/home/runner/work/git/git/t'
> make: *** [Makefile:3018: test] Error 2
>

Firstly I very much applaud any effort to move the CI UX forward. I know
we haven't seen eye-to-eye on some of the trade-offs there, but I think
something like this series is a step in the right direction. I.e. trying
harder to summarize the output for the user, and making use of some CI
platform-specific features.

I sent a reply in this thread purely on some implementation concerns
related to that in
https://lore.kernel.org/git/220126.86sftbfjl4.gmgdl@evledraar.gmail.com/,
but let's leave that aside for now...

> [...]
> So I had a look at what standards exist e.g. when testing PowerShell
> modules, in the way of marking up their test output in GitHub workflows, and
> I was not disappointed: GitHub workflows support "grouping" of output lines,
> i.e. marking sections of the output as a group that is then collapsed by
> default and can be expanded. And it is this feature I decided to use in this
> patch series, along with GitHub workflows' commands to display errors or
> notices that are also shown on the summary page of the workflow run. Now, in
> addition to "Process completed with exit code" on the summary page, we also
> read something like:
>
> ⊗ linux-gcc (ubuntu-latest)
>    failed: t9800.20 submit from detached head
>
> Even better, this message is a link, and following that, the reader is
> presented with something like this
> [https://github.com/dscho/git/runs/4840190622?check_suite_focus=true]:

This series is doing several different things, at least:

 1) "Grouping" the ci/ output, i.e. "make" from "make test"
 2) Doing likewise for t/test-lib.sh
 3) In doing that for t/test-lib.sh, also "signalling" the GitHub CI,
    to e.g. get the "submit from detached head" output you quote just
    a few lines above

I'd like to focus on just #1 here.

Where I was going with that in my last CI series was to make a start at
eventually being able to run simply "make" at the top-level
"step". I.e. to have a recipe that looks like:

    - run: make
    - run: make test

I feel strongly that that's where we should be heading, and the #1 part
of this series is basically trying to emulate what you'd get for free if
we simply did that.

I.e. if you run single commands at the "step" level (in GitHub CI
nomenclature) you'll get what you're doing with groupings in this series
for free, and without any special code in ci/*, better yet if you then
do want grouping *within* that step you're free to do so without having
clobbered your one-level of grouping already on distinguishing "make"
from "make test".

IOW our CI now looks like this (pseudocode):

     - job:
       - step1:
         - use ci/lib.sh to set env vars
         - run a script like ci/run-build-and-tests.sh
       - step2:
         - use ci/lib.sh to set env vars
         - run a script like print-test-failures.sh

But should instead look like:

     - job:
       - step1:
         - set variables in $GITHUB_ENV using ci/lib.sh
       - step2:
         - make
       - step3:
         - make test
       - step4:
         - run a script like print-test-failures.sh

Well, we can quibble about "step4", but again, let's focus on #1 here,
that's more #2-#3 territory.

I had some WIP code to do that which I polished up, here's how e.g. a
build failure looks like in your implementation (again, just focusing on
how "make" and "make test" is divided out, not the rest):

    https://github.com/dscho/git/runs/4840190622?check_suite_focus=true#step:4:62

I.e. you've made "build" an expandable group at the same level as a
single failed test, and still all under the opaque
ci/run-build-and-test.sh script.

And here's mine. This is using a semi-recent version of my patches that
happened to have a failure, not quite what I've got now, but close
enough for this E-Mail:

    https://github.com/avar/git/runs/4956260395?check_suite_focus=true#step:7:1

Now, notice two things, one we've made "make" and "make test" top-level
steps, but more importantly if you expand that "make test" step on yours
you'll get the full "make test" output,

And two it's got something you don't have at all, which is that we're
now making use of the GitHub CI feature of having pre-declared an
environment for "make test", which the CI knows about (you need to click
to expand it):

    https://github.com/avar/git/runs/4956260395?check_suite_focus=true#step:7:4

Right now that's something we hardly make use of at all, but with my
changes the environment is the *only* special sauce we specify before
the step, i.e. GIT_PROVE_OPTS=.. DEFAULT_TEST_TARGET=... etc.

I think I've run out of my ML quota for now, but here's the branch that implements it:

    https://github.com/git/git/compare/master...avar:avar/ci-unroll-make-commands-to-ci-recipe

That's "282 additions and 493 deletions.", much of what was required to
do this was to eject the remaining support for the dead Travis and Azure
CI's that we don't run, i.e. to entirely remove any sort of state
management or job control from ci/lib.sh, and have it *only* be tasked
with setting variables for subsequent steps to use.

That makes it much simpler, my end-state of it is ~170 lines v.s. your
~270 (but to be fair some of that's deleted Travis code):

    https://github.com/avar/git/blob/avar/ci-unroll-make-commands-to-ci-recipe/ci/lib.sh
    https://github.com/gitgitgadget/git/blob/pr-1117/dscho/use-grouping-in-ci-v1/ci/lib.sh

And much of the rest is just gone, e.g. ci/run-build-and-tests.sh isn't
there anymore, instead you simply run "make" or "make test" (or the
equivalent on Windows, which also works):

    https://github.com/avar/git/tree/avar/ci-unroll-make-commands-to-ci-recipe/ci
    https://github.com/gitgitgadget/git/tree/pr-1117/dscho/use-grouping-in-ci-v1/ci

Anyway, I hope we can find some sort of joint way forward with this,
because I think your #1 at least is going in the opposite direction we
should be going to achieve much the same ends you'd like to achieve.

We can really just do this in a much simpler way once we stop treating
ci/lib.sh and friends as monolithic ball of mud entry points.

But I'd really like us not to go in this direction of using markup to
"sub-divide" the "steps" within a given job, when we can relatively
easily just ... divide the steps.

As shown above that UI plays much more naturally into the CI's native
features & how it likes to arrange & present things.

And again, all of this is *only* discussing the "step #1" noted
above. Using "grouping" for presenting the test failures themselves or
sending summaries to the CI "Summary" is a different matter.

Thanks!



@gitgitgadget
Copy link

gitgitgadget bot commented Feb 19, 2022

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

I notice that you did not take this into `seen` yet. I find that a little
sad because it would potentially have helped others to figure out the
failure in the latest `seen`:
https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162

Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3.

Ciao,
Dscho


On Mon, 24 Jan 2022, Johannes Schindelin via GitGitGadget wrote:

>
> Background
> ==========
>
> Recent patches intended to help readers figure out CI failures much quicker
> than before. Unfortunately, they haven't been entirely positive for me. For
> example, they broke the branch protections in Microsoft's fork of Git, where
> we require Pull Requests to pass a certain set of Checks (which are
> identified by their names) and therefore caused follow-up work.
>
> Using CI and in general making it easier for new contributors is an area I'm
> passionate about, and one I'd like to see improved.
>
>
> The current situation
> =====================
>
> Let me walk you through the current experience when a PR build fails: I get
> a notification mail that only says that a certain job failed. There's no
> indication of which test failed (or was it the build?). I can click on a
> link at it takes me to the workflow run. Once there, all it says is "Process
> completed with exit code 1", or even "code 2". Sure, I can click on one of
> the failed jobs. It even expands the failed step's log (collapsing the other
> steps). And what do I see there?
>
> Let's look at an example of a failed linux-clang (ubuntu-latest) job
> [https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true]:
>
> [...]
> Test Summary Report
> -------------------
> t1092-sparse-checkout-compatibility.sh           (Wstat: 256 Tests: 53 Failed: 1)
>   Failed test:  49
>   Non-zero exit status: 1
> t3701-add-interactive.sh                         (Wstat: 0 Tests: 71 Failed: 0)
>   TODO passed:   45, 47
> Files=957, Tests=25489, 645 wallclock secs ( 5.74 usr  1.56 sys + 866.28 cusr 364.34 csys = 1237.92 CPU)
> Result: FAIL
> make[1]: *** [Makefile:53: prove] Error 1
> make[1]: Leaving directory '/home/runner/work/git/git/t'
> make: *** [Makefile:3018: test] Error 2
>
>
> That's it. I count myself lucky not to be a new contributor being faced with
> something like this.
>
> Now, since I am active in the Git project for a couple of days or so, I can
> make sense of the "TODO passed" label and know that for the purpose of
> fixing the build failures, I need to ignore this, and that I need to focus
> on the "Failed test" part instead.
>
> I also know that I do not have to get myself an ubuntu-latest box just to
> reproduce the error, I do not even have to check out the code and run it
> just to learn what that "49" means.
>
> I know, and I do not expect any new contributor, not even most seasoned
> contributors to know, that I have to patiently collapse the "Run
> ci/run-build-and-tests.sh" job's log, and instead expand the "Run
> ci/print-test-failures.sh" job log (which did not fail and hence does not
> draw any attention to it).
>
> I know, and again: I do not expect many others to know this, that I then
> have to click into the "Search logs" box (not the regular web browser's
> search via Ctrl+F!) and type in "not ok" to find the log of the failed test
> case (and this might still be a "known broken" one that is marked via
> test_expect_failure and once again needs to be ignored).
>
> To be excessively clear: This is not a great experience!
>
>
> Improved output
> ===============
>
> Our previous Azure Pipelines-based CI builds had a much nicer UI, one that
> even showed flaky tests, and trends e.g. how long the test cases ran. When I
> ported Git's CI over to GitHub workflows (to make CI more accessible to new
> contributors), I knew fully well that we would leave this very nice UI
> behind, and I had hoped that we would get something similar back via new,
> community-contributed GitHub Actions that can be used in GitHub workflows.
> However, most likely because we use a home-grown test framework implemented
> in opinionated POSIX shells scripts, that did not happen.
>
> So I had a look at what standards exist e.g. when testing PowerShell
> modules, in the way of marking up their test output in GitHub workflows, and
> I was not disappointed: GitHub workflows support "grouping" of output lines,
> i.e. marking sections of the output as a group that is then collapsed by
> default and can be expanded. And it is this feature I decided to use in this
> patch series, along with GitHub workflows' commands to display errors or
> notices that are also shown on the summary page of the workflow run. Now, in
> addition to "Process completed with exit code" on the summary page, we also
> read something like:
>
> ⊗ linux-gcc (ubuntu-latest)
>    failed: t9800.20 submit from detached head
>
>
> Even better, this message is a link, and following that, the reader is
> presented with something like this
> [https://github.com/dscho/git/runs/4840190622?check_suite_focus=true]:
>
> ⏵ Run ci/run-build-and-tests.sh
> ⏵ CI setup
>   + ln -s /home/runner/none/.prove t/.prove
>   + run_tests=t
>   + export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
>   + group Build make
>   + set +x
> ⏵ Build
> ⏵ Run tests
>   === Failed test: t9800-git-p4-basic ===
> ⏵ ok: t9800.1 start p4d
> ⏵ ok: t9800.2 add p4 files
> ⏵ ok: t9800.3 basic git p4 clone
> ⏵ ok: t9800.4 depot typo error
> ⏵ ok: t9800.5 git p4 clone @all
> ⏵ ok: t9800.6 git p4 sync uninitialized repo
> ⏵ ok: t9800.7 git p4 sync new branch
> ⏵ ok: t9800.8 clone two dirs
> ⏵ ok: t9800.9 clone two dirs, @all
> ⏵ ok: t9800.10 clone two dirs, @all, conflicting files
> ⏵ ok: t9800.11 clone two dirs, each edited by submit, single git commit
> ⏵ ok: t9800.12 clone using non-numeric revision ranges
> ⏵ ok: t9800.13 clone with date range, excluding some changes
> ⏵ ok: t9800.14 exit when p4 fails to produce marshaled output
> ⏵ ok: t9800.15 exit gracefully for p4 server errors
> ⏵ ok: t9800.16 clone --bare should make a bare repository
> ⏵ ok: t9800.17 initial import time from top change time
> ⏵ ok: t9800.18 unresolvable host in P4PORT should display error
> ⏵ ok: t9800.19 run hook p4-pre-submit before submit
>   Error: failed: t9800.20 submit from detached head
> ⏵ failure: t9800.20 submit from detached head
>   Error: failed: t9800.21 submit from worktree
> ⏵ failure: t9800.21 submit from worktree
>   === Failed test: t9801-git-p4-branch ===
>   [...]
>
>
> The "Failed test:" lines are colored in yellow to give a better visual clue
> about the logs' structure, the "Error:" label is colored in red to draw the
> attention to the important part of the log, and the "⏵" characters indicate
> that part of the log is collapsed and can be expanded by clicking on it.
>
> To drill down, the reader merely needs to expand the (failed) test case's
> log by clicking on it, and then study the log. If needed (e.g. when the test
> case relies on side effects from previous test cases), the logs of preceding
> test cases can be expanded as well. In this example, when expanding
> t9800.20, it looks like this (for ease of reading, I cut a few chunks of
> lines, indicated by "[...]"):
>
> [...]
> ⏵ ok: t9800.19 run hook p4-pre-submit before submit
>   Error: failed: t9800.20 submit from detached head
> ⏷ failure: t9800.20 submit from detached head
>       test_when_finished cleanup_git &&
>       git p4 clone --dest="$git" //depot &&
>         (
>           cd "$git" &&
>           git checkout p4/master &&
>           >detached_head_test &&
>           git add detached_head_test &&
>           git commit -m "add detached_head" &&
>           git config git-p4.skipSubmitEdit true &&
>           git p4 submit &&
>             git p4 rebase &&
>             git log p4/master | grep detached_head
>         )
>     [...]
>     Depot paths: //depot/
>     Import destination: refs/remotes/p4/master
>
>     Importing revision 9 (100%)Perforce db files in '.' will be created if missing...
>     Perforce db files in '.' will be created if missing...
>
>     Traceback (most recent call last):
>       File "/home/runner/work/git/git/git-p4", line 4455, in <module>
>         main()
>       File "/home/runner/work/git/git/git-p4", line 4449, in main
>         if not cmd.run(args):
>       File "/home/runner/work/git/git/git-p4", line 2590, in run
>         rebase.rebase()
>       File "/home/runner/work/git/git/git-p4", line 4121, in rebase
>         if len(read_pipe("git diff-index HEAD --")) > 0:
>       File "/home/runner/work/git/git/git-p4", line 297, in read_pipe
>         retcode, out, err = read_pipe_full(c, *k, **kw)
>       File "/home/runner/work/git/git/git-p4", line 284, in read_pipe_full
>         p = subprocess.Popen(
>       File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
>         self._execute_child(args, executable, preexec_fn, close_fds,
>       File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
>         raise child_exception_type(errno_num, err_msg, err_filename)
>     FileNotFoundError: [Errno 2] No such file or directory: 'git diff-index HEAD --'
>     error: last command exited with $?=1
>     + cleanup_git
>     + retry_until_success rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>     + nr_tries_left=60
>     + rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>     + test_path_is_missing /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>     + test 1 -ne 1
>     + test -e /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>     + retry_until_success mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>     + nr_tries_left=60
>     + mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>     + exit 1
>     + eval_ret=1
>     + :
>     not ok 20 - submit from detached head
>     #
>     #        test_when_finished cleanup_git &&
>     #        git p4 clone --dest="$git" //depot &&
>     #        (
>     #            cd "$git" &&
>     #            git checkout p4/master &&
>     #            >detached_head_test &&
>     #            git add detached_head_test &&
>     #            git commit -m "add detached_head" &&
>     #            git config git-p4.skipSubmitEdit true &&
>     #            git p4 submit &&
>     #            git p4 rebase &&
>     #            git log p4/master | grep detached_head
>     #        )
>     #
>   Error: failed: t9800.21 submit from worktree
>   [...]
>
>
> Is this the best UI we can have for test failures in CI runs? I hope we can
> do better. Having said that, this patch series presents a pretty good start,
> and offers a basis for future improvements.
>
> Johannes Schindelin (9):
>   ci: fix code style
>   ci/run-build-and-tests: take a more high-level view
>   ci: make it easier to find failed tests' logs in the GitHub workflow
>   ci/run-build-and-tests: add some structure to the GitHub workflow
>     output
>   tests: refactor --write-junit-xml code
>   test(junit): avoid line feeds in XML attributes
>   ci: optionally mark up output in the GitHub workflow
>   ci: use `--github-workflow-markup` in the GitHub workflow
>   ci: call `finalize_test_case_output` a little later
>
>  .github/workflows/main.yml           |  12 ---
>  ci/lib.sh                            |  81 ++++++++++++++--
>  ci/run-build-and-tests.sh            |  11 ++-
>  ci/run-test-slice.sh                 |   5 +-
>  t/test-lib-functions.sh              |   4 +-
>  t/test-lib-github-workflow-markup.sh |  50 ++++++++++
>  t/test-lib-junit.sh                  | 132 +++++++++++++++++++++++++++
>  t/test-lib.sh                        | 128 ++++----------------------
>  8 files changed, 287 insertions(+), 136 deletions(-)
>  create mode 100644 t/test-lib-github-workflow-markup.sh
>  create mode 100644 t/test-lib-junit.sh
>
>
> base-commit: af4e5f569bc89f356eb34a9373d7f82aca6faa8a
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1117%2Fdscho%2Fuse-grouping-in-ci-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1117/dscho/use-grouping-in-ci-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1117
> --
> gitgitgadget
>
>

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 20, 2022

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> I notice that you did not take this into `seen` yet. I find that a little
> sad because it would potentially have helped others to figure out the
> failure in the latest `seen`:
> https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162
>
> Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3.

I saw the thread, I saw a few patches were commented on, and a few
were left unanswered, but one was replied by the original submitter
with a "Good catch!", making me expect the topic to be discussed or
rerolled to become ready relatively soon.

But nothing happened, so I even forgot to take a look myself by
picking it up in 'seen'.  It does sound sad that the topic was left
hanging there for 3 weeks or so in that state without any reroll or
response.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 20, 2022

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Sun, Feb 20 2022, Johannes Schindelin wrote:

> Hi Junio,
>
> I notice that you did not take this into `seen` yet. I find that a little
> sad because it would potentially have helped others to figure out the
> failure in the latest `seen`:
> https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162
>
> Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3.

I left some feedback on your submission ~3 weeks ago that you haven't
responded to:
https://lore.kernel.org/git/220127.86ilu5cdnf.gmgdl@evledraar.gmail.com/

I think you should really reply to that before this moves forward,
i.e. it's not trivial concerns. I think to get from our current "X" to
your aims of "Y" your way of doing that (for part of this series) is
really an overly complex way of getting there that we can do much
simpler, and the simpler way integrates much better with the GitHub CI
UI.

The feedback I left is on the part of this that's not directly relevant
to what you're pointing out here (which is the grouping of the per-test
failure output), but if your series is picked-up as-is we'd need to undo
rather big parts of it to get to what I consider a better state for the
"grouping" of the "make" v.s. "make test" etc. output.

I can just submit my version of that & we can hash out what direction
makes sense there, how does that sound? I've been running with it for
about a month, and really think that part of the failure output is much
better.

Here's an example of that part:
https://github.com/avar/git/runs/5259000590?check_suite_focus=true

I.e. note how we'll now just have a "make" and "make test" step, and we
failed there on the "make".

So we'd get to the point of simply invoking those build steps as 1=1
mapped CI steps, as opposed to "improving" ci/run-build-and-tests.sh to
emulate that (I've just git rm'd it in my version).

>
> On Mon, 24 Jan 2022, Johannes Schindelin via GitGitGadget wrote:
>
>>
>> Background
>> ==========
>>
>> Recent patches intended to help readers figure out CI failures much quicker
>> than before. Unfortunately, they haven't been entirely positive for me. For
>> example, they broke the branch protections in Microsoft's fork of Git, where
>> we require Pull Requests to pass a certain set of Checks (which are
>> identified by their names) and therefore caused follow-up work.
>>
>> Using CI and in general making it easier for new contributors is an area I'm
>> passionate about, and one I'd like to see improved.
>>
>>
>> The current situation
>> =====================
>>
>> Let me walk you through the current experience when a PR build fails: I get
>> a notification mail that only says that a certain job failed. There's no
>> indication of which test failed (or was it the build?). I can click on a
>> link at it takes me to the workflow run. Once there, all it says is "Process
>> completed with exit code 1", or even "code 2". Sure, I can click on one of
>> the failed jobs. It even expands the failed step's log (collapsing the other
>> steps). And what do I see there?
>>
>> Let's look at an example of a failed linux-clang (ubuntu-latest) job
>> [https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true]:
>>
>> [...]
>> Test Summary Report
>> -------------------
>> t1092-sparse-checkout-compatibility.sh           (Wstat: 256 Tests: 53 Failed: 1)
>>   Failed test:  49
>>   Non-zero exit status: 1
>> t3701-add-interactive.sh                         (Wstat: 0 Tests: 71 Failed: 0)
>>   TODO passed:   45, 47
>> Files=957, Tests=25489, 645 wallclock secs ( 5.74 usr  1.56 sys + 866.28 cusr 364.34 csys = 1237.92 CPU)
>> Result: FAIL
>> make[1]: *** [Makefile:53: prove] Error 1
>> make[1]: Leaving directory '/home/runner/work/git/git/t'
>> make: *** [Makefile:3018: test] Error 2
>>
>>
>> That's it. I count myself lucky not to be a new contributor being faced with
>> something like this.
>>
>> Now, since I am active in the Git project for a couple of days or so, I can
>> make sense of the "TODO passed" label and know that for the purpose of
>> fixing the build failures, I need to ignore this, and that I need to focus
>> on the "Failed test" part instead.
>>
>> I also know that I do not have to get myself an ubuntu-latest box just to
>> reproduce the error, I do not even have to check out the code and run it
>> just to learn what that "49" means.
>>
>> I know, and I do not expect any new contributor, not even most seasoned
>> contributors to know, that I have to patiently collapse the "Run
>> ci/run-build-and-tests.sh" job's log, and instead expand the "Run
>> ci/print-test-failures.sh" job log (which did not fail and hence does not
>> draw any attention to it).
>>
>> I know, and again: I do not expect many others to know this, that I then
>> have to click into the "Search logs" box (not the regular web browser's
>> search via Ctrl+F!) and type in "not ok" to find the log of the failed test
>> case (and this might still be a "known broken" one that is marked via
>> test_expect_failure and once again needs to be ignored).
>>
>> To be excessively clear: This is not a great experience!
>>
>>
>> Improved output
>> ===============
>>
>> Our previous Azure Pipelines-based CI builds had a much nicer UI, one that
>> even showed flaky tests, and trends e.g. how long the test cases ran. When I
>> ported Git's CI over to GitHub workflows (to make CI more accessible to new
>> contributors), I knew fully well that we would leave this very nice UI
>> behind, and I had hoped that we would get something similar back via new,
>> community-contributed GitHub Actions that can be used in GitHub workflows.
>> However, most likely because we use a home-grown test framework implemented
>> in opinionated POSIX shells scripts, that did not happen.
>>
>> So I had a look at what standards exist e.g. when testing PowerShell
>> modules, in the way of marking up their test output in GitHub workflows, and
>> I was not disappointed: GitHub workflows support "grouping" of output lines,
>> i.e. marking sections of the output as a group that is then collapsed by
>> default and can be expanded. And it is this feature I decided to use in this
>> patch series, along with GitHub workflows' commands to display errors or
>> notices that are also shown on the summary page of the workflow run. Now, in
>> addition to "Process completed with exit code" on the summary page, we also
>> read something like:
>>
>> ⊗ linux-gcc (ubuntu-latest)
>>    failed: t9800.20 submit from detached head
>>
>>
>> Even better, this message is a link, and following that, the reader is
>> presented with something like this
>> [https://github.com/dscho/git/runs/4840190622?check_suite_focus=true]:
>>
>> ⏵ Run ci/run-build-and-tests.sh
>> ⏵ CI setup
>>   + ln -s /home/runner/none/.prove t/.prove
>>   + run_tests=t
>>   + export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
>>   + group Build make
>>   + set +x
>> ⏵ Build
>> ⏵ Run tests
>>   === Failed test: t9800-git-p4-basic ===
>> ⏵ ok: t9800.1 start p4d
>> ⏵ ok: t9800.2 add p4 files
>> ⏵ ok: t9800.3 basic git p4 clone
>> ⏵ ok: t9800.4 depot typo error
>> ⏵ ok: t9800.5 git p4 clone @all
>> ⏵ ok: t9800.6 git p4 sync uninitialized repo
>> ⏵ ok: t9800.7 git p4 sync new branch
>> ⏵ ok: t9800.8 clone two dirs
>> ⏵ ok: t9800.9 clone two dirs, @all
>> ⏵ ok: t9800.10 clone two dirs, @all, conflicting files
>> ⏵ ok: t9800.11 clone two dirs, each edited by submit, single git commit
>> ⏵ ok: t9800.12 clone using non-numeric revision ranges
>> ⏵ ok: t9800.13 clone with date range, excluding some changes
>> ⏵ ok: t9800.14 exit when p4 fails to produce marshaled output
>> ⏵ ok: t9800.15 exit gracefully for p4 server errors
>> ⏵ ok: t9800.16 clone --bare should make a bare repository
>> ⏵ ok: t9800.17 initial import time from top change time
>> ⏵ ok: t9800.18 unresolvable host in P4PORT should display error
>> ⏵ ok: t9800.19 run hook p4-pre-submit before submit
>>   Error: failed: t9800.20 submit from detached head
>> ⏵ failure: t9800.20 submit from detached head
>>   Error: failed: t9800.21 submit from worktree
>> ⏵ failure: t9800.21 submit from worktree
>>   === Failed test: t9801-git-p4-branch ===
>>   [...]
>>
>>
>> The "Failed test:" lines are colored in yellow to give a better visual clue
>> about the logs' structure, the "Error:" label is colored in red to draw the
>> attention to the important part of the log, and the "⏵" characters indicate
>> that part of the log is collapsed and can be expanded by clicking on it.
>>
>> To drill down, the reader merely needs to expand the (failed) test case's
>> log by clicking on it, and then study the log. If needed (e.g. when the test
>> case relies on side effects from previous test cases), the logs of preceding
>> test cases can be expanded as well. In this example, when expanding
>> t9800.20, it looks like this (for ease of reading, I cut a few chunks of
>> lines, indicated by "[...]"):
>>
>> [...]
>> ⏵ ok: t9800.19 run hook p4-pre-submit before submit
>>   Error: failed: t9800.20 submit from detached head
>> ⏷ failure: t9800.20 submit from detached head
>>       test_when_finished cleanup_git &&
>>       git p4 clone --dest="$git" //depot &&
>>         (
>>           cd "$git" &&
>>           git checkout p4/master &&
>>           >detached_head_test &&
>>           git add detached_head_test &&
>>           git commit -m "add detached_head" &&
>>           git config git-p4.skipSubmitEdit true &&
>>           git p4 submit &&
>>             git p4 rebase &&
>>             git log p4/master | grep detached_head
>>         )
>>     [...]
>>     Depot paths: //depot/
>>     Import destination: refs/remotes/p4/master
>>
>>     Importing revision 9 (100%)Perforce db files in '.' will be created if missing...
>>     Perforce db files in '.' will be created if missing...
>>
>>     Traceback (most recent call last):
>>       File "/home/runner/work/git/git/git-p4", line 4455, in <module>
>>         main()
>>       File "/home/runner/work/git/git/git-p4", line 4449, in main
>>         if not cmd.run(args):
>>       File "/home/runner/work/git/git/git-p4", line 2590, in run
>>         rebase.rebase()
>>       File "/home/runner/work/git/git/git-p4", line 4121, in rebase
>>         if len(read_pipe("git diff-index HEAD --")) > 0:
>>       File "/home/runner/work/git/git/git-p4", line 297, in read_pipe
>>         retcode, out, err = read_pipe_full(c, *k, **kw)
>>       File "/home/runner/work/git/git/git-p4", line 284, in read_pipe_full
>>         p = subprocess.Popen(
>>       File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
>>         self._execute_child(args, executable, preexec_fn, close_fds,
>>       File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
>>         raise child_exception_type(errno_num, err_msg, err_filename)
>>     FileNotFoundError: [Errno 2] No such file or directory: 'git diff-index HEAD --'
>>     error: last command exited with $?=1
>>     + cleanup_git
>>     + retry_until_success rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>>     + nr_tries_left=60
>>     + rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>>     + test_path_is_missing /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>>     + test 1 -ne 1
>>     + test -e /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>>     + retry_until_success mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>>     + nr_tries_left=60
>>     + mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>>     + exit 1
>>     + eval_ret=1
>>     + :
>>     not ok 20 - submit from detached head
>>     #
>>     #        test_when_finished cleanup_git &&
>>     #        git p4 clone --dest="$git" //depot &&
>>     #        (
>>     #            cd "$git" &&
>>     #            git checkout p4/master &&
>>     #            >detached_head_test &&
>>     #            git add detached_head_test &&
>>     #            git commit -m "add detached_head" &&
>>     #            git config git-p4.skipSubmitEdit true &&
>>     #            git p4 submit &&
>>     #            git p4 rebase &&
>>     #            git log p4/master | grep detached_head
>>     #        )
>>     #
>>   Error: failed: t9800.21 submit from worktree
>>   [...]
>>
>>
>> Is this the best UI we can have for test failures in CI runs? I hope we can
>> do better. Having said that, this patch series presents a pretty good start,
>> and offers a basis for future improvements.
>>
>> Johannes Schindelin (9):
>>   ci: fix code style
>>   ci/run-build-and-tests: take a more high-level view
>>   ci: make it easier to find failed tests' logs in the GitHub workflow
>>   ci/run-build-and-tests: add some structure to the GitHub workflow
>>     output
>>   tests: refactor --write-junit-xml code
>>   test(junit): avoid line feeds in XML attributes
>>   ci: optionally mark up output in the GitHub workflow
>>   ci: use `--github-workflow-markup` in the GitHub workflow
>>   ci: call `finalize_test_case_output` a little later
>>
>>  .github/workflows/main.yml           |  12 ---
>>  ci/lib.sh                            |  81 ++++++++++++++--
>>  ci/run-build-and-tests.sh            |  11 ++-
>>  ci/run-test-slice.sh                 |   5 +-
>>  t/test-lib-functions.sh              |   4 +-
>>  t/test-lib-github-workflow-markup.sh |  50 ++++++++++
>>  t/test-lib-junit.sh                  | 132 +++++++++++++++++++++++++++
>>  t/test-lib.sh                        | 128 ++++----------------------
>>  8 files changed, 287 insertions(+), 136 deletions(-)
>>  create mode 100644 t/test-lib-github-workflow-markup.sh
>>  create mode 100644 t/test-lib-junit.sh
>>
>>
>> base-commit: af4e5f569bc89f356eb34a9373d7f82aca6faa8a
>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1117%2Fdscho%2Fuse-grouping-in-ci-v1
>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1117/dscho/use-grouping-in-ci-v1
>> Pull-Request: https://github.com/gitgitgadget/git/pull/1117
>> --
>> gitgitgadget
>>
>>

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 20, 2022

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Sat, 19 Feb 2022, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > I notice that you did not take this into `seen` yet. I find that a little
> > sad because it would potentially have helped others to figure out the
> > failure in the latest `seen`:
> > https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162
> >
> > Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3.
>
> I saw the thread, I saw a few patches were commented on, and a few
> were left unanswered, but one was replied by the original submitter
> with a "Good catch!", making me expect the topic to be discussed or
> rerolled to become ready relatively soon.

Yes, I have local changes, but I had really hoped that this patch series
would get a chance to prove its point by example, i.e. by offering the
improved output for the failures in `seen`. I hoped that because I think
that those improvements speak for themselves when you see them.

Ciao,
Dscho

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 22, 2022

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Ævar,

On Mon, 21 Feb 2022, Ævar Arnfjörð Bjarmason wrote:

> On Sun, Feb 20 2022, Johannes Schindelin wrote:
>
> > On Sat, 19 Feb 2022, Junio C Hamano wrote:
> >
> >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> >>
> >> > I notice that you did not take this into `seen` yet. I find that a little
> >> > sad because it would potentially have helped others to figure out the
> >> > failure in the latest `seen`:
> >> > https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162
> >> >
> >> > Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3.
> >>
> >> I saw the thread, I saw a few patches were commented on, and a few
> >> were left unanswered, but one was replied by the original submitter
> >> with a "Good catch!", making me expect the topic to be discussed or
> >> rerolled to become ready relatively soon.
> >
> > Yes, I have local changes, but I had really hoped that this patch series
> > would get a chance to prove its point by example, i.e. by offering the
> > improved output for the failures in `seen`. I hoped that because I think
> > that those improvements speak for themselves when you see them.
>
> I think it's a good idea to get wider expose in "seen", "next" etc. for
> topics where the bottleneck is lack of feedback due to lack of wider
> exposure.

Having this in `seen` will give the patch series a chance to show in real
life how it improves the process of analyzing regressions.

Ciao,
Johannes

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 22, 2022

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Ævar,

On Sun, 20 Feb 2022, Ævar Arnfjörð Bjarmason wrote:

> On Sun, Feb 20 2022, Johannes Schindelin wrote:
>
> > I notice that you did not take this into `seen` yet. I find that a little
> > sad because it would potentially have helped others to figure out the
> > failure in the latest `seen`:
> > https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162
> >
> > Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3.
>
> I left some feedback on your submission ~3 weeks ago that you haven't
> responded to:
> https://lore.kernel.org/git/220127.86ilu5cdnf.gmgdl@evledraar.gmail.com/

You answered my goal of making it easier to figure out regressions by
doubling down on hiding the logs even better. That's not feedback, that's
just ignoring the goal.

You answered my refactor of the Azure Pipelines support with the question
"why?" that I had answered already a long time ago. That's not feedback,
that's ignoring the answers I already provided.

I don't know how to respond to that, therefore I didn't.

Ciao,
Johannes

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 25, 2022

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Wed, Feb 23 2022, Phillip Wood wrote:

> On 22/02/2022 13:31, Ævar Arnfjörð Bjarmason wrote:
>> [...]
>> So just to make the point about one of those mentioned in my [1] with
>> some further details (I won't go into the whole thing to avoid repeating
>> myself):
>> I opened both of:
>>      https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true
>>      https://github.com/dscho/git/runs/4840190622?check_suite_focus=true
>> Just now in Firefox 91.5.0esr-1. Both having been opened before, so
>> they're in cache, and I've got a current 40MB/s real downlink speed etc.
>> The former fully loads in around 5100ms, with your series here
>> that's
>> just short of 18000ms.
>> So your CI changes are making the common case of just looking at a
>> CI
>> failure more than **3x as slow as before**.
>
> I don't think that is the most useful comparison between the two.[...]

I'm not saying that it's the most useful comparison between the two, but
that there's a major performance regression introduced in this series
that so far isn't addressed or noted.

> [...]When
> I am investigating a test failure the time that matters to me is the
> time it takes to display the output of the failing test case. With the
> first link above the initial page load is faster but to get to the
> output of the failing test case I have click on "Run
> ci/print_test_failures.sh" then wait for that to load and then search
> for "not ok" to actually get to the information I'm after. With the
> second link the initial page load does feel slower but then I'm
> presented  with the test failures nicely highlighted in red, all I
> have to do is click on one and I've got the information I'm
> after. Overall that is much faster and easier to use.

Whether you think the regression is worth the end result is a subjective
judgement. I don't think it is, but I don't think you or anyone else is
wrong if they don't agree.

If you think it's OK to spend ~20s instead of ~5s on rendering the new
output that's something that clearly depends on how much you value the
new output, and much much you're willing to wait.

What I am saying, and what I hope you'll agree with, is that it's
something that should be addressed in some way by this series.

One way to do that would be to note the performence regression in a
commit message, and argue that despite the slowdown it's worth it.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 25, 2022

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Phillip,

On Wed, 23 Feb 2022, Phillip Wood wrote:

> On 22/02/2022 13:31, Ævar Arnfjörð Bjarmason wrote:
> > [...]
> > So just to make the point about one of those mentioned in my [1] with
> > some further details (I won't go into the whole thing to avoid repeating
> > myself):
> >
> > I opened both of:
> >
> > https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true
> > https://github.com/dscho/git/runs/4840190622?check_suite_focus=true
> >
> > Just now in Firefox 91.5.0esr-1. Both having been opened before, so
> > they're in cache, and I've got a current 40MB/s real downlink speed etc.
> >
> > The former fully loads in around 5100ms, with your series here that's
> > just short of 18000ms.
> >
> > So your CI changes are making the common case of just looking at a CI
> > failure more than **3x as slow as before**.
>
> I don't think that is the most useful comparison between the two. When I
> am investigating a test failure the time that matters to me is the time
> it takes to display the output of the failing test case.

Thank you for expressing this so clearly. I will adopt a variation of this
phrasing in my commit message, if you don't mind?

> With the first link above the initial page load is faster but to get to
> the output of the failing test case I have click on "Run
> ci/print_test_failures.sh" then wait for that to load and then search
> for "not ok" to actually get to the information I'm after.

And that's only because you are familiar with what you have to do.

Any new contributor would be stuck with the information presented on the
initial load, without any indication that more information _is_ available,
just hidden away in the next step's log (which is marked as "succeeding",
therefore misleading the inclined reader into thinking that this cannot
potentially contain any information pertinent to the _failure_ that needs
to be investigated).

> With the second link the initial page load does feel slower but then I'm
> presented with the test failures nicely highlighted in red, all I have
> to do is click on one and I've got the information I'm after.
>
> Overall that is much faster and easier to use.

Thank you for your comment. I really started to doubt myself, getting the
idea that it's just a case of me holding this thing wrong.

For what it's worth, I did make a grave mistake by using that particular
`seen` CI failure with all of those failing p4 tests, which obviously
resulted in an incredibly large amount of logs. Obviously that _must_ be
slow to load. I just did not have the time to fabricate a CI failure.

However, I do agree with you that the large amount of logs would have to
be looked at _anyway_, whether it is shown upon loading the job's logs or
only when expanding the `print-test-failures` step's logs. The amount of
the logs is a constant, after all, I did not change anything there (nor
would I).

So a better example might be my concrete use case yesterday: the CI build
of `seen` failed. Here is the link to the regular output:

	https://github.com/git/git/actions/runs/1890665968

On that page, you see the following:


	Annotations
	8 errors and 1 warning

	ⓧ win test (3)
	  Process completed with exit code 2.

	ⓧ win test (6)
	  Process completed with exit code 2.

	ⓧ win test (2)
	  Process completed with exit code 2.

	ⓧ win+VS test (3)
	  Process completed with exit code 2.

	ⓧ win+VS test (6)
	  Process completed with exit code 2.

	ⓧ win+VS test (2)
	  Process completed with exit code 2.

	ⓧ osx-gcc (macos-latest)
	  Process completed with exit code 2.

	ⓧ osx-clang (macos-latest)
	  Process completed with exit code 2.

	⚠ CI: .github#L1
	  windows-latest workflows now use windows-2022. For more details, see https://github.com/actions/virtual-environments/issues/4856

So I merged my branch into `seen` and pushed it. The corresponding run can
be seen here:

	https://github.com/dscho/git/actions/runs/1892982393

On that page, you see the following:

	Annotations
	50 errors and 1 warning

	ⓧ win test (3)
	  failed: t7527.1 explicit daemon start and stop

	ⓧ win test (3)
	  failed: t7527.2 implicit daemon start

	ⓧ win test (3)
	  failed: t7527.3 implicit daemon stop (delete .git)

	ⓧ win test (3)
	  failed: t7527.4 implicit daemon stop (rename .git)

	ⓧ win test (3)
	  failed: t7527.5 implicit daemon stop (rename GIT~1)

	ⓧ win test (3)
	  failed: t7527.6 implicit daemon stop (rename GIT~2)

	ⓧ win test (3)
	  failed: t7527.8 cannot start multiple daemons

	ⓧ win test (3)
	  failed: t7527.10 update-index implicitly starts daemon

	ⓧ win test (3)
	  failed: t7527.11 status implicitly starts daemon

	ⓧ win test (3)
	  failed: t7527.12 edit some files

	ⓧ win test (2)
	  failed: t0012.81 fsmonitor--daemon can handle -h

	ⓧ win test (2)
	  Process completed with exit code 1.

	ⓧ win test (6)
	  failed: t7519.2 run fsmonitor-daemon in bare repo

	ⓧ win test (6)
	  failed: t7519.3 run fsmonitor-daemon in virtual repo

	ⓧ win test (6)
	  Process completed with exit code 1.

	ⓧ win+VS test (3)
	  failed: t7527.1 explicit daemon start and stop

	ⓧ win+VS test (3)
	  failed: t7527.2 implicit daemon start

	ⓧ win+VS test (3)
	  failed: t7527.3 implicit daemon stop (delete .git)

	ⓧ win+VS test (3)
	  failed: t7527.4 implicit daemon stop (rename .git)

	ⓧ win+VS test (3)
	  failed: t7527.5 implicit daemon stop (rename GIT~1)

	ⓧ win+VS test (3)
	  failed: t7527.6 implicit daemon stop (rename GIT~2)

	ⓧ win+VS test (3)
	  failed: t7527.8 cannot start multiple daemons

	ⓧ win+VS test (3)
	  failed: t7527.10 update-index implicitly starts daemon

	ⓧ win+VS test (3)
	  failed: t7527.11 status implicitly starts daemon

	ⓧ win+VS test (3)
	  failed: t7527.12 edit some files

	ⓧ win+VS test (2)
	  failed: t0012.81 fsmonitor--daemon can handle -h

	ⓧ win+VS test (2)
	  Process completed with exit code 1.

	ⓧ win+VS test (6)
	  failed: t7519.2 run fsmonitor-daemon in bare repo

	ⓧ win+VS test (6)
	  failed: t7519.3 run fsmonitor-daemon in virtual repo

	ⓧ win+VS test (6)
	  Process completed with exit code 1.

	ⓧ osx-clang (macos-latest)
	  failed: t0012.81 fsmonitor--daemon can handle -h

	ⓧ osx-clang (macos-latest)
	  failed: t7519.2 run fsmonitor-daemon in bare repo

	ⓧ osx-clang (macos-latest)
	  failed: t7527.1 explicit daemon start and stop

	ⓧ osx-clang (macos-latest)
	  failed: t7527.2 implicit daemon start

	ⓧ osx-clang (macos-latest)
	  failed: t7527.3 implicit daemon stop (delete .git)

	ⓧ osx-clang (macos-latest)
	  failed: t7527.4 implicit daemon stop (rename .git)

	ⓧ osx-clang (macos-latest)
	  failed: t7527.7 MacOS event spelling (rename .GIT)

	ⓧ osx-clang (macos-latest)
	  failed: t7527.8 cannot start multiple daemons

	ⓧ osx-clang (macos-latest)
	  failed: t7527.10 update-index implicitly starts daemon

	ⓧ osx-clang (macos-latest)
	  failed: t7527.11 status implicitly starts daemon

	ⓧ osx-gcc (macos-latest)
	  failed: t0012.81 fsmonitor--daemon can handle -h

	ⓧ osx-gcc (macos-latest)
	  failed: t7519.2 run fsmonitor-daemon in bare repo

	ⓧ osx-gcc (macos-latest)
	  failed: t7527.1 explicit daemon start and stop

	ⓧ osx-gcc (macos-latest)
	  failed: t7527.2 implicit daemon start

	ⓧ osx-gcc (macos-latest)
	  failed: t7527.3 implicit daemon stop (delete .git)

	ⓧ osx-gcc (macos-latest)
	  failed: t7527.4 implicit daemon stop (rename .git)

	ⓧ osx-gcc (macos-latest)
	  failed: t7527.7 MacOS event spelling (rename .GIT)

	ⓧ osx-gcc (macos-latest)
	  failed: t7527.8 cannot start multiple daemons

	ⓧ osx-gcc (macos-latest)
	  failed: t7527.10 update-index implicitly starts daemon

	ⓧ osx-gcc (macos-latest)
	  failed: t7527.11 status implicitly starts daemon

	⚠ CI: .github#L1
	  windows-latest workflows now use windows-2022. For more details, see https://github.com/actions/virtual-environments/issues/4856

In my mind, this is already an improvement. (Even if this is a _lot_ of
output, and a lot of individual errors, given that all of them are fixed
with a single, small patch to adjust an option usage string, but that's
not the fault of my patch series, but of the suggestion to put the check
for the option usage string linting into the `parse_options()` machinery
instead of into the static analysis job.)

Since there are still plenty of failures, the page admittedly does load
relatively slowly. But that's not the time I was trying to optimize for.
My time comes at quite a premium these days, so if the computer has to
work a little harder while I can do something else, as long as it saves
_me_ time, I'll take that time. Every time.

Ciao,
Dscho

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 25, 2022

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> So I merged my branch into `seen` and pushed it. The corresponding run can
> be seen here:
>
> 	https://github.com/dscho/git/actions/runs/1892982393

I visited this page (while logged in to GItHub---I am saying this
for others who may not know the output is shown differently for
visitors that are logged-in, and and logged-in users).

> On that page, you see the following:
>
> 	Annotations
> 	50 errors and 1 warning
>
> 	ⓧ win test (3)
> 	  failed: t7527.1 explicit daemon start and stop
> ...
>
> 	⚠ CI: .github#L1
> 	  windows-latest workflows now use windows-2022. For more details, see https://github.com/actions/virtual-environments/issues/4856
>
> In my mind, this is already an improvement. (Even if this is a _lot_ of
> output, and a lot of individual errors, given that all of them are fixed
> with a single, small patch to adjust an option usage string, but that's
> not the fault of my patch series, but of the suggestion to put the check
> for the option usage string linting into the `parse_options()` machinery
> instead of into the static analysis job.)

It is not obvious what aspect in the new output _you_ found "an
improvement" to your readers, because you didn't spell it out.  That
makes "in my mind, this is already an improvement" a claim that is
unnecessarily weaker than it really is.

Let me tell my experience:

 - Clicking on macos+clang in the map-looking thing, it did show and
   scroll down automatically to show the last failure link ready to
   be clicked after a few seconds, which was nice, but made me
   scroll back to see the first failure, which could have been
   better.

 - Clicking on win+VS test (2), the failed <test> part was
   automatically opened, and a circle spinned for several dozens of
   seconds to make me wait, but after that, nothing happened.  It
   was somewhat hard to know if I were expected to do something to
   view the first error and when the UI is ready to let me do so, or
   if I were just expected to wait a bit longer for it to all happen
   automatically.

Either case, the presentation to fold all the pieces that finished
successfully made it usable, as that saved human time to scan to
where failures are shown.

I personally do not care about the initial latency when viewing the
output from CI run that may have happened a few dozens of minutes
ago (I do not sit in front of GitHub CI UI and wait until it
finishes). As long as it is made clear when I can start interacting
with it, I can just open the page and let it load while I am working
on something else.

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 26, 2022

This branch is now known as js/ci-github-workflow-markup.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 26, 2022

This patch series was integrated into seen via git@d552efd.

@gitgitgadget gitgitgadget bot added the seen label Feb 26, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Feb 26, 2022

This patch series was integrated into seen via git@82dd0cb.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 26, 2022

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:

> Let me tell my experience:
>
>  - Clicking on macos+clang in the map-looking thing, it did show and
>    scroll down automatically to show the last failure link ready to
>    be clicked after a few seconds, which was nice, but made me
>    scroll back to see the first failure, which could have been
>    better.
>
>  - Clicking on win+VS test (2), the failed <test> part was
>    automatically opened, and a circle spinned for several dozens of
>    seconds to make me wait, but after that, nothing happened.  It
>    was somewhat hard to know if I were expected to do something to
>    view the first error and when the UI is ready to let me do so, or
>    if I were just expected to wait a bit longer for it to all happen
>    automatically.
>
> Either case, the presentation to fold all the pieces that finished
> successfully made it usable, as that saved human time to scan to
> where failures are shown.
>
> I personally do not care about the initial latency when viewing the
> output from CI run that may have happened a few dozens of minutes
> ago (I do not sit in front of GitHub CI UI and wait until it
> finishes). As long as it is made clear when I can start interacting
> with it, I can just open the page and let it load while I am working
> on something else.

FWIW, CI run on "seen" uses this series.

When I highlight a failure at CI, I often give a URL like this:

https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5520

I notice that this "hide by default" forces the recipient of the URL
to click the line after the line with a red highlight before they
can view the breakage.

For example, an URL to show a similar breakage from the old run
(without this series) looks like this:

https://github.com/git/git/runs/5341052811?check_suite_focus=true#step:5:3968

This directly jumps to the error and the recipient of the URL does
not have to do anything special, which I have been using as a
convenient way to give developers a starting point.

I haven't compared the implementation of this one and Ævar's series
that aims for a different goal, so I do not yet have an opinion on
which one should come first (if we want to achieve both of what each
of them wants to achieve, that is).

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 28, 2022

This patch series was integrated into seen via git@e26e6d5.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 1, 2022

This patch series was integrated into seen via git@7974a23.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 1, 2022

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:

> FWIW, CI run on "seen" uses this series.

Another "early impression".  I had to open this one today,

    https://github.com/git/git/runs/5367854000?check_suite_focus=true

which was a jarring experience.  It correctly painted the fourth
circle "Run ci/run-build-and-tests.sh" in red with X in it, and
after waiting for a while (which I already said that I do not mind
at all), showed a bunch of line, and then auto-scrolled down to the
end of that section.

It _looked_ like that it was now ready for me to interact with it,
so I started to scroll up to the beginning of that section, but I
had to stare at blank space for several minutes before lines are
shown to occupy that space.  During the repainting, unlike the
initial delay-wait that lets me know that it is not ready by showing
the spinning circle, there was no indication that it wants me to
wait until it fills the blank space with lines.  Not very pleasant.

I do not think it is so bad to say that it is less pleasant than
opening the large "print test failures" section and looking for "not
ok", which was what the original CI UI we had before this series.
But at least with the old one, once the UI becomes ready for me to
interact with, I didn't have to wait for (for the lack of better
phrase) such UI hiccups.  Responses to looking for the next instance
of "not ok" was predictable.

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented May 23, 2022

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Mon, 23 May 2022, Junio C Hamano wrote:

> [...] the test output got a lot shorter by discarding the "ok" output
> and keeping only the failures and skips.  Because the readers are mostly
> interested in seeing failures (they can download the full log if
> they want to), and this design decision probably makes sense to me.

For the record, Victoria suggested to group by file rather than by failed
test case.

However, I do speak from a lot of experience diagnosing test failures in
CI/PR runs when I say: it is frequently very helpful to have a look at one
failed test case at a time. I'd much rather suffer a minor lag while
scrolling than having to find the boundaries manually, in particular when
`test_expect_failure` test cases are present (which are reported as
"broken" in the current iteration instead of "failed").

Besides, the scroll issue is probably similar between both approaches to
grouping (and may be independent of the grouping, as you pointed out by
reporting similar issues in the current `print-test-failures` step), and
is something I hope the Actions engineers are working on.

> Common to the both approaches, folding output from each test piece
> to one line (typically "ok" but sometimes "failed" heading) may be
> the source of UI responsiveness irritation I have been observing,
> but I wonder, with the removal of all "ok" pieces, it may make sense
> not to fold anything and instead give a flat "here are the traces of
> all failed and skipped tests".

As I mentioned above, I'd rather keep the grouping by failed test case.

Obviously, the ideal way to decide would be to set up some A/B testing
with real people, but I have no way to set up anything like that.

> In any case, either implementation seems to give us a good improvement
> over what is in 'master'.

There are two things I would like to add:

- In the current iteration's summary page, you will see the failed test
  cases' titles in the errors, and they are clickable (and will get you to
  the corresponding part of the logs). I find this very convenient.

- The addition of the suggestion to look at the run's artifacts for the
  full logs might not look like a big deal, but I bet that it will help in
  particular new contributors. This was yet another great suggestion by
  Victoria.

Thanks,
Dscho

@gitgitgadget
Copy link

gitgitgadget bot commented May 23, 2022

This branch is now known as js/ci-github-workflow-markup.

@gitgitgadget
Copy link

gitgitgadget bot commented May 23, 2022

This patch series was integrated into seen via git@ecd5bba.

@gitgitgadget
Copy link

gitgitgadget bot commented May 23, 2022

This patch series was integrated into seen via git@8907954.

@gitgitgadget
Copy link

gitgitgadget bot commented May 24, 2022

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Mon, May 23 2022, Johannes Schindelin wrote:

> Hi Ævar,
>
> On Mon, 23 May 2022, Ævar Arnfjörð Bjarmason wrote:
>
>> Re selling point & feature comparison: The point of the ab/* variant was
>> to re-roll Johannes's onto a "base" topic that made much of his
>> unnecessary, because the building up of features to emit GitHub markup
>> can be replaced by unrolling things like "make" and "make test" to the
>> top-level.
>>
>> That has its own UX benefits, e.g. you can see at a glance what command
>> was run and what the environment was, and "make" and "make test" are now
>> split up from one monolithic "build and test" step.
>>
>> But the primary intention was not to provide a prettier UX, but to show
>> that this arrangement made sense. I was hoping that Johannes would reply
>> with some variant of "ah, I see what you mean, that does make things
>> simpler!" and run with it, but alas...
>
> I believe that we share the goal to make the Git project more welcoming
> and easier to navigate for new contributors.

Yes, definitely.

> The patch series you wanted me to look at claims to make the CI/PR
> definitions/scripts simpler. As it matters more to contributors how to
> investigate test failures, i.e. what information they are provided about
> the failures, I disagree that that patch series needs to be connected to
> my patch series in any way.

Our two set of patches change different parts of the CI UX, so no. The
set of patches I've been proposing isn't just making CI/PR
definitions/scripts simpler, although it also does that.

So e.g. in your patches you need to massage the CI output to split the
"build" step from the "test" step. As you can see in an earlier RFC
re-roll of them on top of my topic that something you'd get for free:
https://lore.kernel.org/git/RFC-cover-v5-00.10-00000000000-20220421T183001Z-avarab@gmail.com/

> Further, the result does not look like a simplification to me. For
> example, I consider it an absolute no-go to remove the remnants of Azure
> Pipelines support. As I had hinted, and as you saw on the git-security
> list, I require this support for embargoed releases. That’s what I did
> when working on the patches that made it into v2.35.2. In my book,
> removing such vital (if dormant) code is not a simplification, but a
> Chesterton’s Fence. While we do not need to use Azure Pipelines for our
> regular CI, we definitely need it for embargoed releases. “Simply revert
> it back” is not an excuse for removing something that should not be
> removed in the first place.

Can you please reply to this 3 month old and still-waiting-on-your-reply
E-Mail on this topic so we can figure out a way forward with this:
https://lore.kernel.org/git/220222.86y2236ndp.gmgdl@evledraar.gmail.com/

> As another example where I have a different concept of what constitutes
> “simple”: In Git for Windows’ fork, we carry a patch that integrates the
> `git-subtree` tests into the CI builds. This patch touches two places,
> `ci/run-build-and-tests.sh` and `ci/run-test-slice.sh`. These changes
> would be inherited by any CI definition that uses the scripts in `ci/`.
> With the proposed patches, there are four places to patch, and they are
> all limited to the GitHub workflow definition. Since you asked me for my
> assessment: this is de-DRYing the code, making it more cumbersome instead
> of simpler.

No, you'd still have two places to patch:

 1. The top-level Makefile to have "make test" run those subtree tests
    depending on some flag, i.e. the same as your
    ci/run-build-and-tests.sh.

 2. ci/run-test-slice.sh as before (which is only needed for the
 Windows-specific tests).

Because we'd be having the Makefile drive the logic you could also run
such a "make test" locally, which is something we should have
anyway. E.g. when I build my own git I run the subtree tests, and would
like to eventually make "run contrib tests too" some configurable
option.

So it is exactly the DRY principle. By avoiding making things needlessly
CI-specific we can just control this behavior with flags, both in and
outside CI.

> In other words, I have fundamental objections about the approach and about
> tying it to the patches that improve the output of Git’s CI/PR runs.

I would too if after my series you needed to patch every place we run
"make test" or whatever to run your subtree tests, but as noted above
that's not the case. So hopefully this addresses that.

More generally: I noted a while ago that if you pointed out issues like
that I'd be happy to address them for you.  Based on this I see
d08496f2c40 (ci: run `contrib/subtree` tests in CI builds, 2021-08-05),
and that would be easy to generalize.

>> In Chrome/Firefox the time to load the page (as in the spinner stops,
>> and we "focus" on the right content) is:
>>
>>     JS: ~60s / ~80s
>>     Æ: ~25s / ~18s
>
> My focus is on the experience of occasional and new contributors who need
> to investigate test failures in the CI/PR runs. In this thread, we already
> discussed the balance between speed of loading the page on the one hand
> and how well the reader is guided toward the relevant parts on the other
> hand.

First, your re-roll claims thta it "improves the time to load pages",
but based on the sort of testing I'd done before when I reported the
severe slowness introduced by this topic I can't reproduce that.

So how exactly are you testing the performance of these load times, and
can you share the numbers you have for master, your previous iteration
and this re-roll?

> I disagree with you that the former should be prioritized over the
> latter, on the contrary, guiding the readers along a path to success is
> much more important than optimizing for a quick page load.

I think a better UX is certainly worth some cost to load times, so I'm
not trying to be difficult in saying that this costs us some
milliseconds so it's a no-go.

But really, this is making it so slow that it's borderline unusable.

The main way I use this interface is that I'll get an E-Mail with a
failure report, or see the "X" in the UX and click through to the
failure, then see the logs etc, and hopefully be able to see from that
what's wrong, or how I could begin to reproduce it.

Right now that's fast enough that I'll do that all in one browser
click-through session, but if I'm having to wait *more than a minute*
v.s. the current 10-20 seconds (which is already quite bad)?

Your latest series also seems to either be buggy (or trigger some bug in
GitHub Actions?) where even after that minute you'll see almost nothing
on your screen. So a user who doesn't know the UX would end up waiting
much longer than that.

You seemingly need to know that it's done when it shows you that blank
screen, and trigger a re-render by scrolling up or down, which will show
you your actual failures.

That's not an issue I saw in any iteration of this before this v3.

> Most contributors who chimed in seemed to not mind a longer page load time
> anyway, as long as the result would help them identify quickly what causes
> the test failures.

Wasn't much of that discussion a follow-up to your initial demos of this
topic?

I don't think those were as slow as what I'm pointing out above, which I
think is just because those failures happened to involve much fewer
lines of log. The slowness seems to be at correlated with how many lines
we're dealing with in total.

> Besides, the page load times are only likely to become
> better anyway, as GitHub engineers continuously improve Actions.

Sure, and if this were all magically made better by GH engineers these
concerns would be addressed.

But right now that isn't the case, and we don't know if/when that would
happen, so we need to review these proposed changes on the basis of how
they'd change the current GitHub CI UX overall.

@@ -0,0 +1,54 @@
# Library of functions to mark up test scripts' output suitable for
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Sat, May 21 2022, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> [...]
> Co-authored-by: Victoria Dye <vdye@github.com>

Missing SOB here for Victoria.

> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---

@gitgitgadget
Copy link

gitgitgadget bot commented May 25, 2022

This patch series was integrated into seen via git@d7d74b2.

@gitgitgadget
Copy link

gitgitgadget bot commented May 25, 2022

This patch series was integrated into seen via git@d2ee9d5.

@gitgitgadget
Copy link

gitgitgadget bot commented May 25, 2022

This patch series was integrated into seen via git@18bd8fd.

@gitgitgadget
Copy link

gitgitgadget bot commented May 26, 2022

This patch series was integrated into seen via git@e494d64.

@gitgitgadget
Copy link

gitgitgadget bot commented May 26, 2022

There was a status update in the "Cooking" section about the branch js/ci-github-workflow-markup on the Git mailing list:

Update the GitHub workflow support to make it quicker to get to the
failing test.

Will merge to 'next'?
source: <pull.1117.v3.git.1653171536.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented May 26, 2022

This patch series was integrated into seen via git@7cfbacf.

@gitgitgadget
Copy link

gitgitgadget bot commented May 27, 2022

This patch series was integrated into seen via git@dd7f543.

@gitgitgadget
Copy link

gitgitgadget bot commented May 28, 2022

This patch series was integrated into seen via git@b1eff10.

@gitgitgadget
Copy link

gitgitgadget bot commented May 31, 2022

This patch series was integrated into seen via git@8edcf4a.

@gitgitgadget
Copy link

gitgitgadget bot commented May 31, 2022

This patch series was integrated into next via git@bd37e9e.

@gitgitgadget gitgitgadget bot added the next label May 31, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 1, 2022

This patch series was integrated into seen via git@0a2e6f6.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 2, 2022

There was a status update in the "Cooking" section about the branch js/ci-github-workflow-markup on the Git mailing list:

Update the GitHub workflow support to make it quicker to get to the
failing test.

Will merge to 'master'.
source: <pull.1117.v3.git.1653171536.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 2, 2022

This patch series was integrated into seen via git@d86b507.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 3, 2022

This patch series was integrated into seen via git@fe88afc.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 3, 2022

This patch series was integrated into seen via git@d639d8f.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 7, 2022

This patch series was integrated into seen via git@c56db02.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 7, 2022

This patch series was integrated into seen via git@fc5a070.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 7, 2022

This patch series was integrated into master via git@fc5a070.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 7, 2022

This patch series was integrated into next via git@fc5a070.

@gitgitgadget gitgitgadget bot added the master label Jun 7, 2022
@gitgitgadget gitgitgadget bot closed this Jun 7, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 7, 2022

Closed via fc5a070.

@dscho dscho deleted the use-grouping-in-ci branch June 7, 2022 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants