New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: make Git's GitHub workflow output much more helpful #1117
Conversation
74278cf
to
992b157
Compare
/preview |
Preview email sent as pull.1117.git.1642696986.gitgitgadget@gmail.com |
/submit |
Submitted as pull.1117.git.1643050574.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
@@ -10,7 +10,7 @@ windows*) cmd //c mklink //j t\\.prove "$(cygpath -aw "$cache_dir/.prove")";; | |||
*) ln -s "$cache_dir/.prove" t/.prove;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Eric Sunshine wrote (reply to this):
On Mon, Jan 24, 2022 at 3:02 PM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> In the web UI of GitHub workflows, failed runs are presented with the
> job step that failed auto-expanded. In the current setup, this is not
> helpful at all because that shows only the output of `prove`, which says
> which test failed, but not in what way.
>
> What would help understand the reader what went wrong is the verbose
> test output of the failed test.
>
> The logs of the failed runs do contain that verbose test output, but it
> is shown in the _next_ step (which is marked as succeeding, and is
> therefore _not_ auto-expanded). Anyone not intimately familiar with this
> would completely miss the verbose test output, being left mostly
> puzzled with the test failures.
>
> We are about to show the failed test cases' output in the _same_ step,
> so that the user has a much easier time to figure out what was going
> wrong.
>
> But first, we must partially revert the change that tried to improve the
> CI runs by combining the `Makefile` targets to build into a single
> `make` invocation. That might have sounded like a good idea at the time,
> but it does make it rather impossible for the CI script to determine
> whether the _build_ failed, or the _tests_. If the tests were run at
> all, that is.
>
> So let's go back to calling `make` for the build, and call `make test`
> separately so that we can easily detect that _that_ invocation failed,
> and react appropriately.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
> @@ -10,7 +10,7 @@ windows*) cmd //c mklink //j t\\.prove "$(cygpath -aw "$cache_dir/.prove")";;
> -export MAKE_TARGETS="all test"
> +run_tests=t
>
> case "$jobname" in
> linux-gcc)
> @@ -41,14 +41,18 @@ pedantic)
> # Don't run the tests; we only care about whether Git can be
> # built.
> export DEVOPTS=pedantic
> - export MAKE_TARGETS=all
> + run_tests=
> ;;
> esac
>
> # Any new "test" targets should not go after this "make", but should
> # adjust $MAKE_TARGETS. Otherwise compilation-only targets above will
> # start running tests.
> -make $MAKE_TARGETS
The comment talking about MAKE_TARGETS seems out of date now that
MAKE_TARGETS has been removed from this script.
> +make
> +if test -n "$run_tests"
> +then
> + make test
> +fi
> check_unignored_build_artifacts
This changes behavior, doesn't it? Wth the original "make all test",
if the `all` target failed, then the `test` target would not be
invoked. However, with the revised code, `make test` is invoked even
if `make all` fails. Is that behavior change significant? Do we care
about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Eric,
On Mon, 24 Jan 2022, Eric Sunshine wrote:
> On Mon, Jan 24, 2022 at 3:02 PM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > In the web UI of GitHub workflows, failed runs are presented with the
> > job step that failed auto-expanded. In the current setup, this is not
> > helpful at all because that shows only the output of `prove`, which says
> > which test failed, but not in what way.
> >
> > What would help understand the reader what went wrong is the verbose
> > test output of the failed test.
> >
> > The logs of the failed runs do contain that verbose test output, but it
> > is shown in the _next_ step (which is marked as succeeding, and is
> > therefore _not_ auto-expanded). Anyone not intimately familiar with this
> > would completely miss the verbose test output, being left mostly
> > puzzled with the test failures.
> >
> > We are about to show the failed test cases' output in the _same_ step,
> > so that the user has a much easier time to figure out what was going
> > wrong.
> >
> > But first, we must partially revert the change that tried to improve the
> > CI runs by combining the `Makefile` targets to build into a single
> > `make` invocation. That might have sounded like a good idea at the time,
> > but it does make it rather impossible for the CI script to determine
> > whether the _build_ failed, or the _tests_. If the tests were run at
> > all, that is.
> >
> > So let's go back to calling `make` for the build, and call `make test`
> > separately so that we can easily detect that _that_ invocation failed,
> > and react appropriately.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> > diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
> > @@ -10,7 +10,7 @@ windows*) cmd //c mklink //j t\\.prove "$(cygpath -aw "$cache_dir/.prove")";;
> > -export MAKE_TARGETS="all test"
> > +run_tests=t
> >
> > case "$jobname" in
> > linux-gcc)
> > @@ -41,14 +41,18 @@ pedantic)
> > # Don't run the tests; we only care about whether Git can be
> > # built.
> > export DEVOPTS=pedantic
> > - export MAKE_TARGETS=all
> > + run_tests=
> > ;;
> > esac
> >
> > # Any new "test" targets should not go after this "make", but should
> > # adjust $MAKE_TARGETS. Otherwise compilation-only targets above will
> > # start running tests.
> > -make $MAKE_TARGETS
>
> The comment talking about MAKE_TARGETS seems out of date now that
> MAKE_TARGETS has been removed from this script.
Good catch!
> > +make
> > +if test -n "$run_tests"
> > +then
> > + make test
> > +fi
> > check_unignored_build_artifacts
>
> This changes behavior, doesn't it? Wth the original "make all test",
> if the `all` target failed, then the `test` target would not be
> invoked. However, with the revised code, `make test` is invoked even
> if `make all` fails. Is that behavior change significant? Do we care
> about it?
That is actually not the case. Compare to what 25715419bf4 (CI: don't run
"make test" twice in one job, 2021-11-23) did: it removed code that _also_
did not specifically prevent `make test` from running when `make all`
failed.
The clue to the riddle is this line in `ci/lib.sh`:
set -ex
The `-e` part lets the script fail whenever any command fails (unless it
is part of an `if`/`while` condition, or properly chained with `||`).
This line is actually touched by the "ci/run-build-and-tests: add some
structure to the GitHub workflow output" patch in this patch series, which
breaks it apart into the `set -e` and the `set -x` part (so that the
latter can be called later in GitHub workflows, to unclutter the output a
bit).
Ciao,
Dscho
User |
992b157
to
9e6e49f
Compare
On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):
|
User |
On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):
|
On the Git mailing list, Johannes Schindelin wrote (reply to this):
|
On the Git mailing list, Junio C Hamano wrote (reply to this):
|
On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):
|
On the Git mailing list, Johannes Schindelin wrote (reply to this):
|
On the Git mailing list, Johannes Schindelin wrote (reply to this):
|
On the Git mailing list, Johannes Schindelin wrote (reply to this):
|
On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):
|
On the Git mailing list, Johannes Schindelin wrote (reply to this):
|
9e6e49f
to
be2a83f
Compare
On the Git mailing list, Junio C Hamano wrote (reply to this):
|
This branch is now known as |
This patch series was integrated into seen via git@d552efd. |
This patch series was integrated into seen via git@82dd0cb. |
On the Git mailing list, Junio C Hamano wrote (reply to this):
|
This patch series was integrated into seen via git@e26e6d5. |
This patch series was integrated into seen via git@7974a23. |
On the Git mailing list, Junio C Hamano wrote (reply to this):
|
On the Git mailing list, Johannes Schindelin wrote (reply to this): Hi Junio,
On Mon, 23 May 2022, Junio C Hamano wrote:
> [...] the test output got a lot shorter by discarding the "ok" output
> and keeping only the failures and skips. Because the readers are mostly
> interested in seeing failures (they can download the full log if
> they want to), and this design decision probably makes sense to me.
For the record, Victoria suggested to group by file rather than by failed
test case.
However, I do speak from a lot of experience diagnosing test failures in
CI/PR runs when I say: it is frequently very helpful to have a look at one
failed test case at a time. I'd much rather suffer a minor lag while
scrolling than having to find the boundaries manually, in particular when
`test_expect_failure` test cases are present (which are reported as
"broken" in the current iteration instead of "failed").
Besides, the scroll issue is probably similar between both approaches to
grouping (and may be independent of the grouping, as you pointed out by
reporting similar issues in the current `print-test-failures` step), and
is something I hope the Actions engineers are working on.
> Common to the both approaches, folding output from each test piece
> to one line (typically "ok" but sometimes "failed" heading) may be
> the source of UI responsiveness irritation I have been observing,
> but I wonder, with the removal of all "ok" pieces, it may make sense
> not to fold anything and instead give a flat "here are the traces of
> all failed and skipped tests".
As I mentioned above, I'd rather keep the grouping by failed test case.
Obviously, the ideal way to decide would be to set up some A/B testing
with real people, but I have no way to set up anything like that.
> In any case, either implementation seems to give us a good improvement
> over what is in 'master'.
There are two things I would like to add:
- In the current iteration's summary page, you will see the failed test
cases' titles in the errors, and they are clickable (and will get you to
the corresponding part of the logs). I find this very convenient.
- The addition of the suggestion to look at the run's artifacts for the
full logs might not look like a big deal, but I bet that it will help in
particular new contributors. This was yet another great suggestion by
Victoria.
Thanks,
Dscho
|
This branch is now known as |
This patch series was integrated into seen via git@ecd5bba. |
This patch series was integrated into seen via git@8907954. |
On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this): On Mon, May 23 2022, Johannes Schindelin wrote:
> Hi Ævar,
>
> On Mon, 23 May 2022, Ævar Arnfjörð Bjarmason wrote:
>
>> Re selling point & feature comparison: The point of the ab/* variant was
>> to re-roll Johannes's onto a "base" topic that made much of his
>> unnecessary, because the building up of features to emit GitHub markup
>> can be replaced by unrolling things like "make" and "make test" to the
>> top-level.
>>
>> That has its own UX benefits, e.g. you can see at a glance what command
>> was run and what the environment was, and "make" and "make test" are now
>> split up from one monolithic "build and test" step.
>>
>> But the primary intention was not to provide a prettier UX, but to show
>> that this arrangement made sense. I was hoping that Johannes would reply
>> with some variant of "ah, I see what you mean, that does make things
>> simpler!" and run with it, but alas...
>
> I believe that we share the goal to make the Git project more welcoming
> and easier to navigate for new contributors.
Yes, definitely.
> The patch series you wanted me to look at claims to make the CI/PR
> definitions/scripts simpler. As it matters more to contributors how to
> investigate test failures, i.e. what information they are provided about
> the failures, I disagree that that patch series needs to be connected to
> my patch series in any way.
Our two set of patches change different parts of the CI UX, so no. The
set of patches I've been proposing isn't just making CI/PR
definitions/scripts simpler, although it also does that.
So e.g. in your patches you need to massage the CI output to split the
"build" step from the "test" step. As you can see in an earlier RFC
re-roll of them on top of my topic that something you'd get for free:
https://lore.kernel.org/git/RFC-cover-v5-00.10-00000000000-20220421T183001Z-avarab@gmail.com/
> Further, the result does not look like a simplification to me. For
> example, I consider it an absolute no-go to remove the remnants of Azure
> Pipelines support. As I had hinted, and as you saw on the git-security
> list, I require this support for embargoed releases. That’s what I did
> when working on the patches that made it into v2.35.2. In my book,
> removing such vital (if dormant) code is not a simplification, but a
> Chesterton’s Fence. While we do not need to use Azure Pipelines for our
> regular CI, we definitely need it for embargoed releases. “Simply revert
> it back” is not an excuse for removing something that should not be
> removed in the first place.
Can you please reply to this 3 month old and still-waiting-on-your-reply
E-Mail on this topic so we can figure out a way forward with this:
https://lore.kernel.org/git/220222.86y2236ndp.gmgdl@evledraar.gmail.com/
> As another example where I have a different concept of what constitutes
> “simple”: In Git for Windows’ fork, we carry a patch that integrates the
> `git-subtree` tests into the CI builds. This patch touches two places,
> `ci/run-build-and-tests.sh` and `ci/run-test-slice.sh`. These changes
> would be inherited by any CI definition that uses the scripts in `ci/`.
> With the proposed patches, there are four places to patch, and they are
> all limited to the GitHub workflow definition. Since you asked me for my
> assessment: this is de-DRYing the code, making it more cumbersome instead
> of simpler.
No, you'd still have two places to patch:
1. The top-level Makefile to have "make test" run those subtree tests
depending on some flag, i.e. the same as your
ci/run-build-and-tests.sh.
2. ci/run-test-slice.sh as before (which is only needed for the
Windows-specific tests).
Because we'd be having the Makefile drive the logic you could also run
such a "make test" locally, which is something we should have
anyway. E.g. when I build my own git I run the subtree tests, and would
like to eventually make "run contrib tests too" some configurable
option.
So it is exactly the DRY principle. By avoiding making things needlessly
CI-specific we can just control this behavior with flags, both in and
outside CI.
> In other words, I have fundamental objections about the approach and about
> tying it to the patches that improve the output of Git’s CI/PR runs.
I would too if after my series you needed to patch every place we run
"make test" or whatever to run your subtree tests, but as noted above
that's not the case. So hopefully this addresses that.
More generally: I noted a while ago that if you pointed out issues like
that I'd be happy to address them for you. Based on this I see
d08496f2c40 (ci: run `contrib/subtree` tests in CI builds, 2021-08-05),
and that would be easy to generalize.
>> In Chrome/Firefox the time to load the page (as in the spinner stops,
>> and we "focus" on the right content) is:
>>
>> JS: ~60s / ~80s
>> Æ: ~25s / ~18s
>
> My focus is on the experience of occasional and new contributors who need
> to investigate test failures in the CI/PR runs. In this thread, we already
> discussed the balance between speed of loading the page on the one hand
> and how well the reader is guided toward the relevant parts on the other
> hand.
First, your re-roll claims thta it "improves the time to load pages",
but based on the sort of testing I'd done before when I reported the
severe slowness introduced by this topic I can't reproduce that.
So how exactly are you testing the performance of these load times, and
can you share the numbers you have for master, your previous iteration
and this re-roll?
> I disagree with you that the former should be prioritized over the
> latter, on the contrary, guiding the readers along a path to success is
> much more important than optimizing for a quick page load.
I think a better UX is certainly worth some cost to load times, so I'm
not trying to be difficult in saying that this costs us some
milliseconds so it's a no-go.
But really, this is making it so slow that it's borderline unusable.
The main way I use this interface is that I'll get an E-Mail with a
failure report, or see the "X" in the UX and click through to the
failure, then see the logs etc, and hopefully be able to see from that
what's wrong, or how I could begin to reproduce it.
Right now that's fast enough that I'll do that all in one browser
click-through session, but if I'm having to wait *more than a minute*
v.s. the current 10-20 seconds (which is already quite bad)?
Your latest series also seems to either be buggy (or trigger some bug in
GitHub Actions?) where even after that minute you'll see almost nothing
on your screen. So a user who doesn't know the UX would end up waiting
much longer than that.
You seemingly need to know that it's done when it shows you that blank
screen, and trigger a re-render by scrolling up or down, which will show
you your actual failures.
That's not an issue I saw in any iteration of this before this v3.
> Most contributors who chimed in seemed to not mind a longer page load time
> anyway, as long as the result would help them identify quickly what causes
> the test failures.
Wasn't much of that discussion a follow-up to your initial demos of this
topic?
I don't think those were as slow as what I'm pointing out above, which I
think is just because those failures happened to involve much fewer
lines of log. The slowness seems to be at correlated with how many lines
we're dealing with in total.
> Besides, the page load times are only likely to become
> better anyway, as GitHub engineers continuously improve Actions.
Sure, and if this were all magically made better by GH engineers these
concerns would be addressed.
But right now that isn't the case, and we don't know if/when that would
happen, so we need to review these proposed changes on the basis of how
they'd change the current GitHub CI UX overall. |
@@ -0,0 +1,54 @@ | |||
# Library of functions to mark up test scripts' output suitable for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):
On Sat, May 21 2022, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> [...]
> Co-authored-by: Victoria Dye <vdye@github.com>
Missing SOB here for Victoria.
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
This patch series was integrated into seen via git@d7d74b2. |
This patch series was integrated into seen via git@d2ee9d5. |
This patch series was integrated into seen via git@18bd8fd. |
This patch series was integrated into seen via git@e494d64. |
There was a status update in the "Cooking" section about the branch Update the GitHub workflow support to make it quicker to get to the failing test. Will merge to 'next'? source: <pull.1117.v3.git.1653171536.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@7cfbacf. |
This patch series was integrated into seen via git@dd7f543. |
This patch series was integrated into seen via git@b1eff10. |
This patch series was integrated into seen via git@8edcf4a. |
This patch series was integrated into next via git@bd37e9e. |
This patch series was integrated into seen via git@0a2e6f6. |
There was a status update in the "Cooking" section about the branch Update the GitHub workflow support to make it quicker to get to the failing test. Will merge to 'master'. source: <pull.1117.v3.git.1653171536.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@d86b507. |
This patch series was integrated into seen via git@fe88afc. |
This patch series was integrated into seen via git@d639d8f. |
This patch series was integrated into seen via git@c56db02. |
This patch series was integrated into seen via git@fc5a070. |
This patch series was integrated into master via git@fc5a070. |
This patch series was integrated into next via git@fc5a070. |
Closed via fc5a070. |
Changes since v2:
main
due resolve merge conflicts withab/test-tap-fix-for-immediate
.I cannot thank Victoria enough for the thorough investigation; It was exactly what I had hoped for, and if I had not been pulled into too many directions at once, I would have incorporated her suggestions and provided a new iteration much earlier.
It might not be all bad that this iteration had to wait a little longer, though: In the meantime, the errors on the summary page are now deep-linked into the part of the logs where the corresponding error message was generated (just click on the job name above the error message).
Note: I tried to add another patch that would turn GCC's compile errors into GitHub workflow commands that would list the error messages on the summary page. However, that would have required piping the output of
make
through ased
call, which in turn would have requiredset -o pipefail
(which is not supported by all the shells that are used in our CI). I even dabbled with using process substitution, but that made things even worse: thesed
process would continue outputting aftermake
was finished and after the::endgroup::
command, meaning that the output was garbled. I'll probably continue investigating at some stage, but for now I'll call my time-boxed experiment a wash.Changes since v1:
MAKE_TARGETS
, a stale comment about that variable is also removed.set -x
has been adjusted because it no longer applies as-is.Background
Using CI and in general making it easier for new contributors is an area I'm passionate about, and one I'd like to see improved.
The current situation
Let me walk you through the current experience when a PR build fails: I get a notification mail that only says that a certain job failed. There's no indication of which test failed (or was it the build?). I can click on a link at it takes me to the workflow run. Once there, all it says is "Process completed with exit code 1", or even "code 2". Sure, I can click on one of the failed jobs. It even expands the failed step's log (collapsing the other steps). And what do I see there?
Let's look at an example of a failed
linux-clang (ubuntu-latest)
job:That's it. I count myself lucky not to be a new contributor being faced with something like this.
Now, since I am active in the Git project for a couple of days or so, I can make sense of the "TODO passed" label and know that for the purpose of fixing the build failures, I need to ignore this, and that I need to focus on the "Failed test" part instead.
I also know that I do not have to get myself an ubuntu-latest box just to reproduce the error, I do not even have to check out the code and run it just to learn what that "49" means.
I know, and I do not expect any new contributor, not even most seasoned contributors to know, that I have to patiently collapse the "Run ci/run-build-and-tests.sh" job's log, and instead expand the "Run ci/print-test-failures.sh" job log (which did not fail and hence does not draw any attention to it).
I know, and again: I do not expect many others to know this, that I then have to click into the "Search logs" box (not the regular web browser's search via Ctrl+F!) and type in "not ok" to find the log of the failed test case (and this might still be a "known broken" one that is marked via
test_expect_failure
and once again needs to be ignored).To be excessively clear: This is not a great experience!
Improved output
Our previous Azure Pipelines-based CI builds had a much nicer UI, one that even showed flaky tests, and trends e.g. how long the test cases ran. When I ported Git's CI over to GitHub workflows (to make CI more accessible to new contributors), I knew fully well that we would leave this very nice UI behind, and I had hoped that we would get something similar back via new, community-contributed GitHub Actions that can be used in GitHub workflows. However, most likely because we use a home-grown test framework implemented in opinionated POSIX shells scripts, that did not happen.
So I had a look at what standards exist e.g. when testing PowerShell modules, in the way of marking up their test output in GitHub workflows, and I was not disappointed: GitHub workflows support "grouping" of output lines, i.e. marking sections of the output as a group that is then collapsed by default and can be expanded. And it is this feature I've decided to use in this patch series, along with GitHub workflows' commands to display errors or notices that are also shown on the summary page of the workflow run. Now, in addition to "Process completed with exit code" on the summary page, we also read something like:
Even better, this message is a link, and following that, the reader is presented with something like this:
The "Failed test:" lines are colored in yellow to give a better visual clue about the logs' structure, the "Error:" label is colored in red to draw the attention to the important part of the log, and the "⏵" characters indicate that part of the log is collapsed and can be expanded by clicking on it.
To drill down, the reader merely needs to expand the test case's log by clicking on it, and then study the log. If needed (e.g. when the test case relies on side effects from previous test cases), the logs of preceding test cases can be expanded as well. In case the full log is needed, including the successful test cases, they are included in the artifacts that are attached to the CI/PR run.
Is this the best UI we can have for test failures in CI runs? I hope we can do better. Having said that, this patch series presents a pretty good start, and offers a basis for future improvements.
cc: Eric Sunshine sunshine@sunshineco.com
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Phillip Wood phillip.wood123@gmail.com
cc: Victoria Dye vdye@github.com