Annotate pytest failures with GitHub Actions #1707

huonw · 2020-06-22T09:51:03Z

This replaces the JUnit test summary annotation used on Buildkite with annotations on individual tests about the failures. In particular, a custom script walks through the JUnit XML file to find any failures or errors, and adds a summary of the failures to each test via the ::error file=...::... command https://help.github.com/en/actions/reference/workflow-commands-for-github-actions#setting-an-error-message

For reference, a JUnit XML file emitted by pytest looks something like:

<?xml version="1.0" ?>
<testsuites>
	<testsuite errors="4" failures="18" hostname="fv-az28" name="pytest" skipped="12" tests="890" time="916.014" timestamp="2020-06-22T01:56:26.278020">
		<testcase classname="tests.test_aaa_on_gpu" name="test_on_gpu_when_requested" time="0.000">
			<error message="STELLARGRAPH_MUST_USE_GPU is not set to 1, so a GPU does not have to be used" type="pytest.skip">tests\test_aaa_on_gpu.py:26: STELLARGRAPH_MUST_USE_GPU is not set to 1, so a GPU does not have to be used</error>
		</testcase>
		<testcase classname="tests.test_calibration" name="test_temperature_scaling_bad_input_type" time="3.927"/>
		<testcase classname="tests.test_calibration" name="test_temperature_scaling_fit_predict" time="11.547">
			<system-out>Using Early Stopping based on performance evaluated on given validation set.
</system-out>
		</testcase>
[...]
		<testcase classname="tests.datasets.test_datasets" name="test_cora_load[True-True-True]" time="0.798">
			<failure message="AssertionError: assert dtype('int64') == int  +  where dtype('int64') = Int64Index([  31336, 1061127, 1106406,   13195,   37879, 1126012, 1107140,\n            1102850,   31349, 1106418,\n    ...227, 1131180, 1128974, 1128975, 1128977,\n            1128978,  117328,   24043],\n           dtype='int64', length=2485).dtype  +    where Int64Index([  31336, 1061127, 1106406,   13195,   37879, 1126012, 1107140,\n            1102850,   31349, 1106418,\n    ...227, 1131180, 1128974, 1128975, 1128977,\n            1128978,  117328,   24043],\n           dtype='int64', length=2485) = &lt;bound method StellarGraph.nodes of &lt;stellargraph.core.graph.StellarDiGraph object at 0x000001EC160766D8&gt;&gt;()  +      where &lt;bound method StellarGraph.nodes of &lt;stellargraph.core.graph.StellarDiGraph object at 0x000001EC160766D8&gt;&gt; = &lt;stellargraph.core.graph.StellarDiGraph object at 0x000001EC160766D8&gt;.nodes">is_directed = True, largest_cc_only = True, subject_as_feature = True

    @pytest.mark.parametrize(&quot;is_directed&quot;, [False, True])
    @pytest.mark.parametrize(&quot;largest_cc_only&quot;, [False, True])
[...]    
&gt;       assert g.nodes().dtype == int
E       AssertionError: assert dtype('int64') == int
E        +  where dtype('int64') = Int64Index([  31336, 1061127, 1106406,   13195,   37879, 1126012, 1107140,\n            1102850,   31349, 1106418,\n    ...227, 1131180, 1128974, 1128975, 1128977,\n            1128978,  117328,   24043],\n           dtype='int64', length=2485).dtype
E        +    where Int64Index([  31336, 1061127, 1106406,   13195,   37879, 1126012, 1107140,\n            1102850,   31349, 1106418,\n    ...227, 1131180, 1128974, 1128975, 1128977,\n            1128978,  117328,   24043],\n           dtype='int64', length=2485) = &lt;bound method StellarGraph.nodes of &lt;stellargraph.core.graph.StellarDiGraph object at 0x000001EC160766D8&gt;&gt;()
E        +      where &lt;bound method StellarGraph.nodes of &lt;stellargraph.core.graph.StellarDiGraph object at 0x000001EC160766D8&gt;&gt; = &lt;stellargraph.core.graph.StellarDiGraph object at 0x000001EC160766D8&gt;.nodes

tests\datasets\test_datasets.py:194: AssertionError</failure>
		</testcase>
[...]
		<testcase classname="tests.utils.test_hyperbolic" name="test_poincare_ball_exp_specialisation" time="0.000">
			<error message="test setup failure">@pytest.fixture
    def seeded():
&gt;       seed = np.random.randint(2 ** 32)
[...]
&gt;   ???
E   ValueError: high is out of bounds for int32

_bounded_integers.pyx:1343: ValueError</error>
		</testcase>
[...]
	</testsuite>
</testsuites>

Note that <testcase> errors and failures are indicated by <failure message=...>...</failure> and <error message=...>...</error> child elements.

The JUnit files emitted by pytest don't include the test location at all, so we have to estimate them, by searching the stack trace (text contents of the <failure>/<error> element) for the first match for a <file>:<line>: pattern. This should generally be StellarGraph code, and will usually be the call or assert within the test function that had the error.

This is an optional extra, and it only triggers when tests fail, so this heuristic/approximate approach are okay. The code detects failures and highlight them, so that things can be fixed in future.

For instance, for https://github.com/stellargraph/stellargraph/pull/1707/checks?check_run_id=798011967 :

The annotations also are included for files that aren't modified. (Unfortunately, it seems there's more bugs here, because only a small number of the total number of failure annotations are actually listed.) For instance, if a change to code breaks an existing test:

(It's unclear to me why the cards are listed twice, even for a single OS/Python-version pair. I think it's a GitHub bug, because it's definitely only in the logs once: https://github.com/stellargraph/stellargraph/pull/1707/checks?check_run_id=798011967#step:8:31 )

Finally, as warned by codeclimate, the xml.etree.ElementTree module is apparently unsafe, for untrusted XML data. This doesn't concern us particularly, because (a) the JUnit XML files are being generated by us and so are trusted, and (b) the problems of an attack aren't new (if someone is able to attack this part of the CI, they can already run arbitrary code in it).

See: #1687

scripts/ci/junit_to_checkstyle_xml.py

codeclimate · 2020-06-22T09:51:46Z

Code Climate has analyzed commit 03ae57e and detected 2 issues on this pull request.

Here's the issue category breakdown:

Category	Count
Security	2

View more on Code Climate.

huonw · 2020-06-23T05:38:54Z

Only the changes to ci.yml and the new scripts/ci/junit_to_github_checks.py file are real. The example changes to gcn.py and test_graph.py will be reverted before landing.

timpitman

This looks good given the lack of mature plugins for github actions and pytest. There are many duplicate annotations in the sample test failure - a future extension could be to group and filter duplicates.

timpitman · 2020-06-24T23:16:45Z

scripts/ci/junit_to_github_checks.py

+    invalid = []
+    for testcase in root.findall(".//testcase"):
+        for child in testcase:
+            if child.tag in ("error", "failure"):


does this skip annotating warnings?

Yeah it doesn't. Unfortunately, I don't think warnings are explicitly captured as a separate thing associated within each test. For instance, the summary of the tags and their attributes and the counts ([...]) in a random Pytest file I have lying around is:

<testsuites> [1] <testsuite> [1] errors, failures, hostname, name, skipped, tests, time, timestamp <testcase> [890] classname, name, time <skipped> [12] message, type <system-out> [138] <failure> [18] message <system-err> [31] <error> [4] message

Also, at the moment we end up with a lot of warnings from tests, so we'd have a lot of noise here.

huonw · 2020-06-24T23:37:30Z

There are many duplicate annotations in the sample test failure - a future extension could be to group and filter duplicates

The duplicates are either different parameterisations of the test (which are vaguely useful to repeat?), or the seeming GitHub bugs where a single annotation is shown more than once.

This reverts commit 58e783e.

codeclimate bot reviewed Jun 22, 2020

View reviewed changes

scripts/ci/junit_to_checkstyle_xml.py Outdated Show resolved Hide resolved

scripts/ci/junit_to_checkstyle_xml.py Outdated Show resolved Hide resolved

Annotate pytest failures with GitHub Actions

1ad1fda

huonw force-pushed the feature/junit-annotations branch from 6efa2fc to 1ad1fda Compare June 22, 2020 23:37

huonw added 5 commits June 23, 2020 09:50

Adjust annotation name

291c921

Revert-me: make tests fail

58e783e

One step for annotation

2ef50bd

Switch to printing

d0be6f3

Less output

43eae86

huonw mentioned this pull request Jun 23, 2020

Migrate CI to a free service #1687

Closed

24 tasks

huonw added 3 commits June 23, 2020 12:57

Correct line

f640716

Reword messages, etc.

087def0

Simplify to just search logs

6ab06be

huonw marked this pull request as ready for review June 23, 2020 05:37

huonw requested a review from timpitman June 23, 2020 05:38

timpitman approved these changes Jun 24, 2020

View reviewed changes

Revert "Revert-me: make tests fail"

03ae57e

This reverts commit 58e783e.

huonw merged commit 7d49d92 into develop Jun 25, 2020

huonw deleted the feature/junit-annotations branch June 25, 2020 05:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotate pytest failures with GitHub Actions #1707

Annotate pytest failures with GitHub Actions #1707

huonw commented Jun 22, 2020 •

edited

Loading

codeclimate bot commented Jun 22, 2020 •

edited

Loading

huonw commented Jun 23, 2020

timpitman left a comment

timpitman Jun 24, 2020

huonw Jun 24, 2020 •

edited

Loading

huonw commented Jun 24, 2020

Annotate pytest failures with GitHub Actions #1707

Annotate pytest failures with GitHub Actions #1707

Conversation

huonw commented Jun 22, 2020 • edited Loading

codeclimate bot commented Jun 22, 2020 • edited Loading

huonw commented Jun 23, 2020

timpitman left a comment

Choose a reason for hiding this comment

timpitman Jun 24, 2020

Choose a reason for hiding this comment

huonw Jun 24, 2020 • edited Loading

Choose a reason for hiding this comment

huonw commented Jun 24, 2020

huonw commented Jun 22, 2020 •

edited

Loading

codeclimate bot commented Jun 22, 2020 •

edited

Loading

huonw Jun 24, 2020 •

edited

Loading