Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotate pytest failures with GitHub Actions #1707

Merged
merged 10 commits into from
Jun 25, 2020
Merged

Conversation

huonw
Copy link
Member

@huonw huonw commented Jun 22, 2020

This replaces the JUnit test summary annotation used on Buildkite with annotations on individual tests about the failures. In particular, a custom script walks through the JUnit XML file to find any failures or errors, and adds a summary of the failures to each test via the ::error file=...::... command https://help.github.com/en/actions/reference/workflow-commands-for-github-actions#setting-an-error-message

For reference, a JUnit XML file emitted by pytest looks something like:

<?xml version="1.0" ?>
<testsuites>
	<testsuite errors="4" failures="18" hostname="fv-az28" name="pytest" skipped="12" tests="890" time="916.014" timestamp="2020-06-22T01:56:26.278020">
		<testcase classname="tests.test_aaa_on_gpu" name="test_on_gpu_when_requested" time="0.000">
			<error message="STELLARGRAPH_MUST_USE_GPU is not set to 1, so a GPU does not have to be used" type="pytest.skip">tests\test_aaa_on_gpu.py:26: STELLARGRAPH_MUST_USE_GPU is not set to 1, so a GPU does not have to be used</error>
		</testcase>
		<testcase classname="tests.test_calibration" name="test_temperature_scaling_bad_input_type" time="3.927"/>
		<testcase classname="tests.test_calibration" name="test_temperature_scaling_fit_predict" time="11.547">
			<system-out>Using Early Stopping based on performance evaluated on given validation set.
</system-out>
		</testcase>
[...]
		<testcase classname="tests.datasets.test_datasets" name="test_cora_load[True-True-True]" time="0.798">
			<failure message="AssertionError: assert dtype('int64') == int  +  where dtype('int64') = Int64Index([  31336, 1061127, 1106406,   13195,   37879, 1126012, 1107140,\n            1102850,   31349, 1106418,\n    ...227, 1131180, 1128974, 1128975, 1128977,\n            1128978,  117328,   24043],\n           dtype='int64', length=2485).dtype  +    where Int64Index([  31336, 1061127, 1106406,   13195,   37879, 1126012, 1107140,\n            1102850,   31349, 1106418,\n    ...227, 1131180, 1128974, 1128975, 1128977,\n            1128978,  117328,   24043],\n           dtype='int64', length=2485) = &lt;bound method StellarGraph.nodes of &lt;stellargraph.core.graph.StellarDiGraph object at 0x000001EC160766D8&gt;&gt;()  +      where &lt;bound method StellarGraph.nodes of &lt;stellargraph.core.graph.StellarDiGraph object at 0x000001EC160766D8&gt;&gt; = &lt;stellargraph.core.graph.StellarDiGraph object at 0x000001EC160766D8&gt;.nodes">is_directed = True, largest_cc_only = True, subject_as_feature = True

    @pytest.mark.parametrize(&quot;is_directed&quot;, [False, True])
    @pytest.mark.parametrize(&quot;largest_cc_only&quot;, [False, True])
[...]    
&gt;       assert g.nodes().dtype == int
E       AssertionError: assert dtype('int64') == int
E        +  where dtype('int64') = Int64Index([  31336, 1061127, 1106406,   13195,   37879, 1126012, 1107140,\n            1102850,   31349, 1106418,\n    ...227, 1131180, 1128974, 1128975, 1128977,\n            1128978,  117328,   24043],\n           dtype='int64', length=2485).dtype
E        +    where Int64Index([  31336, 1061127, 1106406,   13195,   37879, 1126012, 1107140,\n            1102850,   31349, 1106418,\n    ...227, 1131180, 1128974, 1128975, 1128977,\n            1128978,  117328,   24043],\n           dtype='int64', length=2485) = &lt;bound method StellarGraph.nodes of &lt;stellargraph.core.graph.StellarDiGraph object at 0x000001EC160766D8&gt;&gt;()
E        +      where &lt;bound method StellarGraph.nodes of &lt;stellargraph.core.graph.StellarDiGraph object at 0x000001EC160766D8&gt;&gt; = &lt;stellargraph.core.graph.StellarDiGraph object at 0x000001EC160766D8&gt;.nodes

tests\datasets\test_datasets.py:194: AssertionError</failure>
		</testcase>
[...]
		<testcase classname="tests.utils.test_hyperbolic" name="test_poincare_ball_exp_specialisation" time="0.000">
			<error message="test setup failure">@pytest.fixture
    def seeded():
&gt;       seed = np.random.randint(2 ** 32)
[...]
&gt;   ???
E   ValueError: high is out of bounds for int32

_bounded_integers.pyx:1343: ValueError</error>
		</testcase>
[...]
	</testsuite>
</testsuites>

Note that <testcase> errors and failures are indicated by <failure message=...>...</failure> and <error message=...>...</error> child elements.

The JUnit files emitted by pytest don't include the test location at all, so we have to estimate them, by searching the stack trace (text contents of the <failure>/<error> element) for the first match for a <file>:<line>: pattern. This should generally be StellarGraph code, and will usually be the call or assert within the test function that had the error.

This is an optional extra, and it only triggers when tests fail, so this heuristic/approximate approach are okay. The code detects failures and highlight them, so that things can be fixed in future.

For instance, for https://github.com/stellargraph/stellargraph/pull/1707/checks?check_run_id=798011967 :

image

The annotations also are included for files that aren't modified. (Unfortunately, it seems there's more bugs here, because only a small number of the total number of failure annotations are actually listed.) For instance, if a change to code breaks an existing test:

image

(It's unclear to me why the cards are listed twice, even for a single OS/Python-version pair. I think it's a GitHub bug, because it's definitely only in the logs once: https://github.com/stellargraph/stellargraph/pull/1707/checks?check_run_id=798011967#step:8:31 )

Finally, as warned by codeclimate, the xml.etree.ElementTree module is apparently unsafe, for untrusted XML data. This doesn't concern us particularly, because (a) the JUnit XML files are being generated by us and so are trusted, and (b) the problems of an attack aren't new (if someone is able to attack this part of the CI, they can already run arbitrary code in it).

See: #1687

scripts/ci/junit_to_checkstyle_xml.py Outdated Show resolved Hide resolved
scripts/ci/junit_to_checkstyle_xml.py Outdated Show resolved Hide resolved
@codeclimate
Copy link

codeclimate bot commented Jun 22, 2020

Code Climate has analyzed commit 03ae57e and detected 2 issues on this pull request.

Here's the issue category breakdown:

Category Count
Security 2

View more on Code Climate.

@huonw huonw force-pushed the feature/junit-annotations branch from 6efa2fc to 1ad1fda Compare June 22, 2020 23:37
@huonw huonw mentioned this pull request Jun 23, 2020
24 tasks
@huonw huonw marked this pull request as ready for review June 23, 2020 05:37
@huonw huonw requested a review from timpitman June 23, 2020 05:38
@huonw
Copy link
Member Author

huonw commented Jun 23, 2020

Only the changes to ci.yml and the new scripts/ci/junit_to_github_checks.py file are real. The example changes to gcn.py and test_graph.py will be reverted before landing.

Copy link
Contributor

@timpitman timpitman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good given the lack of mature plugins for github actions and pytest. There are many duplicate annotations in the sample test failure - a future extension could be to group and filter duplicates.

invalid = []
for testcase in root.findall(".//testcase"):
for child in testcase:
if child.tag in ("error", "failure"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this skip annotating warnings?

Copy link
Member Author

@huonw huonw Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it doesn't. Unfortunately, I don't think warnings are explicitly captured as a separate thing associated within each test. For instance, the summary of the tags and their attributes and the counts ([...]) in a random Pytest file I have lying around is:

<testsuites> [1] 
  <testsuite> [1] errors, failures, hostname, name, skipped, tests, time, timestamp
    <testcase> [890] classname, name, time
      <skipped> [12] message, type
      <system-out> [138] 
      <failure> [18] message
      <system-err> [31] 
      <error> [4] message

Also, at the moment we end up with a lot of warnings from tests, so we'd have a lot of noise here.

@huonw
Copy link
Member Author

huonw commented Jun 24, 2020

There are many duplicate annotations in the sample test failure - a future extension could be to group and filter duplicates

The duplicates are either different parameterisations of the test (which are vaguely useful to repeat?), or the seeming GitHub bugs where a single annotation is shown more than once.

@huonw huonw merged commit 7d49d92 into develop Jun 25, 2020
@huonw huonw deleted the feature/junit-annotations branch June 25, 2020 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants