Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various extensions and improvements for asserts #12528

Merged
merged 41 commits into from
Jan 25, 2022

Conversation

bernt-matthias
Copy link
Contributor

@bernt-matthias bernt-matthias commented Sep 22, 2021

Changes to text, size, tabular, archive, and xml assertions:

  • Implement n, delta, min, max for all of them (previously only delta in has_size). I hope that we can save lots of test data files, since this can replace <output compare="sim_size">.
    • for xml only the new generalizing xml_element assertion and has_n_elements_with_path support n, delta, min, max
    • added new Bytes type to xsd to allow for bytes suffixes for all these nubers (might be useful also at other places)
  • Implement negate attribute

Changes to has_n_columns

wanted to use a has_n_columns for this file https://github.com/galaxyproject/tools-iuc/blob/master/tools/stacks2/test-data/populations/populations.sumstats.tsv which is not useful due to the leading comment lines which have different numbers of columns. This PR adds a new attribute comment to the tag which allows to ignore comment lines for the assertion.

Furthermore:

  • only non-empty lines are considered (this can be changed)

xml assertions

  • add xml_element assertion which generalizes all other xml assertions
  • all xml assertions are now implemented by a call to assert_xml_element

Tests and xsd changes

  • adds unit tests for all assertions
  • correctly represent assertions in xsd which allows for better linting
    • also show schema doc for assertions not as table but analogous to what is done for param (more readable), each with a table of allowed attributes
  • annotate types (int and float) for assertion functions
  • add linting of asserts

More

  • fix a bug in element_text_is and attribute_is: they also allowed only a prefix matching

TODO (input needed):

  • create a pr with the assert tests from dev to check if functionality has changed (only those that already worked, checking for len(x) == 1 to allow for reformulation)
  • command, stdout and stderr assertions should maybe only allow text (possibly also tabular) assertions?
    • would break backward compatibility
  • assertions with sub-assertions make no sense if negate=True
    • should be covered by linter
  • would actually be cool to allow n, delta, min, max to allow for 500k, 50M, ...
  • the introduced type annotations in the assertion functions are kind of odd, since the functions still get str (because the parser does not parse them to int,...),
    • but they are helpful for the linter which now uses the annotated types
    • which brings me to the question if the linter needs to check the types because they are now also annotated in xsd: -> xsd covers this .. is this enough?
  • some of the text assertions check for None (i.e. this is currently not consistent) .. but I have no idea what it is good for
    assert output is not None, "Checking has_text assertion on empty output (None)"
    .. we should use it in all or none of the assertions.
    • does it make sense to have this assert not affected by negate?
    • should be reflected in the xsd docs
    • This is most likely because the has_archive_member and element_text passed None to contained assertions. This has been changed .. so not necessary anymore
  • add delta and min / max to all assertions having n/value
    • text, column, general
    • archive
    • xml
      • only for xml_element assertion and has_n_elements_with_path
  • implement negate
    • text, column, general
    • archive
    • xml

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

@github-actions github-actions bot added this to the 22.01 milestone Sep 22, 2021
@bernt-matthias bernt-matthias changed the title make has_n_columns check only non-empty and non-comment lines allow has_n_columns assertion to ignore comment lines Sep 22, 2021
@bernt-matthias bernt-matthias changed the title allow has_n_columns assertion to ignore comment lines Extend functionality of has_n_columns, has_size and has_n_lines assertions Sep 23, 2021
@jmchilton
Copy link
Member

Amazing!

@simonbray
Copy link
Member

I'm not sure I see the benefit of delta_frac for has_size and has_n_lines - I did consider implementing it at one point. Isn't it always possible to set delta to n/delta_frac, for whatever value of delta_frac you want? E.g. if n is 200 and you want delta_frac to be 10, you can set delta to 20.

For sim_size it is different because you do not necessarily know the value of n.

@bernt-matthias
Copy link
Contributor Author

I'm not sure I see the benefit of delta_frac for has_size and has_n_lines - I did consider implementing it at one point. Isn't it always possible to set delta to n/delta_frac, for whatever value of delta_frac you want? E.g. if n is 200 and you want delta_frac to be 10, you can set delta to 20.

For sim_size it is different because you do not necessarily know the value of n.

You have a point here .. but for sim_size we also know the size of the file in test-data, or?

Maybe we can leave it because it makes it a bit easier for tool developers to switch from sim_size to assertions.

@bernt-matthias
Copy link
Contributor Author

While we are at it we can also think about other useful extensions. Just thought that asserting a min/max number of lines/size/columns might be nice. But these could be separate new assertions as well.

@bernt-matthias
Copy link
Contributor Author

Ping @jmchilton since you wrote this TODO

# TODO: Verify all needed function arguments are specified.
I would like to ask if having a linter now for the asserts is sufficient, or if I should implement something along the lines of
assert_function_sig = signature(asserts.assertion_functions[assert_function_name])

@bernt-matthias bernt-matthias changed the title Extend functionality of has_n_columns, has_size and has_n_lines assertions Extend functionality of has_n_columns, has_size and has_n_lines assertions; and other improvements for asserts Oct 20, 2021
@bernt-matthias bernt-matthias changed the title Extend functionality of has_n_columns, has_size and has_n_lines assertions; and other improvements for asserts Extend functionality of has_n_columns, has_size and has_n_lines assertions; add linting for assertions and other improvements for asserts Oct 21, 2021
@bernt-matthias bernt-matthias force-pushed the topic/n-col-assert branch 9 times, most recently from f52a798 to 44ba8d0 Compare December 22, 2021 10:04
@bernt-matthias
Copy link
Contributor Author

@simonbray I removed delta_frac for now. But we it should be easy to readd.

For the sim_size test my rationale was: for test data of different sizes

  • the lazy tool developer can copy paste assertions and use the same delta_frac unchanged
  • even better it can be used in macros

.. delta would need to change with the size of the test data.

For the assertions the argumentation does not work that well. Because the expected size n/value needs to be set in the tool xml as well.

@simonbray
Copy link
Member

I removed delta_frac for now. But we it should be easy to readd.

I guess it is no problem to add it, if it is more intuitive to use than the way I suggested.

@bernt-matthias bernt-matthias force-pushed the topic/n-col-assert branch 2 times, most recently from 5f25d9e to fbe06fb Compare December 24, 2021 09:40
Copy link
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this is ready to merge @bernt-matthias ?

@bernt-matthias
Copy link
Contributor Author

@mvdbeek .. nearly. I still need to execute my first TODO, i.e. check if I did not change functionality of the assertions: by creating a backport PR of the new linter tests (minus those that check new arguments)

Guess I can do this today and will ping you. Would also be handy for the other linter PR (guess this needs a bit of merging).

@@ -0,0 +1,1012 @@
import os
import shutil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing!

bernt-matthias added a commit to bernt-matthias/galaxy that referenced this pull request Jan 25, 2022
@bernt-matthias
Copy link
Contributor Author

I opened a PR applying most of the new tests on the old assertion code #13241

There a few tests failing. In my opinion all of these cases are bugs of the old code .. but other people might think its change of functionality. So here are the cases and my explanation

element_text_is failure

The text of the ./elem/more tags is BAR and BAZ. So the
following should fail:

<assert_contents>
	<element_text_is path="./elem/more" text="BA"/>
</assert_contents>

It did not because

assert_element_text_is(output, path, text) just called
assert_element_text_matches(output, path, re.escape(text))
and the matching function allows for prefixes.

Now re.escape(text) + '$' is used which fixes the bug.

element_text failure

<element_text path="./absent"> was successful for the test XML
even though ./absent is not present in it.
This worked because the assertion function just applies the
contained assertions (here there is none) to the text.

I think in the old code this would also work if there
would be certain subassertions (exactly those that check
for None at first), i.e. this would be inconsistent.

We can easily change the new implementation to also pass
on non-existent paths (and not to check subassertions). But
I think its more convenient since otherwise the tool dev
needs to add an assertion making sure that path exists
to make this useful.

has_archive_member non-archive

Applying a has_archive_member assertion on a non-archive
file resulted in a UnboundLocalError, because the contents
variable did not exist if unzipping & untaring failed.

Now the assertion just fails.

has_archive_member zip absent member

Applying a has_archive_member using a path that is not in
the archive failed with an AttributeError because
the contents variable was None if unzipping and untarring
of the member failed (so contents.read() does not work).

has_archive_member tar absent member

Same as above

has_archive_member tar non-file member

Here the used path matches only for a directory in the zipfile.
The old code failed with the same error as the previous two cases,
i.e. AttributeError due to contents.read().

This is due to an inconsistency how zip and tar handle
directories. In zip they return the empty string,
in the new implementation of the assertion this is now
emulated for tar as well.

has_archive_member zip with content assertion

If there are multiple members matching the path (note that
the path is an regular expression) then the assertion
did check only the first returned.

Since the order of the matches was undefined this easily
fails, like here.

In the new implementation the matching members are sorted.
Also there is a new attribute all which allows to
check the subassertions on all matching members.

@mvdbeek
Copy link
Member

mvdbeek commented Jan 25, 2022

Sooo, we merge ? :shipit:

@bernt-matthias
Copy link
Contributor Author

Fine with merging from my side.

@dannon dannon merged commit 2547da5 into galaxyproject:dev Jan 25, 2022
@github-actions
Copy link

This PR was merged without a "kind/" label, please correct.

@bernt-matthias bernt-matthias deleted the topic/n-col-assert branch January 25, 2022 14:19
bernt-matthias added a commit to bernt-matthias/galaxy that referenced this pull request Jan 26, 2022
since the allow extensions: (K|M|G|T|P|E)i

follow up on galaxyproject#12528
bernt-matthias added a commit to bernt-matthias/galaxy that referenced this pull request Jan 28, 2022
with the added typing -- in particular Optional[int] (ie Union[int, None]) --
the tests in the assertions would have gone much more complicated

since galaxyproject#12528 they are also covered
by the xsd and therefor not really needed
bernt-matthias added a commit to bernt-matthias/galaxy that referenced this pull request Feb 1, 2022
since the allow extensions: (K|M|G|T|P|E)i

follow up on galaxyproject#12528
bernt-matthias added a commit to bernt-matthias/galaxy that referenced this pull request Feb 1, 2022
with the added typing -- in particular Optional[int] (ie Union[int, None]) --
the tests in the assertions would have gone much more complicated

since galaxyproject#12528 they are also covered
by the xsd and therefor not really needed
bernt-matthias added a commit to bernt-matthias/planemo that referenced this pull request Aug 23, 2022
bernt-matthias added a commit to bernt-matthias/planemo that referenced this pull request Aug 23, 2022
bernt-matthias added a commit to bernt-matthias/planemo that referenced this pull request Aug 23, 2022
bernt-matthias added a commit to bernt-matthias/galaxy that referenced this pull request Mar 14, 2023
bernt-matthias added a commit to bernt-matthias/galaxy that referenced this pull request Jun 16, 2023
forgotten here galaxyproject#12528
but I guess we do not need to backport further since it only affects
the tool linting xsd check
bernt-matthias added a commit to bernt-matthias/galaxy that referenced this pull request Jun 16, 2023
forgotten here galaxyproject#12528
but I guess we do not need to backport further since it only affects
the tool linting xsd check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants