-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
subtests #61201
Comments
subtests are a light alternative to parametered tests as in bpo-7897. They don't generate the tests for you, they simply allow to partition a given test case in several logical units. Meaning, when a subtest fails, the other subtests in the test will still run (and the failures will print their respective parameters). Concretely, running the follow example: class MyTest(unittest.TestCase):
def test_b(self):
"""some test"""
for i in range(2, 5):
for j in range(0, 3):
with self.subTest(i=i, j=j):
self.assertNotEqual(i % 3, j) will give the following output: ====================================================================== Traceback (most recent call last):
File "subtests.py", line 11, in test_b
self.assertNotEqual(i % 3, j)
AssertionError: 2 == 2 ====================================================================== Traceback (most recent call last):
File "subtests.py", line 11, in test_b
self.assertNotEqual(i % 3, j)
AssertionError: 0 == 0 ====================================================================== Traceback (most recent call last):
File "subtests.py", line 11, in test_b
self.assertNotEqual(i % 3, j)
AssertionError: 1 == 1 |
Attaching patch. |
+1. I was going to suggest something similar for displaying the for arg, expected in [(...),...]:
with self.somegoodname(msg="arg=%s"%arg):
self.assertEqual(foo(arg), expected) But your idea is even better. |
Since I was asked on IRC, an example of converting an existing test. It's quite trivial really: diff --git a/Lib/test/test_codecs.py b/Lib/test/test_codecs.py
--- a/Lib/test/test_codecs.py
+++ b/Lib/test/test_codecs.py
@@ -630,9 +630,10 @@ class UTF16BETest(ReadTest, unittest.Tes
(b'\xdc\x00\x00A', '\ufffdA'),
]
for raw, expected in tests:
- self.assertRaises(UnicodeDecodeError, codecs.utf_16_be_decode,
- raw, 'strict', True)
- self.assertEqual(raw.decode('utf-16be', 'replace'), expected)
+ with self.subTest(raw=raw, expected=expected):
+ self.assertRaises(UnicodeDecodeError, codecs.utf_16_be_decode,
+ raw, 'strict', True)
+ self.assertEqual(raw.decode('utf-16be', 'replace'), expected)
def test_nonbmp(self):
self.assertEqual("\U00010203".encode(self.encoding), |
I like the idea, and I think this would be a useful addition to unittest. OTOH while this would be applicable to most of the tests (almost every test has a "for" loop to check valid/invalid values, or a few related "subtests" in the same test method), I'm not sure I would use it too often. |
This looks very nice. For cases where you decide you don't want it, some |
Updated patch make subtests play nice with unittest's failfast flag: |
Nice/elegant idea. A couple comments: (1) What will be the semantics of TestCase/subtest failures? Currently, it looks like each subtest failure registers as an additional failure, meaning that the number of test failures can exceed the number of test cases. For example (with a single test case with 2 subtests): $ ./python.exe test_subtest.py
FF ====================================================================== Traceback (most recent call last):
File "test_subtest.py", line 9, in test
self.assertEqual(0, 1)
AssertionError: 0 != 1 ====================================================================== Traceback (most recent call last):
File "test_subtest.py", line 9, in test
self.assertEqual(0, 1)
AssertionError: 0 != 1 Ran 1 test in 0.001s FAILED (failures=2) With the way I understand it, it seems like a subtest failure should register as a failure of the TestCase as a whole, unless the subtests should be enumerated and considered tests in their own right (in which case the total test count should reflect this). (2) Related to (1), it doesn't seem like decorators like expectedFailure are being handled correctly. For example: @unittest.expectedFailure
def test(self):
for i in range(2):
with self.subTest(i=i):
self.assertEqual(i, 0) This results in:
In other words, it seems like the decorator is being applied to each subtest as opposed to the test case as a whole (though actually, I think the first should read "expected failures=1"). It seems like one subtest failing should qualify as an expected failure, or are the semantics such that expectedFailure means that every subtest must fail? |
Le samedi 19 janvier 2013 à 00:33 +0000, Chris Jerdonek a écrit :
It does register as a failure of the TestCase as a whole. Simply, a test Perhaps unittest should be made to show better reporting, e.g. show the
I don't know. I never use expectedFailure. There doesn't seem to be any |
Either way, something isn't right about how it's integrated now. With @unittest.expectedFailure
def test(self):
with self.subTest():
self.assertEqual(0, 1) gives: FAILED (failures=1, unexpected successes=1) And this: @unittest.expectedFailure
def test(self):
with self.subTest():
self.assertEqual(0, 0) gives: OK (unexpected successes=1) But without the patch, this: @unittest.expectedFailure
def test(self):
self.assertEqual(0, 1) gives: OK (expected failures=1) |
I think we're going to have to separate out two counts in the metrics - the total number of tests (the current counts), and the total number of subtests (the executed subtest blocks). (Other parameterisation solutions can then choose whether to treat each pair of parameters as a distinct test case or as a subtest - historical solutions would appear as distinct test cases, while new approaches might choose to use the subtest machinery). The aggregation of subtest results to test case results would then be that the test case fails if either:
The interpretation of "expected failure" in a world with subtests is then clear: as long as at least one subtest or assertion fails, the decorator is satisfied that the expected test case failure is occurred. |
The way expectedFailure is currently implemented (it's a decorator which knows nothing about test cases and test results, it only expects an exception to be raised by its callee), it's gonna be difficult to make it participate with subtests without breaking compatibility. (that said, I also think it's a useless feature) |
This is a reasonable proposal. On the other hand, it was already the |
New patch attached:
|
After thinking about this more, it seems this lets you do two orthogonal things:
Both of these seem independently useful and more generally applicable, so it seems worth discussing them in isolation before exposing them only together. Maybe there is a nice API or combination of API's that lets you do one or the other or both. Also, for (1) above, I'm wondering about the choice to put the extra data in the id/shortDescription of a pseudo-TestCase instead of just adding it to the exception message, for example. Adding it to the message seems more consistent with unittest's current API. Including the info in a new name/id seems potentially to be misusing the concept of TestCase because the test names created from this API need not be unique, and the resulting tests are not addressable/runnable. Incidentally, I noticed that the runnability of TestCases was removed from the documentation in an unreviewed change shortly after the last patch was posted: -An instance of a :class:`TestCase`\ -derived class is an object that can (from http://hg.python.org/cpython/rev/d1e6a48dfb0d#l1.111 ) whereas subtest TestCases from the last patch are not runnable: +class _SubTest(TestCase): A way around these issues would be to pass the original, runnable TestCase object to TestResult.errors, etc. instead of a pseudo-TestCase. Alternatively, subtests could be made independently addressable and runnable, but that route seems more challenging and less certain. |
I don't understand what you mean. 1 is pointless without 2 (if you let the exception bubble up, unittest already deals with it). 2 without 1 doesn't make much sense either (if you only want to silence an exception, a simple try...except will suffice).
If e.g. someone lists all failed tests without detailing the exception messages, it's better if subtests are disambiguished in that list. Also, changing the exception message might be confusing as readers expect the displayed message to be exactly str(exc).
runTest() is still mentioned in the TestCase constructor: but indeed it was removed from the module overview, because it doesn't
I would say impossible, unless you know a way to run a |
Right, if you want independently addressable/runnable, then you're back to parameterised tests as discussed in bpo-7897. What I like about Antoine's subtest idea is that I think it can be used to split the execution/reporting part of parameterised testing from the addressing/selection part. That is, while *this* patch doesn't make subtests addressable, a future enhancement or third party test runner could likely do so. (It wouldn't work with arbitrary subtests, it would only be for data driven variants like those described in bpo-7897) |
I'll explain. (1) is useful without (2) because it lets you add information to the failure data for a group of asserts without having to use the "msg" parameter in every call to assert(). This is useful, for example, if you're testing a number of cases in a loop (with the current behavior of ending the test on first failure), and it's not clear from the default exception message which iteration of the loop failed. Your original example is such a case (minus the part about continuing in case of failure). This use case is basically the one addressed by Serhiy's suggestion in this message: http://bugs.python.org/issue16997#msg180225 (2) is useful without (1) if you'd like to get information about more than one assertion failure in a TestCase (just as in your proposal), but the assertions aren't necessarily coming from a "parametrization" or different iterations of a loop. With the proposed API, you'd do something like: with self.subTest():
# First assertion
...
with self.subTest():
# Second assertion
... The difference here is that I wouldn't call these "subtests," and you don't need parameter data to know which assertion is at fault. The meaning here is more like "with self.continueTesting()".
If the API is more like self.assert*()'s "msg" parameter which appends data to the usual exception, then it will be the same as what people are already used to. Also see the "longMessage" attribute (which defaults to True), which separates the extra message data from the default exception message: http://docs.python.org/dev/library/unittest.html#unittest.TestCase.longMessage
I'm not advocating independent addressability/runnability of subtests or the following approach, but a naive way to do this would be to run the TestCase as usual, but skip over any subTest blocks if the parameter data isn't an exact match. |
It might be a good idea to allow both this and the arbitrary parameter kwargs, then.
Well, I still don't know how to skip a |
I like the idea of the subTest API being something like: def subTest(self, _id, *, **params): However, I'd still factor that in to the reported test ID, not into the exception message. |
I am concerned that this feature changes the TestResult API in a backwards incompatible way. There are (quite a few) custom TestResult objects that just implement the API and don't inherit from TestResult. I'd like to try this new code with (for example) the test result provided by the junitxml project and see what happens. I have a broader concern too. I have seen lots of requests for "test parameterization" and none *specifically* for sub tests. They are very similar mechanisms (with key differences), so I don't think we want *both* in unittest. If test parameterization is more powerful / flexible then I would rather see that *instead*. On the other hand if subtests are *better* then lets have them instead - but I'd like to see that discussion happen before we just jump into subtests. |
Note, some brief discussion on the "testing in python" mailing list: http://lists.idyll.org/pipermail/testing-in-python/2013-January/005356.html |
Feel free to do it, of course :-) (and I'm not sure how it's backwards incompatible: as long as you don't
The underlying idea of subtests is "if you want to parameterize a test, |
My suggestion to add the original TestCase object to TestResult.errors, etc. instead and add the extra failure data to the longDescription would address this concern, which is why I suggested it. The former is what currently happens with multiple failures per TestCase (e.g. in runTest() and tearDown()).
The current API doesn't seem like a good building block because it bundles orthogonal features (i.e. to add loop failure data to a block of asserts you have to use the continuance feature). Why not expose *those* as the building blocks? The API can be something like-- with self.addMessage(msg):
# Add msg to the longDescription of any assertion failure within.
with self.continueTest(msg=''):
# Keep running the TestCase on any assertion failure within. (The current subTest() is basically equivalent to continueTest() with a specialized message. It could be added, too, if desired.) Accepting a string message is more basic and flexible than allowing only a **kwargs dict, which seems a bit "cute" and specialized to me. |
I've already replied to all this. |
You didn't respond to the idea of exposing both features separately after saying you didn't understand what I meant and saying that they were pointless and didn't make sense. So I explained and also proposed a specific API to make the suggestion clearer and more concrete. These don't seem pointless to me at all. |
Well, suffice to say that I wasn't convinced at all. There are multiple I'm making this proposal to solve a concrete issue, not in the interest |
Right. I have *heaps* of tests that would be very easy to migrate to |
"However, I think you're making a mistaking by seeing them as Parameterized tests are done at test collection time and sub-tests at run time. You can't layer parameterization on top of subtests. |
It depends how you define countTests(). sub-tests, as the name implies, |
A comment from lifeless on IRC (Robert Collins): [12:15:46] <lifeless> please consider automated analysis. How can someone tell which test actually failed ? |
My concern is that this re-uses the existing TestResult.add* methods in a different way (including calling addError multiple times). This can break existing tools. Fix suggested by lifeless on IRC. A sub test failure / success / exception calls the following TestResult method: addSubTest(test, sub_id, err_or_None) If we're using a TestResult object that doesn't have these methods (an "old" result objects) then the test can *stop* (i.e. revert to the old behaviour before sub tests existed). |
And on the "superior implementation strategy", both nose and py.test used to have runtime test generation and both have deprecated them and moved to collection time parameterization. (But I guess we know better.) You don't need PEP-422 for parameterization. The TestLoader creates multiple test cases for the parameter sets. |
I don't really have strong feelings about this, but I will just note as a data point that I implemented parameterized tests for the email package, and have no interest myself in subtests. This is for exactly the collection time vs runtime reason that Michael is talking about (I always want to be able to run single tests by name). |
Here is a patch implementing Michael's and lifeless' proposed strategy. |
I'm still opposed to exposing these features only together. Annotating the failure message with parametrization data is useful in its own right, but what if there are 500 subtests in a loop and you don't want 500 failures to be registered for that test case? This is related to Ezio's comment near the top about adding too much noise. addMessage was just one suggestion. A different, functionally equivalent suggestion would be to add a "failFast" (default: False) keyword parameter to subTest() or alternatively a "maxFailures" (default: None) keyword parameter. |
It seems like the last patch (subtests5.patch) dropped the parameter data from the failure output as described in the docs. For example, the example in the docs yields the following:
instead of the documented:
subtests4.patch is okay. |
Weird, since there are unit tests for that. I'll take a look. |
On Sun, Feb 10, 2013 at 12:41 PM, Antoine Pitrou <report@bugs.python.org>wrote:
Parametrized testing wasn't introduced because I or others like formal That being said, I have plans to support some form of "subtests" for pytest best, |
On Sun, Feb 10, 2013 at 12:43 PM, Nick Coghlan <report@bugs.python.org>wrote:
I doubt you can implement parametrized tests via subtests especially for The standard library can also
I can see that and for this reason and the stdlib's use cases it might make best, |
Parametered tests have the same issue. In this case you simply don't use subtests
It is quite orthogonal, actually, and could be added independently. Also, it is not clear how you would limit the addMessage to a subtest, rather than the whole test case. http://testtools.readthedocs.org/en/latest/for-test-authors.html#details |
Right, but then you lose out on both of the benefits documented for subtests: +Without using a subtest, execution would stop after the first failure, Why tie these together? I'm suggesting that we expose a way to benefit from the "easy to diagnose" portion without the "suspend stoppage" portion. (The way we do this doesn't have to be one of the ways I'm suggesting, though I've suggested a few.)
It's not orthogonal because the way I suggested it, subTest() would be the composition of addMessage() and continueTest() context managers. (addMessage limits itself by being a context manager just like subTest.) So if we added addMessage() later, we would want to refactor subTest() to be using it. The equivalence would be something like the following: with self.subTest(msg=msg, i=i):
self.assertEqual(i % 2, 0)
with self.continueTest():
with self.addMessage(msg, i=i):
self.assertEqual(i % 2, 0) However, since it looks like we're going with changing the test case ID instead of putting the extra data only in the exception message (TestCase.longMessage) like I was suggesting before, I think adding a failFast=True or maxFailures=n parameter to subTest() has higher importance, e.g. with self.subTest(msg=msg, maxFailures=1, i=i):
self.assertEqual(i % 2, 0) |
googletest (an xUnit style C++ test framework) has an interesting feature: in addition to assertions like ASSERT_EQ(x, y) that stop the test, it has EXPECT_EQ(x, y) etc that will cause the test to fail without causing it to stop. I think this decoupling of “test failed” and “test execution stopped” is very useful. (Note this also implies a single test can have multiple failures, or if you prefer that a single test can have multiple messages attached to explain why its state is 'FAILED'.) I wouldn't like to see a duplication of all assert* methods as expect* methods, but there are alternatives. A single self.expectThat method that takes a value and a matcher, for instance. Or you could have a context manager: with self.continueOnFailure():
self.assertEqual(x, y) In fact, I suppose that's more-or-less what the subtests patch offers? Except the subtests feature seems to want to get involved in knowing about parameters and the like too, which feels weird to me. Basically, I really don't like the “subtests” name, but if instead it's named something that directly says its only effect is that failures don't abort the test, then I'd be happy. |
My day job these days is to work on the Beaker test system (http://beaker-project.org). I realised today that it actually includes a direct parallel to Antoine's proposed subtest concept: whereas in unittest, each test currently has exactly one result, in Beaker a given task may have *multiple* results. The overall result of the task is then based on the individual results (so if any result is a failure, the overall test is a failure). "tasks" are the smallest addressable unit in deciding what to run, but each task may report multiple results, allowing fine-grained reporting of what succeeded and what failed. That means there's a part of Antoine's patch I disagree with: the change to eliminate the derived "overall" result attached to the aggregate test. I think Beaker's model, where there's a single result for the overall task (so you can ignore the individual results if you don't care), and the individual results are reported separately (if you do care), will make it easier to provide something that integrates cleanly with existing test runners. The complexity involved in attempting to get expectedFailure() to behave as expected is also a strong indication that there are still problems with the way these results are being aggregated. |
The patch doesn't eliminate it, there are even tests for it.
No, the complexity stems from the fact that the expectedFailure decorator knows nothing about the test running machinery and instead blindly raises an exception. |
I think I have figured out what bothers me about the expectedfailure changes, and they actually relate to how expectedfailure was implemented in the first place: I had previously assumed that decorator was an *annotating* decorator - that it set an attribute on the function to indicate it was expected to fail, and the test execution machinery then took that into account when deciding how to interpret the test result. The fact that it is instead a *wrapping* decorator is what made the addition of subtest support far more complicated than I expected. However, I'm wondering if it might still be possible to avoid the need for a thread local context to handle the combination of expected failures and subtests when we have access to the test caseby adding the annotation that I expected to be there in the first place. Can't we:
|
Getting rid of the thread local would be an improvement, and the change to how expected failures is done sounds good too. |
But that would break use cases where you use @expectedfailure on a |
That's a use case that I'm not very *interested* in supporting personally - however it may well be a use case that was "designed in" and that others have a need for (I didn't implement expectedFailure support). I think expectedFailure should be used sparingly and for a utility function to be *able* to turn a failure into a expectedFailure sounds actively dangerous. |
The docs are fairly explicit about the intended use case: "Mark the test as an expected failure. If the test fails when run, the test is not counted as a failure." (from http://docs.python.org/3/library/unittest#unittest.expectedFailure) Nothing there about being able to call some other function and have doing so incidentally mark your test as an expected failure (which I actually consider highly *un*desirable behaviour) |
So, if it's not documented behaviour I think it's fine to lose it. |
Updated patch simplifying the expectedFailure implementation, as suggested by Michael and Nick and Michael. (admire how test_socket was importing a private exception class!) |
New changeset 5c09e1c57200 by Antoine Pitrou in branch 'default': |
Finally committed :) Thanks everyone for the reviews and suggestions. |
We just discovered that this change breaks testtools. I will file a new bug on that. |
See issue bpo-20687 |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: