Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subtests #61201

Closed
pitrou opened this issue Jan 18, 2013 · 62 comments
Closed

subtests #61201

pitrou opened this issue Jan 18, 2013 · 62 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@pitrou
Copy link
Member

pitrou commented Jan 18, 2013

BPO 16997
Nosy @warsaw, @terryjreedy, @ncoghlan, @pitrou, @vstinner, @ezio-melotti, @merwok, @bitdancer, @voidspace, @florentx, @cjerdonek, @Julian, @ericsnowcurrently, @serhiy-storchaka
Files
  • subtests.patch
  • subtests2.patch
  • subtests3.patch
  • subtests4.patch
  • subtests5.patch
  • subtests6.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-03-20.19:24:01.578>
    created_at = <Date 2013-01-18.21:07:11.671>
    labels = ['type-feature', 'library']
    title = 'subtests'
    updated_at = <Date 2014-02-19.18:20:52.434>
    user = 'https://github.com/pitrou'

    bugs.python.org fields:

    activity = <Date 2014-02-19.18:20:52.434>
    actor = 'barry'
    assignee = 'none'
    closed = True
    closed_date = <Date 2013-03-20.19:24:01.578>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2013-01-18.21:07:11.671>
    creator = 'pitrou'
    dependencies = []
    files = ['28776', '28780', '28785', '28942', '29029', '29428']
    hgrepos = []
    issue_num = 16997
    keywords = ['patch']
    message_count = 62.0
    messages = ['180221', '180222', '180225', '180226', '180227', '180230', '180232', '180233', '180234', '180235', '180247', '180253', '180255', '180259', '180822', '180835', '180839', '180891', '180893', '180899', '180977', '180978', '180980', '180985', '180986', '180989', '180991', '181003', '181262', '181786', '181787', '181788', '181789', '181790', '181791', '181792', '181794', '181800', '181811', '181814', '181817', '181866', '181871', '181875', '181881', '181882', '181888', '181930', '181938', '182924', '183289', '183450', '183452', '183454', '183468', '183469', '183470', '184349', '184781', '184782', '211634', '211637']
    nosy_count = 25.0
    nosy_names = ['barry', 'terry.reedy', 'spiv', 'exarkun', 'ncoghlan', 'pitrou', 'vstinner', 'ezio.melotti', 'eric.araujo', 'Arfrever', 'r.david.murray', 'michael.foord', 'hpk', 'flox', 'fperez', 'chris.jerdonek', 'santoso.wijaya', 'nchauvat', 'Julian', 'abingham', 'python-dev', 'eric.snow', 'serhiy.storchaka', 'borja.ruiz', 'bfroehle']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue16997'
    versions = ['Python 3.4']

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 18, 2013

    subtests are a light alternative to parametered tests as in bpo-7897. They don't generate the tests for you, they simply allow to partition a given test case in several logical units. Meaning, when a subtest fails, the other subtests in the test will still run (and the failures will print their respective parameters).

    Concretely, running the follow example:

    class MyTest(unittest.TestCase):
    
        def test_b(self):
            """some test"""
            for i in range(2, 5):
                for j in range(0, 3):
                    with self.subTest(i=i, j=j):
                        self.assertNotEqual(i % 3, j)

    will give the following output:

    ======================================================================
    FAIL: test_b (main.MyTest) (i=2, j=2)
    some test
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "subtests.py", line 11, in test_b
        self.assertNotEqual(i % 3, j)
    AssertionError: 2 == 2

    ======================================================================
    FAIL: test_b (main.MyTest) (i=3, j=0)
    some test
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "subtests.py", line 11, in test_b
        self.assertNotEqual(i % 3, j)
    AssertionError: 0 == 0

    ======================================================================
    FAIL: test_b (main.MyTest) (i=4, j=1)
    some test
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "subtests.py", line 11, in test_b
        self.assertNotEqual(i % 3, j)
    AssertionError: 1 == 1

    @pitrou pitrou added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jan 18, 2013
    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 18, 2013

    Attaching patch.

    @serhiy-storchaka
    Copy link
    Member

    +1. I was going to suggest something similar for displaying the
    I was going to suggest something similar to display the clarification message in case of a fail:

        for arg, expected in [(...),...]:
            with self.somegoodname(msg="arg=%s"%arg):
                self.assertEqual(foo(arg), expected)

    But your idea is even better.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 18, 2013

    Since I was asked on IRC, an example of converting an existing test. It's quite trivial really:

    diff --git a/Lib/test/test_codecs.py b/Lib/test/test_codecs.py
    --- a/Lib/test/test_codecs.py
    +++ b/Lib/test/test_codecs.py
    @@ -630,9 +630,10 @@ class UTF16BETest(ReadTest, unittest.Tes
                 (b'\xdc\x00\x00A', '\ufffdA'),
             ]
             for raw, expected in tests:
    -            self.assertRaises(UnicodeDecodeError, codecs.utf_16_be_decode,
    -                              raw, 'strict', True)
    -            self.assertEqual(raw.decode('utf-16be', 'replace'), expected)
    +            with self.subTest(raw=raw, expected=expected):
    +                self.assertRaises(UnicodeDecodeError, codecs.utf_16_be_decode,
    +                                  raw, 'strict', True)
    +                self.assertEqual(raw.decode('utf-16be', 'replace'), expected)
     
         def test_nonbmp(self):
             self.assertEqual("\U00010203".encode(self.encoding),

    @ezio-melotti
    Copy link
    Member

    I like the idea, and I think this would be a useful addition to unittest.

    OTOH while this would be applicable to most of the tests (almost every test has a "for" loop to check valid/invalid values, or a few related "subtests" in the same test method), I'm not sure I would use it too often.
    The reason is that often there's a single bug that causes the failure(s), and most likely the error reported by unittest is the same (or anyway it's very similar) for all the subtests, so using subtests would just add more noise (at least for me).
    However it might be useful in case of sporadic failures or whenever having the full set of failures would be useful to diagnose the problem.

    @ncoghlan
    Copy link
    Contributor

    This looks very nice. For cases where you decide you don't want it, some
    kind of "fail fast" mechanism would be helpful (e.g. when you've seen the
    full set of failures, and are now just trying to resolve the first one)

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 18, 2013

    Updated patch make subtests play nice with unittest's failfast flag:
    http://docs.python.org/dev/library/unittest.html#cmdoption-unittest-f

    @cjerdonek
    Copy link
    Member

    Nice/elegant idea. A couple comments:

    (1) What will be the semantics of TestCase/subtest failures? Currently, it looks like each subtest failure registers as an additional failure, meaning that the number of test failures can exceed the number of test cases. For example (with a single test case with 2 subtests):

    $ ./python.exe test_subtest.py 
    FF

    ======================================================================
    FAIL: test (main.MyTests) (i=0)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "test_subtest.py", line 9, in test
        self.assertEqual(0, 1)
    AssertionError: 0 != 1

    ======================================================================
    FAIL: test (main.MyTests) (i=1)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "test_subtest.py", line 9, in test
        self.assertEqual(0, 1)
    AssertionError: 0 != 1

    Ran 1 test in 0.001s

    FAILED (failures=2)

    With the way I understand it, it seems like a subtest failure should register as a failure of the TestCase as a whole, unless the subtests should be enumerated and considered tests in their own right (in which case the total test count should reflect this).

    (2) Related to (1), it doesn't seem like decorators like expectedFailure are being handled correctly. For example:

        @unittest.expectedFailure
        def test(self):
            for i in range(2):
                with self.subTest(i=i):
                    self.assertEqual(i, 0)

    This results in:

    Ran 1 test in 0.001s
    
    FAILED (failures=1, unexpected successes=1)
    

    In other words, it seems like the decorator is being applied to each subtest as opposed to the test case as a whole (though actually, I think the first should read "expected failures=1"). It seems like one subtest failing should qualify as an expected failure, or are the semantics such that expectedFailure means that every subtest must fail?

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 19, 2013

    Le samedi 19 janvier 2013 à 00:33 +0000, Chris Jerdonek a écrit :

    With the way I understand it, it seems like a subtest failure should
    register as a failure of the TestCase as a whole, unless the subtests
    should be enumerated and considered tests in their own right (in which
    case the total test count should reflect this).

    It does register as a failure of the TestCase as a whole. Simply, a test
    case can encounter several failures: for example, one in the test method
    and one in the tearDown method.

    Perhaps unittest should be made to show better reporting, e.g. show the
    number of successful and failed tests. I suppose there is a reason for
    the current behaviour, though.

    This results in:

    Ran 1 test in 0.001s
    
    FAILED (failures=1, unexpected successes=1)
    

    In other words, it seems like the decorator is being applied to each
    subtest as opposed to the test case as a whole (though actually, I
    think the first should read "expected failures=1"). It seems like one
    subtest failing should qualify as an expected failure, or are the
    semantics such that expectedFailure means that every subtest must
    fail?

    I don't know. I never use expectedFailure. There doesn't seem to be any
    obviously preferable answer here.

    @cjerdonek
    Copy link
    Member

    Either way, something isn't right about how it's integrated now. With
    the patch, this:

        @unittest.expectedFailure
        def test(self):
            with self.subTest():
                self.assertEqual(0, 1)

    gives: FAILED (failures=1, unexpected successes=1)

    And this:

        @unittest.expectedFailure
        def test(self):
            with self.subTest():
                self.assertEqual(0, 0)

    gives: OK (unexpected successes=1)

    But without the patch, this:

        @unittest.expectedFailure
        def test(self):
            self.assertEqual(0, 1)

    gives: OK (expected failures=1)

    @ncoghlan
    Copy link
    Contributor

    I think we're going to have to separate out two counts in the metrics - the total number of tests (the current counts), and the total number of subtests (the executed subtest blocks). (Other parameterisation solutions can then choose whether to treat each pair of parameters as a distinct test case or as a subtest - historical solutions would appear as distinct test cases, while new approaches might choose to use the subtest machinery).

    The aggregation of subtest results to test case results would then be that the test case fails if either:

    • an assertion directly in the test case fails
    • an assertion fails in at least one subtest of that test case

    The interpretation of "expected failure" in a world with subtests is then clear: as long as at least one subtest or assertion fails, the decorator is satisfied that the expected test case failure is occurred.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 19, 2013

    The way expectedFailure is currently implemented (it's a decorator which knows nothing about test cases and test results, it only expects an exception to be raised by its callee), it's gonna be difficult to make it participate with subtests without breaking compatibility.

    (that said, I also think it's a useless feature)

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 19, 2013

    I think we're going to have to separate out two counts in the metrics

    • the total number of tests (the current counts), and the total number
      of subtests (the executed subtest blocks).

    This is a reasonable proposal. On the other hand, it was already the
    case that a single test could count for several failures (for example,
    an exception in the test method and another in tearDown). Therefore, I
    think it is independent from the subtest proposal and could be tackled
    separately (which would also help keep this issue focussed :-)).

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 19, 2013

    New patch attached:

    • makes _SubTest a TestCase subclass
    • clarifies test skipping for subtests (skipTest() only skips the subtest)
    • makes expected failures work as expected by resorting to a thread-local storage hack

    @cjerdonek
    Copy link
    Member

    After thinking about this more, it seems this lets you do two orthogonal things:

    1. Easily append data to failure messages coming from a block of asserts
    2. Continue running a test case after a failure from a block of asserts

    Both of these seem independently useful and more generally applicable, so it seems worth discussing them in isolation before exposing them only together. Maybe there is a nice API or combination of API's that lets you do one or the other or both.

    Also, for (1) above, I'm wondering about the choice to put the extra data in the id/shortDescription of a pseudo-TestCase instead of just adding it to the exception message, for example. Adding it to the message seems more consistent with unittest's current API. Including the info in a new name/id seems potentially to be misusing the concept of TestCase because the test names created from this API need not be unique, and the resulting tests are not addressable/runnable.

    Incidentally, I noticed that the runnability of TestCases was removed from the documentation in an unreviewed change shortly after the last patch was posted:

    -An instance of a :class:`TestCase`\ -derived class is an object that can
    -completely run a single test method, together with optional set-up and tidy-up
    -code.

    (from http://hg.python.org/cpython/rev/d1e6a48dfb0d#l1.111 )

    whereas subtest TestCases from the last patch are not runnable:

    +class _SubTest(TestCase):
    + ...
    + def runTest(self):
    + raise NotImplementedError("subtests cannot be run directly")

    A way around these issues would be to pass the original, runnable TestCase object to TestResult.errors, etc. instead of a pseudo-TestCase. Alternatively, subtests could be made independently addressable and runnable, but that route seems more challenging and less certain.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 28, 2013

    1. Easily append data to failure messages coming from a block of asserts
    2. Continue running a test case after a failure from a block of asserts

    Both of these seem independently useful and more generally applicable,

    I don't understand what you mean. 1 is pointless without 2 (if you let the exception bubble up, unittest already deals with it). 2 without 1 doesn't make much sense either (if you only want to silence an exception, a simple try...except will suffice).

    I'm wondering about the choice to put the extra data in the
    id/shortDescription of a pseudo-TestCase instead of just adding it to the
    exception message, for example. Adding it to the message seems more
    consistent with unittest's current API.

    If e.g. someone lists all failed tests without detailing the exception messages, it's better if subtests are disambiguished in that list.

    Also, changing the exception message might be confusing as readers expect the displayed message to be exactly str(exc).

    Incidentally, I noticed that the runnability of TestCases was removed from
    the documentation in an unreviewed change shortly after the last patch was > posted:

    runTest() is still mentioned in the TestCase constructor:
    http://docs.python.org/dev/library/unittest.html#unittest.TestCase

    but indeed it was removed from the module overview, because it doesn't
    correspond to modern unit testing practices. I guess it could still be covered in the API docs.

    Alternatively, subtests could be made independently addressable and
    runnable, but that route seems more challenging and less certain.

    I would say impossible, unless you know a way to run a with block in isolation ;-)

    @ncoghlan
    Copy link
    Contributor

    Right, if you want independently addressable/runnable, then you're back to parameterised tests as discussed in bpo-7897.

    What I like about Antoine's subtest idea is that I think it can be used to split the execution/reporting part of parameterised testing from the addressing/selection part.

    That is, while *this* patch doesn't make subtests addressable, a future enhancement or third party test runner could likely do so. (It wouldn't work with arbitrary subtests, it would only be for data driven variants like those described in bpo-7897)

    @cjerdonek
    Copy link
    Member

    > 1. Easily append data to failure messages coming from a block of asserts
    > 2. Continue running a test case after a failure from a block of asserts

    > Both of these seem independently useful and more generally applicable,

    I don't understand what you mean. 1 is pointless without 2 (if you let the exception bubble up, unittest already deals with it). 2 without 1 doesn't make much sense either (if you only want to silence an exception, a simple try...except will suffice).

    I'll explain. (1) is useful without (2) because it lets you add information to the failure data for a group of asserts without having to use the "msg" parameter in every call to assert(). This is useful, for example, if you're testing a number of cases in a loop (with the current behavior of ending the test on first failure), and it's not clear from the default exception message which iteration of the loop failed. Your original example is such a case (minus the part about continuing in case of failure). This use case is basically the one addressed by Serhiy's suggestion in this message:

    http://bugs.python.org/issue16997#msg180225

    (2) is useful without (1) if you'd like to get information about more than one assertion failure in a TestCase (just as in your proposal), but the assertions aren't necessarily coming from a "parametrization" or different iterations of a loop. With the proposed API, you'd do something like:

        with self.subTest():
            # First assertion
            ...
        with self.subTest():
            # Second assertion
            ...

    The difference here is that I wouldn't call these "subtests," and you don't need parameter data to know which assertion is at fault. The meaning here is more like "with self.continueTesting()".

    Also, changing the exception message might be confusing as readers expect the displayed message to be exactly str(exc).

    If the API is more like self.assert*()'s "msg" parameter which appends data to the usual exception, then it will be the same as what people are already used to. Also see the "longMessage" attribute (which defaults to True), which separates the extra message data from the default exception message:

    http://docs.python.org/dev/library/unittest.html#unittest.TestCase.longMessage

    Alternatively, subtests could be made independently addressable and
    runnable, but that route seems more challenging and less certain.

    I would say impossible, unless you know a way to run a with block in isolation ;-)

    I'm not advocating independent addressability/runnability of subtests or the following approach, but a naive way to do this would be to run the TestCase as usual, but skip over any subTest blocks if the parameter data isn't an exact match.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 29, 2013

    If the API is more like self.assert*()'s "msg" parameter which appends
    data to the usual exception, then it will be the same as what people
    are already used to.

    It might be a good idea to allow both this and the arbitrary parameter kwargs, then.

    I'm not advocating independent addressability/runnability of subtests
    or the following approach, but a naive way to do this would be to run
    the TestCase as usual, but skip over any subTest blocks if the
    parameter data isn't an exact match.

    Well, I still don't know how to skip a with block (short of raising an exception that will terminate the entire test).

    @ncoghlan
    Copy link
    Contributor

    I like the idea of the subTest API being something like:

        def subTest(self, _id, *, **params):

    However, I'd still factor that in to the reported test ID, not into the exception message.

    @voidspace
    Copy link
    Contributor

    I am concerned that this feature changes the TestResult API in a backwards incompatible way. There are (quite a few) custom TestResult objects that just implement the API and don't inherit from TestResult. I'd like to try this new code with (for example) the test result provided by the junitxml project and see what happens.

    I have a broader concern too. I have seen lots of requests for "test parameterization" and none *specifically* for sub tests. They are very similar mechanisms (with key differences), so I don't think we want *both* in unittest. If test parameterization is more powerful / flexible then I would rather see that *instead*.

    On the other hand if subtests are *better* then lets have them instead - but I'd like to see that discussion happen before we just jump into subtests.

    @voidspace
    Copy link
    Contributor

    Note, some brief discussion on the "testing in python" mailing list:

    http://lists.idyll.org/pipermail/testing-in-python/2013-January/005356.html

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 30, 2013

    I am concerned that this feature changes the TestResult API in a
    backwards incompatible way. There are (quite a few) custom TestResult
    objects that just implement the API and don't inherit from TestResult.
    I'd like to try this new code with (for example) the test result
    provided by the junitxml project and see what happens.

    Feel free to do it, of course :-)
    In any case, we can add a hook to the TestCase subclass so that
    specialized reporters can get at the parent TestCase.

    (and I'm not sure how it's backwards incompatible: as long as you don't
    use subtests, nothing should break - so existing software shouldn't be
    affected)

    I have a broader concern too. I have seen lots of requests for "test
    parameterization" and none *specifically* for sub tests. They are very
    similar mechanisms (with key differences), so I don't think we want
    *both* in unittest. If test parameterization is more powerful /
    flexible then I would rather see that *instead*.

    The underlying idea of subtests is "if you want to parameterize a test,
    here is a useful building block". Generic parameterization APIs are
    cumbersome and actually harder to write (and read!) than the
    corresponding for loop. With subtests, you just write your for loop
    and use a subtest to isolate each iteration's failures.

    @cjerdonek
    Copy link
    Member

    I am concerned that this feature changes the TestResult API in a backwards incompatible way.

    My suggestion to add the original TestCase object to TestResult.errors, etc. instead and add the extra failure data to the longDescription would address this concern, which is why I suggested it. The former is what currently happens with multiple failures per TestCase (e.g. in runTest() and tearDown()).

    The underlying idea of subtests is "if you want to parameterize a test,
    here is a useful building block".

    The current API doesn't seem like a good building block because it bundles orthogonal features (i.e. to add loop failure data to a block of asserts you have to use the continuance feature). Why not expose *those* as the building blocks? The API can be something like--

        with self.addMessage(msg):
            # Add msg to the longDescription of any assertion failure within.
        
        with self.continueTest(msg=''):
            # Keep running the TestCase on any assertion failure within.

    (The current subTest() is basically equivalent to continueTest() with a specialized message. It could be added, too, if desired.) Accepting a string message is more basic and flexible than allowing only a **kwargs dict, which seems a bit "cute" and specialized to me.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 30, 2013

    The current API doesn't seem like a good building block because it
    bundles orthogonal features (i.e. to add loop failure data to a block
    of asserts you have to use the continuance feature). Why not expose
    *those* as the building blocks? The API can be something like--

    with self.addMessage(msg):
        # Add msg to the longDescription of any assertion failure
    

    within.

    with self.continueTest(msg=''):
        # Keep running the TestCase on any assertion failure within.
    

    (The current subTest() is basically equivalent to continueTest() with
    a specialized message. It could be added, too, if desired.)
    Accepting a string message is more basic and flexible than allowing
    only a **kwargs dict, which seems a bit "cute" and specialized to me.

    I've already replied to all this.

    @cjerdonek
    Copy link
    Member

    I've already replied to all this.

    You didn't respond to the idea of exposing both features separately after saying you didn't understand what I meant and saying that they were pointless and didn't make sense. So I explained and also proposed a specific API to make the suggestion clearer and more concrete. These don't seem pointless to me at all.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 30, 2013

    You didn't respond to the idea of exposing both features separately
    after saying you didn't understand what I meant and saying that they
    were pointless and didn't make sense. So I explained and also
    proposed a specific API to make the suggestion clearer and more
    concrete.

    Well, suffice to say that I wasn't convinced at all. There are multiple
    use cases for subtests in the Python test suite, but I can't think of
    any for your proposed API separation. That's why I find it
    uninteresting.

    I'm making this proposal to solve a concrete issue, not in the interest
    of minimalism. "Building block" was to be understood in that sense. Unit
    testing is one of those areas where purity is a secondary concern.

    @ncoghlan
    Copy link
    Contributor

    Right. I have *heaps* of tests that would be very easy to migrate to
    Antoine's subtest API. A separate "addMessage" API could conceivably be
    helpful for debugging, but it's not the obvious improvement to my existing
    looping tests that subtests would be.

    @voidspace
    Copy link
    Contributor

    "However, I think you're making a mistaking by seeing them as
    *competing* APIs, rather than seeing subtests as a superior
    implementation strategy for the possible later introduction of a
    higher level parameterized tests API."

    Parameterized tests are done at test collection time and sub-tests at run time. You can't layer parameterization on top of subtests.

    @pitrou
    Copy link
    Member Author

    pitrou commented Feb 10, 2013

    Subtests break the current unittest api of suite.countTests() and I
    fear they will also break tools that use the existing test result api
    to generate junit xml for continuous integration.

    It depends how you define countTests(). sub-tests, as the name implies,
    are not independent test cases. As for breaking tools, I don't really
    understand what's special about junit xml. Why would a subtest failure
    be different from a test failure in that regard?

    @voidspace
    Copy link
    Contributor

    A comment from lifeless on IRC (Robert Collins):

    [12:15:46] <lifeless> please consider automated analysis. How can someone tell which test actually failed ?
    [12:15:55] <lifeless> How can they run just that test in future ?

    @voidspace
    Copy link
    Contributor

    My concern is that this re-uses the existing TestResult.add* methods in a different way (including calling addError multiple times). This can break existing tools.

    Fix suggested by lifeless on IRC. A sub test failure / success / exception calls the following TestResult method:

       addSubTest(test, sub_id, err_or_None)

    If we're using a TestResult object that doesn't have these methods (an "old" result objects) then the test can *stop* (i.e. revert to the old behaviour before sub tests existed).

    @voidspace
    Copy link
    Contributor

    And on the "superior implementation strategy", both nose and py.test used to have runtime test generation and both have deprecated them and moved to collection time parameterization. (But I guess we know better.)

    You don't need PEP-422 for parameterization. The TestLoader creates multiple test cases for the parameter sets.

    @bitdancer
    Copy link
    Member

    I don't really have strong feelings about this, but I will just note as a data point that I implemented parameterized tests for the email package, and have no interest myself in subtests. This is for exactly the collection time vs runtime reason that Michael is talking about (I always want to be able to run single tests by name).

    @pitrou
    Copy link
    Member Author

    pitrou commented Feb 10, 2013

    Here is a patch implementing Michael's and lifeless' proposed strategy.

    @cjerdonek
    Copy link
    Member

    I'm still opposed to exposing these features only together. Annotating the failure message with parametrization data is useful in its own right, but what if there are 500 subtests in a loop and you don't want 500 failures to be registered for that test case? This is related to Ezio's comment near the top about adding too much noise.

    addMessage was just one suggestion. A different, functionally equivalent suggestion would be to add a "failFast" (default: False) keyword parameter to subTest() or alternatively a "maxFailures" (default: None) keyword parameter.

    @cjerdonek
    Copy link
    Member

    It seems like the last patch (subtests5.patch) dropped the parameter data from the failure output as described in the docs. For example, the example in the docs yields the following:

    FAIL: test_even (__main__.NumbersTest)
    

    instead of the documented:

    FAIL: test_even (__main__.NumbersTest) (i=1)
    

    subtests4.patch is okay.

    @pitrou
    Copy link
    Member Author

    pitrou commented Feb 11, 2013

    It seems like the last patch (subtests5.patch) dropped the parameter
    data from the failure output as described in the docs. For example,
    the example in the docs yields the following:

    FAIL: test_even (__main__.NumbersTest)
    

    Weird, since there are unit tests for that. I'll take a look.

    @hpk
    Copy link
    Mannequin

    hpk mannequin commented Feb 11, 2013

    On Sun, Feb 10, 2013 at 12:41 PM, Antoine Pitrou <report@bugs.python.org>wrote:

    Antoine Pitrou added the comment:

    > Please don't commit I think we still need a discussion as to whether
    > subtests or paramaterized tests are a better approach. I certainly
    > don't think we need both and there are a lot of people asking for
    > parameterized tests.

    I think they don't cater to the same crowd. I see parameterized tests as
    primarily used by people who like adding formal complexity to their
    tests in the name of architectural elegance (decorators, non-intuitive
    constructs and other additional boilerplate). Subtests are meant to not
    get in the way. IMHO, this makes them more suitable for stdlib
    inclusion, while the testing-in-python people can still rely on their
    additional frameworks.

    Parametrized testing wasn't introduced because I or others like formal
    complexity. I invented the "yield" syntax that both pytest and nose still
    support and it was enhanced for several years until it was deemed not fit
    for a general purpose testing approach. More specifically, if you have
    functional parametrized tests, they each run relatively slow. People
    often want to re-run a single failing test or otherwise select tests which
    use a certain fixture, or send tests to different CPUs to speed up
    execution. That's all not possible with subtests or am i missing
    something?

    That being said, I have plans to support some form of "subtests" for pytest
    as well, as there are indeed cases where a more lightweight approach fits
    well, especially for unit- or integration tests where one doesn't care if a
    group of tests need to be re-run when working on fixing a failure to a
    single subtest. And where it's usually more about reporting, getting nice
    debuggable output on failures. I suspect the cases which Antoine refers
    satisfy this pattern.

    best,
    holger

    @hpk
    Copy link
    Mannequin

    hpk mannequin commented Feb 11, 2013

    On Sun, Feb 10, 2013 at 12:43 PM, Nick Coghlan <report@bugs.python.org>wrote:

    Nick Coghlan added the comment:

    You can use subtests to build parameterized tests, you can't use
    parameterized tests to build subtests.

    I doubt you can implement parametrized tests via subtests especially for
    functional testing and its fixture needs.

    The standard library can also

    be converted to using subtests *far* more readily than it could be
    converted to parameterized tests.

    I can see that and for this reason and the stdlib's use cases it might make
    sense to support it. The unittest module is also used in many other
    contexts so it shouldn't be the only verification criterium.

    best,
    holger

    @pitrou
    Copy link
    Member Author

    pitrou commented Feb 11, 2013

    what if there are 500 subtests in a loop and you don't want 500 failures to be
    registered for that test case?

    Parametered tests have the same issue. In this case you simply don't use subtests
    or test cases. On the other hand, the issue doesn't exist in most cases where you
    iterate over three or four different cases.

    addMessage was just one suggestion.

    It is quite orthogonal, actually, and could be added independently. Also, it is not clear how you would limit the addMessage to a subtest, rather than the whole test case.
    We could re-use testtools' addDetail idea, although their content-type thing
    is a bit heavyweight: I'd rather duck-type the API.

    http://testtools.readthedocs.org/en/latest/for-test-authors.html#details

    @cjerdonek
    Copy link
    Member

    > what if there are 500 subtests in a loop and you don't want 500 failures to be
    > registered for that test case?

    Parametered tests have the same issue. In this case you simply don't use subtests
    or test cases.

    Right, but then you lose out on both of the benefits documented for subtests:

    +Without using a subtest, execution would stop after the first failure,
    +and the error would be less easy to diagnose because the value of i
    +wouldn't be displayed::

    Why tie these together? I'm suggesting that we expose a way to benefit from the "easy to diagnose" portion without the "suspend stoppage" portion. (The way we do this doesn't have to be one of the ways I'm suggesting, though I've suggested a few.)

    > addMessage was just one suggestion.

    It is quite orthogonal, actually, and could be added independently. Also, it is not clear how you would limit the addMessage to a subtest, rather than the whole test case.

    It's not orthogonal because the way I suggested it, subTest() would be the composition of addMessage() and continueTest() context managers. (addMessage limits itself by being a context manager just like subTest.) So if we added addMessage() later, we would want to refactor subTest() to be using it. The equivalence would be something like the following:

        with self.subTest(msg=msg, i=i):
            self.assertEqual(i % 2, 0)
    
        with self.continueTest():
            with self.addMessage(msg, i=i):
                self.assertEqual(i % 2, 0)

    However, since it looks like we're going with changing the test case ID instead of putting the extra data only in the exception message (TestCase.longMessage) like I was suggesting before, I think adding a failFast=True or maxFailures=n parameter to subTest() has higher importance, e.g.

        with self.subTest(msg=msg, maxFailures=1, i=i):
            self.assertEqual(i % 2, 0)

    @spiv
    Copy link
    Mannequin

    spiv mannequin commented Feb 12, 2013

    googletest (an xUnit style C++ test framework) has an interesting feature: in addition to assertions like ASSERT_EQ(x, y) that stop the test, it has EXPECT_EQ(x, y) etc that will cause the test to fail without causing it to stop. I think this decoupling of “test failed” and “test execution stopped” is very useful. (Note this also implies a single test can have multiple failures, or if you prefer that a single test can have multiple messages attached to explain why its state is 'FAILED'.)

    I wouldn't like to see a duplication of all assert* methods as expect* methods, but there are alternatives. A single self.expectThat method that takes a value and a matcher, for instance.

    Or you could have a context manager:

    with self.continueOnFailure():
        self.assertEqual(x, y)

    In fact, I suppose that's more-or-less what the subtests patch offers? Except the subtests feature seems to want to get involved in knowing about parameters and the like too, which feels weird to me.

    Basically, I really don't like the “subtests” name, but if instead it's named something that directly says its only effect is that failures don't abort the test, then I'd be happy.

    @ncoghlan
    Copy link
    Contributor

    My day job these days is to work on the Beaker test system (http://beaker-project.org).

    I realised today that it actually includes a direct parallel to Antoine's proposed subtest concept: whereas in unittest, each test currently has exactly one result, in Beaker a given task may have *multiple* results. The overall result of the task is then based on the individual results (so if any result is a failure, the overall test is a failure).

    "tasks" are the smallest addressable unit in deciding what to run, but each task may report multiple results, allowing fine-grained reporting of what succeeded and what failed.

    That means there's a part of Antoine's patch I disagree with: the change to eliminate the derived "overall" result attached to the aggregate test. I think Beaker's model, where there's a single result for the overall task (so you can ignore the individual results if you don't care), and the individual results are reported separately (if you do care), will make it easier to provide something that integrates cleanly with existing test runners.

    The complexity involved in attempting to get expectedFailure() to behave as expected is also a strong indication that there are still problems with the way these results are being aggregated.

    @pitrou
    Copy link
    Member Author

    pitrou commented Mar 1, 2013

    That means there's a part of Antoine's patch I disagree with: the
    change to eliminate the derived "overall" result attached to the
    aggregate test.

    The patch doesn't eliminate it, there are even tests for it.
    (see the various call order tests)

    The complexity involved in attempting to get expectedFailure() to
    behave as expected is also a strong indication that there are still
    problems with the way these results are being aggregated.

    No, the complexity stems from the fact that the expectedFailure decorator knows nothing about the test running machinery and instead blindly raises an exception.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Mar 4, 2013

    I think I have figured out what bothers me about the expectedfailure changes, and they actually relate to how expectedfailure was implemented in the first place: I had previously assumed that decorator was an *annotating* decorator - that it set an attribute on the function to indicate it was expected to fail, and the test execution machinery then took that into account when deciding how to interpret the test result.

    The fact that it is instead a *wrapping* decorator is what made the addition of subtest support far more complicated than I expected.

    However, I'm wondering if it might still be possible to avoid the need for a thread local context to handle the combination of expected failures and subtests when we have access to the test caseby adding the annotation that I expected to be there in the first place. Can't we:

    1. Update the expectedfailure decorator to also set "_expecting_failure = True" on the wrapped test case function
    2. Update unittest.TestCase.__init__ to do: "self._expecting_failure = hasattr(testMethod, '__func__') and getattr(testMethod.__func__, '_expecting_failure', False)"
    3. Update unittest._SubTest.__init__ to do: "self._expecting_failure = test_case._expecting_failure"
    4. Update testPartExecutor to check "if test_case._expecting_failure:" instead of the current check of the thread local context

    @voidspace
    Copy link
    Contributor

    Getting rid of the thread local would be an improvement, and the change to how expected failures is done sounds good too.

    @pitrou
    Copy link
    Member Author

    pitrou commented Mar 4, 2013

    However, I'm wondering if it might still be possible to avoid the
    need for a thread local context to handle the combination of
    expected failures and subtests when we have access to the test
    caseby adding the annotation that I expected to be there in the
    first place.

    But that would break use cases where you use @expectedfailure on a
    function called by the test method, not directly on the test method
    itself. I don't really care about those use cases myself, but not
    breaking them is the reason I chose not to change the @expectedfailure
    implementation. I'll let Michael decide :-)

    @voidspace
    Copy link
    Contributor

    That's a use case that I'm not very *interested* in supporting personally - however it may well be a use case that was "designed in" and that others have a need for (I didn't implement expectedFailure support).

    I think expectedFailure should be used sparingly and for a utility function to be *able* to turn a failure into a expectedFailure sounds actively dangerous.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Mar 4, 2013

    The docs are fairly explicit about the intended use case: "Mark the test as an expected failure. If the test fails when run, the test is not counted as a failure." (from http://docs.python.org/3/library/unittest#unittest.expectedFailure)

    Nothing there about being able to call some other function and have doing so incidentally mark your test as an expected failure (which I actually consider highly *un*desirable behaviour)

    @voidspace
    Copy link
    Contributor

    So, if it's not documented behaviour I think it's fine to lose it.

    @pitrou
    Copy link
    Member Author

    pitrou commented Mar 16, 2013

    Updated patch simplifying the expectedFailure implementation, as suggested by Michael and Nick and Michael.

    (admire how test_socket was importing a private exception class!)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 20, 2013

    New changeset 5c09e1c57200 by Antoine Pitrou in branch 'default':
    Issue bpo-16997: unittest.TestCase now provides a subTest() context manager to procedurally generate, in an easy way, small test instances.
    http://hg.python.org/cpython/rev/5c09e1c57200

    @pitrou
    Copy link
    Member Author

    pitrou commented Mar 20, 2013

    Finally committed :) Thanks everyone for the reviews and suggestions.

    @pitrou pitrou closed this as completed Mar 20, 2013
    @warsaw
    Copy link
    Member

    warsaw commented Feb 19, 2014

    We just discovered that this change breaks testtools. I will file a new bug on that.

    @warsaw
    Copy link
    Member

    warsaw commented Feb 19, 2014

    See issue bpo-20687

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    This issue was closed.
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants