-
-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show who tests what #170
Comments
This is a very interesting idea, one that figleaf pioneered with "sections". Right now we don't collect this information. The trace function would have to be modified to walk up the stack to identify "the test", then that information would have to be stored somehow. Then the reporting would have to be changed to somehow display the information. That's three significant problems, but only three! Do you have ideas how to do them? |
Original comment by andrea crotti (Bitbucket: andrea_crotti, GitHub: Unknown) Well I need to dive more into the internals to suggest something that
which probably doesn't help much my idea, because I think we would need So for example silly_module.py silly_test.py: silly_test2.py: I should have that silly_func:0 = [silly_test.py:0, silly_test2:0] I'm afraid that it would be a awful lot of information to store if the For the reporting I imagine just to add a clickable button near every That should probably be the easier part, even if I'm not really a good |
I don't think we need to collect all the lines that test product lines, we need to collect the tests that test product lines, which reduces the data collection a bit, but it will still be a challenge. |
Original comment by andrea crotti (Bitbucket: andrea_crotti, GitHub: Unknown) For the tests you mean the code object of the test function? In that case I agree, because it should keep track of original file / line where it's defined if I remember correctly. Anyway another possible usecase which of this feature is checking if unit-tests are really unit tests. If I see for example that module a.py is tested by test_a.py but also |
Issue #185 was marked as a duplicate of this issue. |
Original comment by Kevin Qiu (Bitbucket: kevinjqiu, GitHub: kevinjqiu) I have a prototype that does just as the OP described: https://github.com/kevinjqiu/nostrils It's currently as a nosetest plugin but I'd love to see coverage.py do this. |
Issue #311 was marked as a duplicate of this issue. |
Original comment by Thomas Güttler (Bitbucket: thomas-guettler, GitHub: Unknown) I guess you need this data structure to implement this: I use django ORM since it is what I know best, but SQLAlchemy might be a better solution
This structure needs to be filled with every line that gets executed by coverage. A HTML Report could be created by this data. I guess this is really slow... but who cares? For me it would be enough to run |
Original comment by andrea crotti (Bitbucket: andrea_crotti, GitHub: Unknown) Yes that would be probably too slow agreed.. |
Original comment by Thomas Güttler (Bitbucket: thomas-guettler, GitHub: Unknown) About the ORM: Linus Torvald said sometimes: good programmers care about data structures. That's why I would implement this first. Yes, the execution time would increase a lot. But I don't think an alternative to sqlite would be much faster. And: This is not intended to be run every time. We can optimize later. |
If you are going to record which line was tested by each test, what will you do as the code shifts around due to insertion and deletion of lines? |
Original comment by xcombelle (Bitbucket: xcombelle, GitHub: xcombelle) Instead of inspecting the trace at each call to the trace function, I thought of something which could be faster: at the start of a test record on which test we are, and during the call of trace check which is the current test at the end of the test forget the current test |
Original comment by Florian Bruhin (Bitbucket: The-Compiler, GitHub: The-Compiler) @xcombelle that would work with extensible test frameworks (like pytest and nose), but how are you going to do this with e.g. unittest? |
Original comment by Tibor (Bitbucket: tibor_arpas, GitHub: Unknown) I think the conceptual problem here is, that coverage.py has avoided the concept of "test case" and "test". It's the job of a test runner to define, discover, instantiate, execute them.. And each test runner has a slightly different definition of what is a test and what not.. e.g. unittest has this definition of a test: methods of a unittest.TestCase subclass beginning with letters "test" Other test runners have different definitions... E.g. pytest is very flexible and you can configure almost anything to be a test.. The practical solution might be that coverage.py: @nedbat do you see b) as a challenge also, or were you referring to a) as not beeing easy? :) |
Original comment by Ronny Pfannschmidt (Bitbucket: RonnyPfannschmidt, GitHub: RonnyPfannschmidt) providing a contextmanager to record test setup/execution/teardlow would be nice then all test runner could extend uppon it |
Original comment by Paul Sargent (Bitbucket: PaulS, GitHub: PaulS) I do this kind of analysis in my day job all the time with other tools. What we normally do is store separate coverage results files for each test, and then we can do various bits of analysis like:
It all starts with having identifiable coverage for each test. |
Original comment by Thomas Güttler (Bitbucket: thomas-guettler, GitHub: Unknown) @PaulS you do this type of analysis in your day job? How do you do this? |
Original comment by Laurens Timmermans (Bitbucket: lauwe, GitHub: lauwe) @tibor_arpas : A while back I made a small proof of concept which basically does what you described under 'b'. I've uploaded the (extended) htmlcov of this proof of concept here. It basically provides a count ('covered by how many unique test-cases') and a heatmap kind of visualization to get an idea of which part of your code is touched most. The dropdown (called 'label') at the top and mouse-over in the column on the right allow selection/highlighting of test-cases. The test-suite and unittest.TestCase derived class which produced these results can be found here. The changes I made in coverage.py to support this are not there since they are really hacky and incomplete, but if anyone is interested; let me know. |
I think my preference would be to provide a plugin interface that would let a plugin define the boundaries between tests. In fact, it need not be "tests" at all. Perhaps someone wants to distinguish not between specific tests, but between directories of tests, or between unit and integration tests. Figleaf implemented a feature like this and called it generically, "sections". So the plugin could demarcate the tests (runners? callers? regions? sections? what's a good name?) any way it liked. Coverage.py can ship with a simple one that looks for test_* methods, for the common case. Any ideas about how to present the data? I'd like it to scale to 10k tests... |
@RonnyPfannschmidt I'd rather not rely on the test runners updating to add the feature, though if the test runners want to add support, it'd be good to offer it to them in a way that provides the best support. |
Original comment by Thomas Güttler (Bitbucket: thomas-guettler, GitHub: Unknown) @nedbat " Perhaps someone wants to distinguish not between specific tests, but between directories of tests, or between unit and integration tests" I think doing "Separation of concerns" here would be nice: First, collect the data as detailed as possible. This way both can be done: distinguish between tests methods and distinguish between directories/sections. |
Original comment by Paul Sargent (Bitbucket: PaulS, GitHub: PaulS) @thomas-guettler So my day job is verification of hardware designs, but really the fact that it's hardware is not important. We have tests and we have code under test. The analysis is done with the commercial hardware design tools we use, but the principles of what's done is relatively straight forward. Rather than put a lot of detail here, I've written a snippet |
@thomas-guettler I agree about separation of concerns. That one of the reasons I'm leaning toward a plugin approach: it isn't even clear to me that "test methods" is always the finest granularity we need. Some people use coverage.py without a test suite at all, and they may have their own idea about where interesting slices begin and end. BTW: I like the name "slice" for this concept. It's the same word as "string slicing", but I don't think that collision is a problem. "Segment" is similar, but not as nice. |
Original comment by Thomas Güttler (Bitbucket: thomas-guettler, GitHub: Unknown) @nedbat coverage.py usage without tests.... good catch. Yes, that was not on my mind. You are right, it should be flexible. method: stacktrace_to_??? (unsure how to call it) Input: stacktrace (list of nested method calls). The above use case would go down the stacktrace until it sees a method which starts with "test_....". |
I'd have to play around with possible plugin semantics. The challenge will be to support it in a way that doesn't require invoking a Python function too often, as that will kill performance. |
I appreciate the "make it work, then make it fast" approach. In the case of designing a plugin API, though, the details of the API could have a big effect on the speed. But I hear you: it could be fine for this to be only enabled occasionally, and slow is fine. @lauwe Hmm, "context" is a good (if boring!) word... :) |
Original comment by Chris Beaumont (Bitbucket: chris_beaumont, GitHub: Unknown) Hey there. I've been thinking about this issue lately, and thought it might be worth leaving some notes here. I've been working on a coverage wrapper called smother (https://github.com/chrisbeaumont/smother) based on the ideas I've seen on this ticket, @kevinjqiu's nostrils repo, and the experimental WTW code in coverage.py's source. A quick summary of smother's approach:
In answer to some of the questions in this thread:
This is primarily relevant for
The inspiration for smother was a 11K test suite of a 100K line legacy codebase, and is reasonably performant (negligible time overhead, a somewhat-ungainly 100MB data file that could easily be optimized for size, and ~5 sec query times). I've experimented with different visualizations of smother's CSV output, but ultimately found that the |
Original comment by Tibor (Bitbucket: tibor_arpas, GitHub: Unknown) @chris_beaumont For reference, I'll link here also http://testmon.org . I didn't have a time to look at smother yet. testmon uses a notion of "python code blocks" (probably something similar to smother "semantic region"). pytest-testmon also takes into account holes in the blocks. Which is described here in the second half: https://github.com/tarpas/pytest-testmon/wiki/Determining-affected-tests My anwher to the question:
would be that I think coverage.py doesn't need to care but if it really wants to it can store checksums of code blocks as implemented in testmon . |
I wrote a blog post laying out the challenges: http://nedbatchelder.com/blog/201612/who_tests_what.html |
Original comment by xcombelle (Bitbucket: xcombelle, GitHub: xcombelle) I don't get how you get the figure of C/4 more information to store. (I don't know neither how it is stored now) As I understand now you have to store all the lines executed. With new way you would have to store also the contexts where the line is executed so O(n) more information where n is the average number of simultaneous context. |
The way I've implemented the contexts so far, there is a separate data structure for each context. So I don't store a list of contexts for each line. Instead, each context has a subset of the line data. So the question is, what fraction of the full product's coverage will a single context be. I took a crude guess at 25%. Hence, C/4. |
Original comment by xcombelle (Bitbucket: xcombelle, GitHub: xcombelle) I realize both way to store data is equivalent and that the full project coverage with one test is heavily dependent on the grain of a test. For unittest, it is only a small part of the codebase which is tested by a test but for an integration test a bigger part is covered. So you are totally right that two order of magnitude of data might be necessary |
Original comment by Loic Dachary (Bitbucket: dachary, GitHub: dachary) For the record some discussions on that topic at https://bitbucket.org/ned/coveragepy/pull-requests/120/wip-list-of-contexts-instead-of-none/diff |
Original comment by Loic Dachary (Bitbucket: dachary, GitHub: dachary) For the record, a failed hack can be found at https://bitbucket.org/ned/coveragepy/pull-requests/121/wtw-draft/diff |
Original comment by Tibor (Bitbucket: tibor_arpas, GitHub: Unknown) How about, for every line, collecting filename and line number of previously executed line? This would be a set of course, which would grow only if there is new occurrence - new caller. No idea how much slower this would be. At the measurement stage it would be enough to just work with hash of filename or index into an array of course, so hopefully there is no difference to the current "wtw/context" drafts. Advantages:
|
Original comment by Tibor (Bitbucket: tibor_arpas, GitHub: Unknown) There is now a good UI designed for this feature: http://bit.ly/livetestdemo |
@tibor_arpas Do you have information about who made the UI, and what it is built on? |
how is the status of this? |
I'm currently working on switching to SQLite data storage, as part of this work: https://nedbatchelder.com/blog/201808/sqlite_data_storage_for_coveragepy.html . |
@zooko @massich @schettino72 @tarpas @xcombelle @ChrisBeaumont @lauwe @PaulS @sdamon @guettli @RonnyPfannschmidt @kevinjqiu This is now available in v5.0a3: https://nedbatchelder.com/blog/201810/who_tests_what_is_here.html Please let me know what you think! |
I think it's beyond awesome! It took me hours' to figure out how to get it working though Would you consider a PR to expand the documentation for this, with some examples? EDIT: realistically, not going to get a chance to do this any time soon I'm afraid |
I will gladly take a pull request for that. It's hard sometimes to step back and see things as a newcomer would, so you have expertise I don't. Thanks. |
Originally reported by andrea crotti (Bitbucket: andrea_crotti, GitHub: Unknown)
I was just using the awesome HTML report to see my test coverage and I had the following thought.
Wouldn't it be nice to be able to see easily what parts of the test suites are actually testing my code?
I guess that these information is collected while doing the annotation, right?
In this way we could actually see if the tests are actually good very easily, which is specially important when working with other people code.
The text was updated successfully, but these errors were encountered: