-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test.py: handle the --markers pytest arguments #11713
Conversation
test/topology/pytest.ini
Outdated
@@ -4,3 +4,7 @@ asyncio_mode = auto | |||
log_cli = true | |||
log_format = %(asctime)s %(levelname)s %(message)s | |||
log_date_format = %Y-%m-%d %H:%M:%S | |||
|
|||
markers = | |||
medium: tests that take several minutes to run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we need to filter markers from the test.py command line by the markers from pytest.ini, since different suites can have different sets of markers. Or we should declare a common set of markers for all Python tests that run from test.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.pytest.org/en/7.1.x/example/markers.html#registering-markers
has the following note:
Note
It is recommended to explicitly register markers so that:
- There is one place in your test suite defining your markers
- Asking for existing markers via pytest --markers gives good output
- Typos in function markers are treated as an error if you use the --strict-markers option.
so it seems not required, only recommended, and if we don't use --strict-markers
then it's fine not to register them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pytest emits a warning if unregistered markers are used as decorators, and it makes sense - it's easy to mistype them. On the other hand, it doesn't care about unrecognized markers in the -m
command line switch, it just skips them. So there is actually no problem here - each directory may decide for itself whether to declare the markers in pytest.ini
or not.
Another question is the uniformity of markers in different directories. I suggested medium
and large
, the markers
line from pytest.ini
will need to be copy-pasted to other directories, but I guess we can postpone solving this problem until it really feels like a problem.
CI state |
test/topology/suite.yaml
Outdated
@@ -4,3 +4,4 @@ cluster_size: 3 | |||
extra_scylla_config_options: | |||
authenticator: AllowAllAuthenticator | |||
authorizer: AllowAllAuthorizer | |||
default_markers: not medium and not large |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do we plan to run these tests? We should run them regularly, if not during gating. Maybe the daily build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fruch Hi, long tests are starting to appear in test.py
, we want to run them separately from all other tests, like once a day. Could you advise how to do it correctly in Jenkins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talk with the releng team to help with such a thing
I can merge, unless you decide not to registers these markers explicitly (up to you). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thought (I am not rejecting this patch, I just want to raise a thought):
As I noted in the past, I have a problem with test.py continuing to grow and grow with code specific to the "topology" framework which should have been in the topology/ directory instead of test.py.
However, in this case you might be making the point that these "-k" and "-m" options would be common to all test suites, and not specific to topology, so they really belong in test.py. If this is the intention, I am afraid I don't understand how these options apply to other test suites - such as C++ tests, cql tests, and Alternator tests. Do you intend to support these options for those suites as well and you just didn't do it yet? Do you intend to support the full pytest syntax (like "a or b") as you described in comments? And if we add such options, should we really call them "key" and "marker", two concepts which don't exist in test.py (we just discussed earlier this week the 4 different meanings of "test name" in test.py, now we also have the term "key"?).
test/topology/pytest.ini
Outdated
|
||
markers = | ||
medium: tests that take several minutes to run | ||
large: tests that take more that ten minutes to run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Do you really want two different markers "medium" and "large"?
In test/alternator, running more than 600 tests takes less than 30 second, so I considered any test that takes on its own 30 seconds to be "very slow", and added just one marker - "veryslow". I think that a test that take 9 minutes and a test that takes 19 minutes is equally problematic, and don't want to run either of those during development. So I thought that one level of "slow" marker might be enough. But I don't have strong feelings in this matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong feelings here either, we just used medium
and large
at my previous job. I'm ok with just slow
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two designations "medium" and "large" are fine if you have in mind some scenario where you want to run all "medium" tests but not "large" tests - e.g., run "medium" tests every day but "large" tests every week, or run "medium" tests on every next promotion and "large" tests just once a day, or something like that. I think we should first consider if we have such a plan, and then can come up with tags that fit it.
By the way, ideally, we shouldn't need to have tags that say how long is each test, because the test framework can calculate the amount of time that each test (and here I mean "test" in the pytest sense, i.e., a test functions) takes on average. But we can leave this till much later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only scenario I have right now is to run several long tests daily instead of every commit, so I'll change these to just "slow".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only scenario I have right now is to run several long tests daily instead of every commit, so I'll change these to just "slow".
I liked that you tried to define what exactly slow/large/medium would mean in term of test length. In the test/cql-pytest world, I considered anything taking more than one or two seconds to be too slow for ordinary use by developers. In the test/boost world, we have several tests much slower than that that are tolerated by developers, perhaps because developers never run all the Boost tests themselves. So if this option would be used in practice in Jenkins or by developers, we will need to make a decision how slow the cutoff is and write it somewhere (like you wrote in your patch above for the medium/large definitions).
We discussed with Kostya two options for separating long tests:
We can choose the third option - take pytest syntax as a basis, promise in the future to implement it for all tests (perhaps partially), but for now do support it only in python. The parameters in this case should be called |
@gusev-p @kostja I don't like the first option (declare this functionality python specific) because if it's the case, and will remain the case, I think it should be out of test.py and into the python tests which support it. Moreover, it's not just about "supporting" these parameters, it's also the meaning of specific tags - it will not be helpful if you use the name "medium" in this patch while other existing tests use the name "verylong", we will need all the Python tests to agree on at least some common set of tags so it will be useful. The second and third options are both related to the same philosophy: You have to begin by making a "promise" (or at least state an intention) that eventually these abilities (to run one specific test function and to exclude test functions with a particular tag) will be supported in all test suites, include C++ tests, cql tests, Alternator tests, etc. Once you say that, when you go to explain the "syntax" of the -k option (for example) you need think carefully if you want the -k parameter be exactly the same thing as it is in pytest, or not. If it is to be the same syntax as pytest's "-k" , you should say this (you explained the syntax, but if I remember correctly, not that it's supposed to be based on pytest). |
@nyh @kostja I'm trying to decide how to go about this particular PR:
So regarding tags we could do the following:
|
7dffcbb
to
6ca40cd
Compare
v2:
|
CI state |
6ca40cd
to
e90c05d
Compare
CI state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I didn't understand the reason for the exit-code change, please explain it separately or rethink it.
- The commit message should say that --markers is "currently only supported by topology tests" - or mention a specific test suite class, because the phrase "Python tests" is inaccurate (it doesn't apply to test/cql-pytest, test/alternator, etc., which are also in Python).
- I'm still not very happy about introducing pytest-specific syntax to a test framework which isn't supposed to be only for pytest. I would have used a simpler syntax that will be easier to implement for non-pytest frameworks. BUT I think we can still merge your version as-is, and rethink this syntax later if we want. It's not like end-users will get used to this syntax and we can't change it later.
|
||
# https://docs.pytest.org/en/7.1.x/reference/exit-codes.html | ||
no_tests_selected_exit_code = 5 | ||
self.valid_exit_codes = [0, no_tests_selected_exit_code] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change deserves a separate explanation (and perhaps a separate patch). Pytest deliberately complains when there are no tests to run, because this usually means you did something wrong (e.g., asked to run only a specific test or marker which doesn't exist). So why do we want to mask this error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some subdirectories of scylla/test
may not contain @slow
tests, so running scylla/test.py --markers slow
will fail since a single run of test.py
executes pytest
for all subdirectories. A better way to handle this would be to check that among all runs of pytest
at least one returned exit code != 5
, but this is somewhat new behavior that must be handled for all types of tests, including C++ ones, and it is not obvious if it is worth doing, so I left this workaround for now. This problem hasn't arisen before since there was no way to filter running tests, and each directory contains at least one test.
The default should be to run everything. Otherwise, someone without special knowledge (me, continuous integration) will unknowingly skip tests. |
I understand what you're saying, but I'm afraid that the result of being too strict on this philosophy is that users will not run tests at all (like they do today with dtest - no developer runs those unless they are developing a dtest!), or perhaps developers will have a text file with a 100-character command (like I have for dtest) to run only the tests that they want to run. |
@avi The whole point of the markers is adding "long" tests to test.py, which are normally skipped. So the default is to skip the long tests at least. The CI is a program and can provide whatever defaults it needs. |
I understand that's the point, but I disagree with it. Default off = invisible to developers. It's better that a developer who doesn't know that the test can be skipped wait some more, than not get the benefit of the tests. If you want, you can add an advertisement, use Also, why do we have such long tests. |
@avikivity my view is that what you're saying is a "false dilemma" (https://en.wikipedia.org/wiki/False_dilemma) - you're basically saying that we have just two options: That the developer either runs all tests each time, or runs just the fast tests each time. But there is a third option - that the user never manually runs any tests at all, because it has become too inconvenient or frustrating to run them. I am worried that this is exactly what will happen if we allow very slow tests to enter the test suite. As to what exactly are "very slow" tests, I'm not sure we have an accurate number yet. But as an example, I gave the Alternator test suite, where all 700 tests run today (serially!) in around 30 seconds. A test suite that takes 30 seconds makes it easy for the developer to run the entire Alternator test suite after every small modification to Alternator. If we suddenly added 20 tests taking 30 seconds each, the time to run the test suite (serially) will increase from 30 seconds to 10 minutes. 10 minutes is no longer reasonable to wait after each small modifications. So developers will be tempted not to run the tests themselves at all - and let the CI run it for them, later. I claim that this is what most developers do regarding dtests nowadays (please correct me if I'm wrong) - they never run them manually and just assume the CI will run it. Continuing the Alternator example, I currently have 7 tests (out of 700) in the Alternator test suite marked with "veryslow" marker. These tests range between a few seconds, and a minute, in length. Running these tests (test/alternator/run --runveryslow) increases the entire Alternator test time on my laptop from 24 seconds to 278 seconds (a bit less than 5 minutes). I hope you agree that this is a big difference, although arguably, not a complete deal-breaker in this example (still "just" 5 minutes). You're right that it will be somewhat surprising if a test fails on CI after you thought you ran the entire test suite manually, but - the CI will tell you exactly which test failed, and you can then run this specific test manually (and you can also see that it is marked "slow" and see that this was why it didn't for you).
This is a very good, and very important question. One of the reasons why the Alternator tests are so fast is that I made a lot of effort to avoid long tests almost completely. And many times, if you keep the test shortness as a priority, you can indeed avoid very long tests. For example, you need to avoid arbitrary sleeps that plague our dtest suite, you need to configure Scylla to avoid its own long sleeps (e.g., we configure Alternator TTL to run the scanning thread every half a second, instead of every 24 hours), and you should pick minimal reproducers (if a bug reproduces with 10 rows of data, you don't need to fill the database with one millions rows which takes a lot of time in the test). But unfortunately, it's not always possible to avoid very long tests. One example I have in the Alternator test suite is checking that a certain item does NOT expire due to a TTL. It's easy to wait for an item to expire (and on a developer machine, it will happen in roughly half a second), but how much time do we wait for an item to not expire? Waiting half a second is not enough because of our ridiculously slow debug runs on Jenkins: In those runs, Alternator TTL expiration often takes as much as 60 seconds (!), so our wait for an item to not expire also needs to wait at least 60 seconds. So this test will take 60 seconds to run even on a fast laptop - more than twice than all other Alternator tests combined :-( Another cause of very long tests, which are more common in the Raft tests (I personally don't like this kind of tests so they are not common in my own tests), are tests which aim for "blanket" coverage of a large space of parameters: Instead of the developer trying to define the interesting corner cases manually, we have tests that try all possible schemas with all possible types of items, or alternatively generate a large number of these possibilities by random and test them (each time a different random subset). These tests take a long time not because they sleep a lot, but because they do a lot. I'm not a fan of such tests, but other people swear by them, and if I understand correctly, the new "slow" markers would allow them to write such tests. |
I understand that the goal is to have "gating" tests (the fast ones) which run by default - they'll run in CI and next promotions,
Because: starting Scylla nodes, stopping them, bootstrapping, decommissioning etc. - it all takes time. All tests we had until now worked as follows: they started a single Scylla node and ran all test cases on it. So the single Scylla node start duration was amortized across all test cases. It's easy to make tests fast with this assumption. If we want to test topology operations, then sorry, but long tests is the tradeoff we have to make.
I don't think that's main the reason, at least for Raft tests. A test that performs a single bootstrap operation already takes a few seconds. That's not covering a large space of parameters. That's a trivial test that checks whether one of the most basic administrative Scylla operations works on the happy path. Can you write a test that performs multiple topology operations and runs under one second? Please show me. |
But @kbr- what you are explaining here is why the fastest raft test is not as fast as the fastest Alternator test, taking a couple of seconds instead of a couple of milliseconds. But this doesn't explain the existence of much slower tests, which the new "marker" concept is supposed to mark. Supposedly these "very slow" tests don't just create a cluster and reboot one node, but do one of the other things I describe, like have long sleeps, loop over a large space of parameters, and so on. No? |
Yes, I guess so. The first use case that this PR was created for is: #11734 Still - if we weren't talking about topology operations, but of ops that take a few milliseconds, doing them 20 times would still give a fast test, but here performing just 20 topology operations gives us a "very slow" test. |
Avi, why not address all these concerns and comments to dtest? dtest has all kinds of tests - long, disabled, marked, not marked, flaky, etc. etc. This feature is adding tools to test.py to make impossible possible. It doesn't mean it will immediately be abused. |
We have long topology tests because we still don't use raft fully for topology changes. And we still don't use it because we can't test it well - a good feature development starts with writing a test. |
No,
Agree. |
There's no need to get philosophical here, this just a discussion about defaults and who gets to type a few more characters, or to create a shell alias. Since @gusev-p's advertising budget is low (correct me if I'm wrong), then few people will hear of the change, and so the default for them will be that some tests will become invisible. This includes CI and the standard build process. It's less risky not to change the default. Tests for people who aren't aware will take more time. People who are aware, and choose to skip the slow test (why? a slow test doesn't mean it's unimportant) can type the extra few characters, or execute a shell alias that expands to running the fast tests only. |
@avikivity it's fine, avi, we can have a slow option that's enabled by default. But can we have another slow option that would be disabled by default? People will use the slow option that they like. |
As to CI and the global awareness, we don't have the tests which are slow right now. We're only adding a tag. So we're not regressing anything. So why not let it happen, and later update CI to run with the option, and later decide on case by case basis whether they are opt-in or opt-out? |
BTW, we already have tests in test/ which are not run by default. Why is there so much opposition now? |
Agree, but now there are no slow tests, and if we include them by default, this will feel like a "changing the default", since the |
Some tests may take longer than a few seconds to run. We want to mark such tests in some way, so that we can run them selectively. This patch proposes to use pytest markers for this. The markers from the test.py command line are passed to pytest as is via the -m parameter. By default, the marker filter is not applied and all tests will be run without exception. To exclude e.g. slow tests you can write --markers 'not slow'. The --markers parameter is currently only supported by Python tests, other tests ignore it. We intend to support this parameter for other types of tests in the future. Another possible improvement is not to run suites for which all tests have been filtered out by markers. The markers are currently handled by pytest, which means that the logic in test.py (e.g., running a scylla test cluster) will be run for such suites.
e90c05d
to
54906d6
Compare
v3:
|
CI state |
Some tests may take longer than a few seconds to run. We want to mark such tests in some way, so that we can run them selectively. This patch proposes to use
pytest
markers for this. The markers from the test.py command line are passed topytest
. A special marker with the reserved name 'default' is supported. Its value is replaced by the fixed expression '(not slow)'. This marker is also used as the default value for--markers
, which means that slow test won't be run by default.The
--markers
parameter is currently only supported by Python tests, other tests ignore it. We intend to support this parameter for other types of tests in the future.Another possible improvement is not to run suites for which all tests have been filtered out by markers. The markers are currently handled by
pytest
, which means that the logic intest.py
(e.g., running a scylla test cluster) will be run for such suites.