-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable tests/integration/standard #219
Enable tests/integration/standard #219
Conversation
@@ -37,8 +38,8 @@ | |||
|
|||
|
|||
def setup_module(): | |||
os.environ['SCYLLA_EXT_OPTS'] = "--smp 4 --memory 2048M" | |||
use_cluster('shared_aware', [3], start=True) | |||
os.environ['SCYLLA_EXT_OPTS'] = "--smp 2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand you want to fit the test in GitHub action, but changing the number of shards changes the test logic.
The test is querying specific keys, that should make the query go to specific shard, when smp changes we need to pick new keys, or the target shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I did change the keys - I took them from Java driver, as you can see in commit message. Java driver has very similar tests, and uses smp=2: https://github.com/scylladb/java-driver/blob/b3f3ebaf161b21e5c4840ec294595d4e4b39d9bf/driver-core/src/test/java/com/datastax/driver/core/ShardAwarenessTest.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I've written them initially
Anyhow we are using exactly the tracing for confirmation,
I would remove the assert before figuring out what changed in Scylla that we can't trust them anymore.
It might hide a bug in the driver code.
Also I was quite sure the java tests had them
@@ -62,7 +63,6 @@ def verify_same_shard_in_tracing(self, results, shard_name): | |||
LOGGER.info("%s %s", event.thread_name, event.description) | |||
for event in events: | |||
self.assertEqual(event.thread_name, shard_name) | |||
self.assertIn('querying locally', "\n".join([event.description for event in events])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those assert at critical to this test, they are the confirmation that our query landed on the expected shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_closing_connections
was consistently failing for me. I investigated why that is happening, and came to a conclusion that sometimes even when query is sent to a correct node and correct shard, Scylla can decide to delegate it's execution to another query - in that case 'querying locally' will not be present in the tracing log.
This was confirmed to me by @kbr-scylla on Slack.
The test still checks the shard number for each entry in tracing log - see 2 lines above the one referenced by this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a specific case, I would change it only for that case.
2d6dee5
to
d650905
Compare
Rebased on master, reverted one changed requested by @fruch. There was one more failing test, and it passes for me locally - if it fails again I'll investigate. Regarding removal of "querying locally" assert: I propose to merge this without restoring the assert - as it causes the test to fail. I'll open an issue about it, and look into it after coming back from PTO. |
I don't get why this failed - the branch contains the @edit |
Oh, it's probably not creating a new cluster but reusing the existing one |
Ok, so |
9f92a4b
to
b069485
Compare
b069485
to
2ab4d2c
Compare
Rebased on master - to see if my (now reverted) PR wasn't the cause of test failures. |
Without `-e`, driver is installed into site-packages, and then installed version conflicts with source. This has caused me problems many times during development, because source version doesn't work (native libraries) are not compiled. `-e` causes driver to be installed in-place, so there are no more conflicts and import issues.
Tests marked as xfailing should be failing - if they are not, it means something in Scylla changed and we need to adapt the tests. This attribute means that a passing xtest will cause a failure.
Renamed some decorators (those that I previously added) from lowercase to lowercase with underscores (as advised by PEP 8 for functions - which a decorator is). I changed newly added decorators, didn't yet touch older ones - still need to decide wheter to do it.
It has some advantages, explained in the comment them. For completeness, I'm copying this comment here: # pytest.mark.xfail instead of unittest.expectedFailure because # 1. unittest doesn't skip setUpClass when used on class and we need it sometimes # 2. unittest doesn't have conditional xfail, and I prefer to use pytest than custom decorator # 3. unittest doesn't have a reason argument, so you don't see the reason in pytest report In the future all decorators should probably be switched over.
Previous code was incorrect and couldn't possibly work.
Scylla need more time to start which caused the test to fail. Increasing to 60 seconds was not enough, 120 seems to work.
This test was previously disabled because required functionality was not implemented in CCM. This tests currently passes - and I manually inspected the logs to make sure node3 really has authentication enabled.
Many tests failing on Scylla were labeled with unconditional xfail. This has some problems: - It's hard to tell why a test was marked. - When some functionality is implemented in Scylla, we don't have an easy way to reenable tests that use this functionality. - Test is skipped also when testing with Cassandra This commit introduces more labels for failing tests. It fixes those problems: - Label name and reason string explain why test is disabled - We can edit label definition to enable tests on newer Scylla version - Tests are only skipped in environment where they are expected to fail
instead of unittest.expectedFailure. The former has a `reason` argument, and this reason is shown in test report - so it's easier to judge wheter the test should really fail.
Tests running `sudo` are problematic to run locally during development, as you can't just run the tests - you need to watch them and wait for sudo prompt. In this test, it would be better to use some kind of proxy. As the test was already marked xfail, skip it to make development easier.
Use shard values (keys, shard number) from Java driver tests. Java driver uses values compatible with smp=2, so with those values we'll be able to run those tests in CI (it has only 2 cores).
All the tests should be passing now, so they can be enabled.
One of the tests was failing in CI, even after previous fix. This is probably caused by cluster being reused, instead of being recreated with newly set env option.
When analyzing output of failed test it is useful to know which node emitted which message.
test_closing_connections is failing, for multiple reasons. Skip it until proper investigation can be performed.
2ab4d2c
to
40120f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I don't want to merge it yet. There are flaky tests which I want to disable (and open issues about them), but the old logs expired so I rerun CI a few times - and now they started to pass of course ;_; |
I don't get it - there were 1-3 tests that were failing with high probability, but now there are I think 5 successful runs in a row. Magic? I guess we can merge this, if failures reappear I'll disable the tests and open the issues. |
Some integration tests were disabled in CI, and so have rotten a bit and were not passing - most notably, our shard-awareness tests.
This PR fixes / disables failing tests in tests/integration/standard and enables them in CI.
There are also some minor general fixes in tests (strict xfail,
pip install -e
). See commit messages for descriptions of each change.