Enable tests/integration/standard #219

Lorak-mmk · 2023-03-28T15:25:55Z

Some integration tests were disabled in CI, and so have rotten a bit and were not passing - most notably, our shard-awareness tests.
This PR fixes / disables failing tests in tests/integration/standard and enables them in CI.

There are also some minor general fixes in tests (strict xfail, pip install -e). See commit messages for descriptions of each change.

ci/run_integration_test.sh

fruch · 2023-03-28T15:49:54Z

tests/integration/standard/test_shard_aware.py

@@ -37,8 +38,8 @@


 def setup_module():
-    os.environ['SCYLLA_EXT_OPTS'] = "--smp 4 --memory 2048M"
-    use_cluster('shared_aware', [3], start=True)
+    os.environ['SCYLLA_EXT_OPTS'] = "--smp 2"


I understand you want to fit the test in GitHub action, but changing the number of shards changes the test logic.

The test is querying specific keys, that should make the query go to specific shard, when smp changes we need to pick new keys, or the target shard.

Yes, and I did change the keys - I took them from Java driver, as you can see in commit message. Java driver has very similar tests, and uses smp=2: https://github.com/scylladb/java-driver/blob/b3f3ebaf161b21e5c4840ec294595d4e4b39d9bf/driver-core/src/test/java/com/datastax/driver/core/ShardAwarenessTest.java

I know I've written them initially

Anyhow we are using exactly the tracing for confirmation,

I would remove the assert before figuring out what changed in Scylla that we can't trust them anymore.

It might hide a bug in the driver code.

Also I was quite sure the java tests had them

fruch · 2023-03-28T15:50:53Z

tests/integration/standard/test_shard_aware.py

@@ -62,7 +63,6 @@ def verify_same_shard_in_tracing(self, results, shard_name):
            LOGGER.info("%s %s", event.thread_name, event.description)
        for event in events:
            self.assertEqual(event.thread_name, shard_name)
-        self.assertIn('querying locally', "\n".join([event.description for event in events]))


Those assert at critical to this test, they are the confirmation that our query landed on the expected shard.

test_closing_connections was consistently failing for me. I investigated why that is happening, and came to a conclusion that sometimes even when query is sent to a correct node and correct shard, Scylla can decide to delegate it's execution to another query - in that case 'querying locally' will not be present in the tracing log.

This was confirmed to me by @kbr-scylla on Slack.

The test still checks the shard number for each entry in tracing log - see 2 lines above the one referenced by this comment.

If it's a specific case, I would change it only for that case.

Lorak-mmk · 2023-03-30T13:09:18Z

Rebased on master, reverted one changed requested by @fruch. There was one more failing test, and it passes for me locally - if it fails again I'll investigate.

Regarding removal of "querying locally" assert: I propose to merge this without restoring the assert - as it causes the test to fail. I'll open an issue about it, and look into it after coming back from PTO.

Lorak-mmk · 2023-03-30T14:30:42Z

I don't get why this failed - the branch contains the --smp 1 fix that was merged to master.

@edit 2023-03-30 13:26:42.165 DEBUG [thread:57]: Connected to 2/2 shards on host 127.0.0.2:9042 (0 missing or needs replacement) <- a message from logs. There should be 1 shard, maybe the fix doesn't work for some reason?

Lorak-mmk · 2023-03-30T14:48:00Z

Oh, it's probably not creating a new cluster but reusing the existing one

Lorak-mmk · 2023-03-30T16:39:19Z

Ok, so test_authentication_misconfiguration.py seems to be flaky, I'll have to fix it...

Lorak-mmk · 2023-04-25T10:26:26Z

Rebased on master - to see if my (now reverted) PR wasn't the cause of test failures.

Without `-e`, driver is installed into site-packages, and then installed version conflicts with source. This has caused me problems many times during development, because source version doesn't work (native libraries) are not compiled. `-e` causes driver to be installed in-place, so there are no more conflicts and import issues.

Tests marked as xfailing should be failing - if they are not, it means something in Scylla changed and we need to adapt the tests. This attribute means that a passing xtest will cause a failure.

Renamed some decorators (those that I previously added) from lowercase to lowercase with underscores (as advised by PEP 8 for functions - which a decorator is). I changed newly added decorators, didn't yet touch older ones - still need to decide wheter to do it.

It has some advantages, explained in the comment them. For completeness, I'm copying this comment here: # pytest.mark.xfail instead of unittest.expectedFailure because # 1. unittest doesn't skip setUpClass when used on class and we need it sometimes # 2. unittest doesn't have conditional xfail, and I prefer to use pytest than custom decorator # 3. unittest doesn't have a reason argument, so you don't see the reason in pytest report In the future all decorators should probably be switched over.

Previous code was incorrect and couldn't possibly work.

Scylla need more time to start which caused the test to fail. Increasing to 60 seconds was not enough, 120 seems to work.

This test was previously disabled because required functionality was not implemented in CCM. This tests currently passes - and I manually inspected the logs to make sure node3 really has authentication enabled.

Many tests failing on Scylla were labeled with unconditional xfail. This has some problems: - It's hard to tell why a test was marked. - When some functionality is implemented in Scylla, we don't have an easy way to reenable tests that use this functionality. - Test is skipped also when testing with Cassandra This commit introduces more labels for failing tests. It fixes those problems: - Label name and reason string explain why test is disabled - We can edit label definition to enable tests on newer Scylla version - Tests are only skipped in environment where they are expected to fail

instead of unittest.expectedFailure. The former has a `reason` argument, and this reason is shown in test report - so it's easier to judge wheter the test should really fail.

Tests running `sudo` are problematic to run locally during development, as you can't just run the tests - you need to watch them and wait for sudo prompt. In this test, it would be better to use some kind of proxy. As the test was already marked xfail, skip it to make development easier.

Use shard values (keys, shard number) from Java driver tests. Java driver uses values compatible with smp=2, so with those values we'll be able to run those tests in CI (it has only 2 cores).

All the tests should be passing now, so they can be enabled.

One of the tests was failing in CI, even after previous fix. This is probably caused by cluster being reused, instead of being recreated with newly set env option.

When analyzing output of failed test it is useful to know which node emitted which message.

test_closing_connections is failing, for multiple reasons. Skip it until proper investigation can be performed.

fruch

LGTM

Lorak-mmk · 2023-05-30T10:46:24Z

I don't want to merge it yet. There are flaky tests which I want to disable (and open issues about them), but the old logs expired so I rerun CI a few times - and now they started to pass of course ;_;

Lorak-mmk · 2023-05-30T18:03:36Z

I don't get it - there were 1-3 tests that were failing with high probability, but now there are I think 5 successful runs in a row. Magic? I guess we can merge this, if failures reappear I'll disable the tests and open the issues.

Lorak-mmk requested a review from fruch March 28, 2023 15:25

Lorak-mmk assigned avelanarius and unassigned avelanarius Mar 28, 2023

Lorak-mmk requested a review from avelanarius March 28, 2023 15:26

fruch reviewed Mar 28, 2023

View reviewed changes

Lorak-mmk force-pushed the fix-integration-standard branch from 2d6dee5 to d650905 Compare March 30, 2023 13:06

Lorak-mmk mentioned this pull request Apr 11, 2023

Resurrect integration tests #207

Open

6 tasks

Lorak-mmk force-pushed the fix-integration-standard branch from 9f92a4b to b069485 Compare April 13, 2023 14:23

Lorak-mmk force-pushed the fix-integration-standard branch from b069485 to 2ab4d2c Compare April 25, 2023 10:25

Lorak-mmk mentioned this pull request Apr 25, 2023

regression running scylladb.git test.py topology/test_concurrent_schema #225

Closed

Lorak-mmk added 13 commits May 25, 2023 16:11

pytest.ini: Enable strict xfail

392057e

Tests marked as xfailing should be failing - if they are not, it means something in Scylla changed and we need to adapt the tests. This attribute means that a passing xtest will cause a failure.

test_query.py: Fix usage of MockLoggingHandler

5122080

Previous code was incorrect and couldn't possibly work.

test_custom_cluster.py: Increase startup timeout

8060489

Scylla need more time to start which caused the test to fail. Increasing to 60 seconds was not enough, 120 seems to work.

Reenable test_authentication_misconfiguration.py

45c4f50

This test was previously disabled because required functionality was not implemented in CCM. This tests currently passes - and I manually inspected the logs to make sure node3 really has authentication enabled.

Use pytest.mark.xfail

a8faa90

instead of unittest.expectedFailure. The former has a `reason` argument, and this reason is shown in test report - so it's easier to judge wheter the test should really fail.

Fix a typo in shard aware tests

dca3ab7

test_shard_aware: Copy values from Java driver.

1c886fd

Use shard values (keys, shard number) from Java driver tests. Java driver uses values compatible with smp=2, so with those values we'll be able to run those tests in CI (it has only 2 cores).

Enable tests/integration/standard/ in CI

dadf949

All the tests should be passing now, so they can be enabled.

Lorak-mmk added 3 commits May 25, 2023 16:54

Fix failing test in test_cluster.py

6837233

One of the tests was failing in CI, even after previous fix. This is probably caused by cluster being reused, instead of being recreated with newly set env option.

test_shard_aware.py: print tracing source

fd44512

When analyzing output of failed test it is useful to know which node emitted which message.

test_shard_aware.py: skip failing test

40120f2

test_closing_connections is failing, for multiple reasons. Skip it until proper investigation can be performed.

Lorak-mmk force-pushed the fix-integration-standard branch from 2ab4d2c to 40120f2 Compare May 25, 2023 14:55

fruch approved these changes May 30, 2023

View reviewed changes

avelanarius merged commit 5afb73d into scylladb:master May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable tests/integration/standard #219

Enable tests/integration/standard #219

Lorak-mmk commented Mar 28, 2023

fruch Mar 28, 2023

Lorak-mmk Mar 28, 2023

fruch Mar 28, 2023

fruch Mar 28, 2023

Lorak-mmk Mar 28, 2023

fruch Mar 28, 2023

Lorak-mmk commented Mar 30, 2023

Lorak-mmk commented Mar 30, 2023 •

edited

Loading

Lorak-mmk commented Mar 30, 2023

Lorak-mmk commented Mar 30, 2023

Lorak-mmk commented Apr 25, 2023

fruch left a comment

Lorak-mmk commented May 30, 2023

Lorak-mmk commented May 30, 2023

Enable tests/integration/standard #219

Enable tests/integration/standard #219

Conversation

Lorak-mmk commented Mar 28, 2023

fruch Mar 28, 2023

Choose a reason for hiding this comment

Lorak-mmk Mar 28, 2023

Choose a reason for hiding this comment

fruch Mar 28, 2023

Choose a reason for hiding this comment

fruch Mar 28, 2023

Choose a reason for hiding this comment

Lorak-mmk Mar 28, 2023

Choose a reason for hiding this comment

fruch Mar 28, 2023

Choose a reason for hiding this comment

Lorak-mmk commented Mar 30, 2023

Lorak-mmk commented Mar 30, 2023 • edited Loading

Lorak-mmk commented Mar 30, 2023

Lorak-mmk commented Mar 30, 2023

Lorak-mmk commented Apr 25, 2023

fruch left a comment

Choose a reason for hiding this comment

Lorak-mmk commented May 30, 2023

Lorak-mmk commented May 30, 2023

Lorak-mmk commented Mar 30, 2023 •

edited

Loading