Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dtest test_resetlocalschema_api_issue_7811 AssertionError: Schema versions are different on the nodes #11643

Closed
bhalevy opened this issue Sep 26, 2022 · 12 comments
Assignees
Milestone

Comments

@bhalevy
Copy link
Member

bhalevy commented Sep 26, 2022

This is seen consistently since https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-release/93/testReport/junit/nodetool_additional_test/TestNodetool/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split004___test_resetlocalschema_api_issue_7811/
scylla version 2a74a00
scylladb/scylla-dtest@ef8537c64f22a9df05137369cc24212ee717f3cd

AssertionError: Schema versions are different on the nodes: 68fa8f59-d325-3290-b77c-37ab17181527: [127.0.83.1, 127.0.83.2] unexpectedly
assert 1 == 2
  +1
  -2

I'm not sure what could have introduced the regression.

last consistently passed with 6799e76

scylla$ git log --oneline 6799e766ca95..2a74a0086f89
2a74a0086f docs: fix typos
cf30432715 Merge 'test: add a topology suite with Raft disabled' from Kamil Braun
43131976e9 updateable_value: Update comment about cross-shard copying
9b6fc553b4 db: commitlog: don't print INFO logs on shutdown
a24a8fd595 Update seastar submodule
ce7bb8b6d0 test.py: PythonTestSuite: sum default config params with user-provided ones
1661fe9f37 test: add a topology suite with Raft disabled
311806244d test: pylib: use Python dicts to manipulate `ScyllaServer` configuration
fd19825eaa test: pylib: store `config_options` in `ScyllaServer`

scylladb/scylla-dtest@250ea614e947429463538d749655782e13ba73e7

scylla-dtest$ git log --oneline git log --oneline 250ea614e9..ef8537c64f22
ef8537c6 secondary_indexes_test: Remove DTCS
@bhalevy
Copy link
Member Author

bhalevy commented Sep 26, 2022

@tgrabiec can you please look into this?
dtest #92 run is incomplete so the regression probably sneaked in in a previous merge so we need to bisect it.

@bhalevy
Copy link
Member Author

bhalevy commented Sep 26, 2022

I bisected this issue to a3356e8

Cc @fee-mendes @nyh

@bhalevy
Copy link
Member Author

bhalevy commented Sep 26, 2022

I think this change inadvertently fixed #7811
causing the test to fail.

Cc @raphaelsc @juliayakovlev

@fee-mendes
Copy link
Member

I bisected this issue to a3356e8

Cc @fee-mendes @nyh

I can't reproduce it. Well, at least not an error. :-)

N1:

./tools/toolchain/dbuild ./build/dev/scylla --developer-mode 1 --listen-address 127.0.0.1 --workdir tmp1 --skip-wait-for-gossip-to-settle 1

N2:

./tools/toolchain/dbuild ./build/dev/scylla --developer-mode 1 --workdir tmp2 --listen-address 127.0.0.2 --api-address 127.0.0.2 --seed-provider-parameters seeds=127.0.0.1 --rpc-address 127.0.0.2

schema.cql:

CREATE KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2};
CREATE TABLE ks.cf (pk int PRIMARY KEY) WITH compaction = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_unit': 'NOPE'};
CREATE TABLE ks.cf (pk int PRIMARY KEY) WITH compaction = {'class': 'TimeWindowCompactionStrategy'};

Manual run:

$ cqlsh 127.0.0.1 9042 -f schema.cql ; sleep 2; curl -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_proxy/schema_versions'; echo; sleep 2; curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' 'http://127.0.0.2:10000/storage_service/relocal_schema'; sleep 2; curl -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_proxy/schema_versions'; echo

schema.cql:3:SyntaxException: Invalid window unit NOPE for compaction_window_unit

Warnings :
TimeWindowCompactionStrategy tables without a default_time_to_live may potentially introduce too many windows. Ensure that insert statements specify a TTL (via USING TTL), when inserting data to this table. The restrict_twcs_without_default_ttl configuration option can be changed to silence this warning or make it into an error

[{"key": "702bc2b8-763d-3f0b-bd90-0bbf8f1c1a08", "value": ["127.0.0.2","127.0.0.1"]}]
[{"key": "702bc2b8-763d-3f0b-bd90-0bbf8f1c1a08", "value": ["127.0.0.2","127.0.0.1"]}]

@fee-mendes
Copy link
Member

I think this change inadvertently fixed #7811 causing the test to fail.

Cc @raphaelsc @juliayakovlev

Yes and no. There are other compaction strategies which aren't verified.

@bhalevy
Copy link
Member Author

bhalevy commented Sep 26, 2022

I'll adjust the test to support both cases

@fruch fruch closed this as completed Sep 28, 2022
@DoronArazii DoronArazii added this to the 5.2 milestone Nov 16, 2022
@bhalevy
Copy link
Member Author

bhalevy commented Nov 23, 2023

Hit in 5.2.11: https://jenkins.scylladb.com/job/scylla-5.2/job/dtest-release/34/testReport/junit/nodetool_additional_test/TestNodetool/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split004___test_resetlocalschema_api_issue_7811_SizeTieredCompactionStrategy_/

>       self.send_storage_restful_api(node2, 'relocal_schema')

nodetool_additional_test.py:1345: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

node = <ccmlib.scylla_node.ScyllaNode object at 0x7fcdd6268950>
option = 'relocal_schema'

    @staticmethod
    def send_storage_restful_api(node, option):
        api_cmd = f"http://{node.address()}:10000/storage_service/{option}"
        logger.debug("Send restful api: " + api_cmd)
        response = requests.post(api_cmd)
>       assert response.status_code == 200, response.text
E       AssertionError: {"message": "data_dictionary::no_such_column_family (Can't find a column family with UUID 372e23e0-894a-11ee-9a14-e929b7e0ec6c)", "code": 500}

@tgrabiec please look into this.
I'm not sure if it should block the release or not.
It was tested with scylladb/scylla-dtest@49c70ebdd40bea2a873bd708a4a35d4049cc8708 that includes scylladb/scylla-dtest@02a072e681a2724d3ff44e9ee59f95173dc8b019

@mykaul
Copy link
Contributor

mykaul commented Dec 21, 2023

@bhalevy , @tgrabiec - did we get to a conclusion here? Should we re-open this, or is there a new issue to track for this?

@bhalevy
Copy link
Member Author

bhalevy commented Dec 22, 2023

@bhalevy , @tgrabiec - did we get to a conclusion here? Should we re-open this, or is there a new issue to track for this?

There's a missing backport.
I pinged #14710 (comment)
Let's start with that.

@tgrabiec
Copy link
Contributor

Lack of #14710 could explain it. relocal schema invokes schema merge path now, so if we corrupted schema so that the table is on disk but not in database object, relocal schema will fail the same way "create table" would fail.

@bhalevy
Copy link
Member Author

bhalevy commented Dec 25, 2023

Lack of #14710 could explain it. relocal schema invokes schema merge path now, so if we corrupted schema so that the table is on disk but not in database object, relocal schema will fail the same way "create table" would fail.

Yes. The test started failing right after bfd8401 was backported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants