New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRUNCATE TABLE XXX is not working on v2.2 #3694

Closed
eranhazout opened this Issue Aug 15, 2018 · 10 comments

Comments

Projects
None yet
6 participants
@eranhazout

eranhazout commented Aug 15, 2018

  1. Create a simple table:
    CREATE TABLE gurushots.test5 ( member_id ascii, challenge_id int, views counter, votes counter, PRIMARY KEY (member_id, challenge_id) ) WITH CLUSTERING ORDER BY (challenge_id DESC);
  2. Perform some random inserts (updates for counters in my case) of 200 rows
  3. Using truncate table command doesn’t work, the data is still there

Additional info:
I run the "truncate table" command by cqlsh
When I run it from client it works after 4-6 times and the table was truncated
Using simple strategy and RF=3
experimental mode is enabled

Scylla version: upgraded cluster from 2.1 to 2.2
Cluster size: 3 nodes
OS: CentOS

Hardware details (for performance issues) Delete if unneeded
Platform (physical/VM/cloud instance type/docker):
Hardware: cores=16 each memory=120GB each
Disks: 1 SSD each

@tgrabiec tgrabiec added the bug label Aug 16, 2018

@slivne

This comment has been minimized.

Contributor

slivne commented Sep 6, 2018

@eranhazout - sorry for the late response.

I tried to reproduce locally on a 3 node cluster and failed - can you please share the journalctl logs.

@eranhazout

This comment has been minimized.

eranhazout commented Sep 6, 2018

Sorry, already killed this cluster.

@dorlaor

This comment has been minimized.

Contributor

dorlaor commented Sep 7, 2018

@skpotnuru

This comment has been minimized.

skpotnuru commented Sep 12, 2018

Thank you @dorlaor for referencing me in this page. I am the one complained about the issue in stackoverflow.
I have a three node cluster. and I truncated a table using command

truncate table students;

the command executing fine, means no error I am getting. If I query the table, the data is loading back. I think this is the ghost replication happening in my case. I even tried using all levels of consistency. But data still remains after performing truncation. I found a solution that to perform compaction operation after truncation, I even performed using nodetool in all the three nodes. But data still remains.
And I also observed that sometimes truncate working fine. In a single shot I can able to truncate the table. But sometimes even though I execute truncate table command multiple times data remains.
Some one please help me in this issue.
Thank you.

@skpotnuru

This comment has been minimized.

skpotnuru commented Sep 12, 2018

Today I found some thing that, I truncate table and compact it using nodetool on all the three nodes. And two nodes data was removed and in one node data remains. After some time data is replicated back on two other nodes also which were empty after after compaction.

@slivne

This comment has been minimized.

Contributor

slivne commented Sep 12, 2018

@skpotnuru can you please upload log files from the 3 nodes - if the logs are large you can follow https://docs.scylladb.com/operating-scylla/troubleshooting/report_scylla_problem/

@tomazbracic

This comment has been minimized.

tomazbracic commented Sep 30, 2018

I have the same issue on production 3-node cluster with Scylladb 2.2.0 system. Truncate doesn't work. I've then tested on a small docker container running Scylladb 2.3.0 - Truncate work.

@tgrabiec

This comment has been minimized.

Contributor

tgrabiec commented Oct 2, 2018

I analyzed data provided by @tomazbracic, and the problem seems to be that one of the replicas does not process the truncate request, so it still has the data. The request doesn't seem to be received, it's not the case that it's blocked somewhere.

By analyzing the core dumps I've noticed that none of the nodes sent any requests to themselves, which implies that the truncate coordinator didn't consider itself as an eligible target for truncate. gossiper::get_live_members() will exclude itself when in shutdown state. And in fact, some shards of one of the nodes have STATUS set to shutdown for the current node on non-zero shards:

shard 0:
  gms::application_state::STATUS: {version=19, value="NORMAL,9133786472009474734"}
shard 1:
  gms::application_state::STATUS: {version=68230, value="shutdown,true"}
shard 2:
  gms::application_state::STATUS: {version=68230, value="shutdown,true"}
shard 3:
  gms::application_state::STATUS: {version=68230, value="shutdown,true"}

The local node is not actually shutting down.

After restarting the node, truncate succeeds.

I am figuring out how could we end up in such a state.

@tgrabiec

This comment has been minimized.

Contributor

tgrabiec commented Oct 2, 2018

I suspect that this is caused by #3798

@tgrabiec

This comment has been minimized.

Contributor

tgrabiec commented Oct 3, 2018

This is caused by effects described in #3798 (comment)

@slivne slivne added this to the 3.0 milestone Oct 4, 2018

duarten added a commit that referenced this issue Oct 8, 2018

Merge 'Fix issues with endpoint state replication to other shards' fr…
…om Tomasz

Fixes #3798
Fixes #3694

Tests:

  unit(release), dtest([new] cql_tests.py:TruncateTester.truncate_after_restart_test)

* tag 'fix-gossip-shard-replication-v1' of github.com:tgrabiec/scylla:
  gms/gossiper: Replicate enpoint states in add_saved_endpoint()
  gms/gossiper: Make reset_endpoint_state_map() have effect on all shards
  gms/gossiper: Replicate STATUS change from mark_as_shutdown() to other shards
  gms/gossiper: Always override states from older generations

avikivity added a commit that referenced this issue Oct 9, 2018

Merge 'Fix issues with endpoint state replication to other shards' fr…
…om Tomasz

Fixes #3798
Fixes #3694

Tests:

  unit(release), dtest([new] cql_tests.py:TruncateTester.truncate_after_restart_test)

* tag 'fix-gossip-shard-replication-v1' of github.com:tgrabiec/scylla:
  gms/gossiper: Replicate enpoint states in add_saved_endpoint()
  gms/gossiper: Make reset_endpoint_state_map() have effect on all shards
  gms/gossiper: Replicate STATUS change from mark_as_shutdown() to other shards
  gms/gossiper: Always override states from older generations

(cherry picked from commit 48ebe65)

tgrabiec added a commit that referenced this issue Oct 17, 2018

Merge 'Fix issues with endpoint state replication to other shards' fr…
…om Tomasz

Fixes #3798
Fixes #3694

Tests:

  unit(release), dtest([new] cql_tests.py:TruncateTester.truncate_after_restart_test)

* tag 'fix-gossip-shard-replication-v1' of github.com:tgrabiec/scylla:
  gms/gossiper: Replicate enpoint states in add_saved_endpoint()
  gms/gossiper: Make reset_endpoint_state_map() have effect on all shards
  gms/gossiper: Replicate STATUS change from mark_as_shutdown() to other shards
  gms/gossiper: Always override states from older generations

(cherry picked from commit 48ebe65)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment