Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replica/database: stop compaction before closing system tables during shutdown #17218

Conversation

lkshminarayanan
Copy link
Member

During shutdown, as all system tables are closed in parallel, there is a possibility of a race condition between compaction stoppage and the closure of the compaction_history table. So, stop all the compaction tasks before attempting to close the tables.

Fixes #15721

@lkshminarayanan
Copy link
Member Author

The database::close_tables methods are already protected by a _stop_barrier.arrive_and_wait call : database.cc:2231.

So, the only possible race condition is the one that happens when all the system tables are closed in parallel - there is a chance that compaction_history table closes early when other system tables are still compacting. So, to prevent that this patch just stops all the compaction of the system tables before starting to closing them in parallel. This is now protected by a barrier as well to prevent another shard from updating compaction_history after it has been closed. I tried to reproduce this race locally using seastar::sleep()s in various fibers but it never occurred, so I was not able to test the patch.

replica/database.cc Outdated Show resolved Hide resolved
@lkshminarayanan lkshminarayanan force-pushed the fix_test_can_limit_compaction_throughput branch from 445ed1c to d2bd963 Compare February 8, 2024 05:54
@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - Container Test
✅ - dtest
❌ - Unit Tests

Failed Tests (2/24453):

Build Details:

  • Duration: 2 hr 38 min
  • Builder: i-0d6e5e41bc6d8aa99 (m6id.8xlarge)

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Container Test
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 3 hr 55 min
  • Builder: spider5.cloudius-systems.com

replica/database.cc Outdated Show resolved Hide resolved
@lkshminarayanan lkshminarayanan force-pushed the fix_test_can_limit_compaction_throughput branch from d2bd963 to b76aa4e Compare February 12, 2024 15:08
@lkshminarayanan lkshminarayanan force-pushed the fix_test_can_limit_compaction_throughput branch from b76aa4e to eaa9d0b Compare February 12, 2024 15:28
raphaelsc

This comment was marked as outdated.

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - Container Test
✅ - dtest
❌ - Unit Tests

Failed Tests (2/24588):

Build Details:

  • Duration: 2 hr 0 min
  • Builder: spider3.cloudius-systems.com

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Container Test
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 2 hr 24 min
  • Builder: spider5.cloudius-systems.com

@denesb
Copy link
Contributor

denesb commented Feb 13, 2024

…ing shutdown

During shutdown, as all system tables are closed in parallel, there is a
possibility of a race condition between compaction stoppage and the
closure of the compaction_history table. So, quiesce all the compaction
tasks before attempting to close the tables.

Fixes scylladb#15721

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
@lkshminarayanan lkshminarayanan force-pushed the fix_test_can_limit_compaction_throughput branch from eaa9d0b to 2903d26 Compare February 14, 2024 13:27
Copy link
Member

@raphaelsc raphaelsc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! thanks.

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Container Test
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 2 hr 45 min
  • Builder: i-04ce631b2fee2137a (m5d.12xlarge)

@lkshminarayanan lkshminarayanan deleted the fix_test_can_limit_compaction_throughput branch February 15, 2024 13:06
denesb pushed a commit that referenced this pull request Feb 19, 2024
…ing shutdown

During shutdown, as all system tables are closed in parallel, there is a
possibility of a race condition between compaction stoppage and the
closure of the compaction_history table. So, quiesce all the compaction
tasks before attempting to close the tables.

Fixes #15721

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes #17218

(cherry picked from commit 3b7b315)
dgarcia360 pushed a commit to dgarcia360/scylla that referenced this pull request Apr 30, 2024
…ing shutdown

During shutdown, as all system tables are closed in parallel, there is a
possibility of a race condition between compaction stoppage and the
closure of the compaction_history table. So, quiesce all the compaction
tasks before attempting to close the tables.

Fixes scylladb#15721

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb#17218
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants