Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal_keyspace extention: enhance the semantics also to flushes (dirty_memory_manager) #14529

Closed
eliransin opened this issue Jul 5, 2023 · 10 comments
Assignees
Labels
Milestone

Comments

@eliransin
Copy link
Contributor

Commit 7c8c020 introduced a new type of a keyspace, an internal keyspace:

It is said in the commit message:

To be populated early by extensions. Such a keyspace should be
1.) Started before user keyspaces
2.) Flushed/closed after user keyspaces
3.) For all other regards be considered "user".

It should be extended also to:
4) should not compete with user keyspace for flushes

This is to prevent deadlocks,
The explanation, this extension was created in order to allow for "soft system_tables", which means that their schema as well
as the data is not local. The semantics didn't concern at all if flushes should compete with user tables of not.
pure system tables can be assumed not to compete with user table flushes thus if a specific user table flush would like to
force a system table flush as a condition to it's own flush it can do so without the fear of a deadlock that will be resulted if
the flush of the system table is serialized after the user table flush.
We actually do this in enterprise in tables that are encrypted using replicated_key_provider and we indeed deadlock since
internal keyspaces does compete with user table flushes.

Since they are considered to be a "high priority keyspaces" anyway we should extend the semantics as aforementioned above.
I am also labeling it as a bug since this missing additional semantic causes us a real deadlock in replicated_key_provide.

@eliransin eliransin self-assigned this Jul 5, 2023
@eliransin
Copy link
Contributor Author

/cc @elcallio

eliransin pushed a commit to eliransin/scylla that referenced this issue Jul 6, 2023
commit 7c8c020 introduced a new type of a keyspace, an internal keyspace
It defined the semantics for this internal keyspace, this keyspace is
somewhat a hybrid between system and user keyspace.

Here we extend the semantics to include also flushes, meaning that
flushes will be done using the system dirty_mamory_manager. This is
in order to allow inter dependencies between internal tables and user
tables and prevent deadlocks.

One example of such a deadlock is our `replicated_key_provider`
encryption on the enterprise version. The deadlock occur because in some
circumstances, an encrypted user table flush is dependant upon the
`encrypted_keys` table being flushed but since the requests are
serialized, we get a deadlock.

Tests: unit tests dev + debug
The deadlock dtest reproducer:
encryption_at_rest_test.py::TestEncryptionAtRest::test_reboot

Fixes scylladb#14529

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
@DoronArazii DoronArazii added this to the 5.4 milestone Jul 9, 2023
raphaelsc pushed a commit to raphaelsc/scylla that referenced this issue Aug 22, 2023
commit 7c8c020 introduced a new type of a keyspace, an internal keyspace
It defined the semantics for this internal keyspace, this keyspace is
somewhat a hybrid between system and user keyspace.

Here we extend the semantics to include also flushes, meaning that
flushes will be done using the system dirty_mamory_manager. This is
in order to allow inter dependencies between internal tables and user
tables and prevent deadlocks.

One example of such a deadlock is our `replicated_key_provider`
encryption on the enterprise version. The deadlock occur because in some
circumstances, an encrypted user table flush is dependant upon the
`encrypted_keys` table being flushed but since the requests are
serialized, we get a deadlock.

Tests: unit tests dev + debug
The deadlock dtest reproducer:
encryption_at_rest_test.py::TestEncryptionAtRest::test_reboot

Fixes scylladb#14529

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>

Closes scylladb#14547
raphaelsc pushed a commit to raphaelsc/scylla that referenced this issue Aug 29, 2023
commit 7c8c020 introduced a new type of a keyspace, an internal keyspace
It defined the semantics for this internal keyspace, this keyspace is
somewhat a hybrid between system and user keyspace.

Here we extend the semantics to include also flushes, meaning that
flushes will be done using the system dirty_mamory_manager. This is
in order to allow inter dependencies between internal tables and user
tables and prevent deadlocks.

One example of such a deadlock is our `replicated_key_provider`
encryption on the enterprise version. The deadlock occur because in some
circumstances, an encrypted user table flush is dependant upon the
`encrypted_keys` table being flushed but since the requests are
serialized, we get a deadlock.

Tests: unit tests dev + debug
The deadlock dtest reproducer:
encryption_at_rest_test.py::TestEncryptionAtRest::test_reboot

Fixes scylladb#14529

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>

Closes scylladb#14547
@denesb
Copy link
Contributor

denesb commented Sep 29, 2023

Backport to 2023.1 queued as 0954c5d18d998b90213589d32b35f54f5e81a2a9.
@elcallio do any other versions need this?

@denesb
Copy link
Contributor

denesb commented Sep 29, 2023

Backport to 2023.1 queued as 0954c5d18d998b90213589d32b35f54f5e81a2a9. @elcallio do any other versions need this?

Dequeued, breaks the build.

@mykaul mykaul reopened this Sep 29, 2023
@mykaul
Copy link
Contributor

mykaul commented Oct 1, 2023

Backport to 2023.1 queued as 0954c5d18d998b90213589d32b35f54f5e81a2a9. @elcallio do any other versions need this?

Dequeued, breaks the build.

@eliransin - who's looking at this?

@mykaul mykaul added the P1 Urgent label Oct 1, 2023
@eliransin
Copy link
Contributor Author

Backport to 2023.1 queued as 0954c5d18d998b90213589d32b35f54f5e81a2a9. @elcallio do any other versions need this?

Dequeued, breaks the build.

@eliransin - who's looking at this?

I will look at this, missed the fact it breaks 2023.1

@elcallio
Copy link
Contributor

Where is the broken build?

@eliransin
Copy link
Contributor Author

This is in 2023.1 (I am not sure about the dequing...)
scylladb/scylla-enterprise@308d41be02c405abac608e113fc7f9af8417e362

Also - we have https://github.com/scylladb/scylla-enterprise/issues/3334 to manage 2023.1 backport of this patch.
I think this issue can be closed.

@mykaul mykaul closed this as completed Oct 18, 2023
@avikivity
Copy link
Member

Anything related to enterprise should be discussed on its own tracker.

@avikivity
Copy link
Member

Should this be backported, if so, why?

Issues like "X should be Y" are unhelpful, it's impossible to say if they are fixes or enhancements.

@elcallio
Copy link
Contributor

elcallio commented Nov 8, 2023

Not further than to 2023, in which it is already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants