-
Notifications
You must be signed in to change notification settings - Fork 1.3k
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Untyped result sets may cause segfaults when parsing disengaged optionals #6915
Comments
In untyped_result_set::get_view, there exists a silent assumption that the underlying data, which is an optional, to always be engaged. In case the value happens to be disengaged it may lead to creating an incorrect bytes view from a disengaged optional. In order to make the code safer (since values parsed by this code often come from the network and can contain virtually anything) a segfault is replaced with an exception, by calling optional's value() function, which throws when called on disengaged optionals. Fixes scylladb#6915 Message-Id: <090d3f323f693eaa833cad551a044a934b06678d.1595602089.git.sarna@scylladb.com>
@psarna should this be backported? I don't think we have a trigger. |
We don't have a trigger, but corrupt system tables would suffice. I think I'll backport, since it's a bug (segfault instead of graciously failing with a recoverable exception) and the backporting cost is cheap. |
Neither 4.1 nor 4.2 backports cleanly, because CDC relied on dereferencing disengaged optionals somewhere, and it shows in |
Do you know which optional was the problem in CDC, @psarna ? |
Not precisely, this is the output of the failed test: -- generates 3 rows: preimage(static), delta(static), delta(clustering)
update ks.t set vs = 2, vc = 2 where pk = 0 and ck = 0;
{
- "status" : "ok"
+ "message" : "std::bad_optional_access (bad optional access)",
+ "status" : "error"
}
-- generates 4 rows: preimage(static), preimage(clustering), delta(static), delta(clustering)
update ks.t set vs = 3, vc = 3 where pk = 0 and ck = 0;
{
- "status" : "ok"
+ "message" : "std::bad_optional_access (bad optional access)",
+ "status" : "error"
}
select "cdc$batch_seq_no", "cdc$operation", ck, vs, vc from ks.t_scylla_cdc_log where pk = 0 and ck = 0 allow filtering;
{
- "rows" :
- [
- {
- "cdc$batch_seq_no" : "2",
- "cdc$operation" : "1",
- "ck" : "0",
- "pk" : "0",
- "vc" : "2"
- },
- {
- "cdc$batch_seq_no" : "2",
- "cdc$operation" : "0",
- "ck" : "0",
- "pk" : "0",
- "vc" : "2"
- },
- {
- "cdc$batch_seq_no" : "3",
- "cdc$operation" : "1",
- "ck" : "0",
- "pk" : "0",
- "vc" : "3"
- }
- ]
+ "rows" : null
} |
Does the test fail every time or are there some special conditions? |
every single time on 4.1/4.2, never on master |
Could you try removing "vs = 2" and "vs = 3" from updates and see if this fixes he exception? |
Well, it definitely fixes |
Cool. That narrows down the search to the code related to static row. If we want to find the patch that fixed the problem. |
Do we have to do anything? It seems that the patch is fine on master is backporting necessary? |
@haaawk, according to this #6915 (comment) I was under the impression |
You're right, @eliransin - we don't know which CDC patch affects the behavior. I was referring to the fact that it's not clear we want to backport this fix at all. @avikivity wrote:
and then @psarna wrote:
This left me with the impression there's no need for backporting and spending time on looking for this CDC patch wouldn't be the best use of our time when we can do something more useful instead. |
Thanks @haaawk for clarification. I will update if we have a trigger. |
Looks like there was no trigger, so removing the backport candidate label. |
In
untyped_result_set.hh
, functionget_view
silently assumes that the optional value kept in its internal map is always engaged. It's not always the case, since the optional values come from the tables (they are often a result of theSELECT
statement), and thus they shouldn't assume that the values are always present.For instance,
system_keyspace::load_view_build_progress
can fail when a particular column -"first_token"
- is not present in thesystem.view_builds_in_progress
table - which is entirely possible, since we cannot assume any particular contents inside a CQL table.Current code creates an incorrect
bytes_view
object when it's given a disengaged optional, and it may lead to unexpected segfaults later. It's better to check if the optional value is engaged, and throw if it's not.The text was updated successfully, but these errors were encountered: