New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dtest-raft] test_single_column_blob_max_size_with_cdc_preimage_full_postimage fails with [lsa - Aborting due to allocation failure] #15278
Comments
@bhalevy - does that belong to your team? |
ping @bhalevy |
I don't know. We need to understand the root cause first. |
It's an OOM during table scan query, the table has some columns with large blobs (the test is test_single_column_blob_max_size_with_cdc_preimage_full_postimage after all). Doesn't seem related to Raft at all. The table is a CDC log table, but I'm not sure if it matters, the table just contains large data blobs which could happen with regular tables too, and then we're fetching those blobs during table scans. It could be some regression in the readers (not sure if there were any changes there recently?) or perhaps something elsewhere is creating large memory pressure while the readers are running. @fruch do we know when this first started appearing?
Not sure I understand -- so it might have happened without Raft, but we don't know? |
create_table_stmt = "CREATE TABLE ks.cf (pk bigint, ck bigint, v blob, PRIMARY KEY (pk, ck)) \
WITH cdc={'enabled': true, 'preimage': 'full', 'postimage': true}"
session.execute(create_table_stmt)
insert_value = bytes("1".encode()) * 7 * MB
if prepare_statements:
insert_statement = session.prepare("INSERT INTO ks.cf (pk, ck, v) VALUES (?, ?, ?)")
else:
insert_statement = SimpleStatement("INSERT INTO ks.cf (pk, ck, v) VALUES (%(pk)s, %(ck)s, %(v)s)")
insert_parameters = [{"pk": i, "ck": j, "v": insert_value} for i in range(4) for j in range(3)]
update_value = bytes("2".encode()) * 4 * MB
if prepare_statements:
update_statement = session.prepare("UPDATE ks.cf set v = ? where pk = ? and ck = ?")
else:
update_statement = SimpleStatement("UPDATE ks.cf set v = %(v)s where pk = %(pk)s and ck=%(ck)s")
update_parameters = [{"pk": i, "ck": j, "v": update_value} for i in range(4) for j in range(3)]
> self.execute_case(node, session,
insert_data={
"insert_statement": insert_statement,
"insert_parameters": insert_parameters
},
update_data={
"update_statement": update_statement,
"update_parameters": update_parameters
},
check_stalls=prepare_statements)
The failing scan query is
|
yes, since older job were evacuated by jenkins we don't have any log |
Or with semaphores -- which should be preventing from this large memory consumption IIUC |
We're also fetching rows from the base table during the test, but those are single partition and even single row queries, compared to the CDC log table queries which are scans: select_statement = SimpleStatement("SELECT * FROM ks.cf WHERE pk = %(pk)s and ck = %(ck)s")
select_parameters = [{"pk": i, "ck": j} for i in range(4) for j in range(3)]
cdc_select_statement = SimpleStatement("SELECT * FROM ks.cf_scylla_cdc_log LIMIT 10") |
It looks like the semaphore's kill limit was triggered. This was part of the OOM protection PR which is not new. The LSA alloc failure is unrelated, the semaphore cannot poison LSA allocations. |
I couldn't make the likely cause out of the logs. |
Do we have a log when we cross the soft limit? We can search the cloud to see how common that happens. |
No, reads are just throttled when the soft limit is reached (but, again this requires cooperation from the reads, and practically only applies to reads that go to the disk). |
Another occurence: #15371. |
@denesb - who should look at the coredump? |
I will have a look. |
AssertionError: ERROR 2023-09-22 18:28:25,531 [shard 0:stat] lsa - Aborting due to allocation failure: failed to reclaim 191348736 bytes of memory, while attempting to ensure an std reserve of 536870912 A reserve of 536M seems excessive. There could be a very large allocation somewhere. |
We want a reserve larger than all of memory. |
Attempting to increase std reserve means that we have an allocation larger than a segment (128K). These are passed-through to the std allocator. [1]
|
The test uses 1MB blobs, although the data type is a |
The failing read is a data query (single partition read) into |
I was wrong. The OOM kill can poison LSA allocations. The site of the LSA crash is: scylladb/cache_flat_mutation_reader.hh Lines 376 to 400 in 2cc37eb
In this scope we call Looking at the semaphore:
The current consumption is just 4M from the kill limit, which can easily be tripped with the large blobs this test uses. |
While working on a unit-test reproducer for this, I stepped on yet another bug: #15578. |
@denesb - can you confirm if that's the same issue? From https://jenkins.scylladb.com/job/scylla-5.4/job/longevity/job/longevity-lwt-3h-test/2/consoleFull#-1083344827fcc21424-66d2-4bd8-8e0d-9746405e5b16 :
Just before that we see:
|
No, this is #15485. |
The oom killer was added in 5.4, so no backport is required. |
test_single_column_blob_max_size_with_cdc_preimage_full_postimage fails from time to time, with the following error:
It's with raft enabled (the other cases can be verified, since they happened long more then 2 weeks ago)
decoded backtrace:
https://jenkins.scylladb.com/job/scylla-staging/job/dtest-pytest-gating/57/testReport/junit/test_cdc_large_values/TestLargeColumnsWithCDC/FullDtest___full_split000___test_single_column_blob_max_size_with_cdc_preimage_full_postimage_prepared_statements__2/
Logs
https://jenkins.scylladb.com/job/scylla-staging/job/dtest-pytest-gating/57/artifact/logs-full.release.000/1693859448888_test_cdc_large_values.py%3A%3ATestLargeColumnsWithCDC%3A%3Atest_single_column_blob_max_size_with_cdc_preimage_full_postimage%5Bprepared_statements%5D/
The text was updated successfully, but these errors were encountered: