Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughThis PR introduces runtime validation for checksum buffer configuration, adjusts buffer allocation sizing to prevent zero-sized allocations, and adds regression tests for issue Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (6)
dbms/src/Storages/DeltaMerge/File/tests/gtest_dm_file.cpp (1)
1110-1193: LGTM — solid regression test for the empty-array substream path.The test correctly exercises the "non-null but empty
Array" case:std::vector<std::optional<Array>>(total_rows, Array{})constructs engaged optionals holding empty arrays (notnullopt), so the null map is all-zero while each row's nested array has size 0 — exactly the scenario that would otherwise trigger a zero-sized allocation. The three metadata invariants that are asserted (data_bytes == 0, absent empty-elements file on disk, and amerged_sub_file_infosentry withsize == 0) correspond 1:1 to the behavior inDMFileMetaV2::finalizeSmallFiles(the 0-byte substream is copied into the merged file, recorded withsize == 0, and the original sub-file is deleted), so the assertions should be robust across environments.One tiny nit (optional): the
if (!block) break;guard on lines 1168–1169 is dead code —while (Block block = stream->read())already terminates on a falsy block. Matches the pre-existing pattern inWriteReadNullableVectorColumn, so feel free to ignore.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbms/src/Storages/DeltaMerge/File/tests/gtest_dm_file.cpp` around lines 1110 - 1193, The loop in TEST_F(DMFileMetaV2Test, WriteReadNullableEmptyArrayColumn) contains a redundant guard "if (!block) break;" immediately after "while (Block block = stream->read())" — remove that dead check so the loop relies on the while-condition (the block falsiness already terminates the loop); update the loop body accordingly in the test where "stream->read()" is used to populate "block".tests/fullstack-test2/ddl/issue_10809_int_decimal.test (2)
27-33: Consider verifying the read also succeeds before compaction.The sibling test
issue_10809.test(vector case) asserts the count both before and aftercompact tiflash replicato distinguish a pre-compaction regression from a compaction-time regression. For consistency and tighter coverage, consider adding a pre-compactionselect count(*), count(v)check for the int table (and similarly for the decimal table at Line 40–46).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/fullstack-test2/ddl/issue_10809_int_decimal.test` around lines 27 - 33, Add a pre-compaction verification step to the int and decimal tests by running and asserting the same SELECT before the ALTER so you can distinguish pre- vs post-compaction regressions: locate the SQL sequence around the calls to "select count(*), count(v) from test.t_issue10809_int" and "select count(*), count(v) from test.t_issue10809_decimal" and insert an identical SELECT+assertion immediately before the corresponding "alter table ... compact tiflash replica" statements, ensuring the test checks counts both before and after compaction.
18-20: Minor: comment phrasing is confusing."Negative regressions" typically means tests that verify something should NOT happen. Here the test verifies that compaction DOES continue to work for ordinary nullable scalar types (i.e., a positive/sanity regression for types other than
Array). Consider rephrasing to something like "Sanity regressions for#10809on non-Array nullable scalar types to ensure compaction still succeeds."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/fullstack-test2/ddl/issue_10809_int_decimal.test` around lines 18 - 20, Replace the confusing header "Negative regressions for `#10809`." with a clearer phrasing that reflects the test intent; for example, update the comment to "Sanity regressions for `#10809` on non-Array nullable scalar types to ensure compaction still succeeds." so the test purpose (verifying compaction still works for ordinary nullable scalar types) is explicit; edit the top comment in tests/fullstack-test2/ddl/issue_10809_int_decimal.test where that header string appears.tests/fullstack-test-index/vector/issue_10809.test (1)
22-43: SQL keyword casing is inconsistent with sibling tests.This file uses uppercase (
CREATE TABLE,ALTER TABLE,INSERT INTO,DEFAULT NULL,SELECT ... FROM) whileissue_10809_int_decimal.testandissue_10809_varchar.testadded in the same PR use lowercase. Not a functional issue — just worth aligning for readability across the#10809test suite.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/fullstack-test-index/vector/issue_10809.test` around lines 22 - 43, The SQL keywords in this test (e.g., CREATE TABLE, ALTER TABLE, INSERT INTO, DEFAULT NULL, SELECT ... FROM, set tidb_isolation_read_engines, and the helper call wait_table test t) should be changed to lowercase to match the sibling tests (issue_10809_int_decimal.test and issue_10809_varchar.test); update every SQL statement in this file to use lowercase keywords (create table, alter table, insert into, default null, select ... from, set tidb_isolation_read_engines) while keeping identifiers and spacing unchanged so the test behavior is identical.dbms/src/IO/Checksum/tests/gtest_dm_checksum_buffer.cpp (1)
511-522: Addtry/CATCHwrapper and at least one explicit assertion.Unlike the sibling
runCompressedSeekableReaderBufferTestat Line 387 (wrapped withtry { ... } CATCH), this new test has no exception-to-test-failure translation, so aDB::Exceptionfrombuildorseekwill surface as an uncaught-exception crash rather than a readable gtest failure. The test also has no explicit assertions — since the whole point is to pin down the empty-file regression fixed inChecksumReadBufferBuilder.cpp, an explicitASSERT_NO_THROW/state check makes the contract self-documenting.♻️ Proposed fix
template <ChecksumAlgo D> void runEmptyCompressedSeekableReaderBufferTest() +try { auto config = DM::DMChecksumConfig{{}, TIFLASH_DEFAULT_CHECKSUM_FRAME_SIZE, D}; auto compressed_in = CompressedReadBufferFromFileBuilder::build( String{}, "empty-compressed-buffer", config.getChecksumAlgorithm(), config.getChecksumFrameLength()); - compressed_in->seek(0, 0); + ASSERT_NO_THROW(compressed_in->seek(0, 0)); } +CATCH🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbms/src/IO/Checksum/tests/gtest_dm_checksum_buffer.cpp` around lines 511 - 522, The test runEmptyCompressedSeekableReaderBufferTest lacks exception handling and assertions; wrap the body in the same try { ... } CATCH(...) pattern used by runCompressedSeekableReaderBufferTest and convert the operations that may throw (CompressedReadBufferFromFileBuilder::build and compressed_in->seek) into a gtest assertion such as ASSERT_NO_THROW(...) or explicitly assert post-conditions (e.g. that compressed_in is non-null and seek succeeds) so DB::Exception is translated into a readable test failure instead of an uncaught crash.dbms/src/IO/FileProvider/ChecksumReadBufferBuilder.cpp (1)
37-40: Fix looks correct; optional: collapse tostd::max.The guard plus the
allocation_size >= 1floor cleanly eliminates the zero-sizedFramedChecksumReadBufferconstruction that callers likeloadColMarkWithChecksumTo/loadMinMaxIndexWithChecksumcould hit on empty files. Optional readability tweak:♻️ Optional refactor
- RUNTIME_CHECK_MSG(checksum_frame_size > 0, "Invalid checksum frame size for {}", filename_); - auto allocation_size = std::min(estimated_size, checksum_frame_size); - if (allocation_size == 0) - allocation_size = 1; + RUNTIME_CHECK_MSG(checksum_frame_size > 0, "Invalid checksum frame size for {}", filename_); + auto allocation_size = std::max<size_t>(1, std::min(estimated_size, checksum_frame_size));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbms/src/IO/FileProvider/ChecksumReadBufferBuilder.cpp` around lines 37 - 40, Replace the two-step allocation_size computation with a single std::max to ensure allocation_size is at least 1: compute allocation_size as the max of 1 and the min of estimated_size and checksum_frame_size (i.e., replace allocation_size = std::min(...); if (allocation_size==0) allocation_size=1 with allocation_size = std::max<size_t>(1, std::min(estimated_size, checksum_frame_size))). Update the code in ChecksumReadBufferBuilder where checksum_frame_size and allocation_size are used (affects FramedChecksumReadBuffer construction and callers like loadColMarkWithChecksumTo / loadMinMaxIndexWithChecksum) so empty files still produce a non-zero allocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/fullstack-test-index/vector/issue_10809.test`:
- Line 55: The SQL statement "drop table if exists test.t" in the test file is
missing its terminating semicolon; update that statement (the line containing
the literal drop table if exists test.t) to include a trailing ";" so it matches
the style of the other mysql> statements and terminates properly.
---
Nitpick comments:
In `@dbms/src/IO/Checksum/tests/gtest_dm_checksum_buffer.cpp`:
- Around line 511-522: The test runEmptyCompressedSeekableReaderBufferTest lacks
exception handling and assertions; wrap the body in the same try { ... }
CATCH(...) pattern used by runCompressedSeekableReaderBufferTest and convert the
operations that may throw (CompressedReadBufferFromFileBuilder::build and
compressed_in->seek) into a gtest assertion such as ASSERT_NO_THROW(...) or
explicitly assert post-conditions (e.g. that compressed_in is non-null and seek
succeeds) so DB::Exception is translated into a readable test failure instead of
an uncaught crash.
In `@dbms/src/IO/FileProvider/ChecksumReadBufferBuilder.cpp`:
- Around line 37-40: Replace the two-step allocation_size computation with a
single std::max to ensure allocation_size is at least 1: compute allocation_size
as the max of 1 and the min of estimated_size and checksum_frame_size (i.e.,
replace allocation_size = std::min(...); if (allocation_size==0)
allocation_size=1 with allocation_size = std::max<size_t>(1,
std::min(estimated_size, checksum_frame_size))). Update the code in
ChecksumReadBufferBuilder where checksum_frame_size and allocation_size are used
(affects FramedChecksumReadBuffer construction and callers like
loadColMarkWithChecksumTo / loadMinMaxIndexWithChecksum) so empty files still
produce a non-zero allocation.
In `@dbms/src/Storages/DeltaMerge/File/tests/gtest_dm_file.cpp`:
- Around line 1110-1193: The loop in TEST_F(DMFileMetaV2Test,
WriteReadNullableEmptyArrayColumn) contains a redundant guard "if (!block)
break;" immediately after "while (Block block = stream->read())" — remove that
dead check so the loop relies on the while-condition (the block falsiness
already terminates the loop); update the loop body accordingly in the test where
"stream->read()" is used to populate "block".
In `@tests/fullstack-test-index/vector/issue_10809.test`:
- Around line 22-43: The SQL keywords in this test (e.g., CREATE TABLE, ALTER
TABLE, INSERT INTO, DEFAULT NULL, SELECT ... FROM, set
tidb_isolation_read_engines, and the helper call wait_table test t) should be
changed to lowercase to match the sibling tests (issue_10809_int_decimal.test
and issue_10809_varchar.test); update every SQL statement in this file to use
lowercase keywords (create table, alter table, insert into, default null, select
... from, set tidb_isolation_read_engines) while keeping identifiers and spacing
unchanged so the test behavior is identical.
In `@tests/fullstack-test2/ddl/issue_10809_int_decimal.test`:
- Around line 27-33: Add a pre-compaction verification step to the int and
decimal tests by running and asserting the same SELECT before the ALTER so you
can distinguish pre- vs post-compaction regressions: locate the SQL sequence
around the calls to "select count(*), count(v) from test.t_issue10809_int" and
"select count(*), count(v) from test.t_issue10809_decimal" and insert an
identical SELECT+assertion immediately before the corresponding "alter table ...
compact tiflash replica" statements, ensuring the test checks counts both before
and after compaction.
- Around line 18-20: Replace the confusing header "Negative regressions for
`#10809`." with a clearer phrasing that reflects the test intent; for example,
update the comment to "Sanity regressions for `#10809` on non-Array nullable
scalar types to ensure compaction still succeeds." so the test purpose
(verifying compaction still works for ordinary nullable scalar types) is
explicit; edit the top comment in
tests/fullstack-test2/ddl/issue_10809_int_decimal.test where that header string
appears.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: cadad68e-9c68-41a5-8679-65ae95c832b5
📒 Files selected for processing (6)
dbms/src/IO/Checksum/tests/gtest_dm_checksum_buffer.cppdbms/src/IO/FileProvider/ChecksumReadBufferBuilder.cppdbms/src/Storages/DeltaMerge/File/tests/gtest_dm_file.cpptests/fullstack-test-index/vector/issue_10809.testtests/fullstack-test2/ddl/issue_10809_int_decimal.testtests/fullstack-test2/ddl/issue_10809_varchar.test
| +----+ | ||
|
|
||
| # Cleanup. | ||
| mysql> drop table if exists test.t |
There was a problem hiding this comment.
Missing terminating semicolon.
Every other mysql> statement in this file (and in sibling tests) ends with ;. The trailing drop table if exists test.t is missing one.
Proposed fix
-mysql> drop table if exists test.t
+mysql> drop table if exists test.t;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| mysql> drop table if exists test.t | |
| mysql> drop table if exists test.t; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/fullstack-test-index/vector/issue_10809.test` at line 55, The SQL
statement "drop table if exists test.t" in the test file is missing its
terminating semicolon; update that statement (the line containing the literal
drop table if exists test.t) to include a trailing ";" so it matches the style
of the other mysql> statements and terminates properly.
What problem does this PR solve?
Issue Number: close #10809
Problem Summary:
VECTOR(N) DEFAULT NULL, TiFlash stores the column asNullable(Array(Float32)).NULL, the nested array defaults to empty arrays, so theArrayElementssubstream can be a valid zero-byte file.ChecksumReadBufferBuilderused:allocation_size = min(data.size(), checksum_frame_size)allocation_size == 0, which was then passed intoFramedChecksumReadBufferas its internal frame size.seek(0)later triggered a divide-by-zero inFramedChecksumReadBuffer::doSeek.What is changed and how it works?
checksum_frame_size > 0allocation_sizeto at least1Check List
Tests
Nullable(Array(Float32))with empty arrays under DMFile meta v2VECTOR(N) DEFAULT NULLINT,DECIMAL, andVARCHARSide effects
Documentation
Release note
Summary by CodeRabbit
Release Notes
Bug Fixes
Tests