Skip to content

WIP#10819

Open
JinheLin wants to merge 1 commit intopingcap:masterfrom
JinheLin:fix_empty_file
Open

WIP#10819
JinheLin wants to merge 1 commit intopingcap:masterfrom
JinheLin:fix_empty_file

Conversation

@JinheLin
Copy link
Copy Markdown
Contributor

@JinheLin JinheLin commented Apr 24, 2026

What problem does this PR solve?

Issue Number: close #10809

Problem Summary:

  • For VECTOR(N) DEFAULT NULL, TiFlash stores the column as Nullable(Array(Float32)).
  • When all rows are NULL, the nested array defaults to empty arrays, so the ArrayElements substream can be a valid zero-byte file.
  • With meta v2 small-file merge, that zero-byte substream may still be recorded as a merged subfile. During read/compact, the merged slice becomes empty, and ChecksumReadBufferBuilder used:
    • allocation_size = min(data.size(), checksum_frame_size)
  • For an empty merged slice, this produced allocation_size == 0, which was then passed into FramedChecksumReadBuffer as its internal frame size.
  • The first seek(0) later triggered a divide-by-zero in FramedChecksumReadBuffer::doSeek.

What is changed and how it works?

  • Make the read path tolerate empty merged substreams safely:
    • validate checksum_frame_size > 0
    • clamp the computed allocation_size to at least 1
  • This keeps old DMFiles with zero-size merged entries readable and avoids the compact-time crash, while preserving EOF behavior for empty inputs.

Check List

Tests

  • Unit test
    • unit test for empty checksum/compressed reader seek
    • unit test for Nullable(Array(Float32)) with empty arrays under DMFile meta v2
  • Integration test
    • fullstack regression reproducing the issue with VECTOR(N) DEFAULT NULL
    • negative fullstack regressions for INT, DECIMAL, and VARCHAR
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

Release Notes

Bug Fixes

  • Improved validation of checksum configuration for buffer allocation
  • Enhanced handling of NULL and empty values in nullable columns (arrays, vectors, decimals, strings)

Tests

  • Added regression tests for proper handling of NULL values in vectors, integers, decimals, and string columns
  • Extended test coverage for empty array columns and file operations

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. labels Apr 24, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Apr 24, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign flowbehappy for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 24, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

This PR introduces runtime validation for checksum buffer configuration, adjusts buffer allocation sizing to prevent zero-sized allocations, and adds regression tests for issue #10809 covering NULL handling in vector, numeric, and string columns, plus tests for empty array column scenarios.

Changes

Cohort / File(s) Summary
Checksum Configuration Validation
dbms/src/IO/FileProvider/ChecksumReadBufferBuilder.cpp
Added runtime validation requiring checksum_frame_size to be strictly positive in both build overloads; adjusted allocation sizing to prevent zero-sized buffers.
Checksum Algorithm Tests
dbms/src/IO/Checksum/tests/gtest_dm_checksum_buffer.cpp
Added EmptyCompressedSeekable GoogleTest suite for each checksum variant (None, CRC32, CRC64, City128, XXH3) exercising seek behavior on empty inputs.
DMFile Array Column Test
dbms/src/Storages/DeltaMerge/File/tests/gtest_dm_file.cpp
Added test validating DMFile handling of nullable array columns with all empty arrays, verifying metadata, empty-elements substream absence, and data restoration.
Issue #10809 Regression Tests
tests/fullstack-test-index/vector/issue_10809.test, tests/fullstack-test2/ddl/issue_10809_int_decimal.test, tests/fullstack-test2/ddl/issue_10809_varchar.test
Added three fullstack regression tests for NULL handling in VECTOR, nullable INT/DECIMAL, and nullable VARCHAR columns, with pre- and post-compaction validation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

size/XL

Suggested reviewers

  • CalvinNeo
  • gengliqi
  • windtalker

Poem

🐰 Buffers and nulls, so empty and vast,
Validation logic holds firm and fast!
🥕 With tests in abundance for vectors so null,
The storage is steady, complete and full!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'WIP' is vague and generic, providing no meaningful information about the changeset content or intent. Replace 'WIP' with a descriptive title that summarizes the main change, such as 'Fix empty file handling in checksum buffers and DMFile' or similar based on the actual intent.
✅ Passed checks (4 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description is largely complete with clear problem statement, changes explanation, and comprehensive test coverage checklist. All required template sections are addressed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
dbms/src/Storages/DeltaMerge/File/tests/gtest_dm_file.cpp (1)

1110-1193: LGTM — solid regression test for the empty-array substream path.

The test correctly exercises the "non-null but empty Array" case: std::vector<std::optional<Array>>(total_rows, Array{}) constructs engaged optionals holding empty arrays (not nullopt), so the null map is all-zero while each row's nested array has size 0 — exactly the scenario that would otherwise trigger a zero-sized allocation. The three metadata invariants that are asserted (data_bytes == 0, absent empty-elements file on disk, and a merged_sub_file_infos entry with size == 0) correspond 1:1 to the behavior in DMFileMetaV2::finalizeSmallFiles (the 0-byte substream is copied into the merged file, recorded with size == 0, and the original sub-file is deleted), so the assertions should be robust across environments.

One tiny nit (optional): the if (!block) break; guard on lines 1168–1169 is dead code — while (Block block = stream->read()) already terminates on a falsy block. Matches the pre-existing pattern in WriteReadNullableVectorColumn, so feel free to ignore.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/Storages/DeltaMerge/File/tests/gtest_dm_file.cpp` around lines 1110
- 1193, The loop in TEST_F(DMFileMetaV2Test, WriteReadNullableEmptyArrayColumn)
contains a redundant guard "if (!block) break;" immediately after "while (Block
block = stream->read())" — remove that dead check so the loop relies on the
while-condition (the block falsiness already terminates the loop); update the
loop body accordingly in the test where "stream->read()" is used to populate
"block".
tests/fullstack-test2/ddl/issue_10809_int_decimal.test (2)

27-33: Consider verifying the read also succeeds before compaction.

The sibling test issue_10809.test (vector case) asserts the count both before and after compact tiflash replica to distinguish a pre-compaction regression from a compaction-time regression. For consistency and tighter coverage, consider adding a pre-compaction select count(*), count(v) check for the int table (and similarly for the decimal table at Line 40–46).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/fullstack-test2/ddl/issue_10809_int_decimal.test` around lines 27 - 33,
Add a pre-compaction verification step to the int and decimal tests by running
and asserting the same SELECT before the ALTER so you can distinguish pre- vs
post-compaction regressions: locate the SQL sequence around the calls to "select
count(*), count(v) from test.t_issue10809_int" and "select count(*), count(v)
from test.t_issue10809_decimal" and insert an identical SELECT+assertion
immediately before the corresponding "alter table ... compact tiflash replica"
statements, ensuring the test checks counts both before and after compaction.

18-20: Minor: comment phrasing is confusing.

"Negative regressions" typically means tests that verify something should NOT happen. Here the test verifies that compaction DOES continue to work for ordinary nullable scalar types (i.e., a positive/sanity regression for types other than Array). Consider rephrasing to something like "Sanity regressions for #10809 on non-Array nullable scalar types to ensure compaction still succeeds."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/fullstack-test2/ddl/issue_10809_int_decimal.test` around lines 18 - 20,
Replace the confusing header "Negative regressions for `#10809`." with a clearer
phrasing that reflects the test intent; for example, update the comment to
"Sanity regressions for `#10809` on non-Array nullable scalar types to ensure
compaction still succeeds." so the test purpose (verifying compaction still
works for ordinary nullable scalar types) is explicit; edit the top comment in
tests/fullstack-test2/ddl/issue_10809_int_decimal.test where that header string
appears.
tests/fullstack-test-index/vector/issue_10809.test (1)

22-43: SQL keyword casing is inconsistent with sibling tests.

This file uses uppercase (CREATE TABLE, ALTER TABLE, INSERT INTO, DEFAULT NULL, SELECT ... FROM) while issue_10809_int_decimal.test and issue_10809_varchar.test added in the same PR use lowercase. Not a functional issue — just worth aligning for readability across the #10809 test suite.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/fullstack-test-index/vector/issue_10809.test` around lines 22 - 43, The
SQL keywords in this test (e.g., CREATE TABLE, ALTER TABLE, INSERT INTO, DEFAULT
NULL, SELECT ... FROM, set tidb_isolation_read_engines, and the helper call
wait_table test t) should be changed to lowercase to match the sibling tests
(issue_10809_int_decimal.test and issue_10809_varchar.test); update every SQL
statement in this file to use lowercase keywords (create table, alter table,
insert into, default null, select ... from, set tidb_isolation_read_engines)
while keeping identifiers and spacing unchanged so the test behavior is
identical.
dbms/src/IO/Checksum/tests/gtest_dm_checksum_buffer.cpp (1)

511-522: Add try/CATCH wrapper and at least one explicit assertion.

Unlike the sibling runCompressedSeekableReaderBufferTest at Line 387 (wrapped with try { ... } CATCH), this new test has no exception-to-test-failure translation, so a DB::Exception from build or seek will surface as an uncaught-exception crash rather than a readable gtest failure. The test also has no explicit assertions — since the whole point is to pin down the empty-file regression fixed in ChecksumReadBufferBuilder.cpp, an explicit ASSERT_NO_THROW/state check makes the contract self-documenting.

♻️ Proposed fix
 template <ChecksumAlgo D>
 void runEmptyCompressedSeekableReaderBufferTest()
+try
 {
     auto config = DM::DMChecksumConfig{{}, TIFLASH_DEFAULT_CHECKSUM_FRAME_SIZE, D};
     auto compressed_in = CompressedReadBufferFromFileBuilder::build(
         String{},
         "empty-compressed-buffer",
         config.getChecksumAlgorithm(),
         config.getChecksumFrameLength());
 
-    compressed_in->seek(0, 0);
+    ASSERT_NO_THROW(compressed_in->seek(0, 0));
 }
+CATCH
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/IO/Checksum/tests/gtest_dm_checksum_buffer.cpp` around lines 511 -
522, The test runEmptyCompressedSeekableReaderBufferTest lacks exception
handling and assertions; wrap the body in the same try { ... } CATCH(...)
pattern used by runCompressedSeekableReaderBufferTest and convert the operations
that may throw (CompressedReadBufferFromFileBuilder::build and
compressed_in->seek) into a gtest assertion such as ASSERT_NO_THROW(...) or
explicitly assert post-conditions (e.g. that compressed_in is non-null and seek
succeeds) so DB::Exception is translated into a readable test failure instead of
an uncaught crash.
dbms/src/IO/FileProvider/ChecksumReadBufferBuilder.cpp (1)

37-40: Fix looks correct; optional: collapse to std::max.

The guard plus the allocation_size >= 1 floor cleanly eliminates the zero-sized FramedChecksumReadBuffer construction that callers like loadColMarkWithChecksumTo / loadMinMaxIndexWithChecksum could hit on empty files. Optional readability tweak:

♻️ Optional refactor
-    RUNTIME_CHECK_MSG(checksum_frame_size > 0, "Invalid checksum frame size for {}", filename_);
-    auto allocation_size = std::min(estimated_size, checksum_frame_size);
-    if (allocation_size == 0)
-        allocation_size = 1;
+    RUNTIME_CHECK_MSG(checksum_frame_size > 0, "Invalid checksum frame size for {}", filename_);
+    auto allocation_size = std::max<size_t>(1, std::min(estimated_size, checksum_frame_size));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/IO/FileProvider/ChecksumReadBufferBuilder.cpp` around lines 37 - 40,
Replace the two-step allocation_size computation with a single std::max to
ensure allocation_size is at least 1: compute allocation_size as the max of 1
and the min of estimated_size and checksum_frame_size (i.e., replace
allocation_size = std::min(...); if (allocation_size==0) allocation_size=1 with
allocation_size = std::max<size_t>(1, std::min(estimated_size,
checksum_frame_size))). Update the code in ChecksumReadBufferBuilder where
checksum_frame_size and allocation_size are used (affects
FramedChecksumReadBuffer construction and callers like loadColMarkWithChecksumTo
/ loadMinMaxIndexWithChecksum) so empty files still produce a non-zero
allocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/fullstack-test-index/vector/issue_10809.test`:
- Line 55: The SQL statement "drop table if exists test.t" in the test file is
missing its terminating semicolon; update that statement (the line containing
the literal drop table if exists test.t) to include a trailing ";" so it matches
the style of the other mysql> statements and terminates properly.

---

Nitpick comments:
In `@dbms/src/IO/Checksum/tests/gtest_dm_checksum_buffer.cpp`:
- Around line 511-522: The test runEmptyCompressedSeekableReaderBufferTest lacks
exception handling and assertions; wrap the body in the same try { ... }
CATCH(...) pattern used by runCompressedSeekableReaderBufferTest and convert the
operations that may throw (CompressedReadBufferFromFileBuilder::build and
compressed_in->seek) into a gtest assertion such as ASSERT_NO_THROW(...) or
explicitly assert post-conditions (e.g. that compressed_in is non-null and seek
succeeds) so DB::Exception is translated into a readable test failure instead of
an uncaught crash.

In `@dbms/src/IO/FileProvider/ChecksumReadBufferBuilder.cpp`:
- Around line 37-40: Replace the two-step allocation_size computation with a
single std::max to ensure allocation_size is at least 1: compute allocation_size
as the max of 1 and the min of estimated_size and checksum_frame_size (i.e.,
replace allocation_size = std::min(...); if (allocation_size==0)
allocation_size=1 with allocation_size = std::max<size_t>(1,
std::min(estimated_size, checksum_frame_size))). Update the code in
ChecksumReadBufferBuilder where checksum_frame_size and allocation_size are used
(affects FramedChecksumReadBuffer construction and callers like
loadColMarkWithChecksumTo / loadMinMaxIndexWithChecksum) so empty files still
produce a non-zero allocation.

In `@dbms/src/Storages/DeltaMerge/File/tests/gtest_dm_file.cpp`:
- Around line 1110-1193: The loop in TEST_F(DMFileMetaV2Test,
WriteReadNullableEmptyArrayColumn) contains a redundant guard "if (!block)
break;" immediately after "while (Block block = stream->read())" — remove that
dead check so the loop relies on the while-condition (the block falsiness
already terminates the loop); update the loop body accordingly in the test where
"stream->read()" is used to populate "block".

In `@tests/fullstack-test-index/vector/issue_10809.test`:
- Around line 22-43: The SQL keywords in this test (e.g., CREATE TABLE, ALTER
TABLE, INSERT INTO, DEFAULT NULL, SELECT ... FROM, set
tidb_isolation_read_engines, and the helper call wait_table test t) should be
changed to lowercase to match the sibling tests (issue_10809_int_decimal.test
and issue_10809_varchar.test); update every SQL statement in this file to use
lowercase keywords (create table, alter table, insert into, default null, select
... from, set tidb_isolation_read_engines) while keeping identifiers and spacing
unchanged so the test behavior is identical.

In `@tests/fullstack-test2/ddl/issue_10809_int_decimal.test`:
- Around line 27-33: Add a pre-compaction verification step to the int and
decimal tests by running and asserting the same SELECT before the ALTER so you
can distinguish pre- vs post-compaction regressions: locate the SQL sequence
around the calls to "select count(*), count(v) from test.t_issue10809_int" and
"select count(*), count(v) from test.t_issue10809_decimal" and insert an
identical SELECT+assertion immediately before the corresponding "alter table ...
compact tiflash replica" statements, ensuring the test checks counts both before
and after compaction.
- Around line 18-20: Replace the confusing header "Negative regressions for
`#10809`." with a clearer phrasing that reflects the test intent; for example,
update the comment to "Sanity regressions for `#10809` on non-Array nullable
scalar types to ensure compaction still succeeds." so the test purpose
(verifying compaction still works for ordinary nullable scalar types) is
explicit; edit the top comment in
tests/fullstack-test2/ddl/issue_10809_int_decimal.test where that header string
appears.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cadad68e-9c68-41a5-8679-65ae95c832b5

📥 Commits

Reviewing files that changed from the base of the PR and between 0dc254b and 0f514a2.

📒 Files selected for processing (6)
  • dbms/src/IO/Checksum/tests/gtest_dm_checksum_buffer.cpp
  • dbms/src/IO/FileProvider/ChecksumReadBufferBuilder.cpp
  • dbms/src/Storages/DeltaMerge/File/tests/gtest_dm_file.cpp
  • tests/fullstack-test-index/vector/issue_10809.test
  • tests/fullstack-test2/ddl/issue_10809_int_decimal.test
  • tests/fullstack-test2/ddl/issue_10809_varchar.test

+----+

# Cleanup.
mysql> drop table if exists test.t
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing terminating semicolon.

Every other mysql> statement in this file (and in sibling tests) ends with ;. The trailing drop table if exists test.t is missing one.

Proposed fix
-mysql> drop table if exists test.t
+mysql> drop table if exists test.t;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mysql> drop table if exists test.t
mysql> drop table if exists test.t;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/fullstack-test-index/vector/issue_10809.test` at line 55, The SQL
statement "drop table if exists test.t" in the test file is missing its
terminating semicolon; update that statement (the line containing the literal
drop table if exists test.t) to include a trailing ";" so it matches the style
of the other mysql> statements and terminates properly.

@JinheLin JinheLin changed the title WIP Storages: fix divide-by-zero when reading empty merged substreams in DMFile Apr 24, 2026
@JinheLin JinheLin changed the title Storages: fix divide-by-zero when reading empty merged substreams in DMFile WIP Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-triage-completed release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TiFlash crashed with error Integer divide by zero

1 participant