Skip to content

Next-gen columnar: Enum clustered PK returns wrong ordinals on next-gen columnar read path #10851

@JaySon-Huang

Description

@JaySon-Huang

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Prerequisites:

  • TiDB cluster with next-gen / disaggregated storage: TiKV + tikv-worker + TiFlash compute, with columnar enabled (ENABLE_NEXT_GEN_COLUMNAR).
  • TiFlash replica on the test table.
DROP TABLE IF EXISTS test.t_enum;

CREATE TABLE test.t_enum (
    pk ENUM('tidb', 'pd', 'tikv', 'tiflash') PRIMARY KEY CLUSTERED
);

INSERT INTO test.t_enum VALUES ('tidb'), ('tiflash');

ALTER TABLE test.t_enum SET TIFLASH REPLICA 1;

-- Wait until information_schema.tiflash_replica.AVAILABLE = 1 for test.t_enum

SET tidb_isolation_read_engines = 'tiflash';

SELECT * FROM test.t_enum;

SELECT pk, pk + 0 FROM test.t_enum ORDER BY pk + 0;

-- Optional: force MPP + TiFlash scan
SET tidb_enforce_mpp = 1;
SELECT /*+ READ_FROM_STORAGE(TIFLASH[test.t_enum]) */ * FROM test.t_enum;
SELECT /*+ READ_FROM_STORAGE(TIFLASH[test.t_enum]) */ pk, pk + 0 FROM test.t_enum ORDER BY pk + 0;

Compare with SET tidb_isolation_read_engines = 'tikv' (or default TiKV read) on the same table.

2. What did you expect to see? (Required)

Same results as TiKV / TiDB:

Query Expected
SELECT * tidb, tiflash
SELECT pk, pk + 0 ORDER BY pk + 0 tidb / 1, tiflash / 4

MPP / columnar path should return 2 rows with correct enum ordinals (1 and 4).

3. What did you see instead (Required)

On TiFlash (columnar / disaggregated read path), SELECT * returns 2 rows (not 1). The second row is wrong: enum ordinal is decoded as 0, which displays as an empty string, not tiflash.

Example (SELECT *):

+------+
| pk   |
+------+
| tidb |
|      |   -- empty enum (internal ordinal 0); expected: tiflash
+------+
2 rows in set
Query Actual (buggy)
SELECT * 2 rows: tidb, then empty pk (length(pk) = 0, pk + 0 = 0); tiflash not shown
SELECT pk, pk + 0 ORDER BY pk + 0 2 rows: e.g. empty pk / 0.0, tidb / 1.0; tiflash / 4.0 missing
SELECT count(*) 2 (row count matches TiKV; values are wrong)

Observations from debugging (not visible to end users but useful for triage):

  • Proxy RowMvccReader can read two KV rows for tidb and tiflash (enum common handle, is_int_handle=false).
  • TiFlash MPP logs show Finished reading proxy snapshots, rows=2 — row count at the proxy→TiFlash boundary is correct; the second row’s enum ordinal is mis-decoded (0 instead of 4).

Root cause (summary): In kvengine columnar encoding (components/kvengine/src/table/columnar/), enum PK values were written into the TiFlash block bulk buffer with get_fixed_size(Enum) = 8 and u64::to_le_bytes() per row. TiFlash maps TiDB TypeEnum to Enum16 (TiDBTypes.h: M(Enum, 0xf7, VarUInt, Enum16)), and bulk-decodes 2 bytes per row. The width mismatch misaligns ordinals and breaks enum display / pk+0.

Fix direction: Use get_fixed_size(Enum) = 2 and write enum ordinals as (ordinal as u16).to_le_bytes() in push_uint_col_value (including UINT_FLAG common-handle paths in reader.rs).

4. What is your TiFlash version? (Required)

  • Build: Local dev build with next-gen columnar (-DENABLE_NEXT_GEN=1 -DENABLE_NEXT_GEN_COLUMNAR=1), tiflash-proxy-columnar + cloud-storage-engine kvengine.
  • Reproduced on: Disaggregated test cluster (TiKV + tikv-worker + TiFlash compute).
  • Affected component: kvengine columnar → TiFlash RNProxyInputStream / StorageDisaggregatedColumnar (not classic DeltaMerge storage).

Note for reviewers: A local fix in columnar.rs / reader.rs was validated: SQL results match TiKV, and MPP debug log shows proxy block enum ... ordinals=[1,4] with rows=2. Consider compatibility with existing L2 columnar files written under the old 8-byte-per-row layout when rolling out the encoding change.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions