feat(executor): add StringVector for TEXT column support#55
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 19 minutes and 21 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis pull request implements support for variable-length string storage through a new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant VectorBatch
participant StringVector
participant ColumnarTable
participant Storage as Storage Layer
Client->>VectorBatch: create(schema with TEXT/VARCHAR/CHAR)
VectorBatch->>VectorBatch: init_from_schema()
VectorBatch->>StringVector: new StringVector(column_type)
StringVector-->>VectorBatch: created
Client->>VectorBatch: append_tuple(values)
VectorBatch->>StringVector: append(text_value)
StringVector->>StringVector: store string + update null_bitmap
StringVector-->>VectorBatch: appended
Client->>ColumnarTable: append_batch(vector_batch)
ColumnarTable->>StringVector: iterate columns
ColumnarTable->>Storage: write null flags + length prefix + string bytes
Storage-->>ColumnarTable: persisted
Client->>ColumnarTable: read_batch(row_range)
ColumnarTable->>Storage: read null flags + length prefixes + string bytes
Storage-->>ColumnarTable: data loaded
ColumnarTable->>StringVector: populate with deserialized strings
StringVector-->>ColumnarTable: batch ready
ColumnarTable-->>Client: VectorBatch returned
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/storage/columnar_table.cpp`:
- Around line 152-173: read_batch's TEXT branch reads variable-length data
starting at d_in offset 0, so when start_row>0 the string stream isn't advanced
and strings get misaligned with the null bitmap; fix by advancing d_in to the
beginning of the requested row before reading actual_rows. In the branch
handling common::ValueType::TYPE_TEXT/TYPE_VARCHAR/TYPE_CHAR (where you have
d_in and n_in and str_vec/dynamic_cast<executor::StringVector&>), if start_row >
0 first seek d_in to 0 and loop r from 0 to start_row-1: read the 4-byte length
(uint32_t) and then seek forward by that length (use d_in.read to consume or
d_in.seekg(len, std::ios::cur)), handling EOF/errors; after skipping start_row
records proceed to the existing loop that reads actual_rows lengths+data and
appends values. Ensure you still read the null bitmap from n_in as currently
done and add error checks when skipping/reading lengths.
In `@tests/string_vector_tests.cpp`:
- Around line 175-183: The second append in TEST_F
StringVectorTests.SpecialCharacters is a no-op because the literal "emoji: 🎉
NULL: \0 embedded" is truncated at the embedded NUL; update the test to either
build the string with an explicit length so the embedded NUL is preserved (e.g.
construct a std::string with the bytes including '\0' and pass that to
Value::make_text) and then assert vec.get(1).as_text() contains the embedded
NUL, or remove the second vec.append(Value::make_text(...)) and its misleading
comment; modify the StringVector vec, vec.append, and Value::make_text usage
accordingly to reflect the chosen approach.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e55c50ef-3620-4cee-a002-5ac820eecdb0
📒 Files selected for processing (6)
CMakeLists.txtdocs/phases/PHASE_8_ANALYTICS.mdinclude/executor/types.hppsrc/storage/columnar_table.cpptests/columnar_table_tests.cpptests/string_vector_tests.cpp
Summary
StringVectorclass for variable-length string storage in columnar tablesTYPE_TEXT,TYPE_VARCHAR, andTYPE_CHARcolumns inVectorBatchColumnarTablewith length-prefixed formatStringVectorColumnarTableTest plan
string_vector_tests: 14/14 passedcolumnar_table_tests: 12/12 passed (including TEXT tests)analytics_tests: 5/5 passedSummary by CodeRabbit
New Features
Tests
Documentation