Conversation
…ed bugs with __int128
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (8)
include/testing/temp_file.h (2)
38-39: Избыточная меткаpublic:.Метка
public:на строке 38 избыточна — секция уже открыта на строке 23.♻️ Предлагаемое исправление
~TempFile() { Remove(); } - public: const std::filesystem::path& Path() const { return path_; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@include/testing/temp_file.h` around lines 38 - 39, В коде присутствует лишняя метка доступа `public:` перед метода `Path()`; удалите этот лишний `public:` (оставьте только одну открывающую секцию `public:` уже объявленную ранее) чтобы избежать дублирования и сохранить метод `const std::filesystem::path& Path() const` и поле `path_` в существующей публичной секции.
41-48:Remove()помечен какconst— допустимо, но семантически спорно.Метод не изменяет
path_, поэтомуconstтехнически корректен. Однако он имеет побочный эффект (удаление файла), что может ввести в заблуждение при чтении кода. Рекомендую убратьconstдля ясности намерения.Также: ошибка удаления игнорируется — для RAII-деструктора это допустимо (исключения в деструкторе нежелательны), но для тестового утилитарного класса может быть полезно хотя бы логировать неудачи.
♻️ Предлагаемое исправление
private: - void Remove() const { + void Remove() { if (path_.empty()) { return; } std::error_code ec; std::filesystem::remove(path_, ec); + // Опционально: логирование для отладки + // if (ec) { std::cerr << "Failed to remove temp file: " << path_ << '\n'; } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@include/testing/temp_file.h` around lines 41 - 48, The Remove() method is marked const but performs filesystem side-effects and currently swallows errors; change Remove() to be non-const (remove the const qualifier from the method declaration/definition for Remove()) to reflect its mutating/side-effecting behavior, and enhance error handling by checking the std::error_code from std::filesystem::remove(path_, ec) and logging or reporting failures (use the class's logging facility or std::cerr) including path_ and ec.message() so failed deletions are visible during tests.CMakeLists.txt (1)
26-26: Сделайте-O2конфигурационно-зависимым, а не глобальным.
Сейчас оптимизация включается даже в Debug и в санитайзерных целях, что ухудшает отладочность и качество диагностики.Предлагаемое исправление
-add_compile_options(-Wall -Wextra -Werror -Wpedantic -O2) +add_compile_options( + -Wall -Wextra -Werror -Wpedantic + $<$<OR:$<CONFIG:Release>,$<CONFIG:RelWithDebInfo>,$<CONFIG:MinSizeRel>>:-O2> +)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CMakeLists.txt` at line 26, The global add_compile_options line currently forces -O2 for all builds; remove -O2 from add_compile_options(-Wall -Wextra -Werror -Wpedantic -O2) and instead apply optimization only for Release builds: either set CMAKE_C_FLAGS_RELEASE/CMAKE_CXX_FLAGS_RELEASE to include -O2 or add target_compile_options(...) with a generator expression $<CONFIG:Release> or check CMAKE_BUILD_TYPE and append -O2 for Release; update references to add_compile_options and ensure Debug/Sanitizer configs do not receive -O2.include/model/batch.h (1)
16-22: Правило пяти реализовано корректно.Move-операции помечены
noexcept— это правильно для работы с контейнерами STL. Copy-семантика черезClone()обеспечит глубокое копирование.Небольшое замечание: на строке 22 дублируется спецификатор
public:, который уже был объявлен на строке 12.♻️ Удалить дублирующийся спецификатор доступа
~Batch() = default; - public: const Schema& GetSchema() const { return schema_; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@include/model/batch.h` around lines 16 - 22, В классе Batch удалите дублирующийся спецификатор доступа public: (второй экземпляр после объявлений конструктора/операторов/деструктора), оставив единственный public: ранее в объявлении; отредактируйте декларацию конструкций Batch(const Batch&), operator= и т.д. так, чтобы сохранить текущую семантику, но убрать лишний public: маркер.src/support/parsing.cpp (1)
38-49: Отсутствует проверка границ вParseUnsignedPart.Функция не проверяет, что
offset + length <= input.size(). Хотя вызывающий код (ParseDatePoint, ParseTimestamp) проверяет общую длину строки, функция сама по себе небезопасна.🛡️ Предлагаемое исправление
static unsigned ParseUnsignedPart(const std::string_view input, const size_t offset, const size_t length, const std::string_view type_name) { + if (offset + length > input.size()) { + throw Error::InvalidData("parsing", "invalid " + std::string(type_name) + " value"); + } unsigned value = 0; const char* begin = input.data() + offset; const char* end = begin + length;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/support/parsing.cpp` around lines 38 - 49, ParseUnsignedPart lacks bounds checking for offset+length against input.size(); add an explicit check at the start of ParseUnsignedPart (using parameters input, offset, length) that validates offset <= input.size() and offset + length <= input.size(), and if it fails throw Error::InvalidData("parsing", "invalid " + std::string(type_name) + " value") (same error used for parse failures) before computing begin/end or calling std::from_chars so the function is safe to call independently from ParseDatePoint/ParseTimestamp.src/io/batch_io.cpp (1)
43-50: Потенциальный выход за границы массива вEstimateRowBytes.Функция предполагает, что
columns.size() == row.size(), но это не проверяется. Несмотря на то, что вызывающий код вReadNext()проверяетrow.size() != column_countперед вызовом, функцияEstimateRowBytesсама по себе небезопасна при независимом использовании.🛡️ Рекомендуемое исправление
static uint64_t EstimateRowBytes(const std::vector<std::unique_ptr<Column>>& columns, const std::vector<std::string>& row) { + assert(columns.size() == row.size()); uint64_t bytes = 0; for (size_t i = 0; i < columns.size(); ++i) { bytes = AddChecked(bytes, columns[i]->EstimateSizeFromString(row[i])); } return bytes; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/io/batch_io.cpp` around lines 43 - 50, EstimateRowBytes currently assumes columns.size() == row.size() and can read out-of-bounds; add an explicit guard at the start of EstimateRowBytes that compares columns.size() and row.size() and handles mismatches (e.g., throw a std::invalid_argument or log+return) instead of indexing past the end. Keep the rest of the loop using AddChecked and Column::EstimateSizeFromString unchanged; update callers such as ReadNext only if you choose a non-exception error path so they propagate or handle the new error behavior.include/model/variant_column.h (1)
36-61: Потенциальное переполнениеtotal_sizeпри записи.В цикле на строках 37-42 переменная
total_sizeможет переполниться при большом количестве строк. Хотя на практике это маловероятно, для согласованности с другими проверками на переполнение (строка 38) рекомендуется добавить проверку.🛡️ Предлагаемое исправление
if constexpr (std::endian::native == std::endian::little) { size_t total_size = 0; for (const auto& value : values_) { if (value.size() > std::numeric_limits<uint32_t>::max()) { throw Error::Overflow(Derived::ModuleName(), "value exceeds supported size"); } + const size_t entry_size = sizeof(uint32_t) + value.size(); + if (entry_size > std::numeric_limits<size_t>::max() - total_size) { + throw Error::Overflow(Derived::ModuleName(), "total column size overflow"); + } - total_size += sizeof(uint32_t) + value.size(); + total_size += entry_size; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@include/model/variant_column.h` around lines 36 - 61, total_size accumulation can overflow when summing sizeof(uint32_t) + value.size() across many entries; in the loop that iterates values_ (the block computing total_size before creating buffer), add an explicit overflow check before doing total_size += sizeof(uint32_t) + value.size(): ensure total_size + sizeof(uint32_t) + value.size() does not exceed std::numeric_limits<size_t>::max() (or at least std::numeric_limits<uint32_t>::max() if you want symmetric checks), and if it would overflow, throw Error::Overflow(Derived::ModuleName(), "total size exceeds supported size"); keep the rest of the logic (buffer construction, memcpy, WriteBytes) unchanged.include/model/fixed_column.h (1)
46-61: Потенциальная проблема при ошибке чтения.После
values_.resize(row_count)на строке 52, еслиReadBytesбросит исключение, вектор останется с неинициализированными/мусорными значениями. Рекомендуется использовать временный буфер или очистить вектор при ошибке.♻️ Вариант с временным буфером
void ReadFrom(std::istream& in, const uint32_t row_count, const uint64_t size) override { const uint64_t expected = static_cast<uint64_t>(row_count) * sizeof(T); if (size != expected) { throw Error::Mismatch(Derived::ModuleName(), "column chunk size mismatch"); } - values_.resize(row_count); if constexpr (std::endian::native == std::endian::little) { + std::vector<T> temp(row_count); - ReadBytes(in, reinterpret_cast<char*>(values_.data()), values_.size() * sizeof(T)); + ReadBytes(in, reinterpret_cast<char*>(temp.data()), temp.size() * sizeof(T)); + values_ = std::move(temp); return; } + std::vector<T> temp; + temp.reserve(row_count); for (uint32_t row_index = 0; row_index < row_count; ++row_index) { - values_[row_index] = ReadStream<T>(in); + temp.push_back(ReadStream<T>(in)); } + values_ = std::move(temp); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@include/model/fixed_column.h` around lines 46 - 61, В ReadFrom проверка размера корректна, но после values_.resize(row_count) возможное исключение из ReadBytes оставит values_ в частично инициализированном состоянии; исправьте это, читая данные во временный буфер и только после успешного чтения присваивая/меняя его с values_. Для little-endian ветви создайте временный std::vector<T> tmp(row_count), вызовите ReadBytes(reinterpret_cast<char*>(tmp.data()), tmp.size()*sizeof(T)) и затем values_.swap(tmp); для big-endian ветви аналогично заполняйте tmp с помощью ReadStream<T>(in) в цикле и в конце swap с values_; это гарантирует, что при броске исключения values_ останется нетронутым.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmake/dependencies_package.cmake`:
- Around line 3-6: The code currently forces ENABLE_TESTS off into the CMake
cache when GTest_FOUND is false (using set(ENABLE_TESTS OFF CACHE BOOL "Build
tests" FORCE)), which persists across runs; change this so the temporary disable
only affects the current configure: remove the CACHE ... FORCE usage and set
ENABLE_TESTS with a non-cached value (e.g., set(ENABLE_TESTS OFF)) or otherwise
only modify the variable for this scope when GTest_FOUND is false; ensure the
check still uses GTest_FOUND and that any user-visible option for ENABLE_TESTS
remains controllable by the cache/options mechanism.
In `@include/support/int128.h`:
- Around line 17-18: Заменить устаревший typedef-вариант на каноническое
написание: в объявлении типа Int128 (символ Int128) использовать __int128 вместо
__int128_t, чтобы согласовать его с существующим помощью UInt128 (символ
UInt128) и обеспечить совместимость с GCC/Clang; просто поправить определение
типа Int128 на использование __int128.
In `@src/convert/csv_columnar.cpp`:
- Around line 10-12: Add regression unit tests for the changed logic in
ConvertColumnarToCsv: (1) a test that calls ConvertColumnarToCsv (or the public
wrapper that validates max_rows_per_group) with max_rows_per_group == 0 and
asserts that Error::InvalidArgument is thrown and that its message contains "row
group size must be > 0" (match the new contract), and (2) a test that invokes
ConvertColumnarToCsv on a small columnar input that does not fill a full group
and asserts that the internal buffer is flushed at the end (e.g., by verifying
the output contains the remaining rows or by mocking/inspecting the Flush()
side-effect). Ensure tests reference the ConvertColumnarToCsv entry point and
validate both exception type/message and final Flush behavior.
In `@src/io/csv.cpp`:
- Around line 16-37: Add regression tests covering the CSV parser and post-move
behavior: create unit tests that exercise CsvReader::CsvReader(CsvReader&&),
CsvReader::operator=(CsvReader&&) and RebindAfterMove by moving readers and
verifying they continue to parse correctly; include parsing cases for escaped
quotes (e.g. fields with doubled quotes), CRLF line endings, a final line
without a trailing '\n', empty fields, and combinations thereof to ensure the
character-by-character parser and moved-from/moved-to streams behave correctly
after the refactor.
- Around line 42-57: The parser currently calls streambuf methods
(buffer->sbumpc()/sgetc()), which bypass istream error and exception handling;
replace those direct buffer calls with istream methods: use in_->get() instead
of sbumpc() and in_->peek() instead of sgetc() at each of the four sites
referenced, capture the result into Traits::int_type (e.g. Traits::int_type next
= in_->get()), and then check for EOF/error either via Traits::eq_int_type(next,
Traits::eof()) or by testing in_->eof()/in_->fail() so the stream's
failbit/badbit/exception mask are honored (update code around variables buffer,
sbumpc, sgetc, and uses of Traits accordingly).
- Around line 16-18: The move constructor currently nulls other.in_/other.out_
causing UB when methods like ReadRow(), WriteRow(), Flush() dereference them;
instead, after moving owned_in_/owned_out_ keep other.in_/other.out_ bound to
&other.owned_in_ and &other.owned_out_ (i.e. call RebindAfterMove to point the
moved-from object's in_/out_ to its owned_* members rather than setting them to
nullptr) so moved-from CsvReader objects remain in a valid state; update
CsvReader::CsvReader(CsvReader&&), any move-assignment logic and RebindAfterMove
usage to rebind to &other.owned_in_/&other.owned_out_ rather than nulling
pointers and ensure ReadRow, WriteRow, Flush keep assuming non-null in_/out_.
In `@src/model/column_int32.cpp`:
- Around line 5-7: Add unit tests that exercise Int32Column::AppendFromString
and Int32Column::ValueAsString for both successful round-trips and parsing
failures: cover valid decimal strings, negative numbers, INT32_MIN and INT32_MAX
boundary values, and invalid inputs (empty string, non-numeric, out-of-range) to
assert that ParseInt32 produces the expected values or errors; mirror the same
pattern for all other new column_* implementations in this PR so each
AppendFromString/ValueAsString pair has tests for round-trip correctness and
error handling. Ensure tests call AppendFromString, then verify ValueAt(row) or
ValueAsString(row) as appropriate and assert that parsing errors are
raised/handled consistently.
In `@src/support/parsing.cpp`:
- Around line 246-268: TimestampToString can receive negative subseconds for
pre-1970 timestamps (from ParseTimestamp), causing a "-" to appear in the
fractional part; fix by normalizing subseconds before formatting: compute
int64_t fractional = tod.subseconds().count(); if (fractional < 0) fractional =
-fractional; then zero-pad and trim trailing zeros as currently done (use the
absolute value when calling std::to_string/frac.insert/frac.pop_back), and add
unit tests for pre-epoch dates (e.g., "1969-12-31 23:59:59.*") to cover
ParseTimestamp -> TimestampToString roundtrip; alternatively, if you prefer to
reject unsupported ranges, add range validation at the start of
TimestampToString and document the limitation instead of normalizing.
---
Nitpick comments:
In `@CMakeLists.txt`:
- Line 26: The global add_compile_options line currently forces -O2 for all
builds; remove -O2 from add_compile_options(-Wall -Wextra -Werror -Wpedantic
-O2) and instead apply optimization only for Release builds: either set
CMAKE_C_FLAGS_RELEASE/CMAKE_CXX_FLAGS_RELEASE to include -O2 or add
target_compile_options(...) with a generator expression $<CONFIG:Release> or
check CMAKE_BUILD_TYPE and append -O2 for Release; update references to
add_compile_options and ensure Debug/Sanitizer configs do not receive -O2.
In `@include/model/batch.h`:
- Around line 16-22: В классе Batch удалите дублирующийся спецификатор доступа
public: (второй экземпляр после объявлений конструктора/операторов/деструктора),
оставив единственный public: ранее в объявлении; отредактируйте декларацию
конструкций Batch(const Batch&), operator= и т.д. так, чтобы сохранить текущую
семантику, но убрать лишний public: маркер.
In `@include/model/fixed_column.h`:
- Around line 46-61: В ReadFrom проверка размера корректна, но после
values_.resize(row_count) возможное исключение из ReadBytes оставит values_ в
частично инициализированном состоянии; исправьте это, читая данные во временный
буфер и только после успешного чтения присваивая/меняя его с values_. Для
little-endian ветви создайте временный std::vector<T> tmp(row_count), вызовите
ReadBytes(reinterpret_cast<char*>(tmp.data()), tmp.size()*sizeof(T)) и затем
values_.swap(tmp); для big-endian ветви аналогично заполняйте tmp с помощью
ReadStream<T>(in) в цикле и в конце swap с values_; это гарантирует, что при
броске исключения values_ останется нетронутым.
In `@include/model/variant_column.h`:
- Around line 36-61: total_size accumulation can overflow when summing
sizeof(uint32_t) + value.size() across many entries; in the loop that iterates
values_ (the block computing total_size before creating buffer), add an explicit
overflow check before doing total_size += sizeof(uint32_t) + value.size():
ensure total_size + sizeof(uint32_t) + value.size() does not exceed
std::numeric_limits<size_t>::max() (or at least
std::numeric_limits<uint32_t>::max() if you want symmetric checks), and if it
would overflow, throw Error::Overflow(Derived::ModuleName(), "total size exceeds
supported size"); keep the rest of the logic (buffer construction, memcpy,
WriteBytes) unchanged.
In `@include/testing/temp_file.h`:
- Around line 38-39: В коде присутствует лишняя метка доступа `public:` перед
метода `Path()`; удалите этот лишний `public:` (оставьте только одну открывающую
секцию `public:` уже объявленную ранее) чтобы избежать дублирования и сохранить
метод `const std::filesystem::path& Path() const` и поле `path_` в существующей
публичной секции.
- Around line 41-48: The Remove() method is marked const but performs filesystem
side-effects and currently swallows errors; change Remove() to be non-const
(remove the const qualifier from the method declaration/definition for Remove())
to reflect its mutating/side-effecting behavior, and enhance error handling by
checking the std::error_code from std::filesystem::remove(path_, ec) and logging
or reporting failures (use the class's logging facility or std::cerr) including
path_ and ec.message() so failed deletions are visible during tests.
In `@src/io/batch_io.cpp`:
- Around line 43-50: EstimateRowBytes currently assumes columns.size() ==
row.size() and can read out-of-bounds; add an explicit guard at the start of
EstimateRowBytes that compares columns.size() and row.size() and handles
mismatches (e.g., throw a std::invalid_argument or log+return) instead of
indexing past the end. Keep the rest of the loop using AddChecked and
Column::EstimateSizeFromString unchanged; update callers such as ReadNext only
if you choose a non-exception error path so they propagate or handle the new
error behavior.
In `@src/support/parsing.cpp`:
- Around line 38-49: ParseUnsignedPart lacks bounds checking for offset+length
against input.size(); add an explicit check at the start of ParseUnsignedPart
(using parameters input, offset, length) that validates offset <= input.size()
and offset + length <= input.size(), and if it fails throw
Error::InvalidData("parsing", "invalid " + std::string(type_name) + " value")
(same error used for parse failures) before computing begin/end or calling
std::from_chars so the function is safe to call independently from
ParseDatePoint/ParseTimestamp.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7f870d37-099e-4342-8e13-eaa1f94d99bd
⛔ Files ignored due to path filters (16)
.gitignoreis excluded by none and included by nonebenchmarks/CMakeLists.txtis excluded by none and included by nonebenchmarks/benchmark_csv_to_columnar.cppis excluded by none and included by noneexamples/CMakeLists.txtis excluded by none and included by noneexamples/data.csvis excluded by!**/*.csvand included by noneexamples/example_batch.cppis excluded by none and included by noneexamples/example_columnar.cppis excluded by none and included by noneexamples/example_columns.cppis excluded by none and included by noneexamples/example_csv.cppis excluded by none and included by noneexamples/example_fileio.cppis excluded by none and included by noneexamples/schema.csvis excluded by!**/*.csvand included by nonetests/test_batch.cppis excluded by none and included by nonetests/test_columnar.cppis excluded by none and included by nonetests/test_columns.cppis excluded by none and included by nonetests/test_csv.cppis excluded by none and included by nonetests/test_fileio.cppis excluded by none and included by none
📒 Files selected for processing (47)
CMakeLists.txtcmake/dependencies_package.cmakeinclude/io/batch_io.hinclude/io/columnar_batch_io.hinclude/io/csv.hinclude/io/csv_batch_io.hinclude/io/stream.hinclude/model/batch.hinclude/model/column.hinclude/model/column_boolean.hinclude/model/column_character.hinclude/model/column_date.hinclude/model/column_int128.hinclude/model/column_int16.hinclude/model/column_int32.hinclude/model/column_int64.hinclude/model/column_string.hinclude/model/column_timestamp.hinclude/model/fixed_column.hinclude/model/schema.hinclude/model/variant_column.hinclude/support/error.hinclude/support/int128.hinclude/support/parsing.hinclude/testing/temp_file.hsrc/CMakeLists.txtsrc/app/main.cppsrc/convert/csv_columnar.cppsrc/io/batch_io.cppsrc/io/csv.cppsrc/io/fileio.cppsrc/io/stream.cppsrc/model/batch.cppsrc/model/column.cppsrc/model/column_boolean.cppsrc/model/column_character.cppsrc/model/column_date.cppsrc/model/column_int128.cppsrc/model/column_int16.cppsrc/model/column_int32.cppsrc/model/column_int64.cppsrc/model/column_string.cppsrc/model/column_timestamp.cppsrc/model/metadata.cppsrc/model/schema.cppsrc/support/error.cppsrc/support/parsing.cpp
| if (NOT GTest_FOUND) | ||
| message(WARNING "GTest not found; tests will be disabled for this configure run.") | ||
| set(ENABLE_TESTS OFF CACHE BOOL "Build tests" FORCE) | ||
| endif () |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the file context
cat -n cmake/dependencies_package.cmake | head -20Repository: ik4rd/columnar-engine
Length of output: 360
🏁 Script executed:
# Search for ENABLE_TESTS usage in the codebase
rg "ENABLE_TESTS" -nRepository: ik4rd/columnar-engine
Length of output: 349
🏁 Script executed:
# Look for any CMakeLists.txt or other cmake files that might be affected
git ls-files | grep -i cmake | head -20Repository: ik4rd/columnar-engine
Length of output: 245
🏁 Script executed:
# Check if ENABLE_TESTS is used elsewhere that might depend on cache behavior
rg -B 3 -A 3 "ENABLE_TESTS" CMakeLists.txt | head -40Repository: ik4rd/columnar-engine
Length of output: 707
🏁 Script executed:
# Check if there's any code that modifies or checks CMakeCache.txt
rg "CMakeCache" -iRepository: ik4rd/columnar-engine
Length of output: 47
🏁 Script executed:
# Look at the workflow file to understand how configure is run
cat -n .github/workflows/cmake-single-platform.ymlRepository: ik4rd/columnar-engine
Length of output: 2143
Не используйте CACHE ... FORCE для временного отключения ENABLE_TESTS.
Сообщение утверждает отключение только для текущей конфигурации, но CACHE ... FORCE сохраняет значение OFF в CMakeCache.txt, которое будет действовать при последующих запусках cmake. При локальной разработке, если один раз запустить cmake без GTest, отключение тестов останется в кэше, даже если GTest будет позже установлен.
Исправление
- set(ENABLE_TESTS OFF CACHE BOOL "Build tests" FORCE)
+ set(ENABLE_TESTS OFF)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (NOT GTest_FOUND) | |
| message(WARNING "GTest not found; tests will be disabled for this configure run.") | |
| set(ENABLE_TESTS OFF CACHE BOOL "Build tests" FORCE) | |
| endif () | |
| if (NOT GTest_FOUND) | |
| message(WARNING "GTest not found; tests will be disabled for this configure run.") | |
| set(ENABLE_TESTS OFF) | |
| endif () |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmake/dependencies_package.cmake` around lines 3 - 6, The code currently
forces ENABLE_TESTS off into the CMake cache when GTest_FOUND is false (using
set(ENABLE_TESTS OFF CACHE BOOL "Build tests" FORCE)), which persists across
runs; change this so the temporary disable only affects the current configure:
remove the CACHE ... FORCE usage and set ENABLE_TESTS with a non-cached value
(e.g., set(ENABLE_TESTS OFF)) or otherwise only modify the variable for this
scope when GTest_FOUND is false; ensure the check still uses GTest_FOUND and
that any user-visible option for ENABLE_TESTS remains controllable by the
cache/options mechanism.
| using Int128 = __int128_t; | ||
| using UInt128 = unsigned __int128; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check the current state of the file
cat -n include/support/int128.h | head -30Repository: ik4rd/columnar-engine
Length of output: 784
🏁 Script executed:
# Search for usages of Int128 and UInt128 in the codebase
rg "Int128|UInt128" --type cpp --type h -nRepository: ik4rd/columnar-engine
Length of output: 2264
🌐 Web query:
__int128 vs __int128_t portability toolchain compatibility
💡 Result:
What they are
__int128/unsigned __int128are GCC/Clang language extensions (a builtin 128-bit integer type) available only on targets where the compiler has a 128-bit integer mode. (gcc.gnu.org)__int128_t/__uint128_tare non-standard typedef names used by some toolchains/headers; they’re not the “canonical” spelling in GCC docs, and historically were considered “old typedefs” compared to the__int128keyword form. (gcc.gnu.org)
Portability / toolchain compatibility (practical guidance)
- GCC + Clang (Linux/macOS, many embedded targets):
__int128is commonly supported; library support is inconsistent (e.g., printing needs custom code;std::numeric_limits/ traits support has had historical caveats). (gcc.gnu.org) - MSVC (Visual C++): no native
__int128(so code using it won’t be portable to MSVC without a fallback type). (lists.llvm.org) - C23 / “real”
int128_t: C23 tiesint128_t/uint128_tin<stdint.h>to the existence of a suitable 128-bit type and also implies a bunch of library surface area (macros, format macros in<inttypes.h>, etc.), so availability depends on both compiler + C library support, not just the frontend. (gcc.gnu.org)
Recommendation for maximum portability
- Prefer the canonical extension spelling when available:
__int128andunsigned __int128(not__int128_t).
- Gate it with a feature check and provide a fallback:
- Common practice is to test
__SIZEOF_INT128__(widely used, though not perfectly standardized/documented across all historical versions). (stackoverflow.com)
- Common practice is to test
- For MSVC builds, use a fallback such as:
- a library type (
boost::multiprecision::int128_t/cpp_int), or - a custom 128-bit struct implementation.
- a library type (
Minimal pattern
`#if` defined(__SIZEOF_INT128__)
typedef __int128 i128;
typedef unsigned __int128 u128;
`#else`
/* fallback: boost::multiprecision or custom implementation */
`#endif`Sources (index)
- GCC docs:
__int128extension (gcc.gnu.org) - GCC mailing list: prefers
__int128keyword; calls__int128_t/__uint128_t“old typedefs” (gcc.gnu.org) - LLVM-dev: MSVC doesn’t support
__int128(lists.llvm.org) - GCC libstdc++ notes: evolving “integral type” handling for
__int128(gcc.gnu.org) - C23/C++ papers/notes on
int128_trequirements & library implications (gcc.gnu.org) - Usage note mentioning
__SIZEOF_INT128__detection (stackoverflow.com)
Citations:
- 1: https://gcc.gnu.org/onlinedocs/gcc-8.1.0/gcc/_005f_005fint128.html
- 2: https://gcc.gnu.org/pipermail/gcc-patches/2010-May/284424.html?utm_source=openai
- 3: https://gcc.gnu.org/onlinedocs/gcc-8.1.0/gcc/_005f_005fint128.html
- 4: https://lists.llvm.org/pipermail/llvm-dev/2016-December/108407.html?utm_source=openai
- 5: https://gcc.gnu.org/pipermail/libstdc%2B%2B/2024-January/058116.html?utm_source=openai
- 6: https://stackoverflow.com/questions/16088282/is-there-a-128-bit-integer-in-gcc?utm_source=openai
- 7: https://gcc.gnu.org/onlinedocs/gcc-8.1.0/gcc/_005f_005fint128.html
- 8: https://gcc.gnu.org/pipermail/gcc-patches/2010-May/284424.html?utm_source=openai
- 9: https://lists.llvm.org/pipermail/llvm-dev/2016-December/108407.html?utm_source=openai
- 10: https://gcc.gnu.org/pipermail/gcc-cvs/2025-July/427930.html?utm_source=openai
- 11: https://gcc.gnu.org/pipermail/libstdc%2B%2B/2024-January/058116.html?utm_source=openai
- 12: https://stackoverflow.com/questions/16088282/is-there-a-128-bit-integer-in-gcc?utm_source=openai
Использовать каноническую форму __int128 вместо __int128_t для лучшей совместимости.
На строке 17 используется __int128_t, а на строке 18 уже правильно применяется unsigned __int128. По рекомендациям GCC/Clang, __int128 — это канонический спеллинг, а __int128_t считается устаревшим typedef'ом. Кроме того, это устраняет несогласованность в коде.
Предлагаемое исправление
-using Int128 = __int128_t;
+using Int128 = __int128;
using UInt128 = unsigned __int128;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| using Int128 = __int128_t; | |
| using UInt128 = unsigned __int128; | |
| using Int128 = __int128; | |
| using UInt128 = unsigned __int128; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@include/support/int128.h` around lines 17 - 18, Заменить устаревший
typedef-вариант на каноническое написание: в объявлении типа Int128 (символ
Int128) использовать __int128 вместо __int128_t, чтобы согласовать его с
существующим помощью UInt128 (символ UInt128) и обеспечить совместимость с
GCC/Clang; просто поправить определение типа Int128 на использование __int128.
| if (max_rows_per_group == 0) { | ||
| throw error::MakeError("columnar", "row group size must be > 0"); | ||
| throw Error::InvalidArgument("columnar", "row group size must be > 0"); | ||
| } |
There was a problem hiding this comment.
Добавьте регрессионные тесты для изменённой логики.
На Line 11 изменился контракт ошибки, а на Line 41 добавлен явный Flush(). Нужны тесты минимум на:
max_rows_per_group == 0(ожидаемый тип/сообщение исключения),- финальный сброс буфера в
ConvertColumnarToCsv.
As per coding guidelines "Если меняется логика — попроси тесты."
Also applies to: 37-41
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/convert/csv_columnar.cpp` around lines 10 - 12, Add regression unit tests
for the changed logic in ConvertColumnarToCsv: (1) a test that calls
ConvertColumnarToCsv (or the public wrapper that validates max_rows_per_group)
with max_rows_per_group == 0 and asserts that Error::InvalidArgument is thrown
and that its message contains "row group size must be > 0" (match the new
contract), and (2) a test that invokes ConvertColumnarToCsv on a small columnar
input that does not fill a full group and asserts that the internal buffer is
flushed at the end (e.g., by verifying the output contains the remaining rows or
by mocking/inspecting the Flush() side-effect). Ensure tests reference the
ConvertColumnarToCsv entry point and validate both exception type/message and
final Flush behavior.
| CsvReader::CsvReader(CsvReader&& other) noexcept : owned_in_(std::move(other.owned_in_)) { | ||
| RebindAfterMove(other.in_ == &other.owned_in_, other.in_); | ||
| other.in_ = nullptr; | ||
| } | ||
|
|
||
| CsvReader& CsvReader::operator=(CsvReader&& other) noexcept { | ||
| if (this != &other) { | ||
| owned_in_ = std::move(other.owned_in_); | ||
| RebindAfterMove(other.in_ == &other.owned_in_, other.in_); | ||
| other.in_ = nullptr; | ||
| } | ||
| return *this; | ||
| } | ||
|
|
||
| void CsvReader::RebindAfterMove(const bool uses_owned_stream, std::istream* source_stream) noexcept { | ||
| if (uses_owned_stream) { | ||
| in_ = &owned_in_; | ||
| return; | ||
| } | ||
| in_ = source_stream; | ||
| } | ||
|
|
There was a problem hiding this comment.
Нужны регрессионные тесты на новый автомат парсинга CSV и move-сценарии.
Здесь меняется и посимвольный парсер, и post-move поведение reader/writer. Без кейсов на escaped quotes, CRLF, последнюю строку без \n, пустые поля и move ctor/assignment такой рефакторинг тяжело удерживать.
As per coding guidelines, **/*.{h,hpp,cpp,cc,cxx}: Если меняется логика — попроси тесты.
Also applies to: 38-112, 118-173
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/io/csv.cpp` around lines 16 - 37, Add regression tests covering the CSV
parser and post-move behavior: create unit tests that exercise
CsvReader::CsvReader(CsvReader&&), CsvReader::operator=(CsvReader&&) and
RebindAfterMove by moving readers and verifying they continue to parse
correctly; include parsing cases for escaped quotes (e.g. fields with doubled
quotes), CRLF line endings, a final line without a trailing '\n', empty fields,
and combinations thereof to ensure the character-by-character parser and
moved-from/moved-to streams behave correctly after the refactor.
| CsvReader::CsvReader(CsvReader&& other) noexcept : owned_in_(std::move(other.owned_in_)) { | ||
| RebindAfterMove(other.in_ == &other.owned_in_, other.in_); | ||
| other.in_ = nullptr; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Проверка: показать, что moved-from указатели зануляются, а затем
# используются без null-guard в публичных методах.
rg -n -C2 'other\.(in_|out_)\s*=\s*nullptr|in_->|out_->' src/io/csv.cppRepository: ik4rd/columnar-engine
Length of output: 1192
🏁 Script executed:
# Retrieve the full RebindAfterMove implementation
rg -n -A 10 'void RebindAfterMove' src/io/csv.cppRepository: ik4rd/columnar-engine
Length of output: 47
🏁 Script executed:
# Retrieve the full ReadRow implementation
rg -n -A 30 'bool CsvReader::ReadRow' src/io/csv.cppRepository: ik4rd/columnar-engine
Length of output: 1058
🏁 Script executed:
# Retrieve the full WriteRow and Flush implementations
rg -n -A 15 'void CsvWriter::(WriteRow|Flush)' src/io/csv.cppRepository: ik4rd/columnar-engine
Length of output: 1037
🏁 Script executed:
# Check the class definition for structure
fd -e h csv | xargs cat -n | head -100Repository: ik4rd/columnar-engine
Length of output: 3709
🏁 Script executed:
# Search for RebindAfterMove implementation more broadly
rg -n 'RebindAfterMove' src/io/csv.cppRepository: ik4rd/columnar-engine
Length of output: 555
🏁 Script executed:
# Get line count and structure of csv.cpp
wc -l src/io/csv.cpp && echo "---" && head -50 src/io/csv.cppRepository: ik4rd/columnar-engine
Length of output: 1505
🏁 Script executed:
# Get the full RebindAfterMove implementations for both classes
sed -n '30,35p;132,137p' src/io/csv.cppRepository: ik4rd/columnar-engine
Length of output: 475
🏁 Script executed:
# Verify no null-checks exist in the methods that dereference in_ and out_
rg -n 'in_\s*!=\s*nullptr|out_\s*!=\s*nullptr|if.*in_|if.*out_' src/io/csv.cpp | head -20Repository: ik4rd/columnar-engine
Length of output: 168
Moved-from объекты остаются в небезопасном состоянии с dereferencing нулевых указателей.
После move операции other.in_ и other.out_ зануляются, но ReadRow() (линия 42: in_->rdbuf()), WriteRow() (линии 143, 148, 152, 155, 162: *out_ << ...) и Flush() (линии 169–170: out_->flush(), *out_) разыменовывают эти указатели без проверок на null. Это приводит к UB при любом вызове метода на moved-from объекте.
Вместо обнуления указателя привяжите его к &other.owned_*_. Хотя other.owned_*_ является moved-from объектом, он всё ещё валиден и безопасен для использования (например, можно вызвать rdbuf() на moved-from std::ifstream).
Рекомендуемое исправление
CsvReader::CsvReader(CsvReader&& other) noexcept : owned_in_(std::move(other.owned_in_)) {
RebindAfterMove(other.in_ == &other.owned_in_, other.in_);
- other.in_ = nullptr;
+ other.in_ = &other.owned_in_;
}
CsvReader& CsvReader::operator=(CsvReader&& other) noexcept {
if (this != &other) {
owned_in_ = std::move(other.owned_in_);
RebindAfterMove(other.in_ == &other.owned_in_, other.in_);
- other.in_ = nullptr;
+ other.in_ = &other.owned_in_;
}
return *this;
}
CsvWriter::CsvWriter(CsvWriter&& other) noexcept : owned_out_(std::move(other.owned_out_)) {
RebindAfterMove(other.out_ == &other.owned_out_, other.out_);
- other.out_ = nullptr;
+ other.out_ = &other.owned_out_;
}
CsvWriter& CsvWriter::operator=(CsvWriter&& other) noexcept {
if (this != &other) {
owned_out_ = std::move(other.owned_out_);
RebindAfterMove(other.out_ == &other.owned_out_, other.out_);
- other.out_ = nullptr;
+ other.out_ = &other.owned_out_;
}
return *this;
}Также затрагивает: линии 23–25, 118–120, 125–127.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/io/csv.cpp` around lines 16 - 18, The move constructor currently nulls
other.in_/other.out_ causing UB when methods like ReadRow(), WriteRow(), Flush()
dereference them; instead, after moving owned_in_/owned_out_ keep
other.in_/other.out_ bound to &other.owned_in_ and &other.owned_out_ (i.e. call
RebindAfterMove to point the moved-from object's in_/out_ to its owned_* members
rather than setting them to nullptr) so moved-from CsvReader objects remain in a
valid state; update CsvReader::CsvReader(CsvReader&&), any move-assignment logic
and RebindAfterMove usage to rebind to &other.owned_in_/&other.owned_out_ rather
than nulling pointers and ensure ReadRow, WriteRow, Flush keep assuming non-null
in_/out_.
| auto* buffer = in_->rdbuf(); | ||
| using Traits = std::streambuf::traits_type; | ||
| /* До этого чтение происходило через istream::get() и peek() | ||
| * Traits требуется только как вспомогательный тип для: | ||
| * - eof() | ||
| * - to_chat_type() | ||
| * - eq_int_type() | ||
| */ | ||
|
|
||
| std::string field; | ||
| bool in_quotes = false; | ||
| bool saw_data = false; | ||
|
|
||
| while (true) { | ||
| const int next = in_->get(); | ||
| if (next == EOF) { | ||
| const Traits::int_type next = buffer->sbumpc(); | ||
|
|
||
| if (Traits::eq_int_type(next, Traits::eof())) { |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Do direct calls to std::streambuf::sbumpc/sgetc update the owning std::istream's failbit/badbit and honor the stream exception mask, or is that state management only done by std::istream operations such as get()/peek()?
💡 Result:
Direct calls on the buffer (e.g. is.rdbuf()->sbumpc() / sgetc()) do not update the owning std::istream’s rdstate() (so they won’t set failbit/badbit/eofbit for you), and they do not “honor” the stream’s exception mask (they won’t automatically translate conditions into setstate(...) and then throw ios_base::failure).
That state/exception-mask management is specified as behavior of std::basic_istream input functions: when an input function calls rdbuf()->sbumpc()/sgetc() and gets traits::eof(), the input function does setstate(eofbit) (and other bits as required) and that setstate “may throw” according to the exception mask; similarly, if a called buffer operation throws, the input function sets badbit and rethrows only if exceptions() & badbit is set. [1]
This is also reflected in the description of basic_istream::sentry: during whitespace skipping it may call sbumpc/sgetc(), and then the sentry/istream logic calls setstate(failbit|eofbit) (which may throw). [2]
So: istream operations like get()/peek() (and generally anything specified as an istream “input function”) manage the stream state and exception mask; raw streambuf calls don’t. [1][2]
Sources: [1] [2]
🏁 Script executed:
# Examine the file and specific line ranges
wc -l src/io/csv.cppRepository: ik4rd/columnar-engine
Length of output: 83
🏁 Script executed:
# View the code around lines 42-57
sed -n '42,57p' src/io/csv.cppRepository: ik4rd/columnar-engine
Length of output: 544
🏁 Script executed:
# View the code around lines 72-75
sed -n '70,76p' src/io/csv.cppRepository: ik4rd/columnar-engine
Length of output: 348
🏁 Script executed:
# View the code around lines 103-105
sed -n '101,107p' src/io/csv.cppRepository: ik4rd/columnar-engine
Length of output: 239
🏁 Script executed:
# Search for all sbumpc and sgetc calls in the file
rg -n 'sbumpc|sgetc' src/io/csv.cppRepository: ik4rd/columnar-engine
Length of output: 337
Прямые вызовы rdbuf() приводят к потере сигнализации I/O-ошибок.
sbumpc() и sgetc() обходят механизм управления состоянием istream: они не устанавливают failbit/badbit/eofbit и не учитывают exception mask. В результате реальные ошибки чтения могут быть ошибочно интерпретированы как EOF и привести к молчаливому обрезанию входа. Для CSV-парсера следует использовать get()/peek(), которые правильно управляют состоянием потока и исключениями.
Требуется замена на всех четырёх местах: строки 55, 72, 74, 104–105.
Вариант исправления
- auto* buffer = in_->rdbuf();
- using Traits = std::streambuf::traits_type;
+ using Traits = std::char_traits<char>;
...
- const Traits::int_type next = buffer->sbumpc();
+ const Traits::int_type next = in_->get();
...
- const Traits::int_type peek = buffer->sgetc();
- if (Traits::to_char_type(peek) == '"') {
- buffer->sbumpc();
+ const Traits::int_type peek = in_->peek();
+ if (Traits::eq_int_type(peek, Traits::to_int_type('"'))) {
+ in_->get();
row.back().push_back('"');
} else {
in_quotes = false;
}
...
- if (Traits::to_char_type(buffer->sgetc()) == '\n') {
- buffer->sbumpc();
+ if (const Traits::int_type peek = in_->peek();
+ Traits::eq_int_type(peek, Traits::to_int_type('\n'))) {
+ in_->get();
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/io/csv.cpp` around lines 42 - 57, The parser currently calls streambuf
methods (buffer->sbumpc()/sgetc()), which bypass istream error and exception
handling; replace those direct buffer calls with istream methods: use in_->get()
instead of sbumpc() and in_->peek() instead of sgetc() at each of the four sites
referenced, capture the result into Traits::int_type (e.g. Traits::int_type next
= in_->get()), and then check for EOF/error either via Traits::eq_int_type(next,
Traits::eof()) or by testing in_->eof()/in_->fail() so the stream's
failbit/badbit/exception mask are honored (update code around variables buffer,
sbumpc, sgetc, and uses of Traits accordingly).
| void Int32Column::AppendFromString(const std::string& value) { AppendValue(ParseInt32(value)); } | ||
|
|
||
| std::string Int32Column::ValueAsString(const size_t row) const { return std::to_string(ValueAt(row)); } |
There was a problem hiding this comment.
Нужны тесты на round-trip и ошибки парсинга для новых fixed-column реализаций.
Добавьте тесты для Int32Column (валидные/невалидные строки, граничные значения), и аналогично для остальных новых column_* в этом PR, чтобы зафиксировать поведение AppendFromString/ValueAsString.
As per coding guidelines "Если меняется логика — попроси тесты."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/model/column_int32.cpp` around lines 5 - 7, Add unit tests that exercise
Int32Column::AppendFromString and Int32Column::ValueAsString for both successful
round-trips and parsing failures: cover valid decimal strings, negative numbers,
INT32_MIN and INT32_MAX boundary values, and invalid inputs (empty string,
non-numeric, out-of-range) to assert that ParseInt32 produces the expected
values or errors; mirror the same pattern for all other new column_*
implementations in this PR so each AppendFromString/ValueAsString pair has tests
for round-trip correctness and error handling. Ensure tests call
AppendFromString, then verify ValueAt(row) or ValueAsString(row) as appropriate
and assert that parsing errors are raised/handled consistently.
| std::string TimestampToString(const int64_t value) { | ||
| const std::chrono::sys_time<std::chrono::microseconds> timestamp{std::chrono::microseconds{value}}; | ||
| const auto day_point = std::chrono::floor<std::chrono::days>(timestamp); | ||
| const std::chrono::hh_mm_ss<std::chrono::microseconds> tod{timestamp - day_point}; | ||
|
|
||
| std::ostringstream out; | ||
| out << FormatDateParts(std::chrono::year_month_day{day_point}) << ' ' << std::setfill('0') << std::setw(2) | ||
| << tod.hours().count() << ':' << std::setw(2) << tod.minutes().count() << ':' << std::setw(2) | ||
| << tod.seconds().count(); | ||
|
|
||
| const int64_t fractional = tod.subseconds().count(); | ||
|
|
||
| if (fractional != 0) { | ||
| std::string frac = std::to_string(fractional); | ||
| frac.insert(frac.begin(), 6 - static_cast<std::ptrdiff_t>(frac.size()), '0'); | ||
| while (!frac.empty() && frac.back() == '0') { | ||
| frac.pop_back(); | ||
| } | ||
| out << '.' << frac; | ||
| } | ||
|
|
||
| return out.str(); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Поиск тестов для TimestampToString с отрицательными значениями
rg -n "TimestampToString" --type cpp -C 3Repository: ik4rd/columnar-engine
Length of output: 1483
🏁 Script executed:
# Поиск файлов с тестами
find . -type f -name "*test*" | grep -E "\.(cpp|cc|cxx|h)$" | head -20Repository: ik4rd/columnar-engine
Length of output: 207
🏁 Script executed:
# Поиск всех использований TimestampToString в тестах
rg -l "TimestampToString" --type cppRepository: ik4rd/columnar-engine
Length of output: 145
🏁 Script executed:
# Поиск любых тестов, связанных с parsing или timestamp
find . -type f \( -name "*test*.cpp" -o -name "*test*.cc" \) | head -10Repository: ik4rd/columnar-engine
Length of output: 207
🏁 Script executed:
# Поиск TimestampToString в тестовых файлах
rg "TimestampToString" tests/ --type cpp -A 5 -B 5Repository: ik4rd/columnar-engine
Length of output: 47
🏁 Script executed:
# Поиск negative timestamp или related tests
rg -i "negative|epoch|timestamp" tests/ --type cpp -lRepository: ik4rd/columnar-engine
Length of output: 111
🏁 Script executed:
# Посмотрим содержимое test_columns.cpp, где вероятнее всего тесты для TimestampColumn
wc -l tests/test_columns.cppRepository: ik4rd/columnar-engine
Length of output: 91
🏁 Script executed:
# Посмотрим содержимое test_columns.cpp
cat -n tests/test_columns.cppRepository: ik4rd/columnar-engine
Length of output: 5650
🏁 Script executed:
# Посмотрим test_columnar.cpp
cat -n tests/test_columnar.cpp | head -100Repository: ik4rd/columnar-engine
Length of output: 4064
🏁 Script executed:
# Поиск в test_columnar более специфично для timestamp
rg -n "timestamp|Timestamp" tests/test_columnar.cpp -i -A 3 -B 1Repository: ik4rd/columnar-engine
Length of output: 188
🏁 Script executed:
# Проверим, есть ли какие-то комментарии о поддержке отрицательных значений
rg -n "negative|before.*epoch|Unix epoch" src/support/parsing.cpp -A 2 -B 2Repository: ik4rd/columnar-engine
Length of output: 1018
🏁 Script executed:
# Посмотрим полный код функции TimestampToString еще раз для детального анализа
sed -n '246,268p' src/support/parsing.cppRepository: ik4rd/columnar-engine
Length of output: 1040
🏁 Script executed:
# Поищем где определены или используются отрицательные timestamp значения
rg -rn "< 0|negative.*timestamp|timestamp.*negative" src/ tests/ --type cppRepository: ik4rd/columnar-engine
Length of output: 162
🏁 Script executed:
# Посмотрим функцию ParseTimestamp, которая создает timestamp values
rg -n "ParseTimestamp" src/support/parsing.cpp -A 30 | head -60Repository: ik4rd/columnar-engine
Length of output: 1316
🏁 Script executed:
# Поищем информацию о том, какие значения timestamp могут быть
rg -n "int64_t.*timestamp|timestamp.*int64" src/support/parsing.cpp -B 2 -A 2Repository: ik4rd/columnar-engine
Length of output: 47
🏁 Script executed:
# Проверим, есть ли валидация на отрицательные значения при создании timestamp
cat -n src/support/parsing.cpp | sed -n '1,250p' | tail -50Repository: ik4rd/columnar-engine
Length of output: 1784
🏁 Script executed:
# Посмотрим ParseDatePoint функцию
rg -n "ParseDatePoint" src/support/parsing.cpp -A 20Repository: ik4rd/columnar-engine
Length of output: 2422
🏁 Script executed:
# Проверим, упоминается ли диапазон значений или ограничения timestamp
rg -n "Timestamp" include/support/parsing.h -B 2 -A 2Repository: ik4rd/columnar-engine
Length of output: 468
🏁 Script executed:
# Создадим простой тест для проверки behavior std::chrono с отрицательными значениями
cat > /tmp/test_chrono.cpp << 'EOF'
`#include` <chrono>
`#include` <iostream>
int main() {
// Negative timestamp: -1000000 microseconds (1 second before epoch)
const int64_t negative_value = -1000000;
const std::chrono::sys_time<std::chrono::microseconds> timestamp{
std::chrono::microseconds{negative_value}
};
const auto day_point = std::chrono::floor<std::chrono::days>(timestamp);
const std::chrono::hh_mm_ss<std::chrono::microseconds> tod{timestamp - day_point};
std::cout << "Negative timestamp: " << negative_value << std::endl;
std::cout << "Day point (days since epoch): " << day_point.time_since_epoch().count() << std::endl;
std::cout << "Hours: " << tod.hours().count() << std::endl;
std::cout << "Minutes: " << tod.minutes().count() << std::endl;
std::cout << "Seconds: " << tod.seconds().count() << std::endl;
std::cout << "Subseconds: " << tod.subseconds().count() << std::endl;
return 0;
}
EOF
# Try to compile and understand the behavior
cat /tmp/test_chrono.cppRepository: ik4rd/columnar-engine
Length of output: 1001
🏁 Script executed:
# Проверим все места где вызывается TimestampToString
rg -n "TimestampToString\(" src/ tests/ --type cpp -B 3 -A 1Repository: ik4rd/columnar-engine
Length of output: 785
🏁 Script executed:
# Поищем комментарии или документацию о диапазоне timestamp значений
rg -n "range|limit|support" include/support/parsing.h -iRepository: ik4rd/columnar-engine
Length of output: 47
🏁 Script executed:
# Проверим, есть ли где-то документация о том, какие значения поддерживаются
head -50 include/support/parsing.hRepository: ik4rd/columnar-engine
Length of output: 857
🏁 Script executed:
# Проверим DateToString, похожа ли она на TimestampToString
sed -n '241,244p' src/support/parsing.cppRepository: ik4rd/columnar-engine
Length of output: 261
🏁 Script executed:
# Посмотрим на ValueAt в TimestampColumn чтобы подтвердить откуда берутся значения
rg -n "class TimestampColumn" src/model/ --type cpp -A 20Repository: ik4rd/columnar-engine
Length of output: 47
🏁 Script executed:
# Поищем определение TimestampColumn
rg -n "TimestampColumn" include/ --type cpp -B 2 -A 10Repository: ik4rd/columnar-engine
Length of output: 821
🏁 Script executed:
# Посмотрим на column_timestamp.h
find . -name "*column_timestamp*" -type fRepository: ik4rd/columnar-engine
Length of output: 132
🏁 Script executed:
# Поищем определение класса Column и его наследников
rg -n "class.*Column" include/ --type hpp -A 5 | head -60Repository: ik4rd/columnar-engine
Length of output: 92
🏁 Script executed:
# Посмотрим на FixedColumn для понимания ValueAt
rg -n "class FixedColumn|ValueAt" include/model/ --type cpp -A 3 | head -40Repository: ik4rd/columnar-engine
Length of output: 898
🏁 Script executed:
# Посмотрим на полный файл column_timestamp.cpp
cat -n src/model/column_timestamp.cppRepository: ik4rd/columnar-engine
Length of output: 383
🏁 Script executed:
# Проверим, есть ли где-то валидация на отрицательные значения при сохранении
rg -n "AppendValue|negative|< 0" src/model/ --type cpp -B 2 -A 2 | head -50Repository: ik4rd/columnar-engine
Length of output: 3174
🏁 Script executed:
# Финальная проверка: есть ли прямые вызовы TimestampToString с переменными
rg -n "TimestampToString\(" . --type cpp -B 5 | grep -E "(TimestampToString|int64|value|ValueAt)"Repository: ik4rd/columnar-engine
Length of output: 992
🏁 Script executed:
# Проверим, есть ли какая-то документация о поддержке отрицательных значений
rg -n "before.*epoch|pre-epoch|negative.*timestamp" . --type cpp --type hRepository: ik4rd/columnar-engine
Length of output: 47
🏁 Script executed:
# Финальная проверка: как ParseTimestamp валидирует даты (может ли быть дата до 1970)
sed -n '51,71p' src/support/parsing.cppRepository: ik4rd/columnar-engine
Length of output: 875
🏁 Script executed:
# Финальная проверка: насколько ParseUnsignedPart строгая для года
rg -n "ParseUnsignedPart" src/support/parsing.cpp -B 5 -A 10 | head -40Repository: ik4rd/columnar-engine
Length of output: 1624
🏁 Script executed:
# Проверим конкретный пример: что будет для даты 1969
python3 << 'EOF'
import datetime
# Unix epoch
epoch = datetime.datetime(1970, 1, 1)
# Before epoch
before_epoch = datetime.datetime(1969, 12, 31, 23, 59, 59)
# Microseconds since epoch
td = before_epoch - epoch
microseconds = int(td.total_seconds() * 1_000_000)
print(f"1969-12-31 23:59:59 -> {microseconds} microseconds (negative: {microseconds < 0})")
# 1 second before epoch
one_sec_before = datetime.datetime(1970, 1, 1, 0, 0, -1)
EOFRepository: ik4rd/columnar-engine
Length of output: 253
Проблема с отрицательными timestamp значениями требует исправления.
Функция ParseTimestamp может создавать отрицательные значения для дат до 1970-01-01 (например, "1969-12-31 23:59:59"). При передаче такого отрицательного значения в TimestampToString, вызов tod.subseconds().count() вернёт отрицательное число, которое std::to_string() преобразует в строку вида "-1000000". Это приведёт к некорректному форматированию дробной части (отрицательный знак в fractional).
Необходимо либо:
- Добавить валидацию на неподдерживаемый диапазон в
TimestampToStringи документировать ограничение - Реализовать корректную обработку отрицательных timestamp значений
- Добавить тесты для pre-epoch дат, чтобы выявить и предотвратить регрессию
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/support/parsing.cpp` around lines 246 - 268, TimestampToString can
receive negative subseconds for pre-1970 timestamps (from ParseTimestamp),
causing a "-" to appear in the fractional part; fix by normalizing subseconds
before formatting: compute int64_t fractional = tod.subseconds().count(); if
(fractional < 0) fractional = -fractional; then zero-pad and trim trailing zeros
as currently done (use the absolute value when calling
std::to_string/frac.insert/frac.pop_back), and add unit tests for pre-epoch
dates (e.g., "1969-12-31 23:59:59.*") to cover ParseTimestamp ->
TimestampToString roundtrip; alternatively, if you prefer to reject unsupported
ranges, add range validation at the start of TimestampToString and document the
limitation instead of normalizing.
Summary by CodeRabbit
Примечания к выпуску
Новые возможности
Улучшения производительности