Skip to content

Conversation

@amunra
Copy link
Collaborator

@amunra amunra commented Nov 25, 2025

Provides support for arrow column types. These are likely to be encoutered when deserializing from parquet or when converting a Polars dataframe to Pandas.

Also added support for microsecond precision columns in Pandas. These are sent to the DB with microsecond precision if the protocol_version is at least 2, even for the designated timestamp.

Addresses #115 and #104.

Summary by CodeRabbit

  • New Features

    • Added microsecond-resolution timestamp support for NumPy and PyArrow (including timezone-aware variants) with improved datetime classification and consistent serialization paths.
  • Tests

    • Replaced an experimental failure test with comprehensive end-to-end tests covering NumPy and PyArrow microsecond timestamps, mixed column types, nulls, symbol/string handling, and cross-version serialization validation.

✏️ Tip: You can customize this high-level summary in your review settings.

@amunra amunra requested a review from RaphDal November 25, 2025 19:10
@coderabbitai
Copy link

coderabbitai bot commented Nov 25, 2025

Warning

Rate limit exceeded

@amunra has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 2 minutes and 39 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between ee3e674 and 13a76d5.

📒 Files selected for processing (2)
  • src/questdb/dataframe.pxi (18 hunks)
  • test/test_dataframe.py (1 hunks)

Walkthrough

Added microsecond-resolution datetime (datetime64[us] / timestamp[us]) support paths for NumPy and PyArrow in dataframe serialization: new enum sources and dispatch codes, dtype classification updates, microsecond serializers and buffer resolution logic, plus end-to-end tests for µs/ ns timestamp encodings.

Changes

Cohort / File(s) Summary
Datetime enum, dispatch & globals
src/questdb/dataframe.pxi
Added col_source_dt64us_numpy = 601000 and col_source_dt64us_tz_arrow = 602000; added corresponding col_dispatch_code_* entries for dt64us variants; renumbered decimal-related enum entries; added global _NUMPY_DATETIME64_US; updated _SUPPORTED_DATETIMES.
Type detection & resolution
src/questdb/dataframe.pxi
Introduced _dataframe_classify_timestamp_dtype(dtype) to map numpy datetime64[us/ns], pandas tz dtypes, and pyarrow timestamp[us/ns] to col_source_t; updated TARGET_TO_SOURCES and resolution logic to select pybuf vs arrow buffer paths for dt64us/us variants and to error on unsupported units.
Serialization handlers & dispatch
src/questdb/dataframe.pxi
Added internal serializers: _dataframe_serialize_cell_column_ts__dt64us_numpy, _dataframe_serialize_cell_column_ts__dt64us_tz_arrow, _dataframe_serialize_cell_at_dt64us_numpy, _dataframe_serialize_cell_at_dt64us_tz_arrow; wired new dispatch codes into the _dataframe_serialize_cell switch and extended series/arrow resolve logic.
Tests — end-to-end timestamp & Arrow types
test/test_dataframe.py
Removed a test block that expected an Arrow ingress error; added test_numpy_micros_col, test_arrow_micros_col, test_arrow_types, and test_arrow_strings_as_symbols to validate microsecond and nanosecond timestamp serialization and mixed numpy/pyarrow encodings across protocol versions.

Sequence Diagram(s)

sequenceDiagram
    participant C as Caller
    participant R as TypeResolver
    participant M as SourceMapper
    participant D as Dispatcher
    participant S as Serializer
    rect rgba(220,245,230,0.18)
    C->>R: submit column dtype
    R->>R: classify dtype (np/us, np/ns, pa/us, pa/ns, tz)
    end
    R->>M: return col_source_t
    M->>D: map source -> dispatch code
    D->>S: invoke serializer for dispatch
    alt µs path (dt64us)
        S->>S: serialize using microsecond logic (pybuf or arrow buffer)
    else ns path (dt64ns)
        S->>S: serialize using nanosecond logic
    end
    S-->>C: return encoded bytes
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

  • Verify enum renumbering and no value collisions across col_source_t and col_dispatch_code_t.
  • Review _dataframe_classify_timestamp_dtype for correct unit detection and error paths for unsupported units.
  • Confirm all new dispatch codes are reachable and handled in _dataframe_serialize_cell and resolver logic.
  • Run the added tests and validate deterministic timestamp encoding across protocol versions.

Poem

🐇
I munched the microticks at dawn,
From NumPy fields to Arrow lawn,
Micros danced with nanos near,
Rows marched tidy, bytes sincere,
A rabbit hums: "New timestamps are on!"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: adding support for Arrow datatypes and microsecond timestamps, which align with the key modifications across both the dataframe implementation and test files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
test/test_dataframe.py (4)

1859-1862: Remove debug print statements.

The commented-out print statements (lines 1860-1861) appear to be leftover debug code. Consider removing them before merging to keep the test file clean.

-            # print(df)
-            # print(df.dtypes)
-
             act = _dataframe(self.version, df, table_name='tbl1', at='ts2')

1865-1866: Non-breaking space in comment.

The comment on line 1865 contains a non-breaking space (Unicode U+00A0) instead of a regular space. This was flagged by static analysis (RUF003). Consider replacing with a standard space.


1907-1908: Non-breaking space in comment and debug prints.

Line 1907 contains a non-breaking space (RUF003). Also, consider removing the debug print statements at lines 1919-1920.


1988-1991: Remove debug print statements.

Consider removing the commented-out debug prints at lines 1989-1990 and line 2040.

             })
 
-            # print(df)
-            # print(df.dtypes)
-
             # format a timestamp
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4711b4 and 8cc9ab2.

📒 Files selected for processing (2)
  • src/questdb/dataframe.pxi (17 hunks)
  • test/test_dataframe.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
test/test_dataframe.py

1865-1865: Comment contains ambiguous   (NO-BREAK SPACE). Did you mean (SPACE)?

(RUF003)


1907-1907: Comment contains ambiguous   (NO-BREAK SPACE). Did you mean (SPACE)?

(RUF003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel windows_i686)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel windows_x86_64)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel macos_x64)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel linux_x64_cpython_manylinux_x86_64)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel linux_x64_pypy)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel linux_x64_cpython_musllinux)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel start_linux_arm64_agent_aws)
  • GitHub Check: questdb.py-questdb-client (Building and testing on windows-msvc-2019)
  • GitHub Check: questdb.py-questdb-client (Building and testing on mac)
  • GitHub Check: questdb.py-questdb-client (Building and testing on linux-qdb-master)
  • GitHub Check: questdb.py-questdb-client (Building and testing on linux-old-pandas)
  • GitHub Check: questdb.py-questdb-client (Building and testing TestsAgainstVariousNumpyVersion2x)
  • GitHub Check: questdb.py-questdb-client (Building and testing on linux)
  • GitHub Check: questdb.py-questdb-client (Building and testing TestsAgainstVariousNumpyVersion1x)
🔇 Additional comments (13)
test/test_dataframe.py (1)

1923-2041: Good test coverage for Arrow types.

This test comprehensively validates the serialization of various Arrow-backed types including timestamps, booleans, strings, floats, and integers with proper null handling across protocol versions.

src/questdb/dataframe.pxi (12)

151-158: LGTM - Source enum extensions.

The new microsecond timestamp sources follow the established numbering pattern, and the renumbering of array/decimal sources maintains proper separation.


244-265: LGTM - Target mappings updated correctly.

The new microsecond timestamp sources are correctly mapped to both column_ts and at targets.


395-410: LGTM - Dispatch codes for microsecond paths.

The new dispatch codes correctly combine target and source values following the established pattern.


528-528: LGTM - Global variable for datetime64[us] type.

Correctly declares the new global following the existing pattern for datetime64[ns].


807-821: LGTM - Extended datetime type detection.

The datetime type detection correctly supports both nanosecond and microsecond precision for numpy datetime64 and Arrow timestamps.


999-1013: Good extension of Arrow type support.

The new handling for boolean, timestamp (with unit validation), and large string types follows the existing patterns. The timestamp unit validation (lines 1004-1013) properly rejects unsupported units with a clear error message.


2217-2229: LGTM - Microsecond serialization for numpy datetime64.

The implementation correctly uses line_sender_buffer_column_ts_micros for microsecond precision timestamps, following the same pattern as the nanosecond variant.


2388-2403: LGTM - Microsecond serialization for Arrow timestamps.

The implementation correctly handles Arrow timestamp columns with microsecond precision using line_sender_buffer_column_ts_micros.


2424-2441: LGTM - Designated timestamp serialization for numpy microseconds.

The implementation correctly handles NAT values and uses line_sender_buffer_at_micros for valid timestamps.


2465-2485: LGTM - Designated timestamp serialization for Arrow microseconds.

The implementation correctly handles null values (using at_now) and valid timestamps with line_sender_buffer_at_micros.


2585-2610: LGTM - Dispatch logic extended for microsecond paths.

All new dispatch codes are correctly wired to their respective serialization functions, following the established pattern.


1218-1235: Redundant isinstance check reduces code clarity; the unit-detection pattern works but is inefficient.

The review comment correctly identifies that _NUMPY_DATETIME64_NS and _NUMPY_DATETIME64_US refer to the same type class (both are type(numpy.dtype('datetime64[...]'))). In NumPy, datetime64[ns] and datetime64[us] instances both have the same dtype type class; they differ only in the unit metadata stored in the dtype's name string.

As a result:

  • Line 1218's isinstance(dtype, _NUMPY_DATETIME64_NS) returns True for both datetime64[ns] and datetime64[us] dtypes
  • Line 1232's isinstance(dtype, _NUMPY_DATETIME64_US) is unreachable dead code—a datetime64[us] dtype will never fall through to it because it matches the first elif at line 1218
  • The unit parsing at lines 1220–1224 correctly handles both cases, making the second isinstance check redundant

The code functions correctly but has poor clarity. Consider consolidating into a single datetime64 check that validates the unit string without relying on redundant isinstance branches.

Comment on lines 1218 to 1235
elif (isinstance(dtype, _NUMPY_DATETIME64_NS) and
_dataframe_is_supported_datetime(dtype)):
col.setup.source = col_source_t.col_source_dt64ns_numpy
unit = dtype.name.split('[')[-1].strip(']')
if unit == 'ns':
col.setup.source = col_source_t.col_source_dt64ns_numpy
elif unit == 'us':
col.setup.source = col_source_t.col_source_dt64us_numpy
else:
raise IngressError(
IngressErrorCode.BadDataFrame,
'Unsupported unit for datetime64 numpy column type ' +
f'for column {pandas_col.name} of dtype {dtype}. ' +
f' Must be "us" or "ns", not {unit}')
_dataframe_series_as_pybuf(pandas_col, col)
elif (isinstance(dtype, _NUMPY_DATETIME64_US) and
_dataframe_is_supported_datetime(dtype)):
col.setup.source = col_source_t.col_source_dt64us_numpy
_dataframe_series_as_pybuf(pandas_col, col)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to get rid of the unit logic by inlining the checks from _dataframe_is_supported_datetime (str(dtype) == 'datetime64[ns]')) instead of calling the whole function (it would also remove a redundant isinstance check).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored the logic into a "classify timestamp dtype" function, deduplicated.

Comment on lines 1238 to 1245
if dtype.unit != 'ns':
# Docs say this should always be nanos, but best assert.
# https://pandas.pydata.org/docs/reference/api/pandas.DatetimeTZDtype.html
raise IngressError(
IngressErrorCode.BadDataFrame,
f'Unsupported dtype {dtype} unit {dtype.unit} for column {pandas_col.name!r}. ' +
'Raise an issue if you think it should be supported: ' +
'https://github.com/questdb/py-questdb-client/issues.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _dataframe_is_supported_datetime, this check is already done:

    elif isinstance(dtype, _PANDAS.DatetimeTZDtype):
        return dtype.unit == 'ns'

I don't think this branch can be taken otherwise.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved into the classify function.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/questdb/dataframe.pxi (1)

1241-1247: Consider caching the classification result.

The code calls _dataframe_classify_timestamp_dtype(dtype) twice—once to check if it's non-zero and once to retrieve the value. While the performance impact is minimal, you could cache the result:

else:
    source_result = _dataframe_classify_timestamp_dtype(dtype)
    if source_result != 0:
        col.setup.source = <col_source_t>source_result

However, the current implementation is arguably clearer, so this is purely optional.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8cc9ab2 and 64ffa71.

📒 Files selected for processing (1)
  • src/questdb/dataframe.pxi (18 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: RaphDal
Repo: questdb/py-questdb-client PR: 114
File: test/system_test.py:317-324
Timestamp: 2025-11-07T15:46:43.284Z
Learning: In QuestDB, when decimal values are ingested, the server creates DECIMAL columns with a default precision of 18 and scale of 3, regardless of the scale declared in the client-side data type (e.g., pyarrow.decimal128(18, 2)). Values are stored and returned with 3 decimal places.
🔇 Additional comments (6)
src/questdb/dataframe.pxi (6)

151-158: Enum reorganization looks good.

The new microsecond timestamp source entries follow the established pattern, and the renumbering of decimal and array sources improves logical grouping.


804-804: Error message string properly updated.

The _SUPPORTED_DATETIMES constant now correctly advertises both nanosecond and microsecond precision support, addressing the previous review feedback.


1008-1067: Arrow type support significantly expanded.

The function now handles BOOL, TIMESTAMP (both us and ns units), LARGE_STRING, FLOAT, DOUBLE, and various integer types from PyArrow. The decimal handling refactor using is_decimal_col improves code clarity by consolidating scale validation.


2217-2229: Microsecond timestamp column serializers implemented correctly.

Both the NumPy and Arrow timezone-aware variants properly use line_sender_buffer_column_ts_micros for microsecond precision and correctly handle NaT values and validity bitmaps.

Also applies to: 2388-2403


2424-2441: Designated timestamp microsecond serializers look correct.

The serializers properly use line_sender_buffer_at_micros and correctly handle the NaT → server_now fallback for NumPy sources and the validity bitmap for Arrow sources.

Also applies to: 2465-2485


2585-2586: All microsecond dispatch codes correctly integrated.

The four new dispatch codes for microsecond timestamps (both column and designated timestamp, both NumPy and Arrow) are properly integrated into the main serialization switch statement.

Also applies to: 2601-2602, 2605-2606, 2609-2610

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e228e71 and 9c93f74.

📒 Files selected for processing (1)
  • test/test_dataframe.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
test/test_dataframe.py (2)
test/test.py (1)
  • encode (1411-1413)
test/test_tools.py (1)
  • _float_binary_bytes (14-23)
🪛 Ruff (0.14.5)
test/test_dataframe.py

1862-1862: Comment contains ambiguous   (NO-BREAK SPACE). Did you mean (SPACE)?

(RUF003)


1901-1901: Comment contains ambiguous   (NO-BREAK SPACE). Did you mean (SPACE)?

(RUF003)

🔇 Additional comments (3)
test/test_dataframe.py (3)

1845-1875: Numpy microsecond timestamp test matches intent and edge cases

The construction of df and exp correctly exercises datetime64[us] columns and an at='ts2' designated timestamp across protocol versions, including a None in ts2 (no designated timestamp on the last row). The fdtm helper nicely captures the protocol-version-dependent unit conversion and suffix behavior.


1876-1913: Arrow microsecond timestamp test is well aligned with numpy variant

This test mirrors test_numpy_micros_col for timestamp[us][pyarrow], validating:

  • correct microsecond values in ts1,
  • correct handling of a None in ts2 (no trailing designated timestamp on the last row),
  • and protocol-version-dependent behavior via fdtm.

The expectations look consistent with the numpy test and the stated PR goal.


2031-2054: Arrow string columns as symbols test looks correct

The test_arrow_strings_as_symbols setup and exp expectations correctly validate mixed large_string[pyarrow] and string[pyarrow] columns used as symbols, including None values (omitted symbol on that side). This directly exercises the new Arrow string‑as‑symbol handling.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
test/test_dataframe.py (1)

1845-1874: Micros timestamp tests look right; fix NO‑BREAK SPACE in comments flagged by Ruff

The new test_numpy_micros_col and test_arrow_micros_col exercises:

  • datetime64[us] and timestamp[us][pyarrow] columns,
  • an at column backed by the same micros dtype,
  • correct handling of nulls (falling back to server timestamp), and
  • protocol‑dependent formatting of the designated timestamp via fdtm (mirroring TimestampEncodingMixin.enc_des_ts_t behavior).

That coverage aligns well with the new dt64us serializers in dataframe.pxi.

One small issue: both # format designated timestamp micros comments contain a NO‑BREAK SPACE (U+00A0) after #, which Ruff flags as RUF003. Replace those with a normal ASCII space:

-            # format designated timestamp micros
+            # format designated timestamp micros

Apply this to both the numpy and Arrow micros tests.

Also applies to: 1877-1913, 1862-1869, 1901-1907

🧹 Nitpick comments (1)
src/questdb/dataframe.pxi (1)

1008-1067: Arrow resolver changes (decimals, BOOL, timestamp units) are sound; timestamp branch is now effectively redundant

Using is_decimal_col to guard decimal branches and to populate col.scale (with explicit 0–76 bounds) makes decimal handling clearer and safer. The additional branches for Type_BOOL, Type_TIMESTAMP (with explicit 'us'/'ns' validation), and Type_LARGE_STRING all map cleanly to existing col_source_t values and match how other Arrow numerics are handled.

Given that _dataframe_resolve_source_and_buffers now classifies timestamp Arrow dtypes via _dataframe_classify_timestamp_dtype and routes them through _dataframe_series_as_arrow directly, the Type_TIMESTAMP case in _dataframe_series_resolve_arrow is currently unreachable. Consider removing that branch (or adding a brief comment noting it’s kept for potential future call‑sites) to avoid confusion.

Also applies to: 1221-1225

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c93f74 and ee3e674.

📒 Files selected for processing (2)
  • src/questdb/dataframe.pxi (18 hunks)
  • test/test_dataframe.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-22T07:31:37.016Z
Learnt from: RaphDal
Repo: questdb/py-questdb-client PR: 114
File: src/questdb/ingress.pyx:192-195
Timestamp: 2025-10-22T07:31:37.016Z
Learning: In src/questdb/ingress.pyx, the IngressErrorCode.BadDataFrame enum member is Python-internal only and does not map to any C error code from the underlying library. It is defined as `<int>line_sender_error_invalid_decimal + 1` and should not be added to the c_err_code_to_py function, which only handles C-to-Python error code translation.

Applied to files:

  • src/questdb/dataframe.pxi
🧬 Code graph analysis (1)
test/test_dataframe.py (3)
test/test.py (2)
  • Timestamp (1315-1356)
  • encode (1411-1413)
src/questdb/ingress.pyi (2)
  • value (127-128)
  • value (178-179)
test/test_tools.py (1)
  • _float_binary_bytes (14-23)
🪛 Ruff (0.14.5)
test/test_dataframe.py

1862-1862: Comment contains ambiguous   (NO-BREAK SPACE). Did you mean (SPACE)?

(RUF003)


1901-1901: Comment contains ambiguous   (NO-BREAK SPACE). Did you mean (SPACE)?

(RUF003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel linux_arm64)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel windows_i686)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel windows_x86_64)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel macos_x64)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel linux_x64_pypy)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel linux_x64_cpython_musllinux)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel linux_x64_cpython_manylinux_x86_64)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel start_linux_arm64_agent_aws)
  • GitHub Check: questdb.py-questdb-client (Building and testing on windows-msvc-2019)
  • GitHub Check: questdb.py-questdb-client (Building and testing on mac)
  • GitHub Check: questdb.py-questdb-client (Building and testing on linux-qdb-master)
  • GitHub Check: questdb.py-questdb-client (Building and testing on linux-old-pandas)
  • GitHub Check: questdb.py-questdb-client (Building and testing TestsAgainstVariousNumpyVersion1x)
  • GitHub Check: questdb.py-questdb-client (Building and testing on linux)
  • GitHub Check: questdb.py-questdb-client (Building and testing TestsAgainstVariousNumpyVersion2x)
🔇 Additional comments (5)
src/questdb/dataframe.pxi (4)

151-158: dt64us source/target/dispatch wiring looks coherent

The new col_source_dt64us_* enum values, their inclusion in _TARGET_TO_SOURCES for both col_target_column_ts and col_target_at, and the corresponding col_dispatch_code_* entries for column_ts__dt64us_* and at__dt64us_* form a consistent mapping. The GIL flagging scheme and decimal/array re-numbering remain internally self‑consistent, and I don’t see functional issues here.

Also applies to: 185-267, 293-410


528-529: NUMPY datetime64[us] support is added consistently

The _NUMPY_DATETIME64_US global is initialized in lock‑step with _NUMPY_DATETIME64_NS and wired into _dataframe_may_import_deps correctly, so the later isinstance(dtype, _NUMPY_DATETIME64_US) checks will behave analogously to the existing ns path. No further changes needed here.

Also applies to: 549-563, 588-590


804-805: Timestamp dtype classifier and its call‑sites look correct, but please validate across pandas/pyarrow versions

The new _dataframe_classify_timestamp_dtype cleanly centralizes handling for:

  • numpy datetime64[ns] / datetime64[us],
  • DatetimeTZDtype (unit strictly 'ns'), and
  • ArrowDtype timestamp[ns/us],

returning the appropriate col_source_t or raising IngressError for unsupported units. Its use in both _dataframe_resolve_at and _dataframe_resolve_source_and_buffers (with pybuf for numpy and Arrow export for tz/Arrow dtypes) gives a single source of truth for what’s considered a “timestamp column”, and the _SUPPORTED_DATETIMES message now matches the actual supported set.

One minor note: _dataframe_classify_timestamp_dtype now gates all Arrow timestamp dtypes; _dataframe_series_resolve_arrow is no longer responsible for those, which is a good separation.

Please run the dataframe test suite against the pandas/pyarrow versions you support (especially around DatetimeTZDtype and ArrowDtype(timestamp[us/ns])) to confirm there are no surprises in dtype string representations or .unit values.

Also applies to: 807-845, 881-889, 1152-1160


2218-2230: Micros serializers for dt64us (columns and at) appear correct; rely on tests to confirm protocol semantics

The new serializers:

  • _dataframe_serialize_cell_column_ts__dt64us_numpyline_sender_buffer_column_ts_micros
  • _dataframe_serialize_cell_column_ts__dt64us_tz_arrowline_sender_buffer_column_ts_micros
  • _dataframe_serialize_cell_at_dt64us_numpyline_sender_buffer_at_micros (with _NAT mapped to at_now)
  • _dataframe_serialize_cell_at_dt64us_tz_arrowline_sender_buffer_at_micros / at_now on null

are all wired into the dispatch switch via the new col_dispatch_code_* entries and share the same _NAT / validity semantics as the existing ns variants. This matches what the new tests expect for both column timestamps and designated timestamps coming from datetime64[us] / timestamp[us][pyarrow] columns.

Please run the test matrix for protocol versions 1–3 to confirm that the underlying line_sender_buffer_*_micros functions produce the expected wire format in each ILP version (especially the version‑specific differences for designated timestamps).

Also applies to: 2372-2404, 2425-2442, 2466-2485, 2585-2603, 2605-2611

test/test_dataframe.py (1)

1915-2029: Arrow type and symbol tests provide good end‑to‑end coverage

test_arrow_types and test_arrow_strings_as_symbols together validate:

  • Arrow‑backed timestamps (timestamp[ns][pyarrow]) as both regular and designated timestamps,
  • Arrow bool, large/small strings, float32/float64, and signed integer dtypes (8/16/32/64),
  • mixed null patterns in both values and symbols, and
  • the expected serialized ILP layout across protocol versions.

These tests line up with the updated Arrow resolution and dispatch logic in dataframe.pxi and should catch regressions in Arrow handling going forward.

Also applies to: 2031-2054

@RaphDal RaphDal self-requested a review November 26, 2025 12:37
@amunra amunra merged commit 3c259e6 into main Nov 26, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants