-
Notifications
You must be signed in to change notification settings - Fork 201
Description
What language are you using?
Python
What version are you using?
0.4.4
What database are you using?
MySQL, Oracle (issue confirmed on both)
What dataframe are you using?
Arrow (stream)
Can you describe your bug?
When using return_type="arrow_stream" in ConnectorX 0.4.4, timestamp columns from the database are incorrectly returned as date64[ms] Arrow type instead of timestamp type. This causes timestamp data to be converted to dates, losing the time component. The issue does not occur when using return_type="arrow" (non-streaming mode).
What are the steps to reproduce the behavior?
Database setup if the error only happens on specific data or data type
Using the public Rfam MySQL database which contains timestamp columns:
- Host: mysql-rfam-public.ebi.ac.uk:4497
- Database: Rfam
- Table: family (contains timestamp columns like
createdandupdated)
Example query / code
import connectorx as cx
# Connection string - public test database
conn_str = "mysql://rfamro@mysql-rfam-public.ebi.ac.uk:4497/Rfam"
query = "SELECT * FROM family"
# Read with arrow_stream backend - returns RecordBatchReader
reader = cx.read_sql(
conn_str,
query,
return_type="arrow_stream"
)
schema = reader.schema
print("Column Types:")
print("-" * 50)
for field in schema:
print(f"Column '{field.name}': type '{field.type}'")
# You'll observe that timestamp columns are returned as date64[ms]
# instead of timestamp[us] or timestamp[ms]What is the error?
The schema shows timestamp columns as date64[ms] instead of the expected timestamp type. For example:
- Expected:
timestamp[us]ortimestamp[ms, tz=UTC] - Actual:
date64[ms]
This appears to be a regression, as the issue was previously fixed in earlier versions but has reappeared with the arrow_stream implementation. When switching back to return_type="arrow", the types are correctly returned as timestamps.
Related issue: This was discovered while using dlt (data load tool) with the connectorx backend: dlt-hub/dlt#3186