Skip to content

Timestamp columns incorrectly converted to Date64 with arrow_stream return_type #866

@SebZbp

Description

@SebZbp

What language are you using?

Python

What version are you using?

0.4.4

What database are you using?

MySQL, Oracle (issue confirmed on both)

What dataframe are you using?

Arrow (stream)

Can you describe your bug?

When using return_type="arrow_stream" in ConnectorX 0.4.4, timestamp columns from the database are incorrectly returned as date64[ms] Arrow type instead of timestamp type. This causes timestamp data to be converted to dates, losing the time component. The issue does not occur when using return_type="arrow" (non-streaming mode).

What are the steps to reproduce the behavior?

Database setup if the error only happens on specific data or data type

Using the public Rfam MySQL database which contains timestamp columns:

  • Host: mysql-rfam-public.ebi.ac.uk:4497
  • Database: Rfam
  • Table: family (contains timestamp columns like created and updated)
Example query / code
import connectorx as cx

# Connection string - public test database
conn_str = "mysql://rfamro@mysql-rfam-public.ebi.ac.uk:4497/Rfam"
query = "SELECT * FROM family"

# Read with arrow_stream backend - returns RecordBatchReader
reader = cx.read_sql(
    conn_str,
    query,
    return_type="arrow_stream"
)

schema = reader.schema

print("Column Types:")
print("-" * 50)
for field in schema:
    print(f"Column '{field.name}': type '{field.type}'")
    
# You'll observe that timestamp columns are returned as date64[ms]
# instead of timestamp[us] or timestamp[ms]

What is the error?

The schema shows timestamp columns as date64[ms] instead of the expected timestamp type. For example:

  • Expected: timestamp[us] or timestamp[ms, tz=UTC]
  • Actual: date64[ms]

This appears to be a regression, as the issue was previously fixed in earlier versions but has reappeared with the arrow_stream implementation. When switching back to return_type="arrow", the types are correctly returned as timestamps.

Related issue: This was discovered while using dlt (data load tool) with the connectorx backend: dlt-hub/dlt#3186

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions