Support reading datetime2 as timestamp[ms] #69

jonashaag · 2023-12-22T13:32:39Z

I have a column that is datetime2(prec=27, scale=7, length=8). However all values are actually dates:

-- Result: 0
SELECT COUNT(*)
FROM ...
WHERE (DATEDIFF_BIG(nanosecond, '1800-01-01', col) % 86400000000000) > 0

If you have dates outside of the range of timestamp[ns], loading data with arrow-odbc-py fails.

I wonder if and how arrow-odbc-py could automatically use a timestamp type with larger range than timestamp[ns] in this case.

For example, we could add an option to allow truncation of timestamps.

The text was updated successfully, but these errors were encountered:

pacman82 · 2023-12-22T17:33:53Z

The Rust crate allows passing an arrow Schema in case the application developer has insight which is not reflected in the Schema. Arrow does have Date type, doesn't it?

jonashaag · 2023-12-22T21:24:00Z

I'd be more interested in having something done automatically, since I'm writing generic/library code.

pacman82 · 2023-12-23T13:20:31Z

It already works automatically if the type reported by the driver is SQL_TYPE_DATE. If the schema information is not okay, or the driver has problems relaying it precisely input from the application developer is needed.

pacman82 · 2023-12-23T13:23:08Z

I'd be more interested in having something done automatically, since I'm writing generic/library code.

I think I need to better understand what you mean by this. Would a function be helpful which returns the automatically deduced arrow schema?

You could then apply generic logic to that schema to your hearts content. Together with the ability to feed it back into the creation of the reader.

pacman82 · 2023-12-23T13:35:47Z

Alternatively? How do you think about a function you can pass which manipulates the schema?

What I currently rule out are any decisions based on the values of the fields in the table themselves. This would really mess with the stream based nature of the package. Still applications can feel free to restart the stream based on what they see.

jonashaag · 2023-12-25T21:58:01Z

Yea, that would work well! If there's a way to pass the first batch of data to the callback that might be even more general purpose, although I don't know what representation could be used for the data in that case, outside of all-strings.

Just a note: I'm not aware of any other library that uses a callback to modify the schema. Usually you can only pass the final schema directly, without the callback. It probably still is the better design for arrow-odbc-py!

jonashaag · 2023-12-25T22:00:39Z

Btw, I'm also fine fully solving this problem on my end by dynamically getting the schema and adding casts to the query.

pacman82 · 2023-12-28T17:35:00Z

arrow-odbc 2.1.0 has been released. It allows for specifying a schema parameter in order to overwrite the desired target arrow schema. In theory a generic application could already be written on top of that. One could instantiate the reader twice the first time to generate obtain the schema and the second time to with the manipulated and the actually desired target schema. However of course this involves an extra roundtrip to the database and is wasteful.

If I would go the second step, I would probably provide a function to just create the cursor. The user can than inspect and manipulate the schema. With a second function call the cursor would then be turned into the reader.

jonashaag · 2023-12-29T14:47:36Z

Thank you! For now, I consider this fixed (assuming it works, which I'll try out soon)

jonashaag closed this as completed Dec 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support reading datetime2 as timestamp[ms] #69

Support reading datetime2 as timestamp[ms] #69

jonashaag commented Dec 22, 2023

pacman82 commented Dec 22, 2023

jonashaag commented Dec 22, 2023

pacman82 commented Dec 23, 2023 •

edited

Loading

pacman82 commented Dec 23, 2023

pacman82 commented Dec 23, 2023

jonashaag commented Dec 25, 2023 •

edited

Loading

jonashaag commented Dec 25, 2023

pacman82 commented Dec 28, 2023

jonashaag commented Dec 29, 2023

Support reading datetime2 as timestamp[ms] #69

Support reading datetime2 as timestamp[ms] #69

Comments

jonashaag commented Dec 22, 2023

pacman82 commented Dec 22, 2023

jonashaag commented Dec 22, 2023

pacman82 commented Dec 23, 2023 • edited Loading

pacman82 commented Dec 23, 2023

pacman82 commented Dec 23, 2023

jonashaag commented Dec 25, 2023 • edited Loading

jonashaag commented Dec 25, 2023

pacman82 commented Dec 28, 2023

jonashaag commented Dec 29, 2023

pacman82 commented Dec 23, 2023 •

edited

Loading

jonashaag commented Dec 25, 2023 •

edited

Loading