Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading datetime2 as timestamp[ms] #69

Closed
jonashaag opened this issue Dec 22, 2023 · 9 comments
Closed

Support reading datetime2 as timestamp[ms] #69

jonashaag opened this issue Dec 22, 2023 · 9 comments

Comments

@jonashaag
Copy link

I have a column that is datetime2(prec=27, scale=7, length=8). However all values are actually dates:

-- Result: 0
SELECT COUNT(*)
FROM ...
WHERE (DATEDIFF_BIG(nanosecond, '1800-01-01', col) % 86400000000000) > 0

If you have dates outside of the range of timestamp[ns], loading data with arrow-odbc-py fails.

I wonder if and how arrow-odbc-py could automatically use a timestamp type with larger range than timestamp[ns] in this case.

For example, we could add an option to allow truncation of timestamps.

@pacman82
Copy link
Owner

The Rust crate allows passing an arrow Schema in case the application developer has insight which is not reflected in the Schema. Arrow does have Date type, doesn't it?

@jonashaag
Copy link
Author

I'd be more interested in having something done automatically, since I'm writing generic/library code.

@pacman82
Copy link
Owner

pacman82 commented Dec 23, 2023

It already works automatically if the type reported by the driver is SQL_TYPE_DATE. If the schema information is not okay, or the driver has problems relaying it precisely input from the application developer is needed.

@pacman82
Copy link
Owner

I'd be more interested in having something done automatically, since I'm writing generic/library code.

I think I need to better understand what you mean by this. Would a function be helpful which returns the automatically deduced arrow schema?

You could then apply generic logic to that schema to your hearts content. Together with the ability to feed it back into the creation of the reader.

@pacman82
Copy link
Owner

Alternatively? How do you think about a function you can pass which manipulates the schema?

What I currently rule out are any decisions based on the values of the fields in the table themselves. This would really mess with the stream based nature of the package. Still applications can feel free to restart the stream based on what they see.

@jonashaag
Copy link
Author

jonashaag commented Dec 25, 2023

Yea, that would work well! If there's a way to pass the first batch of data to the callback that might be even more general purpose, although I don't know what representation could be used for the data in that case, outside of all-strings.

Just a note: I'm not aware of any other library that uses a callback to modify the schema. Usually you can only pass the final schema directly, without the callback. It probably still is the better design for arrow-odbc-py!

@jonashaag
Copy link
Author

Btw, I'm also fine fully solving this problem on my end by dynamically getting the schema and adding casts to the query.

@pacman82
Copy link
Owner

arrow-odbc 2.1.0 has been released. It allows for specifying a schema parameter in order to overwrite the desired target arrow schema. In theory a generic application could already be written on top of that. One could instantiate the reader twice the first time to generate obtain the schema and the second time to with the manipulated and the actually desired target schema. However of course this involves an extra roundtrip to the database and is wasteful.

If I would go the second step, I would probably provide a function to just create the cursor. The user can than inspect and manipulate the schema. With a second function call the cursor would then be turned into the reader.

@jonashaag
Copy link
Author

Thank you! For now, I consider this fixed (assuming it works, which I'll try out soon)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants