Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_sql from Oracle to polars truncates timestamp #273

Closed
wKollendorf opened this issue Apr 26, 2022 · 8 comments
Closed

read_sql from Oracle to polars truncates timestamp #273

wKollendorf opened this issue Apr 26, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@wKollendorf
Copy link
Contributor

wKollendorf commented Apr 26, 2022

What language are you using?

Python

What version are you using?

0.2.5

What database are you using?

Oracle

What dataframe are you using?

polars, pandas, arrow2

Can you describe your bug?

When quering data from Oralce DB, timestamp columns are trunctated to date format (DD-MM-YYYY 00:00:00).

Example query / code
query = 'select timestamp_col from sample_db'
cx.read_sql(conn, query, return_type='polars')

Result:
2022-04-27 00:00:00

Converting timestamp to varchar (within query) and then to datetime (polars) it works:

query = 'select to_char(timestamp_col, 'YYYY-MM-DD HH24:MI:SS') as str_col from sample_db'
cx.read_sql(conn, query, return_type='polars').with_column(
          pl.col('str_col').str.strptime(pl.Datetime, '%F %T').alias('datetime_col') 
)

Result:
2022-04-27 18:21:22

What is the error?

No error message, but wrong dateime format.

@wKollendorf wKollendorf added the bug Something isn't working label Apr 26, 2022
@wKollendorf wKollendorf changed the title read_sql from Oracle to polars truncates datetime read_sql from Oracle to polars truncates timestamp Apr 28, 2022
@wangxiaoying
Copy link
Contributor

Hi @wKollendorf , thanks a lot for the example for error reproduction!

Currently we are using arrow as intermediate result before converting to polars. And it seems like we have some issue in the date/time handling in arrow. We are currently switching to arrow2. A quick fix could be explicitly setting the return_type to arrow2 and then convert to polars like this:

query = 'select timestamp_col from sample_db'
df = cx.read_sql(conn, query, return_type='arrow2')
df = pl.from_arrow(df)

Can you have a try and see whether it works?

@wKollendorf
Copy link
Contributor Author

wKollendorf commented Apr 29, 2022

Hi @wangxiaoying,

your fix works fine, thanks!. While testing this, another issue came up:
When using arrow2 as return_type and we have a column with datatype BinaryFloat, there is an conversion error. I can fix this by converting BinaryFloat to number in the sql statement, but maybe you can take a look at this too.

Error message:
No conversion rule form BinaryFloat(true) to connectorx::destinations::arrow2::typesystem::Arrow2TypeSystem

The same is true for BinaryDouble.

@wangxiaoying
Copy link
Contributor

Hi @wKollendorf , thanks for the report! Can you try 0.2.6-alpha.5 to see whether it fix the issue? You can set return_type=polars2, we just add the above conversion internally in this newest alpha version.

@wKollendorf
Copy link
Contributor Author

Hi @wangxiaoying,
I tried to build it with this command:
python3 -m pip install git+https://github.com/sfu-db/connector-x.git@main#subdirectory=connectorx-python

but failed with following error:

....
Compiling connectorx v0.2.6-alpha.5 (/tmp/pip-req-build-6ibt1gbt/connectorx)
      error: could not compile `connectorx` due to 2 previous errors
      💥 maturin failed
        Caused by: Failed to build a native library through cargo
        Caused by: Cargo build finished with "exit status: 101": `cargo rustc --manifest-path Cargo.toml --message-format json --release --lib --`
      🍹 Building a mixed python/rust project
      🔗 Found pyo3 bindings
      🐍 Found CPython 3.9 at /usr/bin/python3
      error[E0554]: `#![feature]` may not be used on the stable release channel
       --> /tmp/pip-req-build-6ibt1gbt/connectorx/src/lib.rs:1:1
        |
      1 | #![feature(generic_associated_types)]
        | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


      error: aborting due to previous error


      For more information about this error, try `rustc --explain E0554`.

      Error: command ['maturin', 'pep517', 'build-wheel', '-i', '/usr/bin/python3', '--compatibility', 'off'] returned non-zero exit status 1
      [end of output]

Sorry I am not experienced in building python modules myself. Maybe you have a hint, otherwise I can wait for the release version. Your workaround is working fine.

@wangxiaoying
Copy link
Contributor

wangxiaoying commented May 2, 2022

Hi @wKollendorf , connectorx is written in rust and we are using the nightly version (for features like GAT). Currently our main branch is using nightly-2021-11-18. Here is how to build connectorx from source code, the generated wheel file can be found in ./connectorx-python/target/wheel. You need to install our development dependencies first before following the steps.

@wKollendorf
Copy link
Contributor Author

wKollendorf commented May 3, 2022

Hi @wangxiaoying,
that worked flawlessly on my Linux machine. But on the Windows workstation, where I can test the Oracle query, I am having a hard time. After configuring rust and openssl there is following error, when trying to build the package:

         ...
         Compiling stringprep v0.1.2
         Compiling idna v0.2.3
         Compiling paste-impl v0.1.18
error: failed to run custom build command for `openssl-sys v0.9.72`

Got openssl from http://slproweb.com/products/Win32OpenSSL.html
Set env variables:
OPENSSL_DIR to C:\Program Files\OpenSSL-Win64
SSL_CERT_FILE to C:\Program Files\OpenSSL-Win64\certs\cacert-2022-04-26.pem

Found the solution:

set OPENSSL_NO_VENDOR=1

Instruct the openssl-sys crate to use a pre-compiled openssl library. If this is not set it will try to compile it and fail (because perl is typically not available on Windows). (link)

Back to the root issue:

Tried it with:
cx.read_sql('oracle://'+username+':'+password+'@'+server+':'+port+'/'+database, query, return_type='polars2')
The error with conversion of BinaryFloat and Double does not occure, but the timestamp is again trunctated.

Oracle timestamp is converted to polars.datatypes.Date

@wangxiaoying
Copy link
Contributor

Hi @wKollendorf , thanks for the information. We will add your solution to the document for windows users.

The error with conversion of BinaryFloat and Double does not occure, but the timestamp is again trunctated.

That's weird, it works fine in my environment on query SELECT test_timestamp FROM table with test_timestamp of type TIMESTAMP:

image

Can you try the previous workaround to see whether it still works on your side?

@wKollendorf
Copy link
Contributor Author

Can you try the previous workaround to see whether it still works on your side?

Okay...I don't know what happend, but both issues somehow disappeared?!

query = 'select timestamp_col as zpt_fs from sample_db'

Current version

C:\Windows\system32>py -m pip show connectorx
Name: connectorx
Version: 0.2.5

cx.read_sql('oracle://'+username+':'+password+'@'+server+':'+port+'/'+database, query, return_type='polars'))

Untitled
Initially I got 2022-05-01 00:00:00....

New version

C:\Windows\system32>py -m pip show connectorx
Name: connectorx
Version: 0.2.6a5

pl.from_arrow(cx.read_sql('oracle://'+username+':'+password+'@'+server+':'+port+'/'+database, query, return_type='arrow2'))
cx.read_sql('oracle://'+username+':'+password+'@'+server+':'+port+'/'+database, query, return_type='polars2'))
Both return the same result:
Untitled

At the moment I can't explain it....sorry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants