Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_database fails to parse the URI with a question mark in the password #12645

Closed
2 tasks done
danielgafni opened this issue Nov 23, 2023 · 10 comments · Fixed by #13514
Closed
2 tasks done

read_database fails to parse the URI with a question mark in the password #12645

danielgafni opened this issue Nov 23, 2023 · 10 comments · Fixed by #13514
Labels
accepted Ready for implementation documentation Improvements or additions to documentation python Related to Python Polars

Comments

@danielgafni
Copy link

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

pl.read_database("SELECT * FROM table LIMIT 1", "postgres://user:password_with_?@host:port/database")

Log output

File /usr/local/lib/python3.9/site-packages/polars/io/database.py:737, in _read_sql_connectorx(query, connection_uri, partition_on, partition_range, partition_num, protocol, schema_overrides)
    734 except BaseException as err:
    735     # basic sanitisation of /user:pass/ credentials exposed in connectorx errs
    736     errmsg = re.sub("://[^:]+:[^:]+@", "://***:***@", str(err))
--> 737     raise type(err)(errmsg) from err
    739 return from_arrow(tbl, schema_overrides=schema_overrides)

RuntimeError: parse error: invalid port number

Issue description

pl.read_database started failing after the password inside the URI string got changed. It now contains a question mark symbol - ?. sqlalchemy is able to use this string correctly, but polars fails to parse it.

Expected behavior

Question marks should not break URI parsing

Installed versions

--------Version info---------
Polars:              0.19.15
Index type:          UInt32
Platform:            Linux-6.3.5-arch1-1-x86_64-with-glibc2.38
Python:              3.9.13 (main, Apr 13 2023, 18:49:10) 
[GCC 12.2.0]

----Optional dependencies----
adbc_driver_sqlite:  <not installed>
cloudpickle:         2.2.1
connectorx:          0.3.1
deltalake:           0.13.0
fsspec:              2023.6.0
gevent:              <not installed>
matplotlib:          <not installed>
numpy:               1.24.4
openpyxl:            <not installed>
pandas:              2.0.3
pyarrow:             12.0.1
pydantic:            1.10.12
pyiceberg:           <not installed>
pyxlsb:              <not installed>
sqlalchemy:          1.4.41
xlsx2csv:            <not installed>
xlsxwriter:          <not installed>
@danielgafni danielgafni added bug Something isn't working python Related to Python Polars labels Nov 23, 2023
@deanm0000
Copy link
Collaborator

Try putting a \ in front of the ?

@Julian-J-S
Copy link
Contributor

@danielgafni
Copy link
Author

Thanks for the workarounds!
Seems like polars should handle this internally tho?

@Julian-J-S
Copy link
Contributor

I think manually escaping the pw is no future proof solution but using my linked method should work fine.
Both ways are possible. One can say it's your job to create a valid connection string ;)
But I agree, if polars does NOT do this it should be documented that the user has to do this and maybe give an easy example :)

@deanm0000
Copy link
Collaborator

deanm0000 commented Nov 23, 2023

Not all databases have the same connection string syntax and the focus is on making the queries work better rather than making connections string builders.

Once they make simple connection string builders then every option is fair game for someone to ask for.

Better to just leave it to the user or maybe a different library, maybe sqlalchemy can make a connection without doing anything else. I don't know.

@deanm0000
Copy link
Collaborator

@stinodego should this be closed or just turned into a feature request?

@stinodego
Copy link
Member

I'd like to phone a friend on this one. @alexander-beedie what's your take on this?

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Dec 14, 2023

I'd like to phone a friend on this one. @alexander-beedie what's your take on this?

I think @deanm0000 is right; SQLAlchemy supports only itself - we support multiple engines/backends, and I agree that we don't want to get into the business of taking connection strings, parsing them, working out which specific rules apply for the given engine/backend combination, escaping accordingly, and then recombining them and handing off. The caller should know what is valid, and provide it in the correct form.

We should also probably assume that there are plenty of people already escaping their password correctly. If we now start applying quoting rules we will break them.

Clearly documenting that you are expected to supply a valid connection string "ready for use" when passed into the given engine is a good move, and I think that's sufficient 👍

@stinodego
Copy link
Member

stinodego commented Dec 14, 2023

Thanks Alex!

Right, so we'll accept a PR that improves the documentation on this, but we will not support it. I'll mark this as accepted.

@isaacnfairplay
Copy link

This issue does not impact when using polars 0.18.11 with connectorx 0.3.2a7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation documentation Improvements or additions to documentation python Related to Python Polars
Projects
Archived in project
6 participants