Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple hosts in postgresql connection string #4392

Closed
wingc-china opened this issue Dec 1, 2018 · 33 comments
Closed

Support for multiple hosts in postgresql connection string #4392

wingc-china opened this issue Dec 1, 2018 · 33 comments
Labels
blocker issue that must be resolved asap as it is preventing things from working bug Something isn't working near-term release addition to the milestone which indicates this should be in a near-term release postgresql use case not really a feature or a bug; can be support for new DB features or user use cases not anticipated
Milestone

Comments

@wingc-china
Copy link

PostgresSQL supports connecting with multiple hosts in connection string.
https://www.postgresql.org/docs/current/static/libpq-connect.html#libpq-multiple-hosts

The multiple hosts in connection string is like
postgresql+psycopg2://user:password@host1:port1,user:password@host2:port2/dbname
Unfortunately current SQLAlchemy behavior is parsing password@host1:port1,user:password out as password to try connecting.

In fact both libpq and psycopg2(psycopg/psycopg2#602) support multi-host already. We might want to add more code to the engine/url.py to add this support in SQLAlchemy.

@zzzeek
Copy link
Member

zzzeek commented Dec 1, 2018

OK, we can do this, but that URL format would be impossible to integrate because it is not RFC-1738 (unless it is, and I'm just ignorant of that variety, though the doc you point out says "URIs generally follow RFC 3986, except that multi-host connection strings are allowed"). we have a homegrown URL parser which I'd prefer not to mess with at this point.

Also the doc you refer to says, " the host, hostaddr, and port options accept a comma-separated list of values. " so user:pass is not part of it.

Propose postgresql+psycopg2://user:pass@/dbname?host=host1:port1&host=host2&host=..., since we do support repeated query parameters.

Also note if you have an application that needs to do this right now, you can use creator, so this capability is present now just not nicely within the URL.

@zzzeek zzzeek added postgresql feature engine engines, connections, transactions, isolation levels, execution options labels Dec 1, 2018
@zzzeek zzzeek added this to the 1.3.xx milestone Dec 1, 2018
@zzzeek zzzeek added the easy a one / two liner type of thing that anyone can do in short order. also see "fairly easy" label Dec 1, 2018
@wingc-china
Copy link
Author

wingc-china commented Dec 4, 2018

Thanks! I didn't know I could use the creator. Will give it a try.
In fact I did some hack to make code in url.py try not parsing port to integer, but pass the port parameter as string to psycopg2 layer. And I tried a connection string as "postgresql+psycopg2://[username]:[password]@Host1,host2:port1,port2/[db name]". It seems working well :)
Let me know if anything else I can help, like testing or something.

Connection string sample was mis-quoted . Just corrected. :)

@zzzeek
Copy link
Member

zzzeek commented Dec 4, 2018

well anyone can volunteer to implement on this one it's pretty easy...

@wronglink
Copy link

Hi there. I'm here for the same issue. Actually one another workaround format for multi-host connection string is: postgresql+psycopg2://user:pass@/dbname?host=host1,host2,host3&port=111,222,333. But I think that the postgres docs format is way more readable: postgresql+psycopg2://user:pass@host1:111,host2:222,host3:333/dbname as it is easy to mix host parameters with some additional parameters like ssl or writable replica requirement.

And we should also take into account that it's really common way to connect sqlalchemy via URL inside some libraries and frameworks, like flask-sqlalchemy or celery. It would be way more convenient if public API url.make_url would support the example from the postgres docs.

@zzzeek
Copy link
Member

zzzeek commented Dec 6, 2018

@wronglink this goes beyond "easy to read". We are talking about the URL object which intends to provide a "good enough" URL not just for postgresql using libpq, but for any other database or database driver as well. Consider that this object presents a unified view of the common elements we use to connect to a database, including that there is a scalar ".host" and ".port" attribute. Turning these into lists to suit a specific feature of a single driver on a single database platform is not appropriate, it would add confusion and ambiguity to the URL object in terms of all other platforms.

An obvious workaround to turning ".hosts" and ".port" into lists is that we could instead keep .host and .port and then add new collections like .additional_hosts and .additional_ports, which is more awkward but doesn't break backwards compatibility for the world. Still, complexities arise, such as, what if the length of the .additional_hosts and the .additional_ports list are not the same ? This is also an issue with the approach you suggest of "?host=a&host=b?port=a&port=b". OK, perhaps this should be a list of tuples, .additional_host_ports or something like that, so if a port were missing from one of the host specifications, it's just blank.

So other than the parsing and generation having to be rewritten as well as all the new tests that would need to be written, and that there's this awkward ".additional_hosts_ports" attribute that seems to be pretty hardcoded to one particular driver, that's how it would look. All of this effort is unnecessary with my proposal that we just keep this simple and localized to the psycopg2 module and use the existing query string functionality. As I am already months if not years behind in the workload of issues that need to be attended towards, I'm not in a position to add these efforts to said workload when a much simpler approach is easily available.

After all that effort and rewriting, we have still bolted straight into URL a whole way of working that is entirely arbitrary and specific to one driver. Don't other drivers support multiple hosts? Well sure. The main example that comes to mind is odbc for SQL Server. The pyodbc client driver doesn't support any of that, so the issue is kind of moot, but if it did, note that in SQL Server's case, the multiple hosts are actually defining a more complex concept of primary and failover hosts. Which means if we in theory wanted our URL to work with those, all the work we did to support multiple hosts is still useless, because the different hosts need to be qualified as to what their role is. To be more general, if lots of DBAPIs supported multiple host/port combinations, this would be much easier to do. But they don't, so the reality of how such a feature should be generalized for other drivers is unknown. When we have an unknown as to how an API should be organized, the best answer to what we should do is to avoid guessing, which I'm sensitive towards because I'm typically the person that has to rip out bad decisions. My proposal avoids this guessing by keeping the change entirely localized to the psycopg2 dialect and nothing else.

In the general sense, a driver that can connect to multiple hosts may need to define any number of qualities for each host, like weighting, failover, failback, etc. It would look pretty amateurish that SQLAlchemy's URL bolted on a multiple-host/port concept that doesn't even work in the general sense.

the fact is, if you truly want to use postgresql's native URL format exactly as written, that is not a problem, that's why create_engine accepts the creator argument. Take the URL and pass it right through to psycopg2.

And we should also take into account that it's really common way to connect sqlalchemy via URL inside some libraries and frameworks, like flask-sqlalchemy or celery

this proposal does not impact that ability in any way, you'd be able to pass ?host&port along on the URL as you normally would. There are even ways to plug in the URL parsing to create_engine() right now, using engine strategies, so you could intercept the custom PG style URL up front and do what you want with it. I would sooner make these URL interception strategies more robust and documented so that custom URL parsing schemes can be plugged in before I would want to hardcode onto URL itself.

@jvanasco
Copy link
Member

jvanasco commented Dec 6, 2018

I agree with @zzzeek's implementation idea, but wanted to suggest an alternative that may take the same amount of work - but offer wider benefits.

What if the creator argument accepted a "dotted notation" string which specifies a callable? (SqlAlchemy does this on relationships, Pyramid does this all over). That would handle the concern for flask-sqlalchemy, celery and other configuration-file based apps to enjoy the benefit of this pyscopg2 feature, WHILE also have general control over the creator within configs as well (yay!).

That could be accompanied by an example in the PostgreSQL dialect docs on how to use the creator to handle the multiple host arguments.

@zzzeek
Copy link
Member

zzzeek commented Dec 6, 2018

I agree with @zzzeek's implementation idea, but wanted to suggest an alternative that may take the same amount of work - but offer wider benefits.

What if the creator argument accepted a "dotted notation" string which specifies a callable? (SqlAlchemy does this on relationships, Pyramid does this all over). That would handle the concern for flask-sqlalchemy, celery and other configuration-file based apps to enjoy the benefit of this pyscopg2 feature, WHILE also have general control over the creator within configs as well (yay!).

you mean if "creator" were perhaps a name to another entrypoint-linked module; we have a lot of entrypoint stuff going on already. let me publicize that there is already a plugin system for engines that allows any module or package to be called up based on an entrypoint name where it can then affect the process of building up the engine in any way it needs. I would have suggested using this immediately here, except that at the moment, the plugin system still works after the URL has been parsed. So a system by which the CreateEnginePlugin can also be called up before the URL is parsed would be needed. I think a way this can work is if in create_engine we do some kind of partial URL parsing, e.g. we look only at the "dbname+driver://" part first, or perhaps we only look at the query string part first, then we locate plugins based on that side of things first and give them the chance to give us a URL object of some kind, and only then when we didn't get a URL object do we fall back to our normal parsing. that way drivers and third party apps can make for more arbitrary formats of URL to come in. this would just involve a little bit more tinkering at the above code.

but again, query params are pretty open ended already so I still think a feature like this should be done the simple way first.

That could be accompanied by an example in the PostgreSQL dialect docs on how to use the creator to handle the multiple host arguments.

@shreyashag
Copy link

Is this still open? I would like to take this up!

@zzzeek
Copy link
Member

zzzeek commented Oct 30, 2019

@shreyashag sure this is open. steps are:

  1. support URL as postgresql+psycopg2://user:pass@/dbname?host=host1:port1&host=host2&host=...
  2. tests added to https://github.com/sqlalchemy/sqlalchemy/blob/master/test/dialect/postgresql/test_dialect.py#L53 , see test_psycopg2_empty_connection_string_w_query_one, test_psycopg2_nonempty_connection_string_w_query etc for example

thanks!

@zzzeek zzzeek added use case not really a feature or a bug; can be support for new DB features or user use cases not anticipated and removed feature labels Oct 30, 2019
@shreyashag
Copy link

Thanks, @zzzeek !

Picking this up!
Will put submit a PR shortly!

@zzzeek zzzeek modified the milestones: 1.3.xx, 1.3.x Dec 18, 2019
@cbueche
Copy link

cbueche commented Apr 24, 2020

Hello devs, any progress on this ? I have a redundant PostgreSQL server setup and would be very interested by this feature, as well with the more options permitted by libpq, like target_session_attrs=read-write.

@zzzeek
Copy link
Member

zzzeek commented Apr 24, 2020

Hi there -

SQLAlchemy is an open source project, which means any "progress" you would see right here so there is no need to ask, and as you will note, this issue is open for any contributors from the open source community to take it on, so you would see when any such contributor has the interest and motivation to see this through. thanks!

@cbueche
Copy link

cbueche commented Apr 27, 2020

Hello zzzeek, I asked because I thought @shreyashag was maybe having some unsubmitted code that I could take over, and make further tests on, possibly ending up in a new feature. My knowledge of Python and SQLAlchemy is too low to pretend writing any code you would not laugh at, so sorry if I do not pursue your "proposal" to take this on. It might look as Easy and good first issue for seasoned developers, but trust me, it already took me a significant effort to understand SQLALchemy and use it in my small web-service. For future guys like me, and until someone with more know-how takes this over, issue 4562 has a workaround in form of passing the connection info to libpq using environment variables. In my particular case, I had to set a few variables to get it working :

SQLALCHEMY_DATABASE_URI = 'postgresql://'
PGPORT, 
PGDATABASE, 
PGUSER, 
PGPASSWORD, 
PGSSLMODE, 
PGSSLROOTCERT, 
PGTARGETSESSIONATTRS = 'any'.

They are all listed in libpq-envars. By the way thanks for your efforts in providing this ORM. Great stuff !

@zzzeek
Copy link
Member

zzzeek commented Apr 27, 2020

@cbueche if it works for you, you can also pass a "creator" callable to create_engine, send your libpq URL right there:

import psycopg2
e = create_engine(
    "postgresql://", 
    creator=lambda: psycopg2.connect(
      "postgresql://host1:123,host2:456/somedb?target_session_attrs=any&application_name=myapp"
)
)

@cbueche
Copy link

cbueche commented Apr 27, 2020

Thx, but in my specific project I'm using Flask-SQLAlchemy which hide the whole connection stuff in SQLALCHEMY_DATABASE_URI so I haven't identified how to use a creator. The env-variable approach works well as it sort of short-circuit all layers between the application and libpq.

@shreyashag
Copy link

Hey @cbueche @zzzeek, sorry have been a little caught up, submitting a PR over the weekend!

sqlalchemy-bot pushed a commit that referenced this issue Sep 30, 2020
Provide support for multiple hosts in the PostgreSQL connection string.

A user requested for SQLAlchemy to support multiple hosts within a PostgreSQL URL string. The proposed fix allows this. In the event that the url contains multiple hosts the proposed code will convert the query["hosts"] tuple into a single string. This allows the hosts to then get converted into a valid dsn variable in the psycopg2 connect function.

This pull request is:

- [ ] A documentation / typographical error fix
	- Good to go, no issue or tests are needed
- [X ] A short code fix
	- please include the issue number, and create an issue if none exists, which
	  must include a complete example of the issue.  one line code fixes without an
	  issue and demonstration will not be accepted.
	- Please include: `Fixes: #<issue number>` in the commit message
	- please include tests.   one line code fixes without tests will not be accepted.
- [ ] A new feature implementation
	- please include the issue number, and create an issue if none exists, which must
	  include a complete example of how the feature would look.
	- Please include: `Fixes: #<issue number>` in the commit message
	- please include tests.

**Have a nice day!**
Fixes: #4392

Closes: #5554
Pull-request: #5554
Pull-request-sha: 3f7a0ab

Change-Id: I3f3768d51b8331de786ffdc025b7ecfc662eafe5
(cherry picked from commit a3640ae933a80f7c98faf6223cd9376c5deb588a)
@zzzeek
Copy link
Member

zzzeek commented Aug 1, 2022

it looks like this has never worked

>>> from sqlalchemy.dialects.postgresql import psycopg2
>>> from sqlalchemy.engine import url
>>> u = url.make_url("postgresql+psycopg2://user:pass@/test?host=host1:port1&host=host2:port2&host=host3:port3")
>>> psycopg2.dialect().create_connect_args(u)
([], {'database': 'test', 'user': 'user', 'password': 'pass', 'host': 'host1:port1,host2:port2,host3:port3'})

is wrong. per the comment at psycopg/psycopg2#602 (comment) it should be:

([], {'database': 'test', 'user': 'user', 'password': 'pass', 'host': 'host1,host2,host3', 'port':'port1,port2,port3'})

cc @CaselIT

@zzzeek zzzeek reopened this Aug 1, 2022
@zzzeek zzzeek added blocker issue that must be resolved asap as it is preventing things from working and removed easy a one / two liner type of thing that anyone can do in short order. also see "fairly easy" labels Aug 1, 2022
@zzzeek zzzeek modified the milestones: 1.3.x, 1.4.x Aug 1, 2022
@zzzeek zzzeek added near-term release addition to the milestone which indicates this should be in a near-term release bug Something isn't working and removed engine engines, connections, transactions, isolation levels, execution options labels Aug 1, 2022
@CaselIT
Copy link
Member

CaselIT commented Aug 1, 2022

oh you went and re-opened this one.

I commented in the other.

personally I think we could follow libpq here and just use host=host1,host2&port=port1,port2 (note that it would require no change since it's currently working) instead of existing syntax

@CaselIT CaselIT removed the blocker issue that must be resolved asap as it is preventing things from working label Aug 1, 2022
@CaselIT
Copy link
Member

CaselIT commented Aug 1, 2022

(not really a blocker since this never worked an no-one reported it for 2 years :) )

@zzzeek
Copy link
Member

zzzeek commented Aug 1, 2022

the syntax does work as long as you have only zero or one "port" specified. so it is very likely people are using this w/ default port.

also the syntax here is emulating libpq's URI syntax which is what people expect.

@zzzeek
Copy link
Member

zzzeek commented Aug 1, 2022

it's a blocker because the feature works until someone needs a non-standard port on more than one host

@CaselIT
Copy link
Member

CaselIT commented Aug 1, 2022

the syntax does work as long as you have only zero or one "port" specified. so it is very likely people are using this w/ default port.

I can't make it work with one port. Only if the port is never specified it works for me

also the syntax here is emulating libpq's URI syntax which is what people expect.

Libpq has no ?host=host1:port1&host=host2:port2 or even ?host=host1:port1,host2:port2 in the query params

it's a blocker because the feature works until someone needs a non-standard port on more than one host

Ok

@CaselIT CaselIT added the blocker issue that must be resolved asap as it is preventing things from working label Aug 1, 2022
@zzzeek
Copy link
Member

zzzeek commented Aug 1, 2022

yeah this is really bad because if any of the hosts "works" without a port, it wastes time trying the bad host in some cases before falling back

try this

from sqlalchemy import create_engine

for url in [
    "postgresql://scott:tiger@/test?host=localhost&host=localhost&host=localhost",
    "postgresql://scott:tiger@/test?host=localhost:5432&host=localhost&host=localhost:5432",
    "postgresql://scott:tiger@/test?host=localhost:5432&host=localhost:5432&host=localhost:5432",
]:

    e = create_engine(url)
    print(f"trying url {url}")
    e.connect().close()

@CaselIT
Copy link
Member

CaselIT commented Aug 1, 2022

this one works by chance "postgresql://scott:tiger@/test?host=localhost:5432&host=localhost&host=localhost:5432". Something like "postgresql://scott:tiger@/test?host=localhost:5432" will break

In any case I'm personally -1 on the host:port thing. I think we could remove the code and document the libpq way of using host and port separated by comma.
At most we could combine multiple host and port in a single string separated by comma.

@sqla-tester
Copy link
Collaborator

Mike Bayer has proposed a fix for this issue in the main branch:

repair psycopg2 (and psycopg) multiple hosts format https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/4017

@sqla-tester
Copy link
Collaborator

Mike Bayer has proposed a fix for this issue in the rel_1_4 branch:

repair psycopg2 (and psycopg) multiple hosts format https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/4018

@zzzeek
Copy link
Member

zzzeek commented Aug 1, 2022

I prefer host:port because it makes it clear how the port is assocaited with the host. libpq raises errors like "can't match 3 hosts to 2 ports" if the two lists dont match up, which IMO points to that format being a bit more clumsy

@CaselIT
Copy link
Member

CaselIT commented Aug 1, 2022

It's documented that it needs to have the same number of values in each of query param, so I guess it's ok if they error

@zzzeek
Copy link
Member

zzzeek commented Aug 1, 2022

yes I know, it's just on their end, it's clumsy that their format necessitates that kind of check.

sqlalchemy-bot pushed a commit that referenced this issue Aug 2, 2022
Fixed issue in psycopg2 dialect where the "multiple hosts" feature
implemented for 🎫`4392`, where multiple ``host:port`` pairs could be
passed in the query string as
``?host=host1:port1&host=host2:port2&host=host3:port3`` was not implemented
correctly, as it did not propagate the "port" parameter appropriately.
Connections that didn't use a different "port" likely worked without issue,
and connections that had "port" for some of the entries may have
incorrectly passed on that hostname. The format is now corrected to pass
hosts/ports appropriately.

As part of this change, maintained support for another multihost style that
worked unintentionally, which is comma-separated
``?host=h1,h2,h3&port=p1,p2,p3``. This format is more consistent with
libpq's query-string format, whereas the previous format is inspired by a
different aspect of libpq's URI format but is not quite the same thing.

If the two styles are mixed together, an error is raised as this is
ambiguous.

Fixes: #4392
Change-Id: Ic9cc0b0e6e90725e158d9efe73e042853dd1263f
(cherry picked from commit 93e6f4f05ba885b16accf0ad811160dd7d0eec70)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker issue that must be resolved asap as it is preventing things from working bug Something isn't working near-term release addition to the milestone which indicates this should be in a near-term release postgresql use case not really a feature or a bug; can be support for new DB features or user use cases not anticipated
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants