-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for multiple hosts in postgresql connection string #4392
Comments
|
OK, we can do this, but that URL format would be impossible to integrate because it is not RFC-1738 (unless it is, and I'm just ignorant of that variety, though the doc you point out says "URIs generally follow RFC 3986, except that multi-host connection strings are allowed"). we have a homegrown URL parser which I'd prefer not to mess with at this point. Also the doc you refer to says, " the host, hostaddr, and port options accept a comma-separated list of values. " so user:pass is not part of it. Propose Also note if you have an application that needs to do this right now, you can use creator, so this capability is present now just not nicely within the URL. |
|
Thanks! I didn't know I could use the creator. Will give it a try. Connection string sample was mis-quoted . Just corrected. :) |
|
well anyone can volunteer to implement on this one it's pretty easy... |
|
Hi there. I'm here for the same issue. Actually one another workaround format for multi-host connection string is: And we should also take into account that it's really common way to connect sqlalchemy via URL inside some libraries and frameworks, like |
|
@wronglink this goes beyond "easy to read". We are talking about the URL object which intends to provide a "good enough" URL not just for postgresql using libpq, but for any other database or database driver as well. Consider that this object presents a unified view of the common elements we use to connect to a database, including that there is a scalar ".host" and ".port" attribute. Turning these into lists to suit a specific feature of a single driver on a single database platform is not appropriate, it would add confusion and ambiguity to the URL object in terms of all other platforms. An obvious workaround to turning ".hosts" and ".port" into lists is that we could instead keep .host and .port and then add new collections like .additional_hosts and .additional_ports, which is more awkward but doesn't break backwards compatibility for the world. Still, complexities arise, such as, what if the length of the .additional_hosts and the .additional_ports list are not the same ? This is also an issue with the approach you suggest of "?host=a&host=b?port=a&port=b". OK, perhaps this should be a list of tuples, .additional_host_ports or something like that, so if a port were missing from one of the host specifications, it's just blank. So other than the parsing and generation having to be rewritten as well as all the new tests that would need to be written, and that there's this awkward ".additional_hosts_ports" attribute that seems to be pretty hardcoded to one particular driver, that's how it would look. All of this effort is unnecessary with my proposal that we just keep this simple and localized to the psycopg2 module and use the existing query string functionality. As I am already months if not years behind in the workload of issues that need to be attended towards, I'm not in a position to add these efforts to said workload when a much simpler approach is easily available. After all that effort and rewriting, we have still bolted straight into URL a whole way of working that is entirely arbitrary and specific to one driver. Don't other drivers support multiple hosts? Well sure. The main example that comes to mind is odbc for SQL Server. The pyodbc client driver doesn't support any of that, so the issue is kind of moot, but if it did, note that in SQL Server's case, the multiple hosts are actually defining a more complex concept of primary and failover hosts. Which means if we in theory wanted our URL to work with those, all the work we did to support multiple hosts is still useless, because the different hosts need to be qualified as to what their role is. To be more general, if lots of DBAPIs supported multiple host/port combinations, this would be much easier to do. But they don't, so the reality of how such a feature should be generalized for other drivers is unknown. When we have an unknown as to how an API should be organized, the best answer to what we should do is to avoid guessing, which I'm sensitive towards because I'm typically the person that has to rip out bad decisions. My proposal avoids this guessing by keeping the change entirely localized to the psycopg2 dialect and nothing else. In the general sense, a driver that can connect to multiple hosts may need to define any number of qualities for each host, like weighting, failover, failback, etc. It would look pretty amateurish that SQLAlchemy's URL bolted on a multiple-host/port concept that doesn't even work in the general sense. the fact is, if you truly want to use postgresql's native URL format exactly as written, that is not a problem, that's why create_engine accepts the creator argument. Take the URL and pass it right through to psycopg2.
this proposal does not impact that ability in any way, you'd be able to pass ?host&port along on the URL as you normally would. There are even ways to plug in the URL parsing to create_engine() right now, using engine strategies, so you could intercept the custom PG style URL up front and do what you want with it. I would sooner make these URL interception strategies more robust and documented so that custom URL parsing schemes can be plugged in before I would want to hardcode onto URL itself. |
|
I agree with @zzzeek's implementation idea, but wanted to suggest an alternative that may take the same amount of work - but offer wider benefits. What if the That could be accompanied by an example in the PostgreSQL dialect docs on how to use the |
you mean if "creator" were perhaps a name to another entrypoint-linked module; we have a lot of entrypoint stuff going on already. let me publicize that there is already a plugin system for engines that allows any module or package to be called up based on an entrypoint name where it can then affect the process of building up the engine in any way it needs. I would have suggested using this immediately here, except that at the moment, the plugin system still works after the URL has been parsed. So a system by which the but again, query params are pretty open ended already so I still think a feature like this should be done the simple way first.
|
|
Is this still open? I would like to take this up! |
|
@shreyashag sure this is open. steps are:
thanks! |
|
Thanks, @zzzeek ! Picking this up! |
|
Hello devs, any progress on this ? I have a redundant PostgreSQL server setup and would be very interested by this feature, as well with the more options permitted by libpq, like |
|
Hi there - SQLAlchemy is an open source project, which means any "progress" you would see right here so there is no need to ask, and as you will note, this issue is open for any contributors from the open source community to take it on, so you would see when any such contributor has the interest and motivation to see this through. thanks! |
|
Hello zzzeek, I asked because I thought @shreyashag was maybe having some unsubmitted code that I could take over, and make further tests on, possibly ending up in a new feature. My knowledge of Python and SQLAlchemy is too low to pretend writing any code you would not laugh at, so sorry if I do not pursue your "proposal" to take this on. It might look as They are all listed in libpq-envars. By the way thanks for your efforts in providing this ORM. Great stuff ! |
|
@cbueche if it works for you, you can also pass a "creator" callable to create_engine, send your libpq URL right there: |
|
Thx, but in my specific project I'm using Flask-SQLAlchemy which hide the whole connection stuff in |
Provide support for multiple hosts in the PostgreSQL connection string. A user requested for SQLAlchemy to support multiple hosts within a PostgreSQL URL string. The proposed fix allows this. In the event that the url contains multiple hosts the proposed code will convert the query["hosts"] tuple into a single string. This allows the hosts to then get converted into a valid dsn variable in the psycopg2 connect function. This pull request is: - [ ] A documentation / typographical error fix - Good to go, no issue or tests are needed - [X ] A short code fix - please include the issue number, and create an issue if none exists, which must include a complete example of the issue. one line code fixes without an issue and demonstration will not be accepted. - Please include: `Fixes: #<issue number>` in the commit message - please include tests. one line code fixes without tests will not be accepted. - [ ] A new feature implementation - please include the issue number, and create an issue if none exists, which must include a complete example of how the feature would look. - Please include: `Fixes: #<issue number>` in the commit message - please include tests. **Have a nice day!** Fixes: #4392 Closes: #5554 Pull-request: #5554 Pull-request-sha: 3f7a0ab Change-Id: I3f3768d51b8331de786ffdc025b7ecfc662eafe5 (cherry picked from commit a3640ae933a80f7c98faf6223cd9376c5deb588a)
|
it looks like this has never worked >>> from sqlalchemy.dialects.postgresql import psycopg2
>>> from sqlalchemy.engine import url
>>> u = url.make_url("postgresql+psycopg2://user:pass@/test?host=host1:port1&host=host2:port2&host=host3:port3")
>>> psycopg2.dialect().create_connect_args(u)
([], {'database': 'test', 'user': 'user', 'password': 'pass', 'host': 'host1:port1,host2:port2,host3:port3'})is wrong. per the comment at psycopg/psycopg2#602 (comment) it should be: cc @CaselIT |
|
oh you went and re-opened this one. I commented in the other. personally I think we could follow libpq here and just use |
|
(not really a blocker since this never worked an no-one reported it for 2 years :) ) |
|
the syntax does work as long as you have only zero or one "port" specified. so it is very likely people are using this w/ default port. also the syntax here is emulating libpq's URI syntax which is what people expect. |
|
it's a blocker because the feature works until someone needs a non-standard port on more than one host |
I can't make it work with one port. Only if the port is never specified it works for me
Libpq has no
Ok |
|
yeah this is really bad because if any of the hosts "works" without a port, it wastes time trying the bad host in some cases before falling back try this from sqlalchemy import create_engine
for url in [
"postgresql://scott:tiger@/test?host=localhost&host=localhost&host=localhost",
"postgresql://scott:tiger@/test?host=localhost:5432&host=localhost&host=localhost:5432",
"postgresql://scott:tiger@/test?host=localhost:5432&host=localhost:5432&host=localhost:5432",
]:
e = create_engine(url)
print(f"trying url {url}")
e.connect().close() |
|
this one works by chance In any case I'm personally -1 on the host:port thing. I think we could remove the code and document the libpq way of using host and port separated by comma. |
|
Mike Bayer has proposed a fix for this issue in the main branch: repair psycopg2 (and psycopg) multiple hosts format https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/4017 |
|
Mike Bayer has proposed a fix for this issue in the rel_1_4 branch: repair psycopg2 (and psycopg) multiple hosts format https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/4018 |
|
I prefer host:port because it makes it clear how the port is assocaited with the host. libpq raises errors like "can't match 3 hosts to 2 ports" if the two lists dont match up, which IMO points to that format being a bit more clumsy |
|
It's documented that it needs to have the same number of values in each of query param, so I guess it's ok if they error |
|
yes I know, it's just on their end, it's clumsy that their format necessitates that kind of check. |
Fixed issue in psycopg2 dialect where the "multiple hosts" feature implemented for 🎫`4392`, where multiple ``host:port`` pairs could be passed in the query string as ``?host=host1:port1&host=host2:port2&host=host3:port3`` was not implemented correctly, as it did not propagate the "port" parameter appropriately. Connections that didn't use a different "port" likely worked without issue, and connections that had "port" for some of the entries may have incorrectly passed on that hostname. The format is now corrected to pass hosts/ports appropriately. As part of this change, maintained support for another multihost style that worked unintentionally, which is comma-separated ``?host=h1,h2,h3&port=p1,p2,p3``. This format is more consistent with libpq's query-string format, whereas the previous format is inspired by a different aspect of libpq's URI format but is not quite the same thing. If the two styles are mixed together, an error is raised as this is ambiguous. Fixes: #4392 Change-Id: Ic9cc0b0e6e90725e158d9efe73e042853dd1263f (cherry picked from commit 93e6f4f05ba885b16accf0ad811160dd7d0eec70)
PostgresSQL supports connecting with multiple hosts in connection string.
https://www.postgresql.org/docs/current/static/libpq-connect.html#libpq-multiple-hosts
The multiple hosts in connection string is like
postgresql+psycopg2://user:password@host1:port1,user:password@host2:port2/dbnameUnfortunately current SQLAlchemy behavior is parsing
password@host1:port1,user:passwordout as password to try connecting.In fact both libpq and psycopg2(psycopg/psycopg2#602) support multi-host already. We might want to add more code to the engine/url.py to add this support in SQLAlchemy.
The text was updated successfully, but these errors were encountered: