Add load_balance_hosts parameter to db config#736
Conversation
| int max_db_connections = -1; | ||
| int dbname_ofs; | ||
| int pool_mode = POOL_INHERIT; | ||
| int host_strategy = ROUND_ROBIN; |
There was a problem hiding this comment.
| int host_strategy = ROUND_ROBIN; | |
| enum HostStrategy host_strategy = ROUND_ROBIN; |
| res_pool_size = atoi(val); | ||
| } else if (strcmp("max_db_connections", key) == 0) { | ||
| max_db_connections = atoi(val); | ||
| } else if (strcmp("host_strategy", key) == 0) { |
There was a problem hiding this comment.
This needs an entry in config.md to describe its usage.
There was a problem hiding this comment.
Documentation is in. Feedback welcome.
| if (!server->pool->last_connect_failed && server->pool->db->host_strategy == LAST_SUCCESSFUL) | ||
| server->pool->rrcounter = server->pool->last_successful_rrcounter; |
There was a problem hiding this comment.
The current PR seems to have the desired effect based on the tests you wrote, but I have the feeling the same could probably be achieved without the two new fields you added: server->rrcounter and pool->last_successful_rrcounter.
How about we simply increase the pool->rr_counter whenever a connection attempt fails.
There was a problem hiding this comment.
Thanks for the feedback. I simplified the logic as you suggest and it looks better to me.
6087208 to
5c93a29
Compare
|
@JelteF Touching base. Happy to incorporate any additional feedback on the new implementation. |
| db->res_pool_size = res_pool_size; | ||
| db->pool_mode = pool_mode; | ||
| db->max_db_connections = max_db_connections; | ||
| db->host_strategy = host_strategy; |
There was a problem hiding this comment.
if the host_strategy of a database can change, should we tag it as changed?
There was a problem hiding this comment.
That's a good call. Yes, I think we should tag it so behavior changes after a reload. I will make that change. Thanks for the feedback!
There was a problem hiding this comment.
host_strategy is now reloadable.
show databases/show pools also now include host_strategy in their output, which makes this testable and is useful on its own.
b35c9a6 to
5d3104f
Compare
|
I think the code in the PR is fine now (apart from the merge conflict that occured due to another merged PR). But after playing with these changes a bit I found one other issue, and I'm wondering how useful this feature is without also addressing this issue: @dpirotte How are you using this change, that it is useful in its current form? |
|
Hi @dpirotte, github notified me that you rebased this on a branch of your fork. I think a rebased version of this can be merged pretty easily. Could you still explain how you use this in practice? Because when I played with it, another bug in pgbouncer was causing serious issues for me. |
|
Hi @JelteF. Yeah, I’m working on rebasing this and another patch on top of 1.19. The code works but I haven’t updated to your new (much improved!) test framework. The purpose here is to support faster failovers in environments where 1/N hosts in the list will be the primary and (N-1)/N hosts will be replicas and where DNS updates are potentially too slow. RDS/Aurora is one such environment, and I believe some Patroni configurations are as well. Given the following configurations: After a failover, A key part here is the I have another patch that adds lightweight Another option would be to keep the round-robin behavior and make I’ll bump once the tests are ported to pytest and we’ll go from there. |
|
@JelteF Rebased |
| Set the pool mode specific to this database. If not set, | ||
| the default `pool_mode` is used. | ||
|
|
||
| ### host_strategy |
There was a problem hiding this comment.
I think it would make sense to change the name of the option to load_balance_hosts. This option got added to libpq for PG16 (by me). It seems nice to use consistently named options across projects. The last_successful option name should then be changed to disable. And lets change round_robin to round-robin, to be consistent with other arguments like verify-full and read-write.
PS. I especially think consistency is nice here, because you mentioned you also want to add target_session_attrs to PgBouncer. Since target_session_attrs and load_balance_hosts can be used nicely together, I think it's especially useful to have the same names for both Postgres and PgBouncer.
There was a problem hiding this comment.
Btw, adding a random as an option would be a nice future improvement to make sure PgBouncer supports the same as libpq. But that should definitely be another follow up PR. We can merge this PR easily without the random functionality.
There was a problem hiding this comment.
I think it would make sense to change the name of the option to load_balance_hosts. This option got added to libpq for PG16 (by me).
Ha, good call. I noticed that you contributed this feature in the release notes. +100 to consistent naming with libpq. I'll take care of that.
There was a problem hiding this comment.
I prefer the initial naming. It seems clear. load_balance_hosts seems to be a misnomer, since no "load" and no "balancing" is involved?
There was a problem hiding this comment.
I prefer the initial naming. It seems clear.
I think the initial name was a fine name too. But I think being in line with naming of libpq, jdbc and npgsql is a big advantage of the load_balance_hosts naming.
load_balance_hosts seems to be a misnomer, since no "load" and no "balancing" is involved?
The "load" is the number of connection that are made by pgbouncer to the database, and those connections are "balanced" across the different servers in a round-robin fashion (or random fashion when implemented in the future). And you can enable this balancing by using disable which will select a working host and send all traffic there.
IMHO, the old naming describes how pgbouncer chooses hosts. The new naming describes what the user will see happening to their setup by changing the setting. I think I personally like names that describe the effects instead of the method.
host_strategy parameter to db configload_balance_hosts parameter to db config
|
@JelteF Updated naming per discussion above. The macOS build passes now. The mingw{32,64} builds failed a couple times but seem to pass now, so those might have been random failures. The most recent build failures look like Cirrus issues: |
| hostlist2 = port=6666 host=127.0.0.1,127.0.0.1 dbname=p0 user=bouncer | ||
| hostlist_good_first = port=6666 host=127.0.0.1,127.0.0.3 dbname=p0 user=bouncer load_balance_hosts=disable | ||
| hostlist_bad_first = port=6666 host=unresolvable-hostname,127.0.0.1 dbname=p0 user=bouncer load_balance_hosts=disable | ||
| load_balance_hosts_update = port=6666 host=127.0.0.1,127.0.0.3 dbname=p0 user=bouncer load_balance_hosts=disable |
There was a problem hiding this comment.
The load_balance_hosts_update database isn't used (afaict). Did you have a test in mind for it?
| When a comma-separated list is specified in `host`, `load_balance_hosts` controls | ||
| which entry is chosen for a new connection. |
There was a problem hiding this comment.
It would be great if this same logic would be used when receiving multiple IP adresses from DNS. The easiest way to test this is to create lines in /etc/host. Libpq its load_balance_hosts version does this too. So if we don't implement this, we should at least make it explicit in the docs that this setting currently doesn't impact DNS based load balancing (but that this might change in the future).
There was a problem hiding this comment.
Yeah. I'm wondering if this feature also includes #448 or at least to have it in mind.
There was a problem hiding this comment.
Interesting, I didn't see that PR before (I guess I didn't go more than 2 years into PR history when joining the project). Right now #448 is completely distinct from this PR:
- This PR does not implement the
disablemode for DNS load balancing. It still uses round robin even ifdisableis used. - This PR does not implement the
randommode. PR add server_shuffle_hosts #448 implementsrandommode for DNS load balancing only, not for host lists (certainly because host lists were not available in PgBouncer in the time that PR was written).
Th design of the config option would allow for implementing both 1 and 2 though. I think 2 isn't required to implement for this PR. A completely missing option isn't very confusing. But I think 1 is quite confusing because the existing option does not apply to all the places that you would expect, so preferably this PR would implement 1 too. If that's hard for some reason, then I think a note in the docs is also sufficient.
There was a problem hiding this comment.
Hey @JelteF,
I work with the author and he's been a bit too busy to finish responding to your feedback so I'd like to take a stab at it.
I don't quite understand 1. Maybe I don't understand pgbouncer internals well enough yet but can you give me some pointers on implementing the disable mode for DNS load balancing? Sorry for such a vague ask for clarification but I do want to complete this feature as soon as I can.
There was a problem hiding this comment.
So to clarify: PgBouncer does round-robin load balancing in two ways:
- If you have multiple hostnames in the
hostkey, then it will round robin across these. - If a single DNS entry returns multiple IPs, then it load balanaces using round-robin across these IPs: https://www.pgbouncer.org/faq.html#how-to-load-balance-queries-between-several-servers
This same thing is true for libpq and load_balance_hosts can be used to control the behaviour for both of these there. The implementation of load_balance_hosts in this PR only controls the behaviour of 1, not 2. And I think it should control both.
There was a problem hiding this comment.
I was mistaken. Looks like getent hosts XX returns all IPs listed in /etc/hosts for XX.
There was a problem hiding this comment.
OK then the only thing that makes this hard @JelteF is my own understanding. Is this where the DNS rr needs to be disabled? https://github.com/cosgroveb/pgbouncer/blob/host-strategy-cos-rebased/src/objects.c#L1647-L1649
There was a problem hiding this comment.
Okay while getent hosts XX returns all the IPs listed in /etc/hosts should I expect to see all the IPs in /etc/hosts for my database host? I put together this demonstration:
def test_load_balance_hosts_disable_with_dns(bouncer, pg):
bouncer.default_db = "dns_load_balance_hosts_disable"
hosts = f"""
127.0.0.53 dnsdbhost
127.0.0.54 dnsdbhost
127.0.0.55 dnsdbhost
"""
# I have modified utils.py to set listen_addresses='*' for this
# demonstrationsbut would use pg.configure/pg.reload for a real test
with bouncer.run_with_appended_etc_hosts(hosts):
subprocess_result = capture(
["getent", "hosts", "dnsdbhost"],
)
getent_result = subprocess_result.split("\n")
assert "127.0.0.53 dnsdbhost" == getent_result[0]
assert "127.0.0.54 dnsdbhost" == getent_result[1]
assert "127.0.0.55 dnsdbhost" == getent_result[2]
bouncer.sql(query="SELECT 1", user="bouncer", password="zzzz", dbname="dns_load_balance_hosts_disable")
with bouncer.admin_runner.cur() as cur:
results = cur.execute("SHOW DNS_HOSTS").fetchall()
result = [r for r in results if r[0] == 'dnsdbhost'][0]
assert result[2] != "127.0.0.53:0"______________________________________ test_load_balance_hosts_disable_with_dns _______________________________________
/home/admin/code/pgbouncer/test/test_load_balance_hosts.py:70: in test_load_balance_hosts_disable_with_dns
assert result[2] != "127.0.0.53:0"
E AssertionError: assert '127.0.0.53:0' != '127.0.0.53:0'
bouncer = <test.utils.Bouncer object at 0x7f0499fbb2e0>
cur = <psycopg.Cursor [closed] [BAD] at 0x7f049b445460>
getent_result = ['127.0.0.53 dnsdbhost', '127.0.0.54 dnsdbhost', '127.0.0.55 dnsdbhost', '127.0.0.53 dnsdbhost', '127.0.0.54 dnsdbhost', '127.0.0.55 dnsdbhost', ...]
hosts = '\n127.0.0.53 dnsdbhost\n127.0.0.54 dnsdbhost\n127.0.0.55 dnsdbhost\n '
pg = <test.utils.Postgres object at 0x7f0499fbbac0>
result = ('dnsdbhost', 14, '127.0.0.53:0')
results = [('dnsdbhost', 14, '127.0.0.53:0')]
subprocess_result = '127.0.0.53 dnsdbhost\n127.0.0.54 dnsdbhost\n127.0.0.55 dnsdbhost\n127.0.0.53 dnsdbhost\n127.0.0.54 dnsdbhost\n127.0.0.55 dnsdbhost\n'
Because the work on the related feature, target_session_attrs is substantially complete and in need of review I do favor merging this branch as soon as possible. As you can see, the testing situation is a bit icky at the moment for the DNS sub-feature of load_balance_hosts but with some time I can complete it. It would be super beneficial to me if we can start working on getting target_session_attrs into pgbouncer in parallel and I think it would be to other users as well.
There was a problem hiding this comment.
I'll get push access to @dpirotte's fork, push up a more recent rebase, and address your docs feedback below it is recommended to set server_login_retry lower than the default
657a815 to
af174ed
Compare
|
@JelteF I rebased this and addressed the docs feedback above. Let me know if you run into any issues merging this! |
PGBouncer's default behavior is to round-robin between comma-separated hosts on each new connection attempt. This is desirable when all hosts are similar, such as load balancing over multiple replicas, but undesirable when only one host out of the list is expected to successfully connect and login. In this scenario, every other connection attempt would hit an invalid host and fail. The `host_strategy` parameter controls this host selection behavior. The default behavior is labeled `round_robin` and a new strategy called `last_successful` is introduced. This new strategy instructs PGBouncer to prefer the host with most recent successful connections. When that host fails, new connections will round-robin through the list until a successful connection occurs.
In order to maintain behavior that connections always start from the left-most host in the host list, increment rrcounter either before choosing a host (in the case of last_successful) or after choosing a host (in the case of round_robin). (This incorporates PR feedback from @JelteF.)
Previously, host_strategy could only be set at pgbouncer startup, which is unintuitive and inconsistent with other per-database configuration. Also, add host_strategy to both `show databases` and `show pools` output for visibility into the current setting, and to facilitate testing that the configuration properly reloads.
Previously, the host_strategy tests used 127.0.0.2,127.0.0.1 had a "bad IP" (127.0.0.2) as the first entry to force the first backend server connection to fail and verify that `host_strategy=last_successful` directs new connections to the second "good" entry. MacOS doesn't treat 127.0.0.2 in the same manner as Linux by default, so we need a different way to fail the first host in the list. Switching to an unresolvable DNS hostname works fine on both Linux and MacOS.
I incorrectly ported these tests from bash to pytest and so they were not actually verifying that subsequent connections were reusing the last successful host in the list. The tests now verify behavior correctly.
This feature does almost the same thing as the upcoming libpq param called `load_balance_hosts`, so use that name for consistency and rename the configuration options accordingly (`last_successful` => `disable`)
Lost in translation from bash tests to pytest
ff49e96 to
829ff98
Compare
JelteF
left a comment
There was a problem hiding this comment.
I'll need to give this another round of testing, but from what I remember the functionality worked as advertised in the past.
I left a few small comments on the documentation. Also formatting check is still failing.
|
Thanks @JelteF. will keep an eye on CI. |
Co-authored-by: Jelte Fennema-Nio <github-tech@jeltef.nl>
Co-authored-by: Jelte Fennema-Nio <github-tech@jeltef.nl>
3d201c8 to
0d2835c
Compare
|
Hey @JelteF did you run into any trouble testing this? Just checking as I'm rebasing and prepping the |
|
Sorry for the delay here. I'll try to play around with this soonish. Aside from that, since today there's now a discord channel to discuss PgBouncer development. It would be nice if you could join that: https://discordapp.com/channels/1258108670710124574/1300532992304742481 |
|
@JelteF can you link the Discord server? I wonder if I need to join the server first. Your channel link takes me here:
|
|
Aha it must be PostgreSQL Hackers https://lnkd.in/g8n3dZfx |
PGBouncer's default behavior is to round-robin between comma-separated hosts on each new connection attempt. This is desirable when all hosts are similar, such as load balancing over multiple replicas, but undesirable when only one host out of the list is expected to successfully connect and login. In this scenario, every other connection attempt would hit an invalid host and fail. The `host_strategy` parameter controls this host selection behavior. The default behavior is labeled `round_robin` and a new strategy called `last_successful` is introduced. This new strategy instructs PGBouncer to prefer the host with most recent successful connections. When that host fails, new connections will round-robin through the list until a successful connection occurs. --------- Co-authored-by: Brian Cosgrove <cosgroveb@gmail.com>
This was missed in pgbouncer#736
This was missed in pgbouncer#736.

PGBouncer's default behavior is to round-robin between comma-separated
hosts on each new connection attempt. This is desirable when all hosts
are similar, such as load balancing over multiple replicas, but
undesirable when only one host out of the list is expected to
successfully connect and login. In this scenario, every other connection
attempt would hit an invalid host and fail.
The
host_strategyparameter controls this host selection behavior. Thedefault behavior is labeled
round_robinand a new strategy calledlast_successfulis introduced. This new strategy instructs PGBouncerto prefer the host with most recent successful connections. When that
host fails, new connections will round-robin through the list until a
successful connection occurs.