Skip to content

Deprecate online restart #894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 19, 2023
Merged

Conversation

JelteF
Copy link
Member

@JelteF JelteF commented Jul 17, 2023

We decided to deprecate online restarts. They have many known problems:

  1. It does not work with TLS connections. Most production PgBouncer
    setups use TLS connections in some way.
  2. application_name is not copied over (it's not part of SHOW FDS like
    the other tracked GUCs)
  3. Any data that's in the SBuf its iobuf is lost during takeover
    (this seems quite bad)
  4. It shows this warning during tests right after takover:
    `WARNING C-0x5620eb57bd90: p1/postgres@127.0.0.1:57066 FIXME: transaction end, but xact_start == 0
    
  5. If the config file is different, but has not been loaded yet. Then
    many things will almost certainly break.
  6. The parameters in track_extra_parameters are not copied over to the
    new process.
  7. It does not work well with SystemD.

The new recommended way to do online restarts is by doing a rolling
restart of PgBouncer processes that listen on the same port using
so_reuseport.

This adds a warning to to the docs about this. And starts reporting a warning when the online restart feature is used.

We decided to deprecate online restarts. They have many known problems:
1. It does not work with TLS connections. Most production PgBouncer
   setups use TLS connections in some way.
2. `application_name` is not copied over (it's not part of SHOW FDS like
   the other tracked GUCs)
3. Any data that's in the `SBuf` its `iobuf` is lost during takeover
   (this seems quite bad)
4. It shows this warning during tests right after takover:
   ```
   `WARNING C-0x5620eb57bd90: p1/postgres@127.0.0.1:57066 FIXME: transaction end, but xact_start == 0
   ```
5. If the config file is different, but has not been loaded yet. Then
   many things will almost certainly break.
6. The parameters in `track_extra_parameters` are not copied over to the
   new process.
7. It does not work well with SystemD.

The new recommended way to do online restarts is by doing a rolling
restart of pgbouncer processes that listen on the same port using
`so_reuseport`.
@JelteF JelteF force-pushed the deprecate-online-restart branch from e9c6030 to 3727c7e Compare July 17, 2023 16:04
@JelteF JelteF requested a review from eulerto July 18, 2023 06:52
@JelteF JelteF merged commit b687220 into pgbouncer:master Jul 19, 2023
@JelteF JelteF deleted the deprecate-online-restart branch July 19, 2023 22:14
@wguerram
Copy link

I used to execute pgbouncer --reboot to reload config changes.

How do i execute restart by using so_reuseport in the terminal?

@JelteF
Copy link
Member Author

JelteF commented Jan 22, 2024

On unix based systems you can reload the config by sending a SIGHUP. e.g. pkill pgbouncer -SIGHUP. On windows you'd want to use the RELOAD command as the pgbouncer admin user.

JelteF added a commit to JelteF/pgbouncer that referenced this pull request Feb 16, 2024
JelteF added a commit that referenced this pull request May 6, 2024
#902)

In 1.20 we announced deprecation of online restarts using the `-R` flag
and recommended for people to use `so_reuseport` instead. But the safe
shutdown logic that we executed when receiving a `SIGINT` signal didn't
actually allow for zero-downtime rolling restarts.

This PR changes the behaviour of our shutdown signals to allow for
rolling restarts. The changes that this PR makes are:

1. SIGTERM does not do "immediate shutdown" anymore, but instead
triggers a newly introduced "super safe shutdown". This "super safe
shutdown" waits for all clients to disconnect before shutting down the
server. This can be used for rolling restarts as described in the newly
added documentation. A second SIGTERM will cause an "immediate
shutdown".
2. SIGINT mostly keeps doing what it was doing previously: waiting until
all servers are released before shutting down PgBouncer. But it
incorporates a few improvements from the new SIGTERM logic:
1. It also stops listening for new connections while PgBouncer is
waiting for servers to be released. This does have the downside that
SIGUSR2 cannot be used to cancel a safe shutdown anymore, but that was
quite a weird/niche feature anyway.
2. Clients that have released their server and then do another query
will be disconnected, instead of putting them in pause mode until either
`query_wait_timeout` triggers or PgBouncer shuts down. This way they can
reconnect quicker to another PgBouncer, because they were never going to
get a new server anyway.
     3.  A that a second SIGINT will trigger an "immediate shutdown"
3. SIGQUIT handling is introduced and now causes an immediate shutdown.

Since this changes the existing behaviour of SIGTERM and SIGINT, this
can be considered a breaking change. But the amount of breakage should
be minimal since most of the time people would not want an immediate
shutdown, and the new SIGINT behaviour is closer to the behaviour most
people would expect.

Related to #894
Related to #900

Since this changes SIGTERM to not do a fast shutdown anymore this also
indirectly addresses #806 and #770.

Fixes #806 
Fixes #770
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants