Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle mismatch between required logical replication slot and wal_level option #920

Closed
redbaron opened this issue Jan 9, 2019 · 7 comments

Comments

@redbaron
Copy link
Contributor

redbaron commented Jan 9, 2019

when patroni config has logical replication slot it should either automatically "bump" wal_level to "replica" or reject config all together.

Right now it reports errors:

2019-01-09 23:38:23,680 ERROR: Failed to create logical replication slot 'permanent_logical_1' plugin='pgoutput'
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/patroni/postgresql.py", line 1617, in sync_replication_slots
    (name, value['plugin'], name))
psycopg2.OperationalError: logical decoding requires wal_level >= logical
@CyberDem0n
Copy link
Member

Explicit is better than implicit. Basically it is your responsibility to provide a correct configuration (wal_level: logical). Also changing of wal_level requires a restart of postgres and Patroni will not do it automatically.
Rejecting config with logical slots is also not an option. We still have to omit warning or error, and better if we do it on every HA cycle, otherwise it gets lost.
Right now the error is absolutely clear and it also contains a hint how to fix it.

@redbaron
Copy link
Contributor Author

Also changing of wal_level requires a restart of postgres and Patroni will not do it automatically.

makes sense, so rejecting is another option.

Explicit is better than implicit.

Agree, thats why synchronously providing error is a safest bet.

Rejecting config with logical slots is also not an option. We still have to omit warning or error, and better if we do it on every HA cycle, otherwise it gets lost.

Incomaptible set of settings shouldn't be any different from incorrect YAML format: either way upon reading the config some validation happens and result is accepted and acted on.

I didn't test, but I assume that if bootstrap config is bad YAML, then Patroni refuses to start. Same thing when config is submitted over API: incorrect YAML generates HTTP error code and it is clear for caller that no changes are made and there is a problem with request configuration. Shouldn't any set of config properties, which are either incompatible between each other or with current runtime config (some "set-on-bootstrap-only" properties for instance) trigger the same feedback , rather than being accepted and silently failing to apply.

@CyberDem0n
Copy link
Member

if bootstrap config is bad YAML, then Patroni refuses to start

Not necessarily a bootstrap, but config in general. And this is an absolutely different issue. It is possible to change permanent slots setting in runtime (patronictl edit-config)

Shouldn't any set of config properties, which are either incompatible between each other or with current runtime config (some "set-on-bootstrap-only" properties for instance) trigger the same feedback

My point is, implementing something that provides feedback requires some effort, while right now we get it for free.

silently failing to apply

This is not true, it is not failing silently, it omits error messages which are absolutely clear and have hints on how to fix issues.

I agree, at the moment we are strictly relying that postgres configuration from DCS or Patroni config doesn't contain anything harmful, what will prevent postgres to start and can do some partial validation only when postgres is running, while it would be really good to do the same even if postgres is down. Unfortunately describing 276+ postgres parameters is quite a bit of work and I don't have time for it.

@redbaron
Copy link
Contributor Author

Not necessarily a bootstrap, but config in general. And this is an absolutely different issue. It is possible to change permanent slots setting in runtime (patronictl edit-config)

unless one goes and edits DCS directly, Patroni has a chance to rejecting incorrect edits and it would be a quite an improvement if it did so. kubectl edit is an example of such feature: should updated configuration is found incompatible with previously applied one (lets say editing already-set clusterIP on a Service), it refuses to accept such edit and resource in a Kubernetes API remains the same. Obviously it can't prevent all kind of errors from going through, but at least those which can be checked upfront are cought

My point is, implementing something that provides feedback requires some effort, while right now we get it for free.
This is not true, it is not failing silently, it omits error messages which are absolutely clear and have hints on how to fix issues.

we don't have feedback at configuration change time, any configuration change receive "success" feedback and then requesting side needs to use other channels to check for real success. It emits error, yes, It is as if PUT request to S3 file was always successful and you'd need to do GET to know if it actually happens.

Unfortunately describing 276+ postgres parameters is quite a bit of work and I don't have time for it.

It doesn't need to be all at once, at least have a scaffolding around new validateNewConfig() which can reject new proposed configuration can be a good start, even if it always returns success. It can then be populated as misconfigurations like one I had discovered.

@ghost
Copy link

ghost commented Apr 3, 2020

@redbaron , you managed to make wal_level logical worked with Patroni ? i have to utilise wal2json extension and not able to do so due to wal_level always being overwritten by patroni to replica.
Any help or pointers would be appreciated.
Thanks,
Aman

@ghost
Copy link

ghost commented Apr 6, 2020

@redbaron please help.

@setevoy
Copy link

setevoy commented May 18, 2020

@redbaron please help.

@ghost Make sure you put wal_level under parameters, not under postgresql directly:

loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
  parameters:
    wal_level: logical
  use_pg_rewind: true
  use_slots: true
retry_timeout: 10
synchronous_mode: true

Then you'll have to run patronictl reload and then patronictl restart (because changing wal_level requires restart)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants