Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres replica node restarts with blank data directory #385

Closed
the-redback opened this issue Jan 3, 2019 · 21 comments
Closed

Postgres replica node restarts with blank data directory #385

the-redback opened this issue Jan 3, 2019 · 21 comments
Assignees
Milestone

Comments

@the-redback
Copy link
Contributor

the-redback commented Jan 3, 2019

Currently, when the replica node (pod) starts/restarts, it deletes its data directory, then takes pg_basebackup of primary node, then starts as standby server. ref: https://github.com/kubedb/postgres/blob/b0ed4c6ee2ab737001ef9fc2e3e1b1d48f6bfa24/hack/docker/postgres/9.6.7/scripts/replica/run.sh#L8

Though we can avoid this 'deleteion & pg_basebackup' for replica pod restarts, yet problem can arise if the replica node is unavailable for a long time, and some WAL objects are deleted after that. So, after becoming available, it will lack WAL files, so data loss can happen.

some reference regarding this topic.

Though. replication_slot is a good alternative, it needs to be drop when coresponding replica node removes itself from cluster. The problem is, It's hard to determine which pod is terminating for good, and which one is restarting.

@the-redback
Copy link
Contributor Author

cc: @stormmore

@the-redback
Copy link
Contributor Author

the-redback commented Jan 14, 2019

When a previous master is becoming a new Standby, it may throw error due to conflict in timeline.
In this case, pg_rewind is a good tool to sync old master to new master, without deleting data directory.

PG_REWIND will also help to find if any WAL is missing from primary or standby, to resume the standby.

@the-redback
Copy link
Contributor Author

Why pg_rewind? refs:

Why don't just pull all WAL file?

  • Doesn't solve conflict between different timelines, (mostly, in failover scenario, where a standby node becomes primary)
    So, after pulling wal files, it is required to run pg_rewind.

pw_rewind. Possible error:

  1. target server must be shut down cleanly
  2. could not find previous WAL record at 0/30000F8
  3. could not find common ancestor of the source and target cluster's timelines
  4. target server needs to use either data checksums or "wal_log_hints = on"

@the-redback
Copy link
Contributor Author

  • Error 1: target server must be shut down cleanly

    Target database (localhost) must be shutdown cleanly to perform pg_rewind.
    So, check and if necessary, re-stop the database gracefully. Then, run pg_rewind again.

  • Error 2: could not find previous WAL record at 0/30000F8

    If the server diverged from primary and the diverged WAL doesn't exist anymore,
    pg_rewind will throw an error similar to "could not find previous WAL record at 0/30000F8".
    We have to manually fetch WALs starting from the missing point.
    At this point, we will take pg_basebackup.
    TODO: for wal-g or other kind of wal-archiving, fetch missing WALs from archive storage (may be). Then, run pg_rewind again

  • Error 3: could not find common ancestor of the source and target cluster's timelines

    Since 9.6, pg_rewind is very powerful to find common ancestor while running on non-promoted master.
    ref: https://www.enterprisedb.com/blog/pgrewind-improvements-postgresql-96
    Yet, if the error shows up, taking pg_basebackup is a must.

  • Error 4: target server needs to use either data checksums or "wal_log_hints = on"

    In case of upgrade from previous database version, where 'wal_log_hints' was not turned on, this error may occur.
    But, will not occur again after adding 'wal_log_hints = on' on config file.
    We could have skipped here and manually pull WAL files so that this node can redo wal files.
    But, again, that will not resolve conflict between timelines.
    So, take base_backup and continue

  • Error 5; The error is not any of these.

    In another scenario, pg_rewind is failing and the reason is not 'non-existing WAL' or 'no common ancestor'.
    The workaround could be deleting $PGDATA directory and taking pg_basebackup again.
    But, again the reason is not missing WAl. So, safely exit without processing further.

@Excds
Copy link

Excds commented Jul 16, 2019

We're also experiencing this problem and getting the database wiped. We're running with one primary and two replicas.

@Excds
Copy link

Excds commented Sep 3, 2019

I can now confirm that using WAL-G archiving does not prevent the data loss.

Setup: one primary and two replicas.

@the-redback
Copy link
Contributor Author

  1. How zalando's patroni works: https://jobs.zalando.com/tech/blog/zalandos-patroni-a-template-for-high-availability-postgresql/?gh_src=4n3gxh1
  2. Kubecon talk on patroni https://www.youtube.com/watch?v=CftcVhFMGSY
  3. patroni + pg = spilo docker image
  4. Helm chart: https://github.com/helm/charts/tree/master/incubator/patroni

Install : helm install --name my-release incubator/patroni --namespace demo

Few essential commands to test failover [wip]:

$ PGPASSWORD_SUPERUSER=$(kubectl get secret --namespace demo my-release-patroni -o jsonpath="{.data.password-superuser}" | base64 --decode)
$ PGPASSWORD_ADMIN=$(kubectl get secret --namespace demo my-release-patroni -o jsonpath="{.data.password-admin}" | base64 --decode)
$ kubectl run -i --tty --rm psql --image=postgres \                                      
      --env "PGPASSWORD=$PGPASSWORD_SUPERUSER" \
      --command -- psql -U postgres \
      -h my-release-patroni.demo.svc.cluster.local postgres

select pg_is_in_recovery(), txid_current(); \watch
kubectl exec -it -n demo my-release-patroni-3 -- su postgres --command 'patronictl list my-release-patroni'
watch -n 2 "kubectl exec -it -n demo my-release-patroni-3 -- su postgres --command 'patronictl list my-release-patroni'"

controlled failover- Manually change leader [maybe, for maintanance]:

patronictl list my-release-patroni
patronictl failover my-release-patroni

@Excds
Copy link

Excds commented Nov 28, 2019

3. patroni + pg = spilo docker image

I spent the last two days testing out Zalando's postgres operator (which uses the spilo image (also theirs)) and I can't say that I'm super happy. Sure, the failover setup seems to work nicely without nuking my data (which kubedb does), but trying to use cloning from wal-g and things does not work great. The python code for managing the configuration of spilo is a bit convoluted.

I don't think using the spilo image as a base for anything would be a good idea (well, except that it does contain the timescaledb extensions by default...).

@Excds
Copy link

Excds commented Nov 28, 2019

@tamalsaha @the-redback

Is there any chance that looking into the data loss will have priority soon?

@schmitch
Copy link

schmitch commented Nov 28, 2019

btw. @the-redback basically pg_rewind is the solution. the first versions of the postgres operator from zalando had a similar problem (i used both projects and even zalando patroni without k8s at all). the also used different solutions for leader election. the newest one is totally different than kubedb uses. (they only change endpoints) and as already said the problem can't happen if wal-g is used.

@Excds
Copy link

Excds commented Nov 29, 2019

@schmitch I had the problem happening with wal-g.

@the-redback
Copy link
Contributor Author

@schmitch, I tried using pg_rewind back in 9.6 version may be. Is it more mature now? coz, It didn't work at that time. :(

@schmitch
Copy link

schmitch commented Dec 2, 2019

well it works obviously better than loosing data.

@the-redback
Copy link
Contributor Author

Patroni:

Pros:

Cons:

  • Written in Python.
  • Not enough documentation to know the steps done by spilo.

@Excds
Copy link

Excds commented Dec 10, 2019

This is where spilo starts configuring if to use wal-e or wal-g, I'm not confident in this way of configuring something without a single test that it is actually accurate: https://github.com/zalando/spilo/blob/09b4cc3c18c0b1d8e666157171794a2a181cdc14/postgres-appliance/scripts/configure_spilo.py#L718

@Adriien-M
Copy link

Adriien-M commented Jan 6, 2020

What is the status of the PR https://github.com/kubedb/postgres/pull/248?
Is it preferable to use patroni rather that kubeDB for postgres database currently?

@thevkevindk
Copy link

Guys, My vrp_vehicle gets deleted. Might it have something to do with mysql-async? I read a post that mysql-async is not supported anymore.

Plus I see that I need to alter table cuz the maximum of lenght in the database is to long. I think it has do to with mysql or lscustoms....

@ghost
Copy link

ghost commented Feb 1, 2020

Guys, My vrp_vehicle gets deleted. Might it have something to do with mysql-async? I read a post that mysql-async is not supported anymore.

Plus I see that I need to alter table cuz the maximum of lenght in the database is to long. I think it has do to with mysql or lscustoms....

Please delete this and your account, thanks.

@the-redback the-redback removed their assignment Feb 20, 2020
@kubedb kubedb locked and limited conversation to collaborators Mar 10, 2020
@tamalsaha
Copy link
Member

@tamalsaha
Copy link
Member

This issue has been resolved in KubeDB release v2021.03.17 . Please open a new bug if you still see this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants