Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider plain SQL output for database backups #1307

Closed
spwoodcock opened this issue Feb 29, 2024 · 1 comment
Closed

Consider plain SQL output for database backups #1307

spwoodcock opened this issue Feb 29, 2024 · 1 comment

Comments

@spwoodcock
Copy link
Member

Is your feature request related to a problem? Please describe.

  • Currently we use --format c for a custom binary compressed format.
  • This works fine if restored with the matching version of pg_restore for the pg_dump it was made with.
  • Perhaps we should consider outputting to the default plain SQL output, then gzipping the file ourselves.

Describe the solution you'd like

  • Compare the file size between --format c and without + gzip.
  • Assess which is best from a longevity perspective (probably raw SQL).
@spwoodcock
Copy link
Member Author

spwoodcock commented Apr 17, 2024

It isn't so clear what the best approach could be.

While "the custom format is much more flexible, and while you can convert a custom format dump into an SQL format dump you can't do the reverse easily".

Custom format has the following advantages:

  • Restore can be done via pg_restore, with multiple configuration flags for maximum flexibility.
  • E.g. selectively restore only some tables/schema, can choose whether to include only schema, only data, or both at restore time.
  • It's faster, as it can do simultaneous jobs during the restore, significantly improving restore time.

The raw SQL format has two advantages:

  • For custom format you can only restore the dump file with a version of PostgreSQL that's higher (more recent). Raw SQL has such no restriction.
  • Raw SQL backups can be inspected as they are plain text.

Data Corruption

  • It would appear that using the custom format has the best advantages, with the only main concern being around data corruption.
  • So as long as the custom format does not become corrupted, it would be the best option.
  • A solution to avoid (the unlikely event of) corruption, is to store the backups on S3.
  • AWS has a ridiculous level of redundancy and protection against things like bit-rot.
  • MinIO uses erasure coding on an object level to protect data from loss and corruption.

Closing, using the pg_dump custom format for backups seems like a good tradeoff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

1 participant