New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add distributed snapshot support to pg_export_snapshot #13201
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have outlined some key changes that need to be made due to the nature of the master branch and have some comments on the tests.
Also, we should probably disallow export/import of distributed snapshots in utility mode connections (until we ever do gp_export_snapshot) - we can discuss this.
PS: I took the opportunity to do a small refactor that may help during your testing. See attached patch:
0001-Refactor-setDistributedTransactionContext.patch.txt
You can apply it w/ git am
. Please order it so that it comes before your 2 commits in this PR.
So we're replacing the local snapshot logic with distributed snapshots for the |
Not replacing really. The way to think about this is: when we export a snapshot, if it has a distributed snapshot component we export that component too. If it doesn't (like when we are in utility mode), we don't export it. The same reasoning goes behind import. This kind of reflects the whole-part relationship between I personally don't like the idea of isolating this to a separate function (when would we want to export just the distributed snapshot and not the other fields in the snapshot, assuming thats what gp_export_distributed_snapshot() will do? If so, then would we need Thoughts? |
32cb360
to
0e67de9
Compare
For context, the on-disk representation looks like this if we've exported a distributed snapshot.
Related to catalog implications, should we consider adding transaction information functions for distributed transaction details as well? If so, should they be new i.e. |
Do you have an immediate use case in mind? IMO lets not introduce it without one. |
0e67de9
to
c026013
Compare
I don't think we need to ERROR for this case? It's perfectly fine to use distributed snapshot (if wish too) in utility mode connection. What would go wrong? I thought it can be one of the use-cases for distributed snapshot so that can used for utility mode connections as well if required and available. Though only aspect is how will the snapshot data made available to locally on segments but if exist then fine to use it. |
This is precisely why I thought that we should |
Sounds good. Lets capture that context for banning as comment in code (no real tech reason just mechanics and all this background). So, that years from now if need arises we know the rational for ban/error. |
70d8960
to
8aabde7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just one outstanding comment about the import validation.
63f3e3d
to
aa2e8b7
Compare
This commit ensures that we use setDistributedTransactionContext() in sinval.c, the only place where we weren't. Also, this commit ensures that we log the act of setting the context, while respecting the debug_print_full_dtm GUC.
This commit adds distributed snapshot metadata when exporting and importing snapshots. Calling `select pg_export_snapshot()` from the QD will write the following fields to the snapshot file in `pg_snapshots` in the coordinator data directory. ``` dsxminall dsid dsxmin dsxmax dscnt dsxip ``` When a snapshot is imported via SET TRANSACTION SNAPSHOT from the QD, subsequent queries will have the distributed snapshot metadata dispatched to the QEs. This enables cluster-wide data visibility consistency. Co-authored-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com> Co-authored-by: Kate Dontsova <edontsova@pivotal.io> Co-authored-by: Brent Doil <bdoil@vmware.com>
In SetTransactionSnapshot, If the source snapshot already has a distributed snapshot, pass in DTX_CONTEXT_LOCAL_ONLY to GetSnapshotData(). This prevents a new distributed snapshot from being created in GetSnapshotData() and ensures that we can use the distributed snapshot from the source snapshot
aa2e8b7
to
f360379
Compare
Add distributed snapshot support to pg_export_snapshot
This PR adds distributed snapshot metadata when exporting and
importing snapshots.
Calling
select pg_export_snapshot()
from the QD will write thefollowing fields to the snapshot file in
pg_snapshots
in thecoordinator data directory.
When a snapshot is imported via SET TRANSACTION SNAPSHOT from the QD, subsequent
queries will have the distributed snapshot metadata dispatched to the QEs.
This enables cluster-wide data visibility consistency.
Reference: https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/C6cY8yIbcps