-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce concept of phase1 commit [RHELDST-20490] #596
Conversation
Internally, items have always been categorized into either phase1 or phase2 during commit, with phase2 being committed last. This is needed to ensure, for example, that a repomd.xml is not committed until after all other files referenced by it have been committed. This change now documents that concept and exposes it via the API. By default, clients get the same behavior as always, but they can now also explicitly request a phase1 commit. A phase1 commit writes phase1 items to the DB, same as always, but then stops without proceeding to write phase2 items. It also leaves the publish open for later modifications and a later phase2 commit. === Why? === This will be used to solve the following Pub/rhsm-pulp/exodus-gw integration problem: Imagine that you need to publish multiple Pulp yum repos. You want this to be atomic, so you want them all to use the same exodus-gw publish. You ask the repos to publish, and they succeed, but then something goes wrong during the later exodus-gw commit which is performed outside of Pulp. So, you restart the whole process and republish the repos again using a new exodus-gw publish, and commit that successfully. Problem: as far as Pulp is concerned, the publish tasks from the first attempt were completely successful, as it does not "see" the later failure to commit. Therefore, Pulp incorrectly thinks that the RPMs processed by those tasks are fully published to the CDN, and so skips publishing them during later tasks. This leads to missing content on the CDN. Solution: exodus-rsync, as invoked by Pulp during publish, will request a phase1 commit during publish of each repo. This ensures the processed RPMs (or other non-entrypoint files) are fully published on the CDN by the time the Pulp publish task succeeds, matching Pulp's expectations. The publish of repomd.xml is still held back until a later phase2 commit, retaining the atomic semantics across multiple repos.
I'll get the exodus-rsync side of this up for review as well, so they can be reviewed together before submitting this if desired. |
This adds support to exodus-rsync for using the new commit_mode argument added to the exodus-gw commit API in [1]. The use-case is to allow rhsm-pulp to do a phase1 commit during Pulp publish tasks, to ensure that all RPMs have reached the CDN storage and later fast-forward publishes will work correctly even if the current exodus-gw publish later fails. The default behavior is not to pass any `commit_mode`, which ensures that exodus-rsync remains compatible with older versions of exodus-gw. This change is aiming for flexibility: - the argument's value is intentionally not validated as being "phase1" or "phase2"; validation is left to the server. This avoids exodus-rsync needing further code changes if more commit modes become supported later. - the mode can be configured both using command-line arguments and the config file. The intent is to use the command-line argument from pubtools-exodus, but it seems wise to also support using the config file for cases not yet anticipated. [1] release-engineering/exodus-gw#596
This adds support to exodus-rsync for using the new commit_mode argument added to the exodus-gw commit API in [1]. The use-case is to allow rhsm-pulp to do a phase1 commit during Pulp publish tasks, to ensure that all RPMs have reached the CDN storage and later fast-forward publishes will work correctly even if the current exodus-gw publish later fails. The default behavior is not to pass any `commit_mode`, which ensures that exodus-rsync remains compatible with older versions of exodus-gw. This change is aiming for flexibility: - the argument's value is intentionally not validated as being "phase1" or "phase2"; validation is left to the server. This avoids exodus-rsync needing further code changes if more commit modes become supported later. - the mode can be configured both using command-line arguments and the config file. The intent is to use the command-line argument from pubtools-exodus, but it seems wise to also support using the config file for cases not yet anticipated. [1] release-engineering/exodus-gw#596
The different parts of this are up for review in case anyone wants to look at them all together. |
Another question related to the other two parts you submitted for review:
Where/by what means do the phase 2 commits occur? |
This part hasn't changed, so it's at the same place it always was, which is: https://github.com/release-engineering/pubtools-exodus/blob/032079efd17029957e408688b35971462df021bd/pubtools/exodus/_hooks/pulp.py#L80 Note, the code has not been updated to explicitly add "phase2" option since that's the default anyway, and specifying it just makes the code incompatible with current versions of exodus-gw. |
...and to expand on that a little: since pubtools-exodus is both the place which asks rhsm-pulp to do a phase1 commit, and also the place which manages the final phase2 commit, this is meant to mitigate the risk that some code might ask for a phase1 commit and forget to do the later phase2 commit. As the responsibility for triggering both of them lies in the same place. |
Internally, items have always been categorized into either phase1 or phase2 during commit, with phase2 being committed last. This is needed to ensure, for example, that a repomd.xml is not committed until after all other files referenced by it have been committed.
This change now documents that concept and exposes it via the API. By default, clients get the same behavior as always, but they can now also explicitly request a phase1 commit.
A phase1 commit writes phase1 items to the DB, same as always, but then stops without proceeding to write phase2 items. It also leaves the publish open for later modifications and a later phase2 commit.
Why?
This will be used to solve the following Pub/rhsm-pulp/exodus-gw integration problem:
Imagine that you need to publish multiple Pulp yum repos. You want this to be atomic, so you want them all to use the same exodus-gw publish.
You ask the repos to publish, and they succeed, but then something goes wrong during the later exodus-gw commit which is performed outside of Pulp.
So, you restart the whole process and republish the repos again using a new exodus-gw publish, and commit that successfully.
Problem: as far as Pulp is concerned, the publish tasks from the first attempt were completely successful, as it does not "see" the later failure to commit. Therefore, Pulp incorrectly thinks that the RPMs processed by those tasks are fully published to the CDN, and so skips publishing them during later tasks. This leads to missing content on the CDN.
Solution: exodus-rsync, as invoked by Pulp during publish, will request a phase1 commit during publish of each repo. This ensures the processed RPMs (or other non-entrypoint files) are fully published on the CDN by the time the Pulp publish task succeeds, matching Pulp's expectations. The publish of repomd.xml is still held back until a later phase2 commit, retaining the atomic semantics across multiple repos.