Introduce concept of phase1 commit [RHELDST-20490] #596

rohanpm · 2023-10-02T05:05:04Z

Internally, items have always been categorized into either phase1 or phase2 during commit, with phase2 being committed last. This is needed to ensure, for example, that a repomd.xml is not committed until after all other files referenced by it have been committed.

This change now documents that concept and exposes it via the API. By default, clients get the same behavior as always, but they can now also explicitly request a phase1 commit.

A phase1 commit writes phase1 items to the DB, same as always, but then stops without proceeding to write phase2 items. It also leaves the publish open for later modifications and a later phase2 commit.

Why?

This will be used to solve the following Pub/rhsm-pulp/exodus-gw integration problem:

Imagine that you need to publish multiple Pulp yum repos. You want this to be atomic, so you want them all to use the same exodus-gw publish.

You ask the repos to publish, and they succeed, but then something goes wrong during the later exodus-gw commit which is performed outside of Pulp.

So, you restart the whole process and republish the repos again using a new exodus-gw publish, and commit that successfully.

Problem: as far as Pulp is concerned, the publish tasks from the first attempt were completely successful, as it does not "see" the later failure to commit. Therefore, Pulp incorrectly thinks that the RPMs processed by those tasks are fully published to the CDN, and so skips publishing them during later tasks. This leads to missing content on the CDN.

Solution: exodus-rsync, as invoked by Pulp during publish, will request a phase1 commit during publish of each repo. This ensures the processed RPMs (or other non-entrypoint files) are fully published on the CDN by the time the Pulp publish task succeeds, matching Pulp's expectations. The publish of repomd.xml is still held back until a later phase2 commit, retaining the atomic semantics across multiple repos.

Internally, items have always been categorized into either phase1 or phase2 during commit, with phase2 being committed last. This is needed to ensure, for example, that a repomd.xml is not committed until after all other files referenced by it have been committed. This change now documents that concept and exposes it via the API. By default, clients get the same behavior as always, but they can now also explicitly request a phase1 commit. A phase1 commit writes phase1 items to the DB, same as always, but then stops without proceeding to write phase2 items. It also leaves the publish open for later modifications and a later phase2 commit. === Why? === This will be used to solve the following Pub/rhsm-pulp/exodus-gw integration problem: Imagine that you need to publish multiple Pulp yum repos. You want this to be atomic, so you want them all to use the same exodus-gw publish. You ask the repos to publish, and they succeed, but then something goes wrong during the later exodus-gw commit which is performed outside of Pulp. So, you restart the whole process and republish the repos again using a new exodus-gw publish, and commit that successfully. Problem: as far as Pulp is concerned, the publish tasks from the first attempt were completely successful, as it does not "see" the later failure to commit. Therefore, Pulp incorrectly thinks that the RPMs processed by those tasks are fully published to the CDN, and so skips publishing them during later tasks. This leads to missing content on the CDN. Solution: exodus-rsync, as invoked by Pulp during publish, will request a phase1 commit during publish of each repo. This ensures the processed RPMs (or other non-entrypoint files) are fully published on the CDN by the time the Pulp publish task succeeds, matching Pulp's expectations. The publish of repomd.xml is still held back until a later phase2 commit, retaining the atomic semantics across multiple repos.

exodus_gw/worker/publish.py

rohanpm · 2023-10-02T21:45:08Z

I'll get the exodus-rsync side of this up for review as well, so they can be reviewed together before submitting this if desired.

This adds support to exodus-rsync for using the new commit_mode argument added to the exodus-gw commit API in [1]. The use-case is to allow rhsm-pulp to do a phase1 commit during Pulp publish tasks, to ensure that all RPMs have reached the CDN storage and later fast-forward publishes will work correctly even if the current exodus-gw publish later fails. The default behavior is not to pass any `commit_mode`, which ensures that exodus-rsync remains compatible with older versions of exodus-gw. This change is aiming for flexibility: - the argument's value is intentionally not validated as being "phase1" or "phase2"; validation is left to the server. This avoids exodus-rsync needing further code changes if more commit modes become supported later. - the mode can be configured both using command-line arguments and the config file. The intent is to use the command-line argument from pubtools-exodus, but it seems wise to also support using the config file for cases not yet anticipated. [1] release-engineering/exodus-gw#596

rohanpm · 2023-10-03T00:18:19Z

The different parts of this are up for review in case anyone wants to look at them all together.

crungehottman · 2023-10-03T21:46:00Z

Another question related to the other two parts you submitted for review:

exodus-rsync, as invoked by Pulp during publish, will request a phase1 commit during publish of each repo. This ensures the processed RPMs (or other non-entrypoint files) are fully published on the CDN by the time the Pulp publish task succeeds, matching Pulp's expectations. The publish of repomd.xml is still held back until a later phase2 commit, retaining the atomic semantics across multiple repos

Where/by what means do the phase 2 commits occur?

rohanpm · 2023-10-03T22:02:49Z

Another question related to the other two parts you submitted for review:

exodus-rsync, as invoked by Pulp during publish, will request a phase1 commit during publish of each repo. This ensures the processed RPMs (or other non-entrypoint files) are fully published on the CDN by the time the Pulp publish task succeeds, matching Pulp's expectations. The publish of repomd.xml is still held back until a later phase2 commit, retaining the atomic semantics across multiple repos

Where/by what means do the phase 2 commits occur?

This part hasn't changed, so it's at the same place it always was, which is: https://github.com/release-engineering/pubtools-exodus/blob/032079efd17029957e408688b35971462df021bd/pubtools/exodus/_hooks/pulp.py#L80

Note, the code has not been updated to explicitly add "phase2" option since that's the default anyway, and specifying it just makes the code incompatible with current versions of exodus-gw.

rohanpm · 2023-10-03T22:13:06Z

Where/by what means do the phase 2 commits occur?

This part hasn't changed, so it's at the same place it always was

...and to expand on that a little: since pubtools-exodus is both the place which asks rhsm-pulp to do a phase1 commit, and also the place which manages the final phase2 commit, this is meant to mitigate the risk that some code might ask for a phase1 commit and forget to do the later phase2 commit. As the responsibility for triggering both of them lies in the same place.

rohanpm force-pushed the phase1-commit branch from 9af04c5 to 017a2e9 Compare October 2, 2023 05:18

rohanpm marked this pull request as ready for review October 2, 2023 05:27

rohanpm requested review from negillett and crungehottman as code owners October 2, 2023 05:27

negillett reviewed Oct 2, 2023

View reviewed changes

exodus_gw/worker/publish.py Show resolved Hide resolved

negillett approved these changes Oct 2, 2023

View reviewed changes

crungehottman reviewed Oct 2, 2023

View reviewed changes

exodus_gw/worker/publish.py Show resolved Hide resolved

rohanpm mentioned this pull request Oct 3, 2023

Support different commit modes [RHELDST-20490] release-engineering/exodus-rsync#364

Merged

crungehottman self-requested a review October 3, 2023 21:06

crungehottman approved these changes Oct 3, 2023

View reviewed changes

rohanpm merged commit 93a3255 into release-engineering:master Oct 4, 2023
3 checks passed

rohanpm deleted the phase1-commit branch October 4, 2023 03:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce concept of phase1 commit [RHELDST-20490] #596

Introduce concept of phase1 commit [RHELDST-20490] #596

rohanpm commented Oct 2, 2023 •

edited

Loading

rohanpm commented Oct 2, 2023

rohanpm commented Oct 3, 2023

crungehottman commented Oct 3, 2023 •

edited

Loading

rohanpm commented Oct 3, 2023

rohanpm commented Oct 3, 2023

Introduce concept of phase1 commit [RHELDST-20490] #596

Introduce concept of phase1 commit [RHELDST-20490] #596

Conversation

rohanpm commented Oct 2, 2023 • edited Loading

Why?

rohanpm commented Oct 2, 2023

rohanpm commented Oct 3, 2023

crungehottman commented Oct 3, 2023 • edited Loading

rohanpm commented Oct 3, 2023

rohanpm commented Oct 3, 2023

rohanpm commented Oct 2, 2023 •

edited

Loading

crungehottman commented Oct 3, 2023 •

edited

Loading