Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repo: Change locking for summary regeneration to be shared #2493

Merged
merged 1 commit into from Dec 6, 2021

Conversation

cgwalters
Copy link
Member

This is trying to address:
https://pagure.io/fedora-iot/issue/48

Basically we changed rpm-ostree to start doing a shared lock during
commit by default, but this broke because pungi is starting a process
doing a commit for each architecture, and then trying to regenerate
the summary after each one.

This patch is deleting a big comment with a rationale for why
summary regeneration should be exclusive. Point by point:

This makes sure the commits and deltas don't get
deleted while generating the summary.

But prune operations require an exclusive lock, which means that
data still can't be deleted when the summary grabs a shared lock.

It also means we can be sure refs
won't be created/updated/deleted during the operation, without having to
add exclusive locks to those operations which would prevent concurrent
commits from working.

First: The status quo has prevented concurrent commits from working!

There is no real locking solution to this problem. What we really
need to do here is regenerate the summary after each commit or
when the caller decides to do it and e.g. include deltas at the same
time.

It's OK if multiple threads race to regenerate the summary;
last-one-wins behavior here is totally fine.

This is trying to address:
https://pagure.io/fedora-iot/issue/48

Basically we changed rpm-ostree to start doing a shared lock during
commit by default, but this broke because pungi is starting a process
doing a commit for each architecture, and then trying to regenerate
the summary after each one.

This patch is deleting a big comment with a rationale for why
summary regeneration should be exclusive.  Point by point:

> This makes sure the commits and deltas don't get
> deleted while generating the summary.

But prune operations require an exclusive lock, which means that
data still can't be deleted when the summary grabs a shared lock.

> It also means we can be sure refs
> won't be created/updated/deleted during the operation, without having to
> add exclusive locks to those operations which would prevent concurrent
> commits from working.

First: The status quo *has* prevented concurrent commits from working!

There is no real locking solution to this problem. What we really
need to do here is regenerate the summary after each commit *or*
when the caller decides to do it and e.g. include deltas at the same
time.

It's OK if multiple threads race to regenerate the summary;
last-one-wins behavior here is totally fine.
@cgwalters
Copy link
Member Author

cc @dbnicholson

@jlebon
Copy link
Member

jlebon commented Dec 3, 2021

If we're OK with multiple ostree summary -u invocations racing and the output file itself is not undefined when run under a changing set of refs (IOW, the worst might be that we output a ref that was just pruned, or just updated, and otherwise it doesn't output any garbage), then do we need any locking at all?

@jlebon
Copy link
Member

jlebon commented Dec 3, 2021

If we're OK with multiple ostree summary -u invocations racing and the output file itself is not undefined when run under a changing set of refs (IOW, the worst might be that we output a ref that was just pruned, or just updated, and otherwise it doesn't output any garbage)

OK right, looking at the code, the obvious problem with this is that this can trivially error out due to readdir(ref dir) + open(ref) race with a prune operation. So yeah, we do need at least a shared lock to not have things go away.

cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Dec 3, 2021
In https://pagure.io/fedora-iot/issue/48
we found a regression from
coreos@e0d2a95

I think this is really an ostree bug; see
ostreedev/ostree#2493
But for now we need to allow callers to choose either the old or new behavior.
This way e.g. pungi can be configured to lock or not without having
to upgrade/downgrade rpm-ostree.
@dbnicholson
Copy link
Member

We talked about this at Endless a couple times and I think we talked about it here, but I can't find it. In my original PR I did have it shared as can be seen in eba8a17. However, when the locking was actually added in #1681 it was made exclusive. I believe that came from a conversation we'd had that I can't find.

I think the idea was that you want the data to be stable so that there aren't inconsistencies between refs, commits and deltas. But I think it's more in line with the rest of ostree for this to be shared and that you'll reach eventual consistency if your processes update the repo metadata consistently.

Anyways, I think it's fine to make this shared.

Copy link
Member

@dbnicholson dbnicholson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dbnicholson
Copy link
Member

One thing I did always want to address that was made possible with the exclusive lock was to ensure you weren't racing when trying to sign the summary. If the summary generation takes an exclusive lock and the summary signing takes an exclusive lock, then you can't accidentally publish a mismatching signature because one process will block the other.

Now that the locking API is public, you can take an exclusive lock before doing this whole process if you desire. The other thing I was trying to achieve in my unfinished #1946 was to make the summary generation and signing non-racy (at least on the repo side), but that's kind of separate.

@cgwalters cgwalters merged commit a07b8d6 into ostreedev:main Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants