Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ostree/prune: Calculate reachability under exclusive lock #2808

Merged
merged 1 commit into from
Feb 1, 2023

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Jan 30, 2023

When we calculate the reachability set in ostree prune, we do this
without any locking. This means that between the time we build the set
and when we call ostree_repo_prune_from_reachable, new content
might've been added. This then causes us to immediately prune that
content since it's not in the now outdated set.

Fix this by calculating the set under an exclusive lock.

I think this is what happened in
fedora-silverblue/issue-tracker#405. While
the pruner was running, the new-updates-sync script[1] was importing
content into the repo. The newly imported commits were immediately
deleted by the many ostree prune --commit-only calls the pruner does,
breaking the refs.

[1] https://pagure.io/fedora-infra/ansible/blob/35b35127e444/f/roles/bodhi2/backend/files/new-updates-sync#_18

When we calculate the reachability set in `ostree prune`, we do this
without any locking. This means that between the time we build the set
and when we call `ostree_repo_prune_from_reachable`, new content
might've been added. This then causes us to immediately prune that
content since it's not in the now outdated set.

Fix this by calculating the set under an exclusive lock.

I think this is what happened in
fedora-silverblue/issue-tracker#405. While
the pruner was running, the `new-updates-sync` script[1] was importing
content into the repo. The newly imported commits were immediately
deleted by the many `ostree prune --commit-only` calls the pruner does,
breaking the refs.

[1] https://pagure.io/fedora-infra/ansible/blob/35b35127e444/f/roles/bodhi2/backend/files/new-updates-sync#_18
Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. Classic GC issue. The obvious optimization here would be to skip pruning any commits created (filesystem timestamp I guess) after the scan started.

But for now, SGTM.

@jlebon
Copy link
Member Author

jlebon commented Jan 30, 2023

Ah, yes. Classic GC issue. The obvious optimization here would be to skip pruning any commits created (filesystem timestamp I guess) after the scan started.

But for now, SGTM.

Good idea. Let me try this and see how hard it'd be. It's definitely unfortunate to be locking for the whole operation since it could take a long time to calculate reachability.

Hmm, though I guess the timestamp trick won't work if you're somehow using rsync for getting content into the repo and preserving timestamps (I think this has come up before IIRC). I wonder if what we actually want here is a new OstreeRepoPruneOptions field where the client also passes the list of objects to consider for pruning. So we'd list the objects, then calculate reachability, and then pass the objects and the set into the prune API.

@jlebon jlebon changed the title ostree/prune: Calculate reachability under exclusive lock ostree/prune: List objects to prune before calculating reachability Jan 30, 2023
@jlebon
Copy link
Member Author

jlebon commented Jan 30, 2023

I wonder if what we actually want here is a new OstreeRepoPruneOptions field where the client also passes the list of objects to consider for pruning. So we'd list the objects, then calculate reachability, and then pass the objects and the set into the prune API.

OK, updated this to do that now! It seems safer than the timestamp trick but does require more invasive public API changes.

src/libostree/ostree-repo.c Outdated Show resolved Hide resolved
@jlebon
Copy link
Member Author

jlebon commented Jan 30, 2023

Well this is awkward... make gir doesn't generate code that respects rustfmt. Fixed!

@jlebon jlebon marked this pull request as draft January 31, 2023 14:57
@jlebon jlebon changed the title ostree/prune: List objects to prune before calculating reachability ostree/prune: Calculate reachability under exclusive lock Jan 31, 2023
@jlebon
Copy link
Member Author

jlebon commented Jan 31, 2023

OK I ended up going back to just using an exclusive lock for this. The issue with the previous approach (listing objects and passing it in) is that it doesn't really solve the race issue because refs are usually the last thing that get updated when content is imported into the repo, so there's still a time where objects will appear unreferenced.

The timestamp trick could be made to work, but (1) it would only work on archive repos where the object files don't have canonicalized timestamps (in the non-archive case, we could scope it to just handling the --commit-only case), and (2) it seems risky to rely too much on filesystem timestamps (e.g. NTP adjustments, or some promotion workflow where timestamps are copied from elsewhere such as if using rsync -a). We could have it work in a follow-up, but I think it needs to be opt-in.

Also, benchmarking how much time the reachability calculation step takes revealed that it's far less expensive than the actual ostree_repo_prune_from_reachable call (where we already take an exclusive lock anyway), so it's not as valuable as initially thought to try to optimize this.

@jlebon jlebon marked this pull request as ready for review January 31, 2023 22:15
@cgwalters
Copy link
Member

/retest

@cgwalters cgwalters merged commit 8616c4d into ostreedev:main Feb 1, 2023
@jlebon jlebon deleted the pr/lock-prune branch April 24, 2023 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants