Skip to content

Fix race condition when pulling manifest with duplicate digests#688

Merged
amisevsk merged 1 commit intokitops-ml:mainfrom
amisevsk:pull-dupe-digests
Jan 15, 2025
Merged

Fix race condition when pulling manifest with duplicate digests#688
amisevsk merged 1 commit intokitops-ml:mainfrom
amisevsk:pull-dupe-digests

Conversation

@amisevsk
Copy link
Copy Markdown
Contributor

Description

When a manifests contains two layers with the same digest, pulling that ModelKit can fail if we try to download both (identical) layers simultaneously. The first download to win the race moves the file from the ingest/ directory to blobs/sha256/, and the second fails as the source file does not exist.

To avoid this, we pull each digest only once.

Linked issues

Closes #687

When a manifests contains two layers with the same digest, pulling that
ModelKit can fail if we try to download both (identical) layers
simultaneously. The first download to win the race moves the file from
the ingest/ directory to blobs/sha256/, and the second fails as the
source file does not exist.

To avoid this, we pull each digest only once.
@amisevsk amisevsk requested a review from gorkem January 10, 2025 21:20
var semErr error
// In some cases, manifests can contain duplicate digests. If we try to concurrently pull the same digest
// twice, a race condition will cause the pull the fail.
pulledDigests := map[string]bool{}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be a preallocated map

	pulledDigests := make(map[string]bool, len(toPull))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but in the case we're trying to solve this would be overallocating -- this is a set used to check whether we've seen each element before. With v0.4.0, we can create ModelKits that have the same layer dozens of times (e.g. empty readme.md files in different directories)

@amisevsk amisevsk merged commit debb97a into kitops-ml:main Jan 15, 2025
@amisevsk amisevsk deleted the pull-dupe-digests branch January 15, 2025 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Concurrently pulling a ModelKit that has multiple layers with the same digest can fail

2 participants