Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that content IDs are unique in a Nessie repository #7757

Merged
merged 4 commits into from Dec 4, 2023

Conversation

snazy
Copy link
Member

@snazy snazy commented Nov 24, 2023

Nessie content IDs are random IDs, but we do not guarantee that those are actually really unique.

This change adds a new object type to ensure that a generated ID is unique by leveraging existing functionality of the Persist framework that already provides "INSERT IF NOT EXIST" guarantees.

New content IDs from this change on are now verified. This change does not include functionality to automatically add already existing content-IDs. IMHO it is probably okay for now given the practically non-existing probability of content-ID conflicts.

@snazy snazy requested a review from adutra November 28, 2023 09:05
@snazy snazy force-pushed the ensure-unique-ids branch 2 times, most recently from 0ab4be2 to 2c94b17 Compare December 1, 2023 16:44
Copy link
Contributor

@adutra adutra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM although there are multiple conflicts because of #7771.

Nessie content IDs are random IDs, but we do not guarantee that those are actually really unique.

This change adds a new object type to ensure that a generated ID is unique by leveraging existing functionality of the `Persist` framework that already provides "`INSERT IF NOT EXIST`" guarantees.

New content IDs from this change on are now verified. This change does not include functionality to automatically add already existing content-IDs. IMHO it is probably okay for now given the practically non-existing probability of content-ID conflicts.
@@ -122,4 +123,10 @@ static ObjId stringDataHash(
hasher.putBytes(text.asReadOnlyByteBuffer());
return hashAsObjId(hasher);
}

static ObjId uniqueIdHash(String space, String value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One general remark: wouldn't it be simpler to model the value as an opaque byte array? It seems we could save some conversions to and from string in the common case where the value is an UUID.

import org.projectnessie.versioned.storage.common.persist.Persist;

/**
* Describes the <em>internal</em> state of a reference when it has been created, managed by {@link
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The javadoc seems to refer to RefObj.

@snazy snazy merged commit b921c82 into projectnessie:main Dec 4, 2023
17 checks passed
@snazy snazy deleted the ensure-unique-ids branch December 4, 2023 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants