Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Override equals for IndividualUriValues #112

Merged
merged 4 commits into from
Oct 1, 2022
Merged

Override equals for IndividualUriValues #112

merged 4 commits into from
Oct 1, 2022

Conversation

goneall
Copy link
Member

@goneall goneall commented Sep 24, 2022

This will cause any individual URI values to be equal if their URI's are equal.

Fixes #108

Signed-off-by: Gary O'Neall gary@sourceauditor.com

This will cause any individual URI values to be equal if their URI's are
equal.

Fixes #108

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>
@goneall
Copy link
Member Author

goneall commented Sep 25, 2022

@nicoweidner - Let me know if this change looks reasonable - I'll squash the commits before merging.

Also - I'm thinking about spinning a release in the next couple of days with the fixes, let me know if there are any issues you think need to be fixed before the release.

Copy link
Contributor

@nicoweidner nicoweidner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to understand how the classes relate. Assuming that any IndividualUriValue is wholly defined by its individual URI (and all other properties are irrelevant/unused), I think these changes make sense.

There are some technical cases that make me feel uneasy, though (I am a mathematician, I am always looking for pathological cases):

  • There are some cross-combinations where e.g. a ExternalExtractedLicenseInfo could be equal to a ExternalSpdxElement if their URIs agree. I think this cannot actually happen because of format requirements on their ids. I didn't check the patterns or combinations exhaustively.
  • A IndividualUriValue can never be equal to a ModelObject that is not an instance of IndividualUriValue - but a ModelObject can be equal to a IndividualUriValue if their ids agree (in cases where a class extends both).

The first one might actually be considered fine. If the URI is the defining property, then even if the situation is possible or becomes possible by future changes, I can see it being fine to consider such elements equal.

The second one seems problematic though. It is breaking symmetricity of equals (I am all about symmetricity these days 😀 ), and this scenario is actually possible if one picks somewhat artificial ids. I added the following lines to your first test case, and it passed, accepting the annotation as equal to the license info:

ExternalExtractedLicenseInfo eel = new ExternalExtractedLicenseInfo(modelStore, namespace, externalDocId + ":" + id, copyManager, true);
Annotation annotation = new Annotation(modelStore, namespace, "DocumentRef-externalDoc:LicenseRef-ID", copyManager, true);
assertEquals(annotation, eel);

I wonder if this could be solved by adding a more precise class check in ModelObject#equals, i.e.: Instead of

if (!(o instanceof ModelObject)) {
    return false;
}

use

if (o == null || o.getClass() == this.getClass()) {
	return false;
}

Do you see any cases (except IndividualUriValue) where instances of ModelObject should be considered equal even though they are not instances of the exact same class?

Comment on lines 228 to 239
@Override
public boolean equals(Object compare) {
if (!(compare instanceof IndividualUriValue)) {
return false;
}
return ((IndividualUriValue)compare).getIndividualURI().equals(documentNamespace);
}

@Override
public int hashCode() {
return documentNamespace.hashCode();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExternalDocumentRef does not implement IndividualUriValue, so requiring the comparison object to be an instance of IndividualUriValue feels wrong. Was this added by accident, or is there some implicit logic relating the two?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - I believe this is was an incorrect addition. I'll update the PR removing this check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In looking back at the code, the override is for an IndividualUriValue created inline for a property value, so I believe the code is correct:

			setPropertyValue(SpdxConstants.PROP_EXTERNAL_SPDX_DOCUMENT,
					new IndividualUriValue() {
						@Override ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you are totally right, I didn't look close enough 😳
But then I wonder why a SimpleUriValue with documentNamespace as uri isn't used here instead of the anonymous class? I assumed that SimpleUriValue was meant as a generic implementing class of IndividualUriValue, and I believe it would behave the same (apart from a technical difference in hashCode that shouldn't matter)

@goneall
Copy link
Member Author

goneall commented Sep 27, 2022

@nicoweidner Thanks for the extensive review. Below are a few responses:

There are some cross-combinations where e.g. a ExternalExtractedLicenseInfo could be equal to a ExternalSpdxElement if their URIs agree.

This cannot happen if the spec is followed since the ID's must have different prefixes. The code that re-inflates the objects from the individuals uses the unique prefixes to build the correct object, so it can not happen in practice either.

Do you see any cases (except IndividualUriValue) where instances of ModelObject should be considered equal even though they are not instances of the exact same class?

I can't think of any specific cases, but I would be nervous about making this assumption. Similar to IndividualUriValue, a ModelObject would be considered equal if the ID's are equal and they are in the same model store.

There is a fundamental assumption in the spec (and carried forward in the code) that the ID's are unique for each element/ModelObject.

Rather than changing equals in ModelObject, perhaps we should prevent new ModelObjects of a different type from being created using the same ID's. In your example above, the creation of the Annotation should fail for 2 reasons:

  • The ID uses a LicenseRef- prefix which is reserved for license types
  • The ID already exists in a different class

If you assume that ID's are unique, would the equals be considered symmetric?

@nicoweidner
Copy link
Contributor

@nicoweidner Thanks for the extensive review. Below are a few responses:

There are some cross-combinations where e.g. a ExternalExtractedLicenseInfo could be equal to a ExternalSpdxElement if their URIs agree.

This cannot happen if the spec is followed since the ID's must have different prefixes. The code that re-inflates the objects from the individuals uses the unique prefixes to build the correct object, so it can not happen in practice either.

Yeah, I think this case is fine.

Do you see any cases (except IndividualUriValue) where instances of ModelObject should be considered equal even though they are not instances of the exact same class?

I can't think of any specific cases, but I would be nervous about making this assumption.

Do you mean you are hesitant to change the ModelObject#equals method to require strict class equality? My idea was to put in this requirement for the method on the ModelObject level, and then if some subclass wants to use a different implementation, it can override it (as is done for the implementations of IndividualUriValue).

Similar to IndividualUriValue, a ModelObject would be considered equal if the ID's are equal and they are in the same model store.

Do you consider one model store as basically the same as one document? Because technically, it would check for documentUri, not the store, right? (unless it's an anonymous id)
I am asking because while working on some tooling, I dealt with conversion cases, where two stores for the same document can be in play (for two different formats), so they would have the same documentUri/namespace.

There is a fundamental assumption in the spec (and carried forward in the code) that the ID's are unique for each element/ModelObject.

Rather than changing equals in ModelObject, perhaps we should prevent new ModelObjects of a different type from being created using the same ID's. In your example above, the creation of the Annotation should fail for 2 reasons:

  • The ID uses a LicenseRef- prefix which is reserved for license types
  • The ID already exists in a different class

If you assume that ID's are unique, would the equals be considered symmetric?

Haha... I started writing a similar comment in response to your "id's are expected to be unique" statement, then realized you had already written basically the same thing.
I believe that should work as well, yes. I would prefer a solution that doesn't even leave technical loopholes, but it seems fine in practice.
The only situation I can think of that could produce weird results would be the conversion scenario mentioned above, where I have two model stores describing the same document, for different formats (say, tag and json). Then my "counterexample" above would still work. It does appear rather far-fetched at this point though, and if the spec requires specific id formats, I'd say it can be ignored (it would be nice if any id format requirements are also enforced, though!).

Offtopic remark: Thanks for being patient in your explanations of the backgrounds! As I'm still missing a lot of context knowledge, being pedantic helps me understand the precise relationships between objects.

@goneall
Copy link
Member Author

goneall commented Sep 27, 2022

Do you mean you are hesitant to change the ModelObject#equals method to require strict class equality? My idea was to put in this requirement for the method on the ModelObject level, and then if some subclass wants to use a different implementation, it can override it (as is done for the implementations of IndividualUriValue).

Yes - I'm a bit hesitant since the namespace + ID's must be unique and if the namespace + ID is the same, then the should be equal - even if they are instantiated in different classes. BTW - if everything is working properly, than the classes shouldn't be different. I would just want that namespace + ID to take precedence on equality.

Do you consider one model store as basically the same as one document? Because technically, it would check for documentUri, not the store, right? (unless it's an anonymous id)
I am asking because while working on some tooling, I dealt with conversion cases, where two stores for the same document can be in play (for two different formats), so they would have the same documentUri/namespace.

Typically, there is one document in one model store, but the design allows for multiple document URI's and multiple documents in the same store. Most IModelStore methods take a documentUri as one of the parameters. You could not, however, have 2 documents with the same URI in the same ModelStore The CopyManager takes care of conversions between the model stores.

it would be nice if any id format requirements are also enforced, though

This would be my preferred approach. There is a field strict which causes exceptions when an object is created or a parameter set which technically doesn't meet the spec. I'm thinking we should check the ID format and throw an exception if strict is true, otherwise report a validation error.

Offtopic remark: Thanks for being patient in your explanations of the backgrounds! As I'm still missing a lot of context knowledge, being pedantic helps me understand the precise relationships between objects.

No worries - I'm very appreciative of the review and comments. It really helps build a better library.

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>
This reverts commit 7b78b28.

Commit was added to the incorrect branch
@sonatype-lift
Copy link
Contributor

sonatype-lift bot commented Sep 28, 2022

⚠️ 28 God Classes were detected by Lift in this project. Visit the Lift web console for more details.

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>
@goneall
Copy link
Member Author

goneall commented Sep 29, 2022

@nicoweidner I did some refactoring of the common code for IndividualUriValue equals and hashCode.

Unless you see any major issues, I'll merge this after it passes the CI. I'll submit another PR that should help with the symmetric equals.

Copy link
Contributor

@nicoweidner nicoweidner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only left some small comments, looks good to me now!

@goneall goneall merged commit 3d15a78 into master Oct 1, 2022
@goneall goneall deleted the issue108 branch October 1, 2022 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Comparing IndividualUriValues in ModelObject#listsEquivalent may be broken?
2 participants