Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: remove equals method #150

Closed
RubenVerborgh opened this issue Jan 28, 2019 · 19 comments
Closed

Proposal: remove equals method #150

RubenVerborgh opened this issue Jan 28, 2019 · 19 comments

Comments

@RubenVerborgh
Copy link
Member

RubenVerborgh commented Jan 28, 2019

In #104, there was a suggestion to drop the .equals method from Term and Quad. Let's express opinions with arguments here. Please be clear and brief.

Depends on #151: the assumption for including an equals method at all is that we come to an agreement of what it would look like in #151. Failure to reach such an agreement would likely also mean that equals has to be removed, regardless of the outcomes of this thread. So I would suggest to assume here that #151 succeeds with one of the three possible outcomes in there. If you disagree with any of those outcomes, #151 is probably be the right place.

Please vote with 👍 or 👎 on this message.

@RubenVerborgh
Copy link
Member Author

RubenVerborgh commented Jan 28, 2019

I vote 👎 because our goal with RDF/JS is to make an idiomatic JavaScript API. In such APIs, objects can have properties, methods, etc. The idiomatic OO way to check for equality, in absence of operator overloading, is a member method.

Also, there are zero use case for using "plain objects", not even for exchange, given that better serialization formats for RDF exist. We should not aim to invent another RDF serialization.

@elf-pavlik
Copy link
Member

elf-pavlik commented Jan 28, 2019

I vote 👍 because I see different implementers having mutually exclusive expectations to how such member method should work (chat comment). As I understand implementations can still provide such custom member method behaving as this implementation expects, at the same time libraries which aim to work only based on what the spec defines can still use a function of their choice which will behave in a way they expect. (EDIT: It all depends on resolution of #151 )

I think we still haven't documented how different data factories supposed coexist (eg. https://github.com/rdfjs/representation-task-force/issues/79#issuecomment-457259704) and haven't followed up on converting instances created with one factory to instances of another factory #137 Possibly having more clarity in that aspect will allow implementers to evaluate difference between having a 'blessed' .equals() member methods in the spec and having them as custom addition of specific implementation. Currently as I understand without 'blessed' .equals() implementation would need to:

  • use function that takes two parameters and compares them
  • ensure that all instances come from the factory which implements custom member method which matches their expectations

not even for exchange, given that better serialization formats for RDF exist. We should not aim to invent another RDF serialization.

👍

@RubenVerborgh

This comment has been minimized.

@earthlyreason
Copy link

I vote 👍 because the API should have correct semantics.

The current proposal is incorrect to define equals as an instance method. Idiomatic JavaScript uses static functions for operations that cannot be meaningfully overridden, and equality is a static operation. Proof. Equality is symmetric: for any two implementations A and B, a.equals(b) and b.equals(a) must always give the same result. There is no possible room for implementations—let alone instances—to differentiate on comparison.

Example. Suppose that an implementation used persistent data structures to support reference equality for comparison. That would work only for values that it constructed, but would fail for values constructed by others. In supporting this kind of specialization, the current proposal assumes that "interop" means cross-runtime interop, neglecting shared-runtime interop.

Incidentally, values and mechanisms have distinct semantics, and JS does in fact have API's that use plain objects to represent plain data, which is what RDF structures are. For example, the iterator protocol uses plain { value, done } literals to communicate between producers and consumers.

@RubenVerborgh
Copy link
Member Author

RubenVerborgh commented Jan 28, 2019

equality is a static operation. Proof. Equality is symmetric: for any two implementations A and B, a.equals(b) and b.equals(a) must always give the same result.

Sorry for being the annoying one here 🙂
But I need to point out that the (valid) argument that you are providing is insufficient proof for your thesis. Staticness and symmetry are orthogonal.

Case in point: literal.equals(term) and term.equals(literal) are two different equals methods, because one is a Literal and the other is a Term. So they are symmetric, but not static.

There is no possible room for implementations—let alone instances—to differentiate on comparison.

I have a very good point, and that point is performance:
N3.js will do a quick internal check first before trying the official algorithm (https://github.com/rdfjs/N3.js/blob/v1.0.3/lib/N3DataFactory.js#L183-L187).

So I am exactly doing what you are saying here:

Suppose that an implementation used persistent data structures to support reference equality for comparison.

but

would fail for values constructed by others.

is incorrect because there obviously is a (slower) fallback path.

So definitely registering your vote, but those three particular arguments might need stronger evidence.

@awwright
Copy link
Member

awwright commented Jan 29, 2019

I tend to agree with @elf-pavlik. In #144 I pointed out that Quad#equals isn't very useful, because implementations of data stores will want to implement their own utility functions for identity and data lookups; and the fact that RDF defines equality only over triples and not quads makes Quad#equals somewhat useless in other contexts.

I feel better about Term#equals, however. I think it's appropriate for an RDF API to define a comprehensive API, the same way the DOM API defines an API over DOM. The only issue there is how we handle invalid literals that might occur because of datatype entailment. I'll be researching this in the near future, I don't have a suggestion at the moment.

@RubenVerborgh
Copy link
Member Author

More votes? @blake-regalia? Others?

@timbl
Copy link

timbl commented Jan 31, 2019

Equality between terms is a a pretty fundamental function and if you can't get interoperability between versions on this, then hope to actually be able to interchange libraries seems far off. rdflib.js used to use sameTerm() but has been changing that to equals() to follow this spec.

If @elf-pavlik by "libraries which aim to work only based on what the spec defines can still use a function of their choice which will behave in a way they expect." you mean libraries using this API will have to have implementation aware code then that would be failure - or do I misunderstand?

@earthlyreason
Copy link

earthlyreason commented Jan 31, 2019

@RubenVerborgh, thanks for your thoughtful critique, and indeed for opening this issue.

For context, I am working on a system where independent components can communicate intent, capabilities, and other metadata (y'know, semantic web) to support visibility, portability, composition, tooling, etc. RDF is the obvious choice for this. In some cases this will involve small, standalone functions that do one specific thing, and could themselves be defined as RDF literals (with a type indicating they are JS code). It would be preferable if such functions could emit usable triples without any outside references.

I would maintain that equals is semantically implementation-independent, certainly instance-independent, and from that perspective a static method would be expected.

Yes, there can be performance or other reasons for wanting to use reference identity for comparison. I have run into a case, using rdf.js terms with @thi.ng/rstream-query, which uses reference identity (before falling back to value comparison). That fallback would work in principle, but I also want to extend the terms with additional properties (e.g. runtime value).

However, you can still do these optimized comparisons using static (or ad-hoc) methods. Note that even your N3 implementation (which is full of good ideas) never depends on the equals method of incoming objects, i.e. all of the comparison is using your module's known functions. My implementation (for the above problem) is likewise internal to the rstream-query wrapper and doesn't rely on any .equals methods on incoming terms. Of course, implementations can provide instance methods as extensions, since clients will usually know which implementation they are using (I am extending terms with deserialized runtime value via deref()).

I don't agree that Literal.equals and Term.equals are "two different methods." A Literal.equals that only works on other literals is useless as such (since s, p, and o's will usually be mixed), i.e. there is only Term.equals. Indeed, @awwright brings up a good point that equality is not formally defined for quads. The document RDF 1.1: On Semantics of RDF Datasets makes it very clear that entailment across graphs was left unspecified.

The OP is to remove equals. I only vote for removing it from instances, although I think that in practice an optimized implementation will use its own method anyway. If this is the only instance method in the interface and you're not planning on making breaking changes to the spec, then moving it to a static method will (IMHO) dramatically increase interoperability by supporting plain objects, while still allowing implementation-defined extensions.

@elf-pavlik
Copy link
Member

If @elf-pavlik by "libraries which aim to work only based on what the spec defines can still use a function of their choice which will behave in a way they expect." you mean libraries using this API will have to have implementation aware code then that would be failure - or do I misunderstand?

I meant that some libraries (we could think of them 'lower lever') will choose to only rely on the core @rdfjs/data-model-spec and if we don't define .equals() or define it differently than they would expect #151 they will use equality function they choose to compare terms and quads. This way it should work on instances produced by any compliant rdfjs data factory.

Other applications (we could think of them 'higher level') may choose an implementation of data factory which provides some convenient extensions to complement the minimal spec. In that case, application will expect that all instances get created with that specific data factory and sometimes it may need to convert instances created by other factories #137. #79 also plays role here since it can allow application to provide its preferred data factory as soon as possible and this way minimize need for conversion.

So far we don't have anything on this page: https://github.com/rdfjs/representation-task-force/wiki/Additional-properties but i might seen some custom extensions in various factories

We might need to clarify our strategy for

  • how to write libraries which only depend on minimal @rdfjs/data-model-spec
  • how to write applications which use specific data factory with some custom extensions

@blake-regalia
Copy link
Contributor

blake-regalia commented Feb 1, 2019

re @RubenVerborgh

The idiomatic OO way to check for equality, in absence of operator overloading, is a member method.

Disregarding the operator overloading part, I agree that the idiomatic way for comparing values in these cases is to use instance methods, e.g., localeCompare is an instance method of String and not a static one. This might be the closest precedent from ES.

Also, there are zero use case for using "plain objects" not even for exchange, given that better serialization formats for RDF exist. We should not aim to invent another RDF serialization.

I completely disagree. term.equals({termType:'NamedNode', value:'http://ex.org/value'}) is a use case, whether or not you think it is worse/better than term.equals(factory.namedNode('http://ex.org/value')). We are not trying to invent another serialization format, but we are trying to invent a data structure and "plain objects" are marshaled versions of that. Being able to un/marshal a data structure quickly is done all the time in programming, seldomly for the purpose of interchange, and for good reasons!


re @gavinpc-mindgrub

The OP is to remove equals. I only vote for removing it from instances

and @timbl

Equality between terms is a a pretty fundamental function and if you can't get interoperability between versions on this, [...]

I think this thread may have accidentally conflated "remove equals" with "move equals to static method" and its possible some of the thumbs down votes are merely against the former?


Me: I have mixed feelings about this. IMO, @elf-pavlik makes a lot of good points even if a static method would be the "less idiomatic" way. I think it's dangerous to just defend with "my way is more idiomatic" if there are actual considerations for the alternative. A static method would indeed offer more potential benefits than an instance method, including not having to worry about the ordering of two subjects for comparison, e.g., a.equals(b) when b is allowed to be null/undefined is terrible IMO since this gives warrant to assume you can also do b.equals(a) whereas fac.sameTerm(a, b) or whatever avoids this problem.

@elf-pavlik elf-pavlik added this to the data-model-spec milestone Feb 7, 2019
@bergos
Copy link
Member

bergos commented Feb 11, 2019

There was enough time to exchange up arguments. Let's close this issue. Based on 2 👍 and 8 👎 votes:

RESOLVED: We keep the .equals method of Term and Quad.

@bergos bergos closed this as completed Feb 11, 2019
@awwright
Copy link
Member

@bergos Is voting on this really the best way to come to a consensus? My argument is a bit more nuanced than just a simple up or down vote.

@earthlyreason
Copy link

@bergos I would also add that per @blake-regalia's comment above, it may not have been clear what was being voted on.

The definition of equality is clear, the spec defines it, and implementations should include it. The objection leading to this issue (#104) was just about the fact that requiring instance methods (any instance methods) means that ad hoc objects (such as {termType: "BlankNode", value: "example"}) cannot be conformant, and why? Even if the spec only stipulates a static method for comparison, it's still perfectly possible to provide optimized (e.g. reference or id-based) equality where it matters for performance. In such cases, you would not be relying on methods of incoming objects, anyway.

@bergos
Copy link
Member

bergos commented Feb 11, 2019

Is voting on this really the best way to come to a consensus? My argument is a bit more nuanced than just a simple up or down vote.

@awwright The best way would have been to propose one or more other options to vote for, like we did it in #151, early after this issue was opened. But at some point we have to accept what the majority voted for doesn't go in that direction and discussing and voting again will not change the result.

I would also add that per @blake-regalia's comment above, it may not have been clear what was being voted on.

@gavinpc-mindgrub I think it was clear what we voted on and just the comment by @elf-pavlik was unclear to @timbl . Also keep in mind that even in the same comment @timbl mentioned the .sameTerm() and .equals() methods of rdflib.js. So it should be clear what his opinion is on that topic.

@elf-pavlik
Copy link
Member

For the record @rubensworks comment from rdfjs/stream-spec#16 (comment)

The equals() method on quads and terms is related to this,
which IMHO does more harm than good.
Personally, I ran into more problems related to it (WASM, NAN, ...),
than cases where I actually was able to use it.
As such, I strongly disagree on adding such a method here as well.

@rubensworks could you possibly share more details about problems you ran into?

@rubensworks
Copy link
Member

This issue illustrates an example of the difficulties that are encountered with the .equals() method: LinkedDataFragments/HDT-Node#35

In summary, RDF quads are created in C++ world, which should then be transferred into JS world through the NAN interface.
The .equals() method significantly complicates this process as the transferred objects are not just plain objects anymore.

Note that I'm not saying that the .equals() method should be removed from the spec. I think it's too late for that now, as RDFJS apps are now extensively making use of this.
But if we were ever to create a 2.0 version of the spec, then I would propose rethinking this issue.

@RubenVerborgh
Copy link
Member Author

Also, I've heard a similar argument for WebWorkers, which can only pass simple objects.

(That said, I think it would/should be straightforward to slap a prototype on them.)

@bergos
Copy link
Member

bergos commented Feb 5, 2020

I think this topic is a little bit bigger than just the equals method.

The possibility to pass around a factory to get instances of Term and Quad objects with custom methods without creating copies is part of foundation to find an agreement on this spec. Otherwise it would not have been possible to bring different parties together.

About the plain object problem: Why not use a factory that creates instances like this:

class NamedNode {
  constructor (iri) {
    this.plainObject = {
      termType: 'NamedNode',
      value: iri
    }
  }

  get value () {
    return this.plainObject .value
  }
}

If you need a plain object, just access term.plainObject. You know you need to create a copy if the property doesn't exist.

Btw. the plainObject property could be added to our list of custom properties so there are no name clashes and maybe other libraries will even adopt that property.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants