Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document equality/equivalence semantics #36

Closed
ctrueden opened this issue Sep 22, 2019 · 6 comments · Fixed by scijava/incubator#94
Closed

Document equality/equivalence semantics #36

ctrueden opened this issue Sep 22, 2019 · 6 comments · Fixed by scijava/incubator#94
Assignees
Labels
docs Issues relating to documentation of the Ops framework
Milestone

Comments

@ctrueden
Copy link
Member

ctrueden commented Sep 22, 2019

Historically, it has been a requirement of ops that two ops with the same name accepting the same inputs should produce equivalent output, where "equivalent output" means equality of content. This has nice consequences, including facilitating reproducibility and caching. However, there are questions:

It is too draconian with respect to "create" ops that implement Source, since two ops of the same name may return different sorts of objects that are not comparable with one another, given the same inputs (that is to say: no inputs).

If you consider the output type as part of the "OpRef uniqueness signature" then there is no conflict, because the output types differ. We could have e.g. a LongType create() with names="create, create.integerType, create.longType" and DoubleType create() with names="create, create.floatingType, create.doubleType (at higher priority, so that a request for a "create" op of type Source<T extends RealType<T>> prefers the DoubleType creator).

However, "equality of content" is subtle and complex. Do a LongType and a DoubleType have equal content if their getRealDouble() methods return the same value? Do a CellImg and an ArrayImg have equal content if they are the same type with same values and same dimensionality? Intuitively that was the idea, although it is important to note that the will not compare as equal via equals—those classes do not override equals. This invites the possibility of additional new ops: equals or perhaps equalContent that tests two objects for this sort of content equality; and compatible or perhaps isContentComparable that discerns whether it even makes sense to consider this question across two types.

We already have copy computer ops that require you to specify the preallocated output object into which the content of the input will be copied. It would make sense to formally state that doing equalContent(in, out) after calling copy(in, out) should always return true; similarly, if isContentComparable(in, out) returns false then calling copy(in, out) should not find any ops to perform the work, since it makes no sense.

My feeling is that there are many subtleties and edge cases surrounding these behaviors, so: A) these design decisions are crucial, and we need to write them down as formally as possible; and B) where feasible, we need to programmatically validate them in unit tests etc.

@gselzer gselzer added this to the 1.0.0 milestone Jun 26, 2020
gselzer added a commit to scijava/incubator that referenced this issue Jun 30, 2020
This lines up better with expectation. It is otherwise impossible for a
user to create two equal (equivalent? See scijava/scijava#36)
OpRefs, even if they pass the exact same Objects to the matcher
@ctrueden ctrueden moved this from Backlog to To do in SciJava Ops Paper #1: Java Apr 29, 2021
@gselzer
Copy link
Member

gselzer commented May 13, 2021

Time commitment: 3 weeks

@ctrueden @hinerm let's have a chat about this sometime in the summer

@gselzer
Copy link
Member

gselzer commented Jun 21, 2022

@ctrueden let's keep this in mind when looking at imagej/imagej-ops#644

@ctrueden
Copy link
Member Author

See also related discussion on Image.sc Zulip.

@ctrueden
Copy link
Member Author

ctrueden commented Jun 23, 2022

@gselzer and I spoke in more detail again about this question today. Here is what we decided, some of which I am reiterating from the Zulip thread, and some of which is new decisions made.

Difference between "equality" and "equivalence": Equivalence – "In formal language theory, weak equivalence of two grammars means they generate the same set of strings, i.e. that the formal language they generate is the same. In compiler theory the notion is distinguished from strong (or structural) equivalence, which additionally means that the two parse trees are reasonably similar in that the same semantic interpretation can be assigned to both".

Here, we mostly use the term "equivalence" to avoid confusion with Java's formal equals method. The ideas being discussed here are not intended to be encoded into equals. We do not intend to imply that the forms of equivalence discussed here are completely analogous to the definitions above.

When it comes to op implementations, there are three kinds of equality/equivalence that occur to me, which I feel are worth discussing:

  1. The two ops implement the same algorithm, from a computer science theory perspective. For example, filter.gauss implements the idea of a Gaussian neighborhood filter. The details of that will depend on exactly which sorts of inputs are given (can you specify the neighborhood shape? can you specify the sigma value per image dimension, or only globally for the entire N-dimensional image? does this particular implementation even work beyond 2D or 3D? etc.)
  2. The two ops implement the same algorithm with the same input and output kinds, where a "kind" is an (extensible) collection of types. For example, filter.gauss(ImagePlus, double) -> ImagePlus and filter.gauss(RandomAccessibleInterval, RealType) -> RandomAccessibleInterval both boil down to filter.gauss(image, number) -> image. This is the concept of an OpListing as defined in Rewrite OpSearchResults as signatures imagej/imagej-ops#644, or an op simplification as implemented in Create simplification framework incubator#25.
  3. The two ops implement the same algorithm with the same input and output kinds, and produce equal output values, by some definition of equality on those output kinds.

Here are the things that @gselzer and I concluded in our conversation.

  • Equal algorithms. Equivalence concept (1) is achieved by giving the two ops the same name: filter.gauss. You can check whether two ops are equivalent in this "same theoretical algorithm" regard with something like op1.name().equals(op2.name()).

  • Equal simplifications. Equivalence concept (2) is achievable by checking that the two ops have the same simplification: op1.listing().equals(op2.listing()) in the case of the Ops v1 Rewrite OpSearchResults as signatures imagej/imagej-ops#644 work, or... some other way in the new SciJava Ops (I don't know off the top of my head as of this writing, but @gselzer feel free to comment on this if you know).

  • Equal outputs. Equivalence concept (3), the concept described in the summary of this issue, is very strict, and in practice will not be achievable in an extensible way, and we should not require this. We want to have, e.g., filter.gauss(RAI, double) backed by imglib2-algorithm, and filter.gauss(ndarray, float) backed by scikit-image, and the chances these produce identical results are low, even if we ensure they use e.g. the same neighborhood shape. There could be floating point rounding errors or differences across languages and platforms. I don't think we can even guarantee that you get exactly the same result, pixel for pixel, even for the same op across different operating system architectures.

    • For example, in Java on macOS, reading a JPEG into a BufferedImage object would (at least historically—and maybe still does) produce with slightly different values than doing so with non-Mac JVMs. This is a pragmatic limitation to the dream of "write once, run anywhere" that we need to accept and move on, with the understanding that we can at least get close to identical in the vast majority of cases if not all cases.
    • Another example is OpenCL: back in 2011, the ImageJ2 team ported a Java implementation of the Sobel filter to OpenCL, with exactly the same sequence of math operations backing it. The results still differed, because OpenCL's default floating point rounding strategy (which is configurable in OpenCL) is not the same as Java's hardcoded strategy. Fortunately, one of the available OpenCL strategies is the same as Java, so once we configured that, the results were identical. But there is no guarantee that different implementations of the exact same algorithm, even when kept as similar as possible, will produce exactly the same results.

A remaining question is then: what is lost by not requiring the "equal outputs" form of equivalence? Above, I claimed that it "has nice consequences, including facilitating reproducibility and caching." But does it really?

  • Reproducibility in SciJava Ops is achieved via the OpChain mechanism, where you can ask, after executing an op, which op implementations were actually used in practice at what versions? And then reuse that op chain to reexecute the same operations exactly at a later time. This provides reproducibility, even across heterogeneous environments to a limited extent, and across different object inputs, as long as all the ops in the declared chain exist in the current environment. And with e.g. OSGi, they could be made to exist in a broader swath of scenarios: you could download and load the classes of a particular implementation at the needed version in its own class loader, even if the runtime has already loaded that implementation at a different version.
  • For caching, it is sufficient to cache the computed output of a given op implementation for a given set of inputs. There is no need nor reason to cache the output value across op implementations, and in fact we could no longer do this if we discard this notion of two op implementations of the same name being required to produce equivalent outputs with equivalent inputs. This seems OK, though. I don't see a scenario where this sort of cross-caching is helpful in any significant way.

We also avoid thorny questions of what "equivalent output values" even means. Do we need output types to implement equals? They won't always do this. For example RandomAccessibleInterval implementations do not implement equals to mean "same dimensions and content"; you have to check them with some external method for this. It's only one kind of equality among many.

We also avoid the problems described above with ops that have the same input signature but different output type, notably the create ops that produce different sorts of objects in a no-args vacuum:

  • These create ops are equivalent by concept (1), same name/algorithm, but it's in sort of a vacuous way: they all share the job of creating an "empty container" object whose initial value(s) are not important.
  • Two such ops might be equivalent by concept (2), depending on the simplifications; for example create() -> DoubleType and create() -> LongType would be equivalent because both are create() -> number, but create() -> ARGBType would not be equivalent to the first two because ARGBType should probably not simplify to number.
  • Whether or not the outputs of these various create() objects are equal (with potential type conversion) according to some equality semantics now no longer matters, because we don't care about guaranteeing or enforcing or validating definition (3) anymore. And we hopefully now avoid the need to ever define a special equals op or similar to facilitate such enforcement/validation.

So with that, I think this issue is resolved. I am leaving it open, though, until the Ops documentation suitably explains this design decision, and these two supported concepts of equality. For concept (2), we also probably need a better name for the "equal simplifications". Maybe "same simple signature"?

@ctrueden ctrueden added the docs Issues relating to documentation of the Ops framework label Jun 23, 2022
@ctrueden ctrueden changed the title Decide on equality/equivalence semantics Document equality/equivalence semantics Jun 23, 2022
@gselzer
Copy link
Member

gselzer commented Jun 23, 2022

  • Equal simplifications. Equivalence concept (2) is achievable by checking that the two ops have the same simplification: op1.listing().equals(op2.listing()) in the case of the Ops v1 Rewrite OpSearchResults as signatures imagej/imagej-ops#644 work, or... some other way in the new SciJava Ops (I don't know off the top of my head as of this writing, but @gselzer feel free to comment on this if you know).

We will want something along the lines of imagej/imagej-ops#644 in SciJava Ops, I think. We still need ops to be searchable. See this Zulip post for the difference between these OpListings and Op simplification ala scijava/incubator#25

@gselzer
Copy link
Member

gselzer commented Sep 26, 2023

What remains to solve this issue is to write the conclusions from this discussion into Javadoc, probably within a package-info.java on the base SciJava Ops API package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Issues relating to documentation of the Ops framework
Projects
Development

Successfully merging a pull request may close this issue.

2 participants