-
Notifications
You must be signed in to change notification settings - Fork 6
Data provenance mechanism #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d176eb6 to
21b3fb5
Compare
Possible solution: make the |
This commmit makes two performance improvements. Firstly, it removes the need to store the inputs. What is the point of storing the inputs, if you know that the inputs are never modified? If you want to know how a particular Object came to be the way it is, just look at the Ops that modified the Object i.e. the Ops for which this Object was an output Secondly, change the backing Object of the OpHistory to a LinkedDeque. This allows us to add faster to the end, removing the need to iterate through the history in order to add something new
This commit removes the two new calls from the OpExecutionSummary API, resulting in less overhead in maintaining the Op history.
Not sure how this slipped past in the original work, but there were issues in the naming of Javassist classes generated by OpMethodInfos.
There are benefits to reworking the caching system, and to refactor the functionality of findOpInstance. The goal is to have findOpInstance place Ops DIRECTLY into the cache, and NOT return anything. Then Ops can be returned directly from the cache. This gives us more control over Op wrapping and retains a more "raw" version of the Op in the cache. We then push Op wrapping to the cache. TODO: Use the hints to only record executions of top-level Ops. What works is to be able to re-wrap an Op with a new UUID. What we DON'T yet do is re-wrap its dependencies with the same UUID.
We want to make sure that all external calls to the matcher generate a new UUID. Since we want to make a MatchingConditions anyways, let's just make all public API generate a MatchingConditions, which will generate a new UUID Also fixes a bug in the DependencyMatching Hint
This creates ONE OpHistory per context, and removes the static nature of the class in favor of state kept within the Context.
2c3250b to
ba2e33e
Compare
@hinerm thanks for the suggestion, I just implemented it. |
If
For lack of a better option, I say we keep this for now. @hinerm @ctrueden if you have other thoughts, feel free to express them. Otherwise, let me know what you think about merging @hinerm |
We instead create an OpHistoryService that can provide an OpHistory. This can be used as long as we have an OpService
ctrueden
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 🎸 🏆 But change everything. Just kidding, change some things. 😉
For others: we are planning to merge this as is, but later, after reviewing the other PRs in the chain. They will all merge as is (barring major issues), but then we'll have another iteration of work on top of it all, based on these reviews. 👍
scijava/scijava-ops/src/main/java/org/scijava/ops/conversionLoss/LossReporterWrapper.java
Show resolved
Hide resolved
scijava/scijava-ops/src/main/java/org/scijava/ops/hints/BaseOpHints.java
Show resolved
Hide resolved
scijava/scijava-ops/src/main/java/org/scijava/ops/provenance/OpHistoryService.java
Show resolved
Hide resolved
| * @return true iff {@code e} was successfully logged | ||
| */ | ||
| @Override | ||
| public boolean addExecution(OpExecutionSummary e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed how simplification and adaptation should generate IDs for their OpInfos such that they can be reconstructed from those IDs later. Therefore, we need to make sure we test the history functionality with simplification, adaptation, and any other cases relating to ID generation and reexecution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDs added with 9fcbc9e
This PR introduces a new mechanism,
OpHistory, to provide data provenance for Ops and Op outputs involved with the matcher. ThisOpHistoryprovides a static repository of information about Op executions. For each top-level call to the matcher, a universally-unique identifier (UUID) is created. Then, Ops stores:Graph<OpInfo>describing the hierarchy of Ops involved in every return produced by theDefaultOpEnvironmentOpWrapperresult (if used)OpWrapperwas utilized), along with the output producedUsing these recorded Objects, we are able to then get answers for the questions:
OpHistory.executionsUpon(Object o): For a givenObjectoutput, what Op(s) were responsible for creating/mutating thisObject?OpHistory.opExecutionChain(Object op): For a givenObjectop, whichOpInfo(s) were tasked with creating/populating theOpDependencyneeds to create thisObject?The answers to these questions allow us to obtain the
OpInfos (and thus the backing algorithms) needed to construct an Op output in a reproducible way.This Op also refactors the outputs of various
DefaultOpEnvironmentto better facilitate the population of theOpHistory. WhenDefaultOpEnvironment.opis called, the general scheme that should be followed is:OpInfoin the op cacheThis new process allows us to store the "raw" Op in the cache, instead of the wrapped Op, better facilitating Op cache hits.
TODO:
HACK, determine when/how the history should be resetOpHistory, possibly using scijava-persist : persisting objects with Gson and scijava discovery mechanism #30. The usage ofUUIDshould aid in the storage.OpEnvironment. Should data provenance be something that allOpEnvironments implement, or something limited to ourDefaultOpEnvironment?Graph<OpInfo>forOpHistory.opExecutionChain: this is exposing Google Guava data structures, which is not ideal. We need to be able to return a Tree of some sort, and this seems to be the best answer.Closes scijava/scijava#53