New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deep copy of original files #20
Comments
I am not sure about the exact implementation of deep copy, but atm I think we are trying to solve 4 issues by this:
I think we could ask @imunro about the setup in their institute because they were trying to tackle the Public group case seriously to have some concrete user case. |
As far as publication goes our current approach is to move data for publication into the public group. |
@mtbc - I will try and formulate the use-cases I have in a clear format in the next day or two and add these. |
This is how I conceptualise Deep vs Shallow copy - (I may well be wrong!) Deep Copy Use Cases Publish 1:
Publish 2:
Analysis:
Notes: Once we have Deep Copy then the UI terminology will need to make clear the difference between them. It is not transparent from the UI what happens when I move a Shallow Copy (linked) to a different group from the original - both “copies” disappear - without any warning. Questions:
|
@mtbc - still need to think some more about all this. |
From the point of view of what's technically implemented on the server (which may not be how things should be for the end user), deep copy is possible by other users even in read-only groups, so long as they don't expect the resulting image to be in the original owner's dataset, as the copy is owned by the copier. |
@pwalczysko - is this what Ian needs? |
I think we need to include here what @imunro acutally wants:
Sorry, first commented, then read your above comment @gusferguson |
I could use some discussion about "shallow copy". The "change to one is reflected in the other" sounds new to me, and also sounds quite expensive. |
@joshmoore - I might be causing confusion by using "shallow copy" - I just mean the current "link" copy that exists already. |
Ah, i.e. an image is linked into two datasets at the same time? Ok. I'll try to adjust my mental map. |
From what I can see that looks as if it would satisfy the requests we've had. Basically people want a method of easily making their data public e.g as supporting data when a paper is published. |
@imunro - thanks for the clarification - I have added that as Publish use case 2. |
@joshmoore: Does the above introduce any new dimensions? I am thinking we are facing exactly the issues we feared. 😃 I wonder how to proceed. |
projection will probably fall into the category of deep copy: at least at the graph level (a new set of raw data is created) |
Absolutely. If the pixels service weren't so annoyingly featureful I'd have fixed this already! /-: |
We can probably allow deep-copy of files without any database changes. However, regarding data duplication in the server's binary repository: separately from the pyramid duplication fear above, if we are to avoid data duplication in the managed repository then there is the question of what happens when the original file is deleted. With the database change to a Boolean "is-a-copy" column that removes that row from the uniqueness constraint,
Without any database changes,
So, UX questions focus on: If the original is deleted, how good or bad is it if the new copy's pixel data (or copied attachments) disappear if the originals are deleted? Does it suffice if at least the copier knows which kind of copy they got? (A question for @joshmoore might be: do you see a solution to pyramid duplication that also requires database changes? E.g., a pointer from a |
Another potential factor: how well would this work with an object data store? |
This is very very bad. If I delete the original and the copy is deleted as well, then this is not a deep copy by any stretch of imagination. The user will be completely unable to absorb a concept of any in-betweens regarding deep copy (direct experience of the situation which we have now). |
Couldn't agree more! |
if the user does a deep copy, he/she will not expect to be affected by the deletion to the original. |
So, my current guess is:
However, I am not sure that these are exactly what we will be glad we did. Chat some more? Go for it? Postpone to >5.3? |
(I assume we do still need to permit deletion of the original.) |
If we don't want to prevent the deletion of the original, then I'd say the other recent comments above amount to a hard-linked mrepo-internal re-import which we could make available without the DB changes. The primary disadvantages would be:
From my side, I think we're still looking for features (or API breakages) which would require the DB changes, i.e. finding the related original fileset or as Mark asks, prevent an operation like delete on the original. Do we want any relationship between old & new? NB: The addition of |
is a big problem I'd have thought: after years using the initial volume the admin adds a new and suddenly nobody can deep-copy any more.
|
So, while it requires a run of the import machinery, there's a reimport workaround available, perhaps best by a server-side script, broadly,
Alternatively, the duplicate machinery is already just about there and its genericity will tend to cover edge cases but it needs,
I'd like to think we're relying less on server-built pixel pyramids anyway so can brush that duplication issue under the rug. 😃 |
In chatting with @joshmoore a third option came to my mind, broadly,
This does not require database changes but should be able to at least copy filesets, attachments, thumbnails. This might be the best in terms of trading off implementation effort with outcome. |
In trying to figure where in the managed repository to put the copy of the original files: the |
I would assume the repository has examples of going from |
And the reverse too? Let's see what we can find. 👍 |
( |
Perhaps awkwardly, further |
It's true they don't like to be nested. Though once one is active other methods shouldn't need to call them. If a separate transaction is necessary, then only |
Aha, that might be exactly the clue I needed to get things working, even if I have to duplicate some |
Quite possibly. The RepositoryDao, I think, has two copies of several methods (or did) for just that reason. |
Current interesting problem is backing out from failures: how to track what It could be useful for another devspace to be created for testing deep copy. Bugs might mess up the binary repository or database. Unless it's easy to restore merge-ci's data? cc: @pwalczysko |
Not very easy. Moderate difficulty - needs manual reimports (although there are scripts, but these need to be run manually). |
In considering how to provide access to |
3 workflows for duplicate:
In short, all three flags highlighted in #20 (comment) comment are under circumstances interesting and useful in the above four workflows. @joshmoore @mtbc does that answer the question ? |
(A quick extra idea I had during the call: what about |
I would be afraid that it is very difficult to explain the behaviour of this option to the users, but maybe someone can come up with a scenario where this flag would be of advantage to a user ? |
Seems to me there are basically 2 workflows:
The only issue might be if you do 2, then decide to move it to another group later. |
@will-moore If I take your lead on the workflow listing, then i would see a sub-option of your option 2. (say, 2b) with
Option 2b is very interesting for saving space for FileAnnotations I would imagine, but also it could help to keep overview by not proliferating tags etc. |
@pwalczysko Your 2b is the same as my 2 (do not duplicate annotations). In the 'don't duplicate annotations' scenario, do we mean ALL annotations? E.g. if I duplicate an image with Key-Value pairs, then I edit the KV pairs on one image, would they update on the other image? I probably wouldn't expect that. Same for Comments (although you can't edit in webclient) and Ratings. Files and Tags are OK not to duplicate. |
@mtbc might correct me, I think the answer is yes. Not sure if any granularity is possible there @mtbc ?
Yes, to make the annotations yours, as you can duplicate objects of other users in 3 types of groups. The duplicate gives you back a nice, one-owner tree with one-owner annotations and one-owner links.
@will-moore that depends on whether or not you chose option 1 or option 2b. Btw, the option 2 (or 2a ...) was thought by me as "do not duplicate the annotations at all, i.e. have an unannotated duplicate image/Dataset/project as a result.
This assumes the granularity which I am not sure is a given, see my first sentence in this comment... cc @mtbc |
One more usecase of a granular exclusion of certain annotations duplicates can be seen in ome/omero-blitz#100 (comment) - ROIs might simply take too long to duplicate, and the user might choose not to duplicate tham... |
By type is available, e.g., duplication can treat tags differently from comments. |
OMERO's 5.2 branch already offers the
Duplicate
request to have the server copy model subgraphs; it is tested by https://github.com/openmicroscopy/openmicroscopy/blob/develop/components/tools/OmeroJava/test/integration/DuplicationTest.java. However, copying images is disappointing because the pixel data is missing because it depends on original files which are uniquely named and singly owned. For space reasons we don't want to actually have to copy the files in the binary repository but we probably want the duplicator to own their own duplicate which may be moved to different groups from the original.The initial use case, for which the existing duplicator probably already suffices, is described in http://trac.openmicroscopy.org/ome/ticket/11532 where it would be possible for scripts to duplicate instruments and suchlike instead of sharing them with derived images or, even better, the duplication could be automatically done by need as described by https://trello.com/c/ISnICsrC/16-auto-duplicate-in-graph-operations.
A more general deep copy could be arranged if we could duplicate original files. For instance, we could allow marked "copies" of original files to have the same name but be read-only, deleting the underlying file only when the last of that name is deleted from the database. Or, we could actually use filesystem links to seemingly copy the file, except that the new copy would have to be on the same volume if hardlinking, or would be lost if softlinking when the original is deleted. Then, there is the matter of pyramids: we want to avoid generating duplicate pyramids, but how can the owner of the duplicate have permissions to find the original's pyramids?
This issue exists to collect interesting points regarding:
The text was updated successfully, but these errors were encountered: