-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have DuplicateI able to handle underlying files #100
Conversation
Also removes a couple of unlikely, tricky legal targets.
Needed for fixing paths from old fileset to new duplicate.
Bug
Possibly just a problem of the cli plugin ? I guess this will be exposed though as the feature is definitely going to be used in cross-group contexts. Edit: No, this is a bug. I cannot work around this easily. Note that the group with id Edit 2: Excerpt from Blitz log
|
Does it work if you first log into that group with |
No, please see my edit of #100 (comment) |
Very odd. I wonder why I never ran into this locally. I'll investigate. Could you provide the full stack trace around the error from |
Following workflow circumvented the error in #100 (comment)
This is not fixing the problem, but gives some insights hopefully. Edit: When trying to duplicate the "old" dataset (id 1, as in comment #100 (comment) , I get the same error as in comment #100 (comment) - thus, the import fixed only the freshly imported dataset behaviour, not the old dataset duplication |
Interesting question is why |
Tomorrow I'll make an artifact with #98 included to try to get more information. Update: Done. |
It looks as though the |
The problem with the Dataset:1 duplication and error reporting the group to be wrong #100 (comment) is fixed now. |
For now, tested duplication of P/D/I in one-owner hierarchies. All gets duplicated, also ROIs and attachments. The images when deleted, do not influence the copies, both when the original is deleted or the copy. Brilliiant. Will try to test more complex setups like
tomorrow. |
User has to be logged into the correct group to attempt a duplicate using the CLI plugin. RA group: Simple members
RO group: Simple members
P group Simple members
Subsequent moves of the duplicated objects into another group by duplicators was successful as expected. Do not see a mistake in that. |
Very probably Checking the DB and filesystem: If you use the web client right-pane for checking where in the managed repository an image's files are, you should see from there that the duplicate is not where the original is. But if you do Whereas, you should see In |
Duplicating a MIF - probably also precedes this PR - it would be nice to have a "duplicate that MIF, pick the images as necessary" option (kind of Move All, as implemented for MiFs in web). The workflow is tedious also because it does not tell you all the images necessary, instead leaves you to re-run the command as many times as there are images inside the MIF, always reporting the failure on the next subsequent image only, it is up to the user to add them all into one single command. |
I can see that the duplicates are in folders near to the original, but with subfolder with naming derived by date, such as.
for original and
for duplicate. Indeed, the number of hard links increased by one after the duplication.
|
for original
for duplicate These paths are fairly similar, as indicated above. They match the report of the webclient and also they match what I found in ManagedRepo in #100 (comment) |
It should work to duplicate |
Great, thank you. I think an obvious RFE (for the CLI client ?) would be to have the hint you just gave me in the output of the error - do you have an idea where should I create an issue or what needs to be changed in code to get such helpful hint ? |
This is a new aspect of the broader https://trello.com/c/F4cAE0k3/358-graphexception-subclasses - between the CLI, QA and UX boards, Trello captures many of the user-facing issues with graphs behavior. (Basically, far better to use the type system to communicate helpful information in a structured way that clients can easily interpret. Faffing with these textual messages generated from deep in the server is never going to be the route to happy users; it's a bug if they have to see them at all because they're not being shown anything better!) |
Duplicating in-place images principally works as expected.
|
Considering Ad 3 above, some option of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments from my first few reads.
* @return the expanded template path | ||
* @throws ServerError if the new path could not be created | ||
*/ | ||
public FsFile createTemplatePath(Object consistentData, EventContext ctx, ServiceFactory sf, Session session, SqlAction sql) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method seems to do a surprising amount of heavy lifting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. It's basically a from-inside-executor translation of the existing one. Though, looking at it, I've not seen clear low-hanging fruit for refactoring.
final SqlAction sql = helper.getSql(); | ||
for (final OriginalFile wrongPathDuplicateFile : wrongPathDuplicateFiles) { | ||
String fromPathName = wrongPathDuplicateFile.getPath() + wrongPathDuplicateFile.getName(); | ||
final boolean isLog = fromPathName.endsWith(".log"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do log files ever get gzipped? cc: @sbesson
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I've ever seen; there is also the worry about casting the net too wide in trying to find fileset files outside the fileset's directory. (Would be much easier if people would just store their logfiles, pyramids, uncompressed TIFFs, etc. on a compressing filesystem!) At least this part should be easily tweaked as the occasion demands.
Files.createLink(to, from); | ||
LOGGER.debug("linked {} to {}", from, to); | ||
} catch (FileSystemException fse) { | ||
LOGGER.debug("failed to link {} to {}", from, to, fse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should perhaps be at a higher level. Could it point to some form of misconfiguration?
Additionally, I still see an RFE for disallowing non-symlinks. The necessary rollback is in place in case this block threw an exception, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Limit increase in disk usage" could definitely be a followup PR, though with maybe a subclass of the request type to non-breakingly specify that limit. Rollback should be good but I definitely encouraging testing that!
Update: Also, configuring, "What is linked how or copied." E.g., for synthetic images, not in the managed repository at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I could create an issue elaborating a little on the further work I suggest in the PR description. Update: Now #106.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To your other question, I'd expect this case to be most likely to occur for managed repositories that are split across multiple partitions, that's why I left it at DEBUG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File copies in the managed repository are now quantified by mtbc@e2eb9bea at INFO level: "file bytes copied ... MANAGED ..."
} | ||
if (PublicRepositoryI.DIRECTORY_MIMETYPE.equals(file.getMimetype())) { | ||
/* Do not attempt to duplicate directories. */ | ||
LOGGER.debug("not directly duplicating OriginalFile:{} on filesystem because it is a directory", file.getId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the impact of not duplicating the directories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplifies the code and adapts existing code. 😃 We can focus on just translating filenames and creating their parents (with PublicRepositoryI.makeCheckedDirs
and parents==true
) without also separately having to recognize and translate partial paths to their new location.
@@ -660,6 +827,171 @@ private void linkToNewDuplicates() throws GraphException { | |||
} | |||
} | |||
|
|||
/** | |||
* Handles duplication of a fileset, including creating filesystem directories for the new template path and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method only handles the on-file duplication or also the database rows? When it exits early, will the Fileset still be created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll create OriginalFile
rows for directories down to the template prefix but not below, those are made later; under the hood it all uses PublicRepositoryI.makeCheckedDirs
which does repositoryDao.register
so directories are created on the filesystem at the same time that they get noted in the database. In the event of rollback, not only should the transaction forget those rows, but the new directories should also be removed by the loop in abort()
.
Suggested further testing:
|
@@ -1086,6 +1123,12 @@ private void abort() { | |||
@Override | |||
public void finish() { | |||
managedRepository.clearFileCreationListener(); | |||
if (diskUsage.isEmpty()) { | |||
LOGGER.info("no file bytes copied"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. This would be nice in the response, but I assume that's sufficiently troublesome water that it should be postponed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least for this PR I'm trying to avoid adjusting the API. I also figure we might need a round of UX thinking to be happy we're breaking it down in the desired way before exposing it in that form.
@mtbc would you please know why is this behavour occurring ? Edit: The images which perform this behaviour are
They might be a product of a script, but I thought the one of them is a product of another. They elicit the error even singly, additionally to erroring when passed into one command as above. |
Is there perhaps an image that was projected from and put in a different dataset or somesuch? My guess is very much that there is some "intimately linked" image not included in the duplication. Does |
Maybe try it with images that you know for sure has one being the product of the other and no other images are involved? |
It is probably as you suspect. |
Retest as per #100 (comment) Duplication of an image with tags, attachments, comments, ratings and KVPs went as expected. The subsequent move into aoother group showed that the entities are independent from the original. Checking of paths For the original
For the copy
This matches with the report of OMERO.web RHP. The report in the Blitz log seems to be okay too, see below (did copying on one image with one FileAttachment on two occasions, the timestamp matches except merge-ci seems to be on GMT, not BST.
|
Interestingly, the above screen duplicates fine in case the owner of the read-only-1 group does it. Note though that this is probably not a principial problem with plates, see
|
I'll push a commit that may fix this. |
see AbstractFileSystemService.createSubpath
The retest of the bug #100 (comment) after last commit is passing
The plate looks okay also in webclient, and was successfully moved to another group. |
Thank you for all the testing. |
Alas, the error message in #100 (comment) is not as easily improved as I had initially hoped, it's very much a matter of where in the model subgraph an issue was detected, by machinery that behaves rather agnostically of the domain model. We know users (partly due to clients) probably know their image ID better than their channel IDs but the graphs machinery doesn't favor one over the other. It's easy to find the associated image ID given the channel ID, e.g., |
In addressing ome/design#20 extend
DuplicateI
so that it can now handle various underlying files, allowing fuller duplication of images among other things.Functional testing:
Code review: I aimed for minimal changes to the repository implementations. A larger review of API and its usage could follow.
Subsequent PRs could include configurability for: