New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import template #4367
Import template #4367
Conversation
`Dataset:id`, `Dataset:name:foo` and similar `Screen` values for `-T` now perform the same actions as `-r` and `-d` as previously. Other values are attached to the imported objects as a custom annotation with the namespace: NSTARGETTEMPLATE
There is not enough information client-side to properly parse out "dataset" or "screen" from the path. Bio-Formats needs to have already run on the files and declared the number of omitted levels. As a solution, this attempts to push the parsing of the file paths server-side. (Additionally, it moves to a regex- based representation). This however does not work as expected since the import mechanism strips off all directories. Likely we don't have much choice but to use the clientPath field.
Since server-side paths are likely to be stripped down to the minimum, there's not enough context for the regex to create a substantial dataset name.
The first found container with the given name will be used.
This is required to support the existing, now deprecated, flags which are used in gateway tests and CLI import tests.
Now with `--exclude=clientpath` its possible to use the client-side absolute filepath to determine whether or not an import has already taken place. This exclusion does *not* check for the checksum of the target file, but rather assumes that the client-side path is unique enough to prevent false positives.
A couple of observations from me @joshmoore
will create a screen and then import the image as an orphan.
will each create the named containers but then import the image as an orphan. I also had cases where the import failed due to the target but only after the upload. I will need to check my history to reproduce this but it may be worth thinking about what we do with these semi-imports. |
return null; | ||
} | ||
|
||
String name = m.group("C1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here C1
is used but in the unit tests for the pattern matching Container1
is used. The two should match. Should the more explicit Container
rather than the short form (or maybe allow both somehow) be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to go for the specific "Container". My hope had been to come up with a way to have "C1" + "C2" and in the screen case those are joined to "C2/C1" while in the PDI case it's Project "C2" and Dataset "C1". Could hold off on that for a RFE but these names are essentially public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest changing to Container1 and then later adding Container2, unless that would break anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, yes, Container1
is what I meant. 👍
My thinking on #4367 (comment) was to eventually define what makes the most sense ("oldest", "newest"?) but then provide further ways to help solve ambiguity:
|
@joshmoore : Agree with #4367 (comment) - in case of fail, fail but with nicer message please. |
Re #4367 (comment) should we define |
@ximenesuk : Would suggest ot default to |
Re: #4367 (comment) you need a dataset under to project to put the image into (cannot be in the project itself). When I run
Do I get this right ? These two are basic workflows covered in the importer though. |
Re: #4367 (comment) with respect to Screens : the creation of a screen should be allowed on import, and of course immediate import of the images into this screen, as long as they are in SPW format. Regarding the plate though, do we ever allow to import something into the existing plate ? Not using the UI Importer for sure - plate is as it is, no additions possible. |
@pwalczysko re #4367 (comment) yes, I think you have it right. At the moment this PR is limited to creating Datasets not Projects. The intention is to expand the functionality later. Do you mean the Insight importer? If so then this PR is not really trying to replicate that workflow. |
Yes, I do. Okay, understood. |
Re #4367 (comment) I was not trying to import into a Plate but demonstrate a bug (You could use Trying to import an image into a Screen is also a bug but is not yet fixed so the Image will be an orphan and an empty Screen will be created. |
All the other functionalities mentioned in the header of the PR are working. Interestingly, I cannot combine |
Turns out that the #4367 (comment) comment is just a problem with syntax - when the double-dash is used in front of the |
Confirmed with @pwalczysko that
does work. (Which raises that idea of passing everything through to Java to remove these minor gotchas). |
This test simply supports one of the targets being chosen but not which one.
The commits after @pwalczysko comments address three issues: First the name of the container in the regex version of the command.
Second, it limits containers, at present, to Dataset and Screen, and so,
should fail. It should not import the image as an orphan and not create the Project. And, finally,
will import into the most recent Dataset with that name, ie the highest ID. The bulk of the commits are refactoring and adding tests and thus the tests should pass. |
I have code on my local branch to take
I'm happy to push this if this usage is okay or do we want to consider a letter-based signifier? Something like |
@pwalczysko thanks for your testing on this. I'm closing the PR for 5.2.1 as following discussion with @joshmoore there needs to be some tweaks to the argument forms to fix some of these problems more robustly and avoid breaking the "API" later. Once 5.2.1 is out I'll re-open this for 5.2.2. |
This PR is superseded by gh-4393
This is a rebase of gh-4125 with the deprecated flags
-d
and-r
maintained. It also partially rebases gh-4239 (the other commit of this PR was already rebased in gh-4263).This PR introduces a template to the import workflow that will allow for elements within the imported files path name to be used to determine the destination of the import. At present this is limited to the Dataset or Screen.
Testing
Import target
This if from my understanding of the code, @joshmoore may add further clarification in comments below
The CLI import should work with the following examples. The examples all apply to datasets but screens should be tested as well using either
-r
or-T Screen:...
.The existing target flag should work:
importing an image to Dataset with id=1.
The new target flag should work with an ID:
importing an image to Dataset with id=2.
The new target flag should work with name:
importing an image to Dataset with name=namey. Note that if the a Dataset with that name does not exist it will be created. If it does exist it will be used. So, run this workflow multiple times with Datasets that do and don't exist.
More complicated names can be used, enclosing with quotes if necessary:
Finally, regular expressions can be used to match the Dataset name from the path name. Here the code
(?<C1>.*)
is providing the name.would use a Dataset with name being the path following
images/
, ie justdv
. And finally,would use a Dataset with name being the path following
Work/
, ieimages/dv
.The regex offers much more power than this but these basic examples should check the workflow.
Note that this does not yet address the group problem. The image will be imported into the default group and if a Dataset with that id does not exist in the default group the import will fail. For the name or regex argument this means that a new Dataset may be created in the default group rather than an existing one from another of the user's group being used.
Exclude on client path
A later commit (ximenesuk@0b0fce6) adds a new exclude option based on the client-side path. Thus
Assuming this image has not been imported from this path before then the first import should succeed while the second should fail: