Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import template #4367

Closed
wants to merge 39 commits into from
Closed

Import template #4367

wants to merge 39 commits into from

Conversation

ximenesuk
Copy link
Contributor

This PR is superseded by gh-4393

This is a rebase of gh-4125 with the deprecated flags -d and -r maintained. It also partially rebases gh-4239 (the other commit of this PR was already rebased in gh-4263).

This PR introduces a template to the import workflow that will allow for elements within the imported files path name to be used to determine the destination of the import. At present this is limited to the Dataset or Screen.

Testing

Import target

This if from my understanding of the code, @joshmoore may add further clarification in comments below

The CLI import should work with the following examples. The examples all apply to datasets but screens should be tested as well using either -r or -T Screen:....

The existing target flag should work:

$ bin/omero import -d 1 ~/Work/images/dv/SMN10ul03_R3D_D3D.dv

importing an image to Dataset with id=1.

The new target flag should work with an ID:

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:2

importing an image to Dataset with id=2.

The new target flag should work with name:

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:name:namey

importing an image to Dataset with name=namey. Note that if the a Dataset with that name does not exist it will be created. If it does exist it will be used. So, run this workflow multiple times with Datasets that do and don't exist.

More complicated names can be used, enclosing with quotes if necessary:

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:name:namey/namey
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:name:"New Dataset"

Finally, regular expressions can be used to match the Dataset name from the path name. Here the code (?<C1>.*) is providing the name.

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T "^.*images/(?<C1>.*?)"

would use a Dataset with name being the path following images/, ie just dv. And finally,

$ bin/ omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T "^.*Work/(?<C1>.*?)"

would use a Dataset with name being the path following Work/, ie images/dv.

The regex offers much more power than this but these basic examples should check the workflow.

Note that this does not yet address the group problem. The image will be imported into the default group and if a Dataset with that id does not exist in the default group the import will fail. For the name or regex argument this means that a new Dataset may be created in the default group rather than an existing one from another of the user's group being used.

Exclude on client path

A later commit (ximenesuk@0b0fce6) adds a new exclude option based on the client-side path. Thus

$ bin/omero import ~/Work/images/dv/IAGFP-Noc01_R3D.dv -- --exclude=clientpath
$ bin/omero import ~/Work/images/dv/IAGFP-Noc01_R3D.dv -- --exclude=clientpath

Assuming this image has not been imported from this path before then the first import should succeed while the second should fail:

...
2015-12-06 17:33:36,025 1959       [      main] INFO   .importer.exclusions.ClientPathExclusion - ClientPath match for filename: Users/colin/Work/images/dv/IAGFP-Noc01_R3D.dv

==> Summary
0 files uploaded, 0 filesets created, 0 images imported, 0 errors in 0:00:00.089

ximenesuk and others added 25 commits November 3, 2015 12:44
`Dataset:id`, `Dataset:name:foo` and similar `Screen`
values for `-T` now perform the same actions as `-r`
and `-d` as previously. Other values are attached to
the imported objects as a custom annotation with the
namespace: NSTARGETTEMPLATE
There is not enough information client-side to properly
parse out "dataset" or "screen" from the path. Bio-Formats
needs to have already run on the files and declared the
number of omitted levels.

As a solution, this attempts to push the parsing of the
file paths server-side. (Additionally, it moves to a regex-
based representation).

This however does not work as expected since the import
mechanism strips off all directories. Likely we don't
have much choice but to use the clientPath field.
Since server-side paths are likely to be stripped down to
the minimum, there's not enough context for the regex to
create a substantial dataset name.
The first found container with the given name will
be used.
This is required to support the existing, now deprecated, flags
which are used in gateway tests and CLI import tests.
Now with `--exclude=clientpath` its possible to use
the client-side absolute filepath to determine whether
or not an import has already taken place. This exclusion
does *not* check for the checksum of the target file,
but rather assumes that the client-side path is unique
enough to prevent false positives.
@jburel jburel added the develop label Dec 2, 2015
@ximenesuk
Copy link
Contributor Author

A couple of observations from me @joshmoore

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Screen:name:namey

will create a screen and then import the image as an orphan.

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Project:name:pname
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Plate:name:plate1

will each create the named containers but then import the image as an orphan.

I also had cases where the import failed due to the target but only after the upload. I will need to check my history to reproduce this but it may be worth thinking about what we do with these semi-imports.

return null;
}

String name = m.group("C1");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here C1 is used but in the unit tests for the pattern matching Container1 is used. The two should match. Should the more explicit Container rather than the short form (or maybe allow both somehow) be used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to go for the specific "Container". My hope had been to come up with a way to have "C1" + "C2" and in the screen case those are joined to "C2/C1" while in the PDI case it's Project "C2" and Dataset "C1". Could hold off on that for a RFE but these names are essentially public API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest changing to Container1 and then later adding Container2, unless that would break anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yes, Container1 is what I meant. 👍

@ximenesuk ximenesuk changed the title Import template dev Import template Dec 6, 2015
@joshmoore
Copy link
Member

My thinking on #4367 (comment) was to eventually define what makes the most sense ("oldest", "newest"?) but then provide further ways to help solve ambiguity:

  • Dataset:>name - choose the newest (tip: "greatest")
  • Dataset:<name - choose the oldest (tip:"least")
  • Dataset:!name - require unique, fail as here in Petr's example

@pwalczysko
Copy link
Member

@joshmoore : Agree with #4367 (comment) - in case of fail, fail but with nicer message please.

@ximenesuk
Copy link
Contributor Author

Re #4367 (comment) should we define Dataset:name to default to one of these options now? (I'd suggest < or > rather than !

@pwalczysko
Copy link
Member

@ximenesuk : Would suggest ot default to newest

@pwalczysko
Copy link
Member

Re: #4367 (comment) you need a dataset under to project to put the image into (cannot be in the project itself). When I run [pwalczysko@ls31619 ~/Downloads/OMERO.server-5.2.0-316-9996051-ice35-b175]$ bin/omero import ~/Desktop/Screen\ Shot\ 2015-11-27\ at\ 09.54.35.png -T Project:name:pname Dataset:name:dname then the Dataset is not recognized as an argument. This means I cannot

  • create a Dataset on import and link it to an existing Project
  • create a Project and Dataset on import and link the Dataset under the Project, then import into this

Do I get this right ? These two are basic workflows covered in the importer though.

@pwalczysko
Copy link
Member

Re: #4367 (comment) with respect to Screens : the creation of a screen should be allowed on import, and of course immediate import of the images into this screen, as long as they are in SPW format. Regarding the plate though, do we ever allow to import something into the existing plate ? Not using the UI Importer for sure - plate is as it is, no additions possible.

@ximenesuk
Copy link
Contributor Author

@pwalczysko re #4367 (comment) yes, I think you have it right. At the moment this PR is limited to creating Datasets not Projects. The intention is to expand the functionality later.

Do you mean the Insight importer? If so then this PR is not really trying to replicate that workflow.

@pwalczysko
Copy link
Member

Do you mean the Insight importer? If so then this PR is not really trying to replicate that workflow.

Yes, I do. Okay, understood.

@ximenesuk
Copy link
Contributor Author

Re #4367 (comment) I was not trying to import into a Plate but demonstrate a bug (You could use -T Image:name:namey and get even more interesting results!). This bug is now fixed locally and will push later.

Trying to import an image into a Screen is also a bug but is not yet fixed so the Image will be an orphan and an empty Screen will be created.

@pwalczysko
Copy link
Member

All the other functionalities mentioned in the header of the PR are working. Interestingly, I cannot combine exclude clientpath with the -T options apparently [pwalczysko@ls31619 ~/Downloads/OMERO.server-5.2.0-316-9996051-ice35-b175]$ bin/omero import /Users/pwalczysko/Desktop/Screen\ Shot\ 2015-11-27\ at\ 09.55.30.png -T "^.*pwalczysko/(?<C1>.*?)" -- --exclude=clientpath fails with unrecognized arguments: -- --exclude=clientpath ?

@pwalczysko
Copy link
Member

Turns out that the #4367 (comment) comment is just a problem with syntax - when the double-dash is used in front of the -T part (and not in front of the exclude part) then it works.

@ximenesuk
Copy link
Contributor Author

Confirmed with @pwalczysko that

$ bin/omero import /Users/pwalczysko/Desktop/Screen\ Shot\ 2015-11-27\ at\ 09.55.30.png -- -T "^.*pwalczysko/(?<C1>.*?)" --exclude=clientpath

does work. (Which raises that idea of passing everything through to Java to remove these minor gotchas).

@ximenesuk
Copy link
Contributor Author

The commits after @pwalczysko comments address three issues:

First the name of the container in the regex version of the command. Container1 must now be specified rather than C1. This brings two parts of the codebase into line and uses a more explicit name. So:

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T "^.*images/(?<Container1>.*?)"

Second, it limits containers, at present, to Dataset and Screen, and so,

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Project:name:pname

should fail. It should not import the image as an orphan and not create the Project.

And, finally,

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:name:namey

will import into the most recent Dataset with that name, ie the highest ID.

The bulk of the commits are refactoring and adding tests and thus the tests should pass.

@ximenesuk
Copy link
Contributor Author

I have code on my local branch to take <name, >name and !name as qualified name signifiers. As all three characters are special to the shell this means their use has to be quoted, thus:

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T 'Dataset:!name:namey'

I'm happy to push this if this usage is okay or do we want to consider a letter-based signifier? Something like fname, lname and uname?

/cc @joshmoore @pwalczysko

@ximenesuk
Copy link
Contributor Author

@pwalczysko thanks for your testing on this. I'm closing the PR for 5.2.1 as following discussion with @joshmoore there needs to be some tweaks to the argument forms to fix some of these problems more robustly and avoid breaking the "API" later. Once 5.2.1 is out I'll re-open this for 5.2.2.

@ximenesuk ximenesuk closed this Dec 11, 2015
@ximenesuk ximenesuk mentioned this pull request Dec 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants