Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import template #4393

Merged
merged 56 commits into from
Feb 9, 2016
Merged

Import template #4393

merged 56 commits into from
Feb 9, 2016

Conversation

ximenesuk
Copy link
Contributor

This is a rebase of gh-4125 with the deprecated flags -d and -r maintained. It also partially rebases gh-4239 (the other commit of this PR was already rebased in gh-4263). This PR supersedes gh-4367.

This PR introduces a template to the import workflow that will allow for elements within the imported files path name to be used to determine the destination of the import. At present this is limited to the Dataset or Screen.

The later commits on this PR formalise the template notation to:

<action or Class>:<discriminator>:<pattern>

or

<action or Class>:<pattern>

where the discriminator is implied. Note that if the pattern needs to contain a colon then an explicit discriminator must be used. The discriminator may have a further qualifying character as illustrated in the testing examples below.

Currently the following actions and classes are supported, regex, Dataset and Screen. The intention is that further classes will be defined and this also allows user-defined classes to be used.

For Dataset and Screen the currently supported discriminators and name and id, the implied discriminator is id. For regex the only discriminator is name and implied discriminator is the qualified form of this +name.

Testing

Import target

This if from my understanding of the code, @joshmoore may add further clarification in comments below

The CLI import should work with the following examples. The examples all apply to datasets but screens should be tested as well using either -r or -T Screen:....

The existing target flag should work:

$ bin/omero import -d 1 ~/Work/images/dv/SMN10ul03_R3D_D3D.dv

importing an image to Dataset with id=1.

The new target flag should work with an ID:

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:2

importing an image to Dataset with id=2. The ID can be explicitely stated:

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:id:2

The new target flag should work with the container name:

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:name:aName

importing an image to Dataset with name=aName. Note that if the a Dataset with that name does not exist it will be created. If it does exist it will be used. If several Datasets with this name exist than the most recent will be used. So, run this workflow multiple times with Datasets that do and don't exist.

More complicated names can be used, enclosing with quotes if necessary:

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:name:aNameWith/aSlash
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:name:"New Dataset"

Additionally the name of the container can be qualified to use the most recent (+), the oldest (-), to only import if there is a unique target (%) or to create a new container even if one with the correct name already exists (@):

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:+name:aName
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:-name:aName
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:%name:aName
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:@name:aName

In the third case the import should fail if multiple datasets with that name exist. In all cases a new Dataset should be created is none exists. In the last case a new Dataset should be created even if one or more already exist.

Finally, regular expressions can be used to match the Dataset name from the path name. Here the code (?<Container1>.*) is providing the name.

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T "regex:^.*images/(?<Container1>.*?)"

would use a Dataset with name being the path following images/, ie just dv.

As in the examples above the name discriminator can be explicitly used and also qualified

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T "regex:+name:^.*images/(?<Container1>.*?)"
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T "regex:-name:^.*images/(?<Container1>.*?)"
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T "regex:%name:^.*images/(?<Container1>.*?)"
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T "regex:@name:^.*images/(?<Container1>.*?)"

These work as in the Dataset examples above. But, in the third case the import will succeed if multiple datasets with that name exist but the import will be orphaned.

And finally,

$ bin/ omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T "regex:^.*Work/(?<Container1>.*?)"

would use a Dataset with name being the path following Work/, ie images/dv.

The regex offers much more power than this but these basic examples should check the workflow. Note that the explicit use of regex is not required but the colon is, so:

$ bin/ omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T ":^.*Work/(?<Container1>.*?)"

is the same as the first examples immediately above.

Note that this does not yet address the group problem. The image will be imported into the default group and if a Dataset with that id does not exist in the default group the import will fail. For the name or regex argument this means that a new Dataset may be created in the default group rather than an existing one from another of the user's group being used.

@joshmoore this re-formulation is in line with what we discussed (I think). It does mean that there could be significant refactoring. The TargetBuilder could parse the argument and pass discriminator and pattern to the appropriate classes, rather than letting the classes each do equivalent parsing.

Also, it would be possible to have the template target import fail client side rather than succeed server side. But there may be a reason you chose to do the container resolution server-side for the regex rather than client side as in the model target.

However, these potential changes can be made later.

Exclude on client path

One commit buried in this PR (ximenesuk@0b0fce6) adds a new exclude option based on the client-side path. Thus

$ bin/omero import ~/Work/images/dv/IAGFP-Noc01_R3D.dv -- --exclude=clientpath
$ bin/omero import ~/Work/images/dv/IAGFP-Noc01_R3D.dv -- --exclude=clientpath

Assuming this image has not been imported from this path before then the first import should succeed while the second should fail:

...
2015-12-06 17:33:36,025 1959       [      main] INFO   .importer.exclusions.ClientPathExclusion - ClientPath match for filename: Users/colin/Work/images/dv/IAGFP-Noc01_R3D.dv

==> Summary
0 files uploaded, 0 filesets created, 0 images imported, 0 errors in 0:00:00.089

ximenesuk and others added 30 commits December 17, 2015 14:02
`Dataset:id`, `Dataset:name:foo` and similar `Screen`
values for `-T` now perform the same actions as `-r`
and `-d` as previously. Other values are attached to
the imported objects as a custom annotation with the
namespace: NSTARGETTEMPLATE
There is not enough information client-side to properly
parse out "dataset" or "screen" from the path. Bio-Formats
needs to have already run on the files and declared the
number of omitted levels.

As a solution, this attempts to push the parsing of the
file paths server-side. (Additionally, it moves to a regex-
based representation).

This however does not work as expected since the import
mechanism strips off all directories. Likely we don't
have much choice but to use the clientPath field.
Since server-side paths are likely to be stripped down to
the minimum, there's not enough context for the regex to
create a substantial dataset name.
The first found container with the given name will
be used.
This is required to support the existing, now deprecated, flags
which are used in gateway tests and CLI import tests.
Now with `--exclude=clientpath` its possible to use
the client-side absolute filepath to determine whether
or not an import has already taken place. This exclusion
does *not* check for the checksum of the target file,
but rather assumes that the client-side path is unique
enough to prevent false positives.
@ximenesuk ximenesuk reopened this Feb 2, 2016
@ximenesuk
Copy link
Contributor Author

@pwalczysko you might want to re-review this if you have time.

@pwalczysko
Copy link
Member

Would you please list on Standup with my name ? Thank you . @ximenesuk

@pwalczysko
Copy link
Member

[pwalczysko@ls31619 ~/Downloads/OMERO.server-5.2.1-348-2b428d1-ice35-b221]$ bin/omero import  /Users/pwalczysko/Desktop/nested\ 1/nested\ 2/nested\ 3/Screen\ Shot\ 2016-02-01\ at\ 14.28.53.png -T "Dataset:!name:test-colin"
-bash: !name: event not found

Edit: as discussed with @ximenesuk this is a bug. Exclamation mark possibly cannot be used at all.

@ximenesuk
Copy link
Contributor Author

Thanks @pwalczysko So we need something other than ! here if more complex escaping is to be avoided.

@pwalczysko
Copy link
Member

All the other parts except the #4393 (comment) bug work fine. The only remark is more general fight with -- -- - again, repeatedly, hit the problem in

bin/omero import  /Users/pwalczysko/Desktop/nested\ 1/nested\ 2/Screen\ Shot\ 2016-02-01\ at\ 14.28.53.png -- -T ":^.*nested\ 1/(?<Container1>.*?)" --exclude=clientpath

works fine
but

bin/omero import  /Users/pwalczysko/Desktop/nested\ 1/nested\ 2/nested\ 3/Screen\ Shot\ 2016-02-01\ at\ 14.28.53.png -T ":^.*nested\ 1/(?<Container1>.*?)" -- --exclude=clientpath

fails. It would be a good idea to simplify this area (even better explanations here would not work really well I am afraid) cc @joshmoore

@joshmoore
Copy link
Member

Re: quoting: good chars. include ",._+:@%/-~", bad chars. include "``|&;()<>`" (and space). A potential mapping might be:

  • "the most recent (>)" --> +
  • "the oldest (<)" --> -
  • "a unique target (!)" --> %

Or we go for strings: rname, oname, uname shrugs

Re: -- I have a couple ideas on how to manage this that we can talk through, but there's going to be a question of which milestone to target. cc: @jburel

@ximenesuk
Copy link
Contributor Author

Useful suggestions @joshmoore, I'll leave @pwalczysko to comment but in conversation he was lukewarm about the alphabetic string forms.

One other case that I was going to add was "create regardless", ie create a new container whether or not one exists. Maybe @ for that?

@pwalczysko
Copy link
Member

I would say I am not lukewarm about alphabetic string forms, in this case I am afraid I am outright against them. As the concept is so complicated, you do not have a chance to convey the meaning even in 2 words. If the meaning is just hinted in the word, it will be inevitably misinterpreted by the user. Signs are better as the user will be forced to go to help and read what that really means.

@ximenesuk
Copy link
Contributor Author

Thanks @pwalczysko (I was being generous in my interpretation of your opposition!) I'll change to the symbols in the comments above unless anyone objects quickly.

@ximenesuk ximenesuk closed this Feb 5, 2016
@ximenesuk ximenesuk reopened this Feb 5, 2016
@ximenesuk
Copy link
Contributor Author

These later commits change the symbols used, add the create symbol and apply the same qualification to regex templates. Tests have been updated but are still not fully comprehensive. In line with these changes the description and testing details have been updated.

@ximenesuk ximenesuk closed this Feb 7, 2016
@ximenesuk ximenesuk reopened this Feb 7, 2016
@pwalczysko
Copy link
Member

Re-tested the changed commands

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:+name:aName
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:-name:aName
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:%name:aName
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:@name:aName

both with and without newly created datasets (i.e. also in cases where no dataset of the specified name was created previously).
All works as expected here.

@ximenesuk
Copy link
Contributor Author

@joshmoore thoughts on merging this if the approach is okay? Any further refactoring, tests and minor changes can be added later.

@joshmoore
Copy link
Member

Works for me. I can foresee us needing one more round of symbol/name changing if there are any objections, but merging to start using this on metadata52.

joshmoore added a commit that referenced this pull request Feb 9, 2016
@joshmoore joshmoore merged commit ecab50c into ome:develop Feb 9, 2016
@jburel jburel added this to the 5.2.2 milestone Feb 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants