Import template #4393

ximenesuk · 2015-12-18T10:33:43Z

This is a rebase of gh-4125 with the deprecated flags -d and -r maintained. It also partially rebases gh-4239 (the other commit of this PR was already rebased in gh-4263). This PR supersedes gh-4367.

This PR introduces a template to the import workflow that will allow for elements within the imported files path name to be used to determine the destination of the import. At present this is limited to the Dataset or Screen.

The later commits on this PR formalise the template notation to:

<action or Class>:<discriminator>:<pattern>

or

<action or Class>:<pattern>

where the discriminator is implied. Note that if the pattern needs to contain a colon then an explicit discriminator must be used. The discriminator may have a further qualifying character as illustrated in the testing examples below.

Currently the following actions and classes are supported, regex, Dataset and Screen. The intention is that further classes will be defined and this also allows user-defined classes to be used.

For Dataset and Screen the currently supported discriminators and name and id, the implied discriminator is id. For regex the only discriminator is name and implied discriminator is the qualified form of this +name.

Testing

Import target

This if from my understanding of the code, @joshmoore may add further clarification in comments below

The CLI import should work with the following examples. The examples all apply to datasets but screens should be tested as well using either -r or -T Screen:....

The existing target flag should work:

$ bin/omero import -d 1 ~/Work/images/dv/SMN10ul03_R3D_D3D.dv

importing an image to Dataset with id=1.

The new target flag should work with an ID:

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:2

importing an image to Dataset with id=2. The ID can be explicitely stated:

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:id:2

The new target flag should work with the container name:

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:name:aName

importing an image to Dataset with name=aName. Note that if the a Dataset with that name does not exist it will be created. If it does exist it will be used. If several Datasets with this name exist than the most recent will be used. So, run this workflow multiple times with Datasets that do and don't exist.

More complicated names can be used, enclosing with quotes if necessary:

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:name:aNameWith/aSlash
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:name:"New Dataset"

Additionally the name of the container can be qualified to use the most recent (+), the oldest (-), to only import if there is a unique target (%) or to create a new container even if one with the correct name already exists (@):

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:+name:aName
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:-name:aName
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:%name:aName
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:@name:aName

In the third case the import should fail if multiple datasets with that name exist. In all cases a new Dataset should be created is none exists. In the last case a new Dataset should be created even if one or more already exist.

Finally, regular expressions can be used to match the Dataset name from the path name. Here the code (?<Container1>.*) is providing the name.

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T "regex:^.*images/(?<Container1>.*?)"

would use a Dataset with name being the path following images/, ie just dv.

As in the examples above the name discriminator can be explicitly used and also qualified

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T "regex:+name:^.*images/(?<Container1>.*?)"
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T "regex:-name:^.*images/(?<Container1>.*?)"
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T "regex:%name:^.*images/(?<Container1>.*?)"
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T "regex:@name:^.*images/(?<Container1>.*?)"

These work as in the Dataset examples above. But, in the third case the import will succeed if multiple datasets with that name exist but the import will be orphaned.

And finally,

$ bin/ omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T "regex:^.*Work/(?<Container1>.*?)"

would use a Dataset with name being the path following Work/, ie images/dv.

The regex offers much more power than this but these basic examples should check the workflow. Note that the explicit use of regex is not required but the colon is, so:

$ bin/ omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T ":^.*Work/(?<Container1>.*?)"

is the same as the first examples immediately above.

Note that this does not yet address the group problem. The image will be imported into the default group and if a Dataset with that id does not exist in the default group the import will fail. For the name or regex argument this means that a new Dataset may be created in the default group rather than an existing one from another of the user's group being used.

@joshmoore this re-formulation is in line with what we discussed (I think). It does mean that there could be significant refactoring. The TargetBuilder could parse the argument and pass discriminator and pattern to the appropriate classes, rather than letting the classes each do equivalent parsing.

Also, it would be possible to have the template target import fail client side rather than succeed server side. But there may be a reason you chose to do the container resolution server-side for the regex rather than client side as in the model target.

However, these potential changes can be made later.

Exclude on client path

One commit buried in this PR (ximenesuk@0b0fce6) adds a new exclude option based on the client-side path. Thus

$ bin/omero import ~/Work/images/dv/IAGFP-Noc01_R3D.dv -- --exclude=clientpath
$ bin/omero import ~/Work/images/dv/IAGFP-Noc01_R3D.dv -- --exclude=clientpath

Assuming this image has not been imported from this path before then the first import should succeed while the second should fail:

...
2015-12-06 17:33:36,025 1959       [      main] INFO   .importer.exclusions.ClientPathExclusion - ClientPath match for filename: Users/colin/Work/images/dv/IAGFP-Noc01_R3D.dv

==> Summary
0 files uploaded, 0 filesets created, 0 images imported, 0 errors in 0:00:00.089

`Dataset:id`, `Dataset:name:foo` and similar `Screen` values for `-T` now perform the same actions as `-r` and `-d` as previously. Other values are attached to the imported objects as a custom annotation with the namespace: NSTARGETTEMPLATE

There is not enough information client-side to properly parse out "dataset" or "screen" from the path. Bio-Formats needs to have already run on the files and declared the number of omitted levels. As a solution, this attempts to push the parsing of the file paths server-side. (Additionally, it moves to a regex- based representation). This however does not work as expected since the import mechanism strips off all directories. Likely we don't have much choice but to use the clientPath field.

Since server-side paths are likely to be stripped down to the minimum, there's not enough context for the regex to create a substantial dataset name.

The first found container with the given name will be used.

This is required to support the existing, now deprecated, flags which are used in gateway tests and CLI import tests.

Now with `--exclude=clientpath` its possible to use the client-side absolute filepath to determine whether or not an import has already taken place. This exclusion does *not* check for the checksum of the target file, but rather assumes that the client-side path is unique enough to prevent false positives.

ximenesuk · 2016-02-02T11:57:08Z

@pwalczysko you might want to re-review this if you have time.

pwalczysko · 2016-02-02T13:18:29Z

Would you please list on Standup with my name ? Thank you . @ximenesuk

pwalczysko · 2016-02-03T16:55:40Z

[pwalczysko@ls31619 ~/Downloads/OMERO.server-5.2.1-348-2b428d1-ice35-b221]$ bin/omero import  /Users/pwalczysko/Desktop/nested\ 1/nested\ 2/nested\ 3/Screen\ Shot\ 2016-02-01\ at\ 14.28.53.png -T "Dataset:!name:test-colin"
-bash: !name: event not found

Edit: as discussed with @ximenesuk this is a bug. Exclamation mark possibly cannot be used at all.

ximenesuk · 2016-02-03T17:05:32Z

Thanks @pwalczysko So we need something other than ! here if more complex escaping is to be avoided.

pwalczysko · 2016-02-03T17:31:42Z

All the other parts except the #4393 (comment) bug work fine. The only remark is more general fight with -- -- - again, repeatedly, hit the problem in

bin/omero import  /Users/pwalczysko/Desktop/nested\ 1/nested\ 2/Screen\ Shot\ 2016-02-01\ at\ 14.28.53.png -- -T ":^.*nested\ 1/(?<Container1>.*?)" --exclude=clientpath

works fine
but

bin/omero import  /Users/pwalczysko/Desktop/nested\ 1/nested\ 2/nested\ 3/Screen\ Shot\ 2016-02-01\ at\ 14.28.53.png -T ":^.*nested\ 1/(?<Container1>.*?)" -- --exclude=clientpath

fails. It would be a good idea to simplify this area (even better explanations here would not work really well I am afraid) cc @joshmoore

joshmoore · 2016-02-04T08:50:02Z

Re: quoting: good chars. include ",._+:@%/-~", bad chars. include "``|&;()<>`" (and space). A potential mapping might be:

"the most recent (>)" --> +
"the oldest (<)" --> -
"a unique target (!)" --> %

Or we go for strings: rname, oname, uname shrugs

Re: -- I have a couple ideas on how to manage this that we can talk through, but there's going to be a question of which milestone to target. cc: @jburel

ximenesuk · 2016-02-04T09:02:07Z

Useful suggestions @joshmoore, I'll leave @pwalczysko to comment but in conversation he was lukewarm about the alphabetic string forms.

One other case that I was going to add was "create regardless", ie create a new container whether or not one exists. Maybe @ for that?

pwalczysko · 2016-02-04T09:56:31Z

I would say I am not lukewarm about alphabetic string forms, in this case I am afraid I am outright against them. As the concept is so complicated, you do not have a chance to convey the meaning even in 2 words. If the meaning is just hinted in the word, it will be inevitably misinterpreted by the user. Signs are better as the user will be forced to go to help and read what that really means.

ximenesuk · 2016-02-04T10:07:08Z

Thanks @pwalczysko (I was being generous in my interpretation of your opposition!) I'll change to the symbols in the comments above unless anyone objects quickly.

ximenesuk · 2016-02-07T18:26:00Z

These later commits change the symbols used, add the create symbol and apply the same qualification to regex templates. Tests have been updated but are still not fully comprehensive. In line with these changes the description and testing details have been updated.

pwalczysko · 2016-02-08T11:55:51Z

Re-tested the changed commands

$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:+name:aName
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:-name:aName
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:%name:aName
$ bin/omero import ~/Work/images/dv/SMN10ul03_R3D_D3D.dv -T Dataset:@name:aName

both with and without newly created datasets (i.e. also in cases where no dataset of the specified name was created previously).
All works as expected here.

ximenesuk · 2016-02-09T13:43:35Z

@joshmoore thoughts on merging this if the approach is okay? Any further refactoring, tests and minor changes can be added later.

joshmoore · 2016-02-09T14:15:46Z

Works for me. I can foresee us needing one more round of symbol/name changing if there are any objections, but merging to start using this on metadata52.

Import template

ximenesuk and others added 30 commits December 17, 2015 14:02

Add a template to import config.

173b959

Allow template to be passed in via importer-cli, but don't document.

69609f4

Changes to ImportLibrary

7127873

Changes to OMEROMetadataStoreClient

506a164

Add ImportTemplate class.

bee4619

Create o.f.i.targets package

71296e9

Initial migration to ImportTarget interface

3cc0162

Re-collapse path targets to TemplateImportTarget

ac0e0d7

template: More moving methods

16be85b

model: implement and throw

5513a24

cli: rename template argument

e5983f6

cli: use target rather than -d and -r

909444e

targets: improve exceptions

d8150f1

fix: re-collapse

d8936ae

import.py: add -T arguments

8c50223

tests: work on unit and integration tests

d30b4af

Initial working -T flag

6588f55

`Dataset:id`, `Dataset:name:foo` and similar `Screen` values for `-T` now perform the same actions as `-r` and `-d` as previously. Other values are attached to the imported objects as a custom annotation with the namespace: NSTARGETTEMPLATE

Use client path for TemplateImportTarget

d817dcf

Since server-side paths are likely to be stripped down to the minimum, there's not enough context for the regex to create a substantial dataset name.

Working model & template sources

c88cc7f

This is no longer failing as expected

0afa54e

Working $Class:name:$Name target

c4cbea8

The first found container with the given name will be used.

Fix flake8

4281e4b

Allow deprecated -d and -r to work with new target

5d9fbf3

This is required to support the existing, now deprecated, flags which are used in gateway tests and CLI import tests.

Fix JavaDoc error and remove blank line

37beaa7

Use more explicit Container1 instead of C1

84351af

Restrict target classes to Dataset and Screen

5cf6438

Handle multiple containers with the same name

213c471

Extend helper method to take object name

33622b8

ximenesuk reopened this Feb 2, 2016

Colin Blackburn added 8 commits February 4, 2016 11:59

Change qualifier symbols and add create symbol

1839cc1

Use ModelTarget approach for TemplateTarget

cc687df

Use split in ModelTemplate and add id as discriminator

4360421

Add test for explicit id discriminator

462fe21

Add ModelTarget test bad discriminator

444cdcd

Add test for qualified discriminators in TemplateTarget

8ad0383

Whitespace fix

fca4b70

Expand tests to include create qualifier

13eed50

ximenesuk closed this Feb 5, 2016

ximenesuk reopened this Feb 5, 2016

ximenesuk closed this Feb 7, 2016

ximenesuk reopened this Feb 7, 2016

joshmoore added a commit that referenced this pull request Feb 9, 2016

Merge pull request #4393 from ximenesuk/import-template-dev

ecab50c

Import template

joshmoore merged commit ecab50c into ome:develop Feb 9, 2016

ximenesuk mentioned this pull request Feb 25, 2016

Add basic import template page ome/omero-documentation#1413

Merged

jburel added this to the 5.2.2 milestone Feb 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import template #4393

Import template #4393

ximenesuk commented Dec 18, 2015

ximenesuk commented Feb 2, 2016

pwalczysko commented Feb 2, 2016

pwalczysko commented Feb 3, 2016

ximenesuk commented Feb 3, 2016

pwalczysko commented Feb 3, 2016

joshmoore commented Feb 4, 2016

ximenesuk commented Feb 4, 2016

pwalczysko commented Feb 4, 2016

ximenesuk commented Feb 4, 2016

ximenesuk commented Feb 7, 2016

pwalczysko commented Feb 8, 2016

ximenesuk commented Feb 9, 2016

joshmoore commented Feb 9, 2016

Import template #4393

Import template #4393

Conversation

ximenesuk commented Dec 18, 2015

Testing

Import target

Exclude on client path

ximenesuk commented Feb 2, 2016

pwalczysko commented Feb 2, 2016

pwalczysko commented Feb 3, 2016

ximenesuk commented Feb 3, 2016

pwalczysko commented Feb 3, 2016

joshmoore commented Feb 4, 2016

ximenesuk commented Feb 4, 2016

pwalczysko commented Feb 4, 2016

ximenesuk commented Feb 4, 2016

ximenesuk commented Feb 7, 2016

pwalczysko commented Feb 8, 2016

ximenesuk commented Feb 9, 2016

joshmoore commented Feb 9, 2016