Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full text additions #57

Merged
merged 10 commits into from Dec 20, 2019
Merged

Full text additions #57

merged 10 commits into from Dec 20, 2019

Conversation

@joshmoore
Copy link
Member

joshmoore commented Jun 20, 2019

As requested recently by @olatarkowska and @sukunis, the FullTextBridge could index a number of additional properties. This PR begins that processing by indexing both Filesets as well as (Logical)Channels for Images.

cc: @mtbc @jburel @dominikl @pwalczysko @will-moore (the search interested)

Simple clientPath test:

mkdir /tmp/abc
touch /tmp/abc/def.fake
bin/omero import /tmp/abc/def.fake
bin/omero search Image abc
if (object instanceof Image) {
final Image image = (Image) object;
final Fileset fileset = image.getFileset();
final Iterator<FilesetEntry> entryIterator = fileset.iterateUsedFiles();

This comment has been minimized.

Copy link
@mtbc

mtbc Jun 21, 2019

Member

This line threw NPE in CI.

This comment has been minimized.

Copy link
@joshmoore

joshmoore Jun 21, 2019

Author Member

Cheers. Did you happen to notice if that made the indexer particularly unhappy?

This comment has been minimized.

Copy link
@mtbc

mtbc Jun 21, 2019

Member

It did. Unfortunately, in this case restarting it wouldn't help as the exception's repeatable. (At least it logged the exception!)

At the moment truly unexpected things stop it in its tracks. It's an unpleasant tradeoff between that and possibly regarding a bunch of failed things as having been indexed; a difficult design issue.

This comment has been minimized.

Copy link
@joshmoore

joshmoore Jun 21, 2019

Author Member

That's what I assumed after running into exceptions while testing this. Happy to just be as safe as possible before merging this, but if we could find somewhere fairly close to FullTextBridge.set (perhaps in that method itself) for capturing all NPEs and similar it might be worth it.

This comment has been minimized.

Copy link
@mtbc

mtbc Jun 21, 2019

Member

For indexing failures better to WARN and actually move on? Sounds plausible; I can do that if desired.

This comment has been minimized.

Copy link
@mtbc

mtbc Jun 21, 2019

Member

(Actually, even ERROR could be good for those.)

This comment has been minimized.

Copy link
@mtbc

mtbc Jun 21, 2019

Member

It's an interestingly difficult one: if it instead throws UnresolvableObjectException then retrying will probably work. Have to give it some thought.

@mtbc mtbc mentioned this pull request Jun 21, 2019
@@ -23,8 +23,9 @@
import ome.model.annotations.TextAnnotation;
import ome.model.annotations.TermAnnotation;
import ome.model.containers.Folder;
import ome.model.core.Image;
import ome.model.core.OriginalFile;
import ome.model.core.*;

This comment has been minimized.

Copy link
@mtbc

mtbc Jun 21, 2019

Member

Okay to do this?

This comment has been minimized.

Copy link
@joshmoore

joshmoore Jun 21, 2019

Author Member

Yeah, I'm finding myself failing to be vigilant against intellij's defaults...

This comment has been minimized.

Copy link
@dominikl

dominikl Jun 24, 2019

Member

I'd find the .* import much cleaner. Would it actually make any difference at the compiled byte code level? I'd guess not.

This comment has been minimized.

Copy link
@rgozim

rgozim Jun 24, 2019

Member

You can adjust import class count in Intellij in the settings (Editor -> Code Style -> Java -- imports)

Screenshot 2019-06-24 at 09 42 30

This comment has been minimized.

Copy link
@joshmoore

joshmoore Jun 24, 2019

Author Member

Agreed, but there's the requirement to remember to do that in every repository (I have each jar as a separate project)

This comment has been minimized.

Copy link
@jburel

jburel Jun 24, 2019

Member

This is a matter of pref but we have avoided to have import .*.
I personally prefer to be explicit

This comment has been minimized.

Copy link
@joshmoore

joshmoore Jul 1, 2019

Author Member

Pushed a fix.

/**
* Walks the acquisition related metadata including channel names.
*
* @param name

This comment has been minimized.

Copy link
@jburel

jburel Jun 24, 2019

Member

Useful to have the description of the parameters at the public method level

@jburel

This comment has been minimized.

Copy link
Member

jburel commented Jun 24, 2019

Is there any tests for the new indexed fields e.g. channel name fluor etc.?

@joshmoore

This comment has been minimized.

Copy link
Member Author

joshmoore commented Jun 24, 2019

No, I'd like to get feedback on the fields etc. first before having a parallel branch to manage. Big question is likely the format of the channel name string. (I failed to find a simple way to re-use client code for this)

if (logical != null) {
addIfNotNull(document, "channel.name", logical.getName(), opts);
addIfNotNull(document, "channel.fluor", logical.getFluor(), opts);
addEnumIfNotNull(document, "channel.mode", logical.getMode(), opts);

This comment has been minimized.

Copy link
@jburel

jburel Jun 24, 2019

Member

the emission wavelength is used for "label" i.e. value displayed in the UI

public String getChannelLabeling()
    {
        String value = getName();
        if (StringUtils.isNotBlank(value)) return value;
        value = getFluor();
        if (StringUtils.isNotBlank(value)) return value;
        Length v = null;
        try {
            v = getEmissionWavelength(null);
        } catch (BigResult e) { }
        if (v != null) {
        	return ""+ DoubleMath.roundToInt(v.getValue(), RoundingMode.DOWN);
        }
        return ""+index;
    }

So that value should probably be indexed

This comment has been minimized.

Copy link
@joshmoore

joshmoore Jun 25, 2019

Author Member

As "channel.label"?

This comment has been minimized.

Copy link
@jburel

jburel Jun 25, 2019

Member

label encapsulating the logic described above or just wavelength?

This comment has been minimized.

Copy link
@joshmoore

joshmoore Jun 25, 2019

Author Member

can do either or both. So separating the questions:

  • do we want to introduce a user-facing field "channel.label" which is the OR of several other fields?
  • do we want to introduce channel.wavelength and if so, in what unit/format?
@joshmoore joshmoore force-pushed the joshmoore:full-text-additions branch from ad3b9ba to ba37935 Jun 25, 2019
@joshmoore joshmoore force-pushed the joshmoore:full-text-additions branch 2 times, most recently from 4ef4f17 to ca188f0 Jun 26, 2019
@joshmoore joshmoore force-pushed the joshmoore:full-text-additions branch from e03eb67 to 1b57f84 Jul 2, 2019
@joshmoore joshmoore mentioned this pull request Jul 2, 2019
0 of 2 tasks complete
@joshmoore

This comment has been minimized.

Copy link
Member Author

joshmoore commented Dec 17, 2019

I plan on coming back to this in the next while. Would anyone like to make a proposal for searching units?

I imagine to do it properly we'd need something like https://dzone.com/articles/build-a-custom-solr-filter-to-handle-unit-conversions . One option would be to move everything to a standard unit or to store the value in every unit (😱 ).

Alternatively, we just say we're not indexing any unit-based field. The same would likely then hold for hash.

@joshmoore joshmoore self-assigned this Dec 17, 2019
joshmoore added a commit to joshmoore/ome-documentation that referenced this pull request Dec 18, 2019
@joshmoore joshmoore force-pushed the joshmoore:full-text-additions branch from 5ac1a9f to da0c753 Dec 19, 2019
@joshmoore

This comment has been minimized.

Copy link
Member Author

joshmoore commented Dec 19, 2019

Tested the last commit with:

 # Returns nothing
 bin/omero search Image abcdef

 # Create a unique clientPath
 mkdir /tmp/abcdef
 touch /tmp/abcdef/search.fake
 bin/omero import /tmp/abcdef/search.fake

 # Check up-to-date
 less var/log/Indexer-0.log

 # Returns
 bin/omero search Image abcdef
 bin/omero search Image 2019
 bin/omero search Image "14-31-26.294"
 bin/omero search Image 14
 bin/omero search Image abc*

 bin/omero search Image d # No result
@jburel

This comment has been minimized.

Copy link
Member

jburel commented Dec 20, 2019

Tested the latest changes:
After importing the file /tmp/abcdef/search.fake

omero search Image abcdef
 # | Class  | Id   | details            | series | fileset       | name        
---+--------+------+--------------------+--------+---------------+-------------
 0 | ImageI | 4965 | owner=102;group=55 | 0      | FilesetI:4107 | search.fake 
(1 row
omero search Image 2019
 # | Class  | Id   | details            | series | fileset       | name                               | format      | objectiveSettings     | instrument      
---+--------+------+--------------------+--------+---------------+------------------------------------+-------------+-----------------------+-----------------
 0 | ImageI | 4965 | owner=102;group=55 | 0      | FilesetI:4107 | search.fake                        |             |                       |                 
 1 | ImageI | 979  | owner=102;group=55 | 1      | FilesetI:964  | MF-2CH-Z-T.tif [MF-2CH-Z-T.tif #2] | FormatI:170 |                       |                 
 2 | ImageI | 980  | owner=102;group=55 | 2      | FilesetI:964  | MF-2CH-Z-T.tif [MF-2CH-Z-T.tif #3] | FormatI:170 |                       |                 
 3 | ImageI | 981  | owner=102;group=55 | 3      | FilesetI:964  | MF-2CH-Z-T.tif [MF-2CH-Z-T.tif #4] | FormatI:170 |                       |                 
 4 | ImageI | 978  | owner=102;group=55 | 0      | FilesetI:964  | MF-2CH-Z-T.tif [MF-2CH-Z-T.tif #1] | FormatI:170 | ObjectiveSettingsI:20 | InstrumentI:909 
omero search Image abc*
 # | Class  | Id   | details            | series | fileset       | name        
---+--------+------+--------------------+--------+---------------+-------------
 0 | ImageI | 4965 | owner=102;group=55 | 0      | FilesetI:4107 | search.fake 
(1 row)
omero search Image d
No results found

Interestingly omero search Image a* is allowed but
omero search Image d* is not. error: unrecognized arguments: docs

@jburel

This comment has been minimized.

Copy link
Member

jburel commented Dec 20, 2019

I needed to add quote around "d*"

@jburel

This comment has been minimized.

Copy link
Member

jburel commented Dec 20, 2019

It works as expected

@jburel jburel merged commit a8e5dcc into ome:master Dec 20, 2019
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@joshmoore joshmoore deleted the joshmoore:full-text-additions branch Dec 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.