Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fileset indexing: address performance issues and add configurability #156

Merged
merged 11 commits into from
Mar 9, 2023
44 changes: 40 additions & 4 deletions src/main/java/ome/services/fulltext/FullTextBridge.java
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,9 @@
*
* @author Josh Moore, josh at glencoesoftware.com
* @since 3.0-Beta3
* @see <a href="https://trac.openmicroscopy.org.uk/ome/FileParsers">Parsers</a
* @see <a href="https://omero.readthedocs.io/en/stable/developers/Search/FileParsers.html">Parsers</a
* href>
sbesson marked this conversation as resolved.
Show resolved Hide resolved
* @see <a href="https://trac.openmicroscopy.org.uk/ome/SearchBridges">Bridges</a
* @see <a href="https://omero.readthedocs.io/en/stable/developers/Modules/Search/Bridges.html">Bridges</a
* href>
*/
@Deprecated
Expand All @@ -67,6 +67,7 @@ public class FullTextBridge extends BridgeHelper {
final protected OriginalFilesService files;
final protected Map<String, FileParser> parsers;
final protected Class<FieldBridge>[] classes;
final protected int maxFilesetSize;

/**
* Since this constructor provides the instance with no way of parsing
Expand Down Expand Up @@ -100,15 +101,41 @@ public FullTextBridge(OriginalFilesService files,
* set of {@link FieldBridge bridge classes} which will be
* instantiated via a no-arg constructor.
* @see <a
* href="https://trac.openmicroscopy.org.uk/ome/SearchBridges">Bridges</a
* href="https://omero.readthedocs.io/en/stable/developers/Modules/Search/Bridges.html">Bridges</a
* href>
*/
@SuppressWarnings("unchecked")
public FullTextBridge(OriginalFilesService files,
Map<String, FileParser> parsers, Class<FieldBridge>[] bridgeClasses) {
this(files, parsers, bridgeClasses, 1);
}

/**
* Main constructor.
*
* @param files
* {@link OriginalFilesService} for getting access to binary files.
* @param parsers
* List of {@link FileParser} instances which are currently
* configured.
* @param bridgeClasses
* set of {@link FieldBridge bridge classes} which will be
* instantiated via a no-arg constructor.
* @param maxFilesetSize
* maximum size of the fileset to be considered for indexing
* @see <a
* href="https://omero.readthedocs.io/en/stable/developers/Modules/Search/Bridges.html">Bridges</a
* href>
*/
@SuppressWarnings("unchecked")
public FullTextBridge(OriginalFilesService files,
Map<String, FileParser> parsers, Class<FieldBridge>[] bridgeClasses,
int maxFilesetSize) {
this.files = files;
this.parsers = parsers;
this.classes = bridgeClasses == null ? new Class[] {} : bridgeClasses;
this.maxFilesetSize = maxFilesetSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default in the constructor above is set to 1.
This could allow to pass any value including negative one. Maybe add a check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you and/or anyone else who was involved in the initial review of the feature foresee a situation where someone would like to preserve the current behavior i.e. index all files in a fileset irrespectively of its size?

If so, -1 (or any negative value?) could be used as a mechanism to skip the fileset size check, with all associated warnings re performance implications. Otherwise, I can make this default to 1 if it's negative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't involved in the initial review or discussion around the feature back in late 2019. However, purely from a technical perspective I can see justification for leaving this open ended. If we add range checks here in the constructor they will apply to all subclasses and that seems like a bad design decision. There are other ways of doing this with Spring such as using properties rather than constructor arguments but I don't know if that's needed here.

logger().info("Maximum fileset size: {}", maxFilesetSize);
}

/**
Expand Down Expand Up @@ -388,6 +415,7 @@ public void set_folders(final String name, final IObject object,
*
* - fileset.entry.clientPath
* - fileset.entry.name
* - fileset.templatePrefix
*/
public void set_fileset(final String name, final IObject object,
final Document document, final LuceneOptions opts) {
Expand All @@ -397,6 +425,15 @@ public void set_fileset(final String name, final IObject object,
if (fileset == null) {
return;
}
// Skip fileset indexing above a cut-off
// As the fileset indexing scales with the number of fileset entries for each
// images, this operation can quickly lead to performance degradation notable
sbesson marked this conversation as resolved.
Show resolved Hide resolved
// in domains like high-content screening where each of 1K-10K images in a plate
// can be associated with 10-100K files
if (fileset.sizeOfUsedFiles() > maxFilesetSize) {
return;
}
add(document, "fileset.templatePrefix", fileset.getTemplatePrefix(), opts);
final Iterator<FilesetEntry> entryIterator = fileset.iterateUsedFiles();
while (entryIterator.hasNext()) {
final FilesetEntry entry = entryIterator.next();
Expand All @@ -405,7 +442,6 @@ public void set_fileset(final String name, final IObject object,
}
add(document, "fileset.entry.clientPath", entry.getClientPath(), opts);
add(document, "fileset.entry.name", entry.getOriginalFile().getName(), opts);
add(document, "fileset.templatePrefix", fileset.getTemplatePrefix(), opts);
}
}
}
Expand Down
1 change: 1 addition & 0 deletions src/main/resources/ome/services/service-ome.api.Search.xml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
<constructor-arg ref="fileParsers"/>
<constructor-arg ref="/OMERO/Files"/>
<constructor-arg value="${omero.search.bridges}"/>
<constructor-arg value="${omero.search.max_fileset_size}"/>
</bean>

<!-- Use "*" as a wildcard parser -->
Expand Down
5 changes: 5 additions & 0 deletions src/main/resources/omero-server.properties
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,11 @@ ome.model.screen.PlateAcquisition,ome.model.screen.Well
# modified.
omero.search.include_actions=INSERT,UPDATE,REINDEX,DELETE

# Maximum size of the fileset to consider for indexing
# Increasing this cut-off can lead to indexing performance degradation
# notably in the high-content screening domain where plates typically
# contain 1K-10K images associated with 10-100K fileset entries each
omero.search.max_fileset_size=10

##
## Old loader: "persistentEventLogLoader"
Expand Down