New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make lucene's IntervalQuery available via the Query DSL #32406
Conversation
This is still something of a work-in-progress, in that it needs Yaml tests and docs to be added, but I wanted to put up a PR to make sure that I'm heading in the right direction with this. |
Pinging @elastic/es-search-aggs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good @romseygeek ! I left some comments.
public TokenStream tokenize(String field, String text) { | ||
throw new IllegalArgumentException("Can only tokenize text on text fields - not on [" + name + "] which is of type [" + typeName() + "]"); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this ? We could check the index_options
and fail match
parsing if the field is not indexed with positions or check tokenized
method on the field type ?
MappedFieldType fieldType = context.fieldMapper(field); | ||
if (fieldType == null) { | ||
throw new IllegalArgumentException("Cannot create IntervalQuery over non-existent field [" + field + "]"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check if the field is indexed with positions and fail the query if not ?
/** | ||
* The new {@link IntervalsSourceProvider}s added by this plugin | ||
*/ | ||
default List<IntervalSpec<?>> getIntervalsSourceProviders() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we provide a scripted intervals source first, I have the feeling that it will cover most of the use cases and that writing a plugin should be for rare cases where the scripted intervals is not enough ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep this. I will use this right out the gate for functionality I know will never be allowed to be contributed due to the potential performance risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing Matt wants this for wildcard support? Which wouldn't be covered by a script filter. I don't think we lose anything by making this pluggable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romseygeek yes that is my immediate use-case assuming we don't get a wildcard interval into lucene.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we lose anything by making this pluggable.
Sure, although this is really an expert use case that requires to know the internals of how minimum interval
works. For wildcard
we could have something in Lucene (assuming we limit the expansion to the max boolean clause limit) that would be complementary of the index_prefixes
option on the text
field ?
|
||
@Override | ||
protected IntervalQueryBuilder doCreateTestQueryBuilder() { | ||
assumeTrue("test runs only when at least a type is registered", getCurrentTypes().length > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this anymore, this method always run with a single registered type.
|
||
public void testMatchInterval() throws IOException { | ||
|
||
assumeTrue("test runs only when at least a type is registered", getCurrentTypes().length > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
|
||
public void testCombineInterval() throws IOException { | ||
|
||
assumeTrue("test runs only when at least a type is registered", getCurrentTypes().length > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here ;)
IntervalsSourceProvider match2 = new IntervalsSourceProvider.Match("floo", Integer.MAX_VALUE, true); | ||
IntervalsSourceProvider combi | ||
= new IntervalsSourceProvider.Combine(Arrays.asList(match1, match2), IntervalsSourceProvider.Combine.Type.ORDERED, 30); | ||
return new IntervalQueryBuilder(STRING_FIELD_NAME, combi); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some randomization ? We run this method multiple times and then perform some checks on the generated query (serialization, correctness, ...).
|
||
private final List<IntervalsSourceProvider> subSources; | ||
private final Type type; | ||
private final int maxWidth; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we extract this option to have a dedicated source for max_width
? Or could it be handled by a scripted intervals source ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally going to do this as a separate source, but it only really makes sense on ordered/unordered combinations, so I think it's useful to have it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer a dedicated source but I can live with it. I agree that it makes sense on ordered/unordered sources but it can also be useful for others so we could also have a dedicated source and an option in the match
source ?
public static final String NAME = "match"; | ||
|
||
private final String text; | ||
private final int maxWidth; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we extract this option to have a dedicated source for max_width ? Or could it be handled by a scripted intervals source ?
Closed in favour of #36135 |
Relates to #29636