Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make lucene's IntervalQuery available via the Query DSL #32406

Closed
wants to merge 4 commits into from
Closed

Make lucene's IntervalQuery available via the Query DSL #32406

wants to merge 4 commits into from

Conversation

romseygeek
Copy link
Contributor

Relates to #29636

@romseygeek
Copy link
Contributor Author

This is still something of a work-in-progress, in that it needs Yaml tests and docs to be added, but I wanted to put up a PR to make sure that I'm heading in the right direction with this.

@romseygeek romseygeek self-assigned this Jul 26, 2018
@romseygeek romseygeek added >feature :Search/Search Search-related issues that do not fall into other categories v7.0.0 v6.5.0 labels Jul 26, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good @romseygeek ! I left some comments.

public TokenStream tokenize(String field, String text) {
throw new IllegalArgumentException("Can only tokenize text on text fields - not on [" + name + "] which is of type [" + typeName() + "]");
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this ? We could check the index_options and fail match parsing if the field is not indexed with positions or check tokenized method on the field type ?

MappedFieldType fieldType = context.fieldMapper(field);
if (fieldType == null) {
throw new IllegalArgumentException("Cannot create IntervalQuery over non-existent field [" + field + "]");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check if the field is indexed with positions and fail the query if not ?

/**
* The new {@link IntervalsSourceProvider}s added by this plugin
*/
default List<IntervalSpec<?>> getIntervalsSourceProviders() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we provide a scripted intervals source first, I have the feeling that it will cover most of the use cases and that writing a plugin should be for rare cases where the scripted intervals is not enough ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep this. I will use this right out the gate for functionality I know will never be allowed to be contributed due to the potential performance risk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing Matt wants this for wildcard support? Which wouldn't be covered by a script filter. I don't think we lose anything by making this pluggable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romseygeek yes that is my immediate use-case assuming we don't get a wildcard interval into lucene.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we lose anything by making this pluggable.

Sure, although this is really an expert use case that requires to know the internals of how minimum interval works. For wildcard we could have something in Lucene (assuming we limit the expansion to the max boolean clause limit) that would be complementary of the index_prefixes option on the text field ?


@Override
protected IntervalQueryBuilder doCreateTestQueryBuilder() {
assumeTrue("test runs only when at least a type is registered", getCurrentTypes().length > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this anymore, this method always run with a single registered type.


public void testMatchInterval() throws IOException {

assumeTrue("test runs only when at least a type is registered", getCurrentTypes().length > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here


public void testCombineInterval() throws IOException {

assumeTrue("test runs only when at least a type is registered", getCurrentTypes().length > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here ;)

IntervalsSourceProvider match2 = new IntervalsSourceProvider.Match("floo", Integer.MAX_VALUE, true);
IntervalsSourceProvider combi
= new IntervalsSourceProvider.Combine(Arrays.asList(match1, match2), IntervalsSourceProvider.Combine.Type.ORDERED, 30);
return new IntervalQueryBuilder(STRING_FIELD_NAME, combi);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some randomization ? We run this method multiple times and then perform some checks on the generated query (serialization, correctness, ...).


private final List<IntervalsSourceProvider> subSources;
private final Type type;
private final int maxWidth;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we extract this option to have a dedicated source for max_width ? Or could it be handled by a scripted intervals source ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally going to do this as a separate source, but it only really makes sense on ordered/unordered combinations, so I think it's useful to have it here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a dedicated source but I can live with it. I agree that it makes sense on ordered/unordered sources but it can also be useful for others so we could also have a dedicated source and an option in the match source ?

public static final String NAME = "match";

private final String text;
private final int maxWidth;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we extract this option to have a dedicated source for max_width ? Or could it be handled by a scripted intervals source ?

@colings86 colings86 added v6.6.0 and removed v6.5.0 labels Oct 25, 2018
@romseygeek romseygeek mentioned this pull request Dec 1, 2018
@romseygeek romseygeek closed this Dec 1, 2018
@romseygeek
Copy link
Contributor Author

Closed in favour of #36135

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search/Search Search-related issues that do not fall into other categories v6.6.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants