Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntervalsSourceProvider and IntervalFilter Plugins #49519

Open
wants to merge 1 commit into
base: master
from

Conversation

@mattweber
Copy link
Contributor

mattweber commented Nov 24, 2019

Allow custom IntervalsSourceProviders and IntervalFilters via search plugins.

//cc @romseygeek @jimczi

Allow custom IntervalsSourceProviders and IntervalFilters via search plugins.
@mattweber

This comment has been minimized.

Copy link
Contributor Author

mattweber commented Nov 24, 2019

Unrelated test disabled in this PR, waiting for #49490.

@elasticmachine

This comment has been minimized.

Copy link
Collaborator

elasticmachine commented Nov 25, 2019

Pinging @elastic/es-search (:Search/Search)

@mattweber

This comment has been minimized.

Copy link
Contributor Author

mattweber commented Nov 26, 2019

@romseygeek while discussing, please take things like #49595 into consideration. Some use cases require very specific matching rules and requirements that will not / should not be part of elasticsearch itself.

In my case, one of the major issues I have is with the wildcard expansion limits that have put in place with no way to override even though it is possible.

@romseygeek

This comment has been minimized.

Copy link
Contributor

romseygeek commented Nov 27, 2019

Thanks for opening this @mattweber

When we first introduced intervals we went back and forth a bit over whether or not to make them pluggable. The main argument against is that the various interval algorithms require that all intervals produced by a source are minimal, and it's not obvious how to enforce this with pluggable sources - so it would be very easy to write a source that doesn't minimize its intervals, and then end up with broken behaviour when it's combined with a filter or a conjunction of some kind.

We think this still holds, and that we should instead be adding support for things like fuzziness or increased multiterm expansions into core. For IntervalFilters, can you explain what you need to do that isn't covered by scripting filters?

@mattweber

This comment has been minimized.

Copy link
Contributor Author

mattweber commented Nov 27, 2019

Thanks @romseygeek. Is that really something you should worry about as long as the expected behavior is documented? If someone does something crazy in a plugin, it really isn't something you would need to support. It would be on that plugin author to fix it. Same applies for all current plugin integration points (a custom query / score function that returns negative scores, an analysis plugin that doesn't advance properly, etc).

There are a couple reasons why I would like to see these pluggable:

  • To do things that are not in core. Regex intervals, more wildcard expansions, maybe something based on payloads and custom business logic. In general, just things that are too specialized or risky/heavy to make it into core. As mentioned in #49595, patent and legal search can have some crazy matching requirements and I can see this being one of the main use-cases for intervals as span queries are phased out. Even if span queries stick around, intervals are much nicer to work with.

  • Being able to expose functionality faster than you release or we can upgrade. Once we go to prod, it is typically hard to upgrade our ES version but installing plugins is relatively easy. There have been numerous cases where I have backported new functionality and/or bug fixes into plugins while I was waiting. Best case its only a couple weeks until a minor/patch release, worst case is waiting for a major release. I had synonym graph support as a plugin months before we actaully got it into an ES release. Same with a fix so nested aggreagations can be used in sorts. I can see this happening with intervals once I start a full integration. In fact, I did find an un-reported NPE bug in current code (fixed in this PR) that would be a prime example. I would fork this interval filter, fix the bug, expose via a plugin and open a PR to fix in core. Once you merge and release, I would remove the forked filter and plugin and move back to using the core filter as soon as we could upgrade to the fixed version. You guys do this exact thing with lucene and your X* forks too.

For pluggable interval filters, I don't have an immediate use for one but I imagine it would be more along the lines the 2nd point above. Ie. Something new popping up in lucene and needing to use it before you can get it into core. The unordered_no_overlaps filter I wrote as a test in this PR could be used as an example. In general, I am not a fan of scripting and try to avoid it if possible.

Sorry for being a pain.

@jimczi

This comment has been minimized.

Copy link
Member

jimczi commented Nov 27, 2019

If someone does something crazy in a plugin, it really isn't something you would need to support. It would be on that plugin author to fix it. Same applies for all current plugin integration points (a custom query / score function that returns negative scores, an analysis plugin that doesn't advance properly, etc).

I agree that users can write the query they want in plugins but in this case you're already inside a query where we'd like to ensure that some basic assumptions are met. The most important one is the minimal interval semantic that can be difficult to understand and more importantly difficult to prove that it works correctly with your logic. If you really want to implement new sources you could simply duplicate the entire intervals query and provide your own wild_intervals that would expose your custom filter/source ?
As we said we don't expect to add a lot more sources there, if you have an idea of one that is missing we should discuss it here and decide whether it should belong to core or not but I don't see any example in your comments that is worth adding. The only ones I see are for things that we don't want to expose for a reason (e.g.: limit multi-fields expansion).
We see great values in replacing plugins with scripts. You don't need to "maintain" your script and provide a new version for each minor release. You are also in a safe environment where we are sure that your customization will not hurt the overall feature. We are open to plugins, I don't think we restrict anything at the query level for instance but the pluggability that you're requesting is at a different level. We have restrictions in place for intervals, the same than in Lucene so we don't diverge from the original implementation. There are reasons for that that @romseygeek explained in his comment.

@romseygeek

This comment has been minimized.

Copy link
Contributor

romseygeek commented Nov 27, 2019

In fact, I did find an un-reported NPE bug in current code

Wow, yes, that toXContent method is entirely wrong isn't it? I'll open a separate fix for that one, thanks for pointing it out!

@romseygeek

This comment has been minimized.

Copy link
Contributor

romseygeek commented Nov 27, 2019

On the larger point - I wonder if it's worth exposing a limited form of provider, for example one that enforces that intervals only cover a single position? That way we don't get the problems with minimization but still allow people to do things with payloads or multiterm expansions that we don't want to support in core.

@jimczi

This comment has been minimized.

Copy link
Member

jimczi commented Nov 27, 2019

It's hard to think of plugins without real use-case. I see that we mention payloads but this would require much more work than just an interval source since we don't expose payloads at indexing time. This is also something that we want to add in Lucene so I don't see why we wouldn't expose a solution for this in core if we support it in Lucene.
Regarding the argument on limitations that we impose, as I said it's relatively easy to bypass by providing your own implementation of intervals query in a plugin so one level up. It shouldn't be too easy though so I think we have a good compromise here, if you're not happy with the restriction you can rewrite the integration of Lucene's intervals in a plugin.

@mattweber

This comment has been minimized.

Copy link
Contributor Author

mattweber commented Nov 27, 2019

Thanks @jimczi @romseygeek, forking the intervals query itself is an option I had thought about but was hoping we could get real plugin support. I completely understand your reservations and push back. Please feel free to close this if you don't see any path forward to supporting plugins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.