Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support field collapsing + rescore #7484

Open
wli-chwy opened this issue May 9, 2023 · 9 comments
Open

Support field collapsing + rescore #7484

wli-chwy opened this issue May 9, 2023 · 9 comments
Labels
decision Issues requiring a decision discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request feature New feature or request help wanted Extra attention is needed high hanging fruit Search:Performance Search:Relevance

Comments

@wli-chwy
Copy link

wli-chwy commented May 9, 2023

Is your feature request related to a problem? Please describe.
OpenSearch will error "cannot usecollapse in conjunction with rescore", if I have both collapse and rescore clause in the query. In ecommerce space, my existing query rely on collapse (collapse on the same parent ID) to deduplicate the same variations of a product. Because of the limitation, I cannot use learn to rank plugin which need rescore to re-rank to improve my search relevancy.

Describe the solution you'd like
No error issues when using collapse in conjunction with rescore. And rescore should happen first to ensure the correct ranking of the face out item.

Describe alternatives you've considered
N/A

Additional context
The same request in Elastic Search elastic/elasticsearch#27243

@wli-chwy wli-chwy added enhancement Enhancement or improvement to existing feature or request untriaged labels May 9, 2023
@macohen macohen added Search Search query, autocomplete ...etc and removed untriaged labels May 9, 2023
@macohen
Copy link
Contributor

macohen commented May 9, 2023

@wli-chwy, are you working with ElasticSearch or OpenSearch 1.x or 2.x?

@macohen
Copy link
Contributor

macohen commented May 16, 2023

@wli-chwy as you see from the elastic issue, there's interest in doing something like this, but it's never been executed. We need to investigate the feasibility and hopefully involve folks from the community on this. I searched through the code to see just how deep this goes and found the exception here:

throw new SearchException(shardTarget, "cannot use `collapse` in conjunction with `rescore`");

Are you able to share more about your use case including mapping and queries here?

I also see #6846 that may be related to your issue.

tagging some folks who may be able to offer some deeper insights/correct my thinking here: @nknize, @msfroh. Is there anyone else who could provide some guidance on this?

@macohen macohen added help wanted Extra attention is needed discuss Issues intended to help drive brainstorming and decision making decision Issues requiring a decision high hanging fruit feature New feature or request labels May 16, 2023
@macohen
Copy link
Contributor

macohen commented Jul 16, 2023

@wli-chwy I believe you did try collapsing after reranking, which is typically what users do. How did that work for you? Is there more you can say about that case here to help us figure out what our options could be?

@msfroh
Copy link
Collaborator

msfroh commented Aug 7, 2023

There might be a couple of options we can think about with regards to collapse + rescore:

  1. If all documents with the same collapsing key are routed to the same shard (via custom _routing), then it's possible to rescore + collapse on each shard while ensuring diversity of collapse keys.
  2. You can request more results from each shard while doing per-shard rescoring, then collapse after.
    1. This is traditionally done in the calling application, though you do need to be prepared to return (potentially) a lot more results from the cluster to the application, which may be expensive.
    2. One could implement collapsing in a SearchResponseProcessor as part of a search pipeline running on the coordinator node.

@wli-chwy
Copy link
Author

@macohen it was slow. We need to do the guessing game. We need to guess how many precollapse items could fill up one page. If we guess less, we need to make another call. If we guess more, we waste data transfer.

All in all, it added 40% more latency in application layer. About 100ms.

@msfroh
Copy link
Collaborator

msfroh commented Aug 28, 2023

With #9405, I added a CollapseResponseProcessor that could help with this.

Functionally, it's not that different from the application layer collapsing that @macohen suggested above (and @wli-chwy replied adds latency), but it avoids the round-trip between the application and the cluster, keeping the work on the coordinator node.

I don't know if keeping it in the cluster cuts the latency enough, but it may be worth trying out.

@macohen
Copy link
Contributor

macohen commented Sep 11, 2023

Is there any specific integration needed to use the LTR results in the CollapseResponseProcessor?

cc:@msfroh

@msfroh
Copy link
Collaborator

msfroh commented Sep 11, 2023

Is there any specific integration needed to use the LTR results in the CollapseResponseProcessor?

Nope

@macohen
Copy link
Contributor

macohen commented Sep 13, 2023

@wli-chwy, do you think this CollapseResponseProcessor can help?

@macohen macohen added Search:Performance Search:Relevance and removed Search Search query, autocomplete ...etc labels Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decision Issues requiring a decision discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request feature New feature or request help wanted Extra attention is needed high hanging fruit Search:Performance Search:Relevance
Projects
Status: 🏗 In progress
Development

No branches or pull requests

3 participants