Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove XMoreLikeThis #10626

Open
jpountz opened this issue Apr 16, 2015 · 8 comments
Open

Remove XMoreLikeThis #10626

jpountz opened this issue Apr 16, 2015 · 8 comments
Labels
help wanted adoptme high hanging fruit >non-issue :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch >tech debt

Comments

@jpountz
Copy link
Contributor

jpountz commented Apr 16, 2015

This class is a fork of Lucene's MoreLikeThis but already diverged quite a lot. If we want to be able to maintain this functionality in the long term, we need to merge improvements back to Lucene, otherwise at some point changes coming from Lucene upgrades will be impossible to reconciliate with changes of this fork.

@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018
@talevy talevy added :Search/Search Search-related issues that do not fall into other categories and removed :Search/Search Search-related issues that do not fall into other categories labels Mar 16, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@cbuescher
Copy link
Member

I did a first pass comparing the code of the Lucene MoreLikeThis and our XMoreLikeThis class an their differences. Apart from a few methods that the current Lucene class (as of 8.1.0 snapshot) has and that we don't use, the major difference is that we keep a set of "skipTerms" and have a few additional "like(...)" methods that we call in our version of the MoreLikeThisQuery. I think a first step for removal would be to try to move this code out of XMoreLikeThis to make the two classes more similar.

@rjernst rjernst added the Team:Search Meta label for search team label May 4, 2020
@cbuescher
Copy link
Member

Had another closer look after a long time here. There are two major additions we have on the ES side in XMoreLikeThis:

  • we added the ability to have skip-terms per Field. These are used similar to stopwords that we already have in the Lucene version, but they work per Field. They are used for the 'unlike' option of the query, e.g. when documents are specified as negative examples.
  • we use a variant of the like(...) method that takes Fields as arguments, which is also used when using documents from an index as a "like" example
  • there a a few minor differences that can be attributed to changes on the Lucene side since the fork, but non of which should affect any functionality as far as I can see

The diff isn't big, I'm going to open a wip PR where I use a modified copy of the current Lucene impl on our side just to see if this passes tests and to show the diff.

I see two options here:

  • we can suggest adding the additional functionality on the Lucene side. Since these are only additions, they shouldn't interfere with any existing usages of the current Lucene implementation
  • we could ask for opening up the Lucene class (make it non-final, expose some currently private logic to create the actual bool queries for subclasses and add hooks for the skip-terms we have)

@cbuescher
Copy link
Member

I opened #71117 to better show the diff.

@cbuescher cbuescher removed their assignment Sep 23, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@Kiriakos1998
Copy link
Contributor

Hello @javanna @cbuescher @jpountz , is this issue still open. I would like to contribute.

@javanna
Copy link
Member

javanna commented Jun 28, 2023

otherwise at some point changes coming from Lucene upgrades will be impossible to reconciliate with changes of this fork.

Given this issue was opened 8 years ago, I am assuming we reached a place of no return here.

Thanks @Kiriakos1998 for your offer. This would involve contributing changes back to Lucene, or breaking the current more like this functionality in Elasticsearch. That is why it's marked high hanging fruit. I would suggest finding another issue, if you don't mind.

@javanna javanna added :Search Relevance/Search Catch all for Search Relevance and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted adoptme high hanging fruit >non-issue :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch >tech debt
Projects
None yet
Development

No branches or pull requests