Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to using the multi-termvectors API #7014

Merged
merged 1 commit into from Aug 21, 2014

Conversation

Projects
None yet
4 participants
@alexksikes
Copy link
Contributor

alexksikes commented Jul 24, 2014

The term vector API can now generate term vectors on the fly, if the terms are
not already stored in the index. This commit exploits this new functionality
for the MLT query. Now the terms are directly retrieved using multi-
termvectors API, instead of retrieving the texts using the multi-get API. The
terms are then directly passed to Lucene MLT.

@jpountz

View changes

src/test/java/org/elasticsearch/index/query/SimpleIndexQueryParserTests.java Outdated
private static Fields generateFields(String[] fieldNames, String text) throws IOException {
MemoryIndex index = new MemoryIndex();
for (String fieldName : fieldNames) {
index.addField(fieldName, text, new WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_4_9));

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 30, 2014

Contributor

Maybe use Lucene.VERSION instead of hardcoding 4.9?

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Jul 30, 2014

Left one minor comment but other than that it looks good to me!

@jpountz jpountz removed the review label Jul 30, 2014

@alexksikes

This comment has been minimized.

Copy link
Contributor Author

alexksikes commented Jul 30, 2014

Thank you. @s1monw would mind having a look at it as well?

@alexksikes alexksikes added the review label Jul 30, 2014

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Aug 21, 2014

LGTM

@jpountz jpountz removed the review label Aug 21, 2014

@javanna

View changes

src/main/java/org/elasticsearch/action/termvector/MultiTermVectorsRequest.java Outdated
@@ -52,6 +53,14 @@ public MultiTermVectorsRequest add(String index, @Nullable String type, String i
return this;
}

public MultiTermVectorsRequest add(MultiGetRequest.Item item) {
TermVectorRequest request = new TermVectorRequest(item.index(), item.type(), item.id())

This comment has been minimized.

Copy link
@javanna

javanna Aug 21, 2014

Member

This internal request creation loses the headers and the context of the original request that generated it (the multi_get one), can we make sure we pass in the original request, add a new constructor to TermVectorRequest and make sure we call super there, which will finally copy headers and context from the original request?

@alexksikes alexksikes force-pushed the alexksikes:feature/mlt-use-termvectors branch Aug 21, 2014

public List<LikeText> fetch(List<MultiGetRequest.Item> items) throws IOException {
MultiGetRequest request = new MultiGetRequest();
public Fields[] fetch(List<MultiGetRequest.Item> items) throws IOException {
MultiTermVectorsRequest request = new MultiTermVectorsRequest();

This comment has been minimized.

Copy link
@javanna

javanna Aug 21, 2014

Member

sorry for the misleading comment, here is where we lose headers and request context, the items don't need them, only the main request. This one should be created passing in the original request, but that would be MoreLikeThisRequest not multi_get. Actually I'm confused on why we still have multi_get items but we don't use multi_get internally anymore

More Like This Query: Switch to using the multi-termvectors API
The term vector API can now generate term vectors on the fly, if the terms are
not already stored in the index. This commit exploits this new functionality
for the MLT query. Now the terms are directly retrieved using multi-
termvectors API, instead of generating them from the texts retrieved using the
multi-get API.

Closes #7014

@alexksikes alexksikes force-pushed the alexksikes:feature/mlt-use-termvectors branch to f1a6b4e Aug 21, 2014

@alexksikes alexksikes merged commit f1a6b4e into elastic:master Aug 21, 2014

alexksikes added a commit that referenced this pull request Aug 21, 2014

More Like This Query: Switch to using the multi-termvectors API
The term vector API can now generate term vectors on the fly, if the terms are
not already stored in the index. This commit exploits this new functionality
for the MLT query. Now the terms are directly retrieved using multi-
termvectors API, instead of generating them from the texts retrieved using the
multi-get API.

Closes #7014

@alexksikes alexksikes deleted the alexksikes:feature/mlt-use-termvectors branch Aug 27, 2014

alexksikes added a commit that referenced this pull request Sep 8, 2014

More Like This Query: Switch to using the multi-termvectors API
The term vector API can now generate term vectors on the fly, if the terms are
not already stored in the index. This commit exploits this new functionality
for the MLT query. Now the terms are directly retrieved using multi-
termvectors API, instead of generating them from the texts retrieved using the
multi-get API.

Closes #7014

@clintongormley clintongormley changed the title More Like This Query: Switch to using the multi-termvectors API Switch to using the multi-termvectors API Jun 7, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.