New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISPN-15390 Allow to project the query score #11679
Conversation
bd8f6bc
to
f33c09c
Compare
@yrodiere @marko-bekhta I don't if you want to have a look at it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find aa lot of Hibernate Search-related things in this PR so cannot add much, but looks nice overall 😃
SearchSession mappingSession = searchMapping.getMappingSession(); | ||
SearchScope<Game> scope = mappingSession.scope(Game.class); | ||
|
||
SearchProjection<List<?>> projection = scope.projection().composite(scope.projection().score(), scope.projection().entity()).toProjection(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to share that not so long ago, Hibernate Search introduced a way to bind this type of projections like:
@ProjectionConstructor
public record MyProjection(@ScoreProjection float score, @EntityProjection Game entity) {
}
// and then
SearchProjection<MyProjection> projection = scope.projection().composite().as( MyProjection.class ).toProjection();
Not that it is needed here, but maybe you'll find it useful in some future tests 😃
See https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#search-dsl-projection-entity-mapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @marko-bekhta nice API. Thanks for sharing it!
@@ -72,15 +71,12 @@ public LuceneSearchQuery<List<Object>> keyAndEntity() { | |||
SearchProjectionFactory<EntityReference, ?> projectionFactory = scope.projection(); | |||
SearchProjection<?>[] searchProjections = new SearchProjection<?>[]{ | |||
projectionFactory.entityReference().toProjection(), | |||
projectionFactory.entity().toProjection() | |||
projectionFactory.entity().toProjection(), | |||
projectionFactory.score().toProjection() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW computing the score does incur additional cost in some cases, in particular when not sorting by score: https://github.com/hibernate/hibernate-search/blob/578ca56deb06d72da49a2b50d9cb0da437c192ed/backend/lucene/src/main/java/org/hibernate/search/backend/lucene/search/extraction/impl/ExtractionRequirements.java#L81-L89, https://github.com/hibernate/hibernate-search/blob/7b6b15c04f0ce6ec98e8d4983605b97f36d34fd5/backend/lucene/src/main/java/org/hibernate/search/backend/lucene/search/extraction/impl/LuceneCollectors.java#L220-L231
Though I couldn't say how high the cost is exactly. "It depends", I guess :)
If it's easy, you might want to only do this when projecting on the score is actually required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yrodiere I wrongly thought that the score came for free, so I introduced it in all cases. Thanks for sharing it, I'm going to change the code to add it only if required by the current projections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yrodiere Fixed in the last commit. Thanks for catching it.
@fax4ever I'm going to build this PR and test it in langchain SPI |
query = cache.query("select g.name, score(g), g from org.infinispan.query.model.Game g where g.description : 'description'~2"); | ||
games = query.list(); | ||
assertThat(games).extracting(objects -> objects[0]).containsExactly("1", "2", "3", "4"); | ||
assertThat(games).extracting(objects -> objects[1]).hasOnlyElementsOfType(Float.class).isNotNull(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in langchain, the score is a double
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only 4 test pass now
java.lang.AssertionError:
Expecting actual:
0.7332834601402283
to be close to:
0.9090677379700546
by less than 1% but difference was 19.336763421211277%.
(a difference of exactly 1% being considered valid)
at dev.langchain4j.store.embedding.EmbeddingStoreWithoutMetadataIT.should_return_correct_score(EmbeddingStoreWithoutMetadataIT.java:237)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
[ERROR] dev.langchain4j.store.embedding.infinispan.InfinispanEmbeddingStoreIT.should_find_with_min_score -- Time elapsed: 0.155 s <<< FAILURE!
java.lang.AssertionError:
Expecting actual:
0.7332834601402283
to be close to:
0.9090677379700546
by less than 1% but difference was 19.336763421211277%.
(a difference of exactly 1% being considered valid)
at dev.langchain4j.store.embedding.EmbeddingStoreWithoutMetadataIT.should_find_with_min_score(EmbeddingStoreWithoutMetadataIT.java:188)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
[ERROR] dev.langchain4j.store.embedding.infinispan.InfinispanEmbeddingStoreIT.should_add_multiple_embeddings_with_segments -- Time elapsed: 0.152 s <<< FAILURE!
java.lang.AssertionError:
Expecting actual:
0.7332834601402283
to be close to:
0.9090677379700546
by less than 1% but difference was 19.336763421211277%.
(a difference of exactly 1% being considered valid)
at dev.langchain4j.store.embedding.EmbeddingStoreWithoutMetadataIT.should_add_multiple_embeddings_with_segments(EmbeddingStoreWithoutMetadataIT.java:160)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
[ERROR] dev.langchain4j.store.embedding.infinispan.InfinispanEmbeddingStoreIT.should_add_multiple_embeddings -- Time elapsed: 0.145 s <<< FAILURE!
java.lang.AssertionError:
Expecting actual:
0.7332834601402283
to be close to:
0.9090677379700546
by less than 1% but difference was 19.336763421211277%.
(a difference of exactly 1% being considered valid)
at dev.langchain4j.store.embedding.EmbeddingStoreWithoutMetadataIT.should_add_multiple_embeddings(EmbeddingStoreWithoutMetadataIT.java:121)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
[INFO]
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] InfinispanEmbeddingStoreIT>EmbeddingStoreWithoutMetadataIT.should_add_multiple_embeddings:121
Expecting actual:
0.7332834601402283
to be close to:
0.9090677379700546
by less than 1% but difference was 19.336763421211277%.
(a difference of exactly 1% being considered valid)
[ERROR] InfinispanEmbeddingStoreIT>EmbeddingStoreWithoutMetadataIT.should_add_multiple_embeddings_with_segments:160
Expecting actual:
0.7332834601402283
to be close to:
0.9090677379700546
by less than 1% but difference was 19.336763421211277%.
(a difference of exactly 1% being considered valid)
[ERROR] InfinispanEmbeddingStoreIT>EmbeddingStoreWithoutMetadataIT.should_find_with_min_score:188
Expecting actual:
0.7332834601402283
to be close to:
0.9090677379700546
by less than 1% but difference was 19.336763421211277%.
(a difference of exactly 1% being considered valid)
[ERROR] InfinispanEmbeddingStoreIT>EmbeddingStoreWithoutMetadataIT.should_return_correct_score:237
Expecting actual:
0.7332834601402283
to be close to:
0.9090677379700546
by less than 1% but difference was 19.336763421211277%.
(a difference of exactly 1% being considered valid)
[ERROR] Errors:
[ERROR] InfinispanEmbeddingStoreIT>EmbeddingStoreWithoutMetadataIT.beforeEach:24->clearStore:45 » Transport io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/[0:0:0:0:0:0:0:1]:11222
[INFO]
[ERROR] Tests run: 8, Failures: 4, Errors: 1, Skipped: 0
[INFO]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hibernate Search returns a float, as Lucene does. I don't know if it makes sense to cast it to double from our side...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it makes sense to cast it to double from our side...
I'm guessing Katia can do that in the Langchain4j SPI implementation?
Anyway, the problem here seems to be deeper: the test expects a certain value for the score, and that's not what it gets. No idea why... ? How was that expected score value determined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yrodiere IIUC we only need to change the distance function to cosine and after that all tests pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test don't pass with the current implem
Are you using |
@marko-bekhta @yrodiere thanks for the review I'm going to apply the changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It worked, all test pass in langchain integration
@karesti thanks for checking it directly with Langchain! |
Optimization / not strictly functional: This in the case of the score is not present neither in projection nor in sort
Thank you @tristantarrant |
https://issues.redhat.com/browse/ISPN-15390
Draft still to test/address: broadcast queries, remote queries.done!