Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISPN-15390 Allow to project the query score #11679

Merged
merged 6 commits into from Jan 18, 2024

Conversation

fax4ever
Copy link
Contributor

@fax4ever fax4ever commented Jan 15, 2024

https://issues.redhat.com/browse/ISPN-15390

Draft still to test/address: broadcast queries, remote queries. done!

@fax4ever fax4ever force-pushed the ISPN-15390 branch 2 times, most recently from bd8f6bc to f33c09c Compare January 16, 2024 09:06
@fax4ever fax4ever marked this pull request as ready for review January 16, 2024 09:06
@fax4ever
Copy link
Contributor Author

@yrodiere @marko-bekhta I don't if you want to have a look at it

Copy link

@marko-bekhta marko-bekhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find aa lot of Hibernate Search-related things in this PR so cannot add much, but looks nice overall 😃

SearchSession mappingSession = searchMapping.getMappingSession();
SearchScope<Game> scope = mappingSession.scope(Game.class);

SearchProjection<List<?>> projection = scope.projection().composite(scope.projection().score(), scope.projection().entity()).toProjection();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to share that not so long ago, Hibernate Search introduced a way to bind this type of projections like:

@ProjectionConstructor
public record MyProjection(@ScoreProjection float score, @EntityProjection Game entity) {
}
// and then
SearchProjection<MyProjection> projection = scope.projection().composite().as( MyProjection.class ).toProjection();

Not that it is needed here, but maybe you'll find it useful in some future tests 😃
See https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#search-dsl-projection-entity-mapping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @marko-bekhta nice API. Thanks for sharing it!

@@ -72,15 +71,12 @@ public LuceneSearchQuery<List<Object>> keyAndEntity() {
SearchProjectionFactory<EntityReference, ?> projectionFactory = scope.projection();
SearchProjection<?>[] searchProjections = new SearchProjection<?>[]{
projectionFactory.entityReference().toProjection(),
projectionFactory.entity().toProjection()
projectionFactory.entity().toProjection(),
projectionFactory.score().toProjection()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW computing the score does incur additional cost in some cases, in particular when not sorting by score: https://github.com/hibernate/hibernate-search/blob/578ca56deb06d72da49a2b50d9cb0da437c192ed/backend/lucene/src/main/java/org/hibernate/search/backend/lucene/search/extraction/impl/ExtractionRequirements.java#L81-L89, https://github.com/hibernate/hibernate-search/blob/7b6b15c04f0ce6ec98e8d4983605b97f36d34fd5/backend/lucene/src/main/java/org/hibernate/search/backend/lucene/search/extraction/impl/LuceneCollectors.java#L220-L231

Though I couldn't say how high the cost is exactly. "It depends", I guess :)

If it's easy, you might want to only do this when projecting on the score is actually required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yrodiere I wrongly thought that the score came for free, so I introduced it in all cases. Thanks for sharing it, I'm going to change the code to add it only if required by the current projections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yrodiere Fixed in the last commit. Thanks for catching it.

@karesti
Copy link
Contributor

karesti commented Jan 16, 2024

@fax4ever I'm going to build this PR and test it in langchain SPI

query = cache.query("select g.name, score(g), g from org.infinispan.query.model.Game g where g.description : 'description'~2");
games = query.list();
assertThat(games).extracting(objects -> objects[0]).containsExactly("1", "2", "3", "4");
assertThat(games).extracting(objects -> objects[1]).hasOnlyElementsOfType(Float.class).isNotNull();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in langchain, the score is a double

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only 4 test pass now

java.lang.AssertionError: 

Expecting actual:
  0.7332834601402283
to be close to:
  0.9090677379700546
by less than 1% but difference was 19.336763421211277%.
(a difference of exactly 1% being considered valid)
	at dev.langchain4j.store.embedding.EmbeddingStoreWithoutMetadataIT.should_return_correct_score(EmbeddingStoreWithoutMetadataIT.java:237)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

[ERROR] dev.langchain4j.store.embedding.infinispan.InfinispanEmbeddingStoreIT.should_find_with_min_score -- Time elapsed: 0.155 s <<< FAILURE!
java.lang.AssertionError: 

Expecting actual:
  0.7332834601402283
to be close to:
  0.9090677379700546
by less than 1% but difference was 19.336763421211277%.
(a difference of exactly 1% being considered valid)
	at dev.langchain4j.store.embedding.EmbeddingStoreWithoutMetadataIT.should_find_with_min_score(EmbeddingStoreWithoutMetadataIT.java:188)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

[ERROR] dev.langchain4j.store.embedding.infinispan.InfinispanEmbeddingStoreIT.should_add_multiple_embeddings_with_segments -- Time elapsed: 0.152 s <<< FAILURE!
java.lang.AssertionError: 

Expecting actual:
  0.7332834601402283
to be close to:
  0.9090677379700546
by less than 1% but difference was 19.336763421211277%.
(a difference of exactly 1% being considered valid)
	at dev.langchain4j.store.embedding.EmbeddingStoreWithoutMetadataIT.should_add_multiple_embeddings_with_segments(EmbeddingStoreWithoutMetadataIT.java:160)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

[ERROR] dev.langchain4j.store.embedding.infinispan.InfinispanEmbeddingStoreIT.should_add_multiple_embeddings -- Time elapsed: 0.145 s <<< FAILURE!
java.lang.AssertionError: 

Expecting actual:
  0.7332834601402283
to be close to:
  0.9090677379700546
by less than 1% but difference was 19.336763421211277%.
(a difference of exactly 1% being considered valid)
	at dev.langchain4j.store.embedding.EmbeddingStoreWithoutMetadataIT.should_add_multiple_embeddings(EmbeddingStoreWithoutMetadataIT.java:121)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   InfinispanEmbeddingStoreIT>EmbeddingStoreWithoutMetadataIT.should_add_multiple_embeddings:121 
Expecting actual:
  0.7332834601402283
to be close to:
  0.9090677379700546
by less than 1% but difference was 19.336763421211277%.
(a difference of exactly 1% being considered valid)
[ERROR]   InfinispanEmbeddingStoreIT>EmbeddingStoreWithoutMetadataIT.should_add_multiple_embeddings_with_segments:160 
Expecting actual:
  0.7332834601402283
to be close to:
  0.9090677379700546
by less than 1% but difference was 19.336763421211277%.
(a difference of exactly 1% being considered valid)
[ERROR]   InfinispanEmbeddingStoreIT>EmbeddingStoreWithoutMetadataIT.should_find_with_min_score:188 
Expecting actual:
  0.7332834601402283
to be close to:
  0.9090677379700546
by less than 1% but difference was 19.336763421211277%.
(a difference of exactly 1% being considered valid)
[ERROR]   InfinispanEmbeddingStoreIT>EmbeddingStoreWithoutMetadataIT.should_return_correct_score:237 
Expecting actual:
  0.7332834601402283
to be close to:
  0.9090677379700546
by less than 1% but difference was 19.336763421211277%.
(a difference of exactly 1% being considered valid)
[ERROR] Errors: 
[ERROR]   InfinispanEmbeddingStoreIT>EmbeddingStoreWithoutMetadataIT.beforeEach:24->clearStore:45 » Transport io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/[0:0:0:0:0:0:0:1]:11222
[INFO] 
[ERROR] Tests run: 8, Failures: 4, Errors: 1, Skipped: 0
[INFO] 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hibernate Search returns a float, as Lucene does. I don't know if it makes sense to cast it to double from our side...

Copy link
Contributor

@yrodiere yrodiere Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it makes sense to cast it to double from our side...

I'm guessing Katia can do that in the Langchain4j SPI implementation?

Anyway, the problem here seems to be deeper: the test expects a certain value for the score, and that's not what it gets. No idea why... ? How was that expected score value determined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yrodiere IIUC we only need to change the distance function to cosine and after that all tests pass

Copy link
Contributor

@karesti karesti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test don't pass with the current implem

@fax4ever
Copy link
Contributor Author

test don't pass with the current implem

Are you using Query<Object[]>? Let's check it on Zulip chat

@fax4ever
Copy link
Contributor Author

@marko-bekhta @yrodiere thanks for the review I'm going to apply the changes

Copy link
Contributor

@karesti karesti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked, all test pass in langchain integration

@fax4ever
Copy link
Contributor Author

@karesti thanks for checking it directly with Langchain!

@tristantarrant tristantarrant merged commit bb07f62 into infinispan:main Jan 18, 2024
7 of 8 checks passed
@fax4ever
Copy link
Contributor Author

Thank you @tristantarrant

@fax4ever fax4ever deleted the ISPN-15390 branch January 18, 2024 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants