Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieve Offsets of inner_hits #2521

Closed
JKatzwinkel opened this issue Apr 6, 2023 · 2 comments · Fixed by #2522
Closed

Retrieve Offsets of inner_hits #2521

JKatzwinkel opened this issue Apr 6, 2023 · 2 comments · Fixed by #2522
Labels
type: bug A general bug

Comments

@JKatzwinkel
Copy link
Contributor

JKatzwinkel commented Apr 6, 2023

  • Elasticsearch: 8.6.2
  • spring-data-elasticsearch: 5.1.0-M3

Hi, thanks again for the excellent work, I enjoy it very much!

I'm having an issue while trying to obtain the correct offset values from inner_hits of a nested query via SearchHit#getNestedMetaData. Even though Elasticsearch responds with the expected offsets when I run my queries directly against it, the #getOffset() return values from inner hits via #getInnerHits() are always the same, regardless of how many inner hits are returned within an individual top-level hit.

Allow me to illustrate this based on your integration test(s). They run a (nested) nested query and test the offsets of the inner hits, of which there are 1 on each nesting level:

		Inhabitant john = new Inhabitant("John", "Smith");
		Inhabitant carla = new Inhabitant("Carla", "Miller");
		House cornerHouse = new House("Round the corner", "7", Arrays.asList(john, carla));
		City metropole = new City("Metropole", Arrays.asList(cornerHouse));

		Inhabitant jack = new Inhabitant("Jack", "Wayne");
		Inhabitant emmy = new Inhabitant("Emmy", "Stone");
		House mainStreet = new House("Main Street", "42", Arrays.asList(jack, emmy));
		City village = new City("Village", Arrays.asList(mainStreet));

		operations.save(Arrays.asList(metropole, village));
	}

	@Test
	@Order(java.lang.Integer.MAX_VALUE)
	void cleanup() {
		operations.indexOps(IndexCoordinates.of(indexNameProvider.getPrefix() + "*")).delete();
	}

	@Test
	void shouldReturnInnerHits() {

		Query query = buildQueryForInnerHits("inner_hit_name", "hou-ses.in-habi-tants", "hou-ses.in-habi-tants.first-name",
				"Carla");

		SoftAssertions softly = new SoftAssertions();
		SearchHits<City> searchHits = operations.search(query, City.class);

		softly.assertThat(searchHits.getTotalHits()).isEqualTo(1);

		SearchHit<City> searchHit = searchHits.getSearchHit(0);
		softly.assertThat(searchHit.getInnerHits()).hasSize(1);

		SearchHits<?> innerHits = searchHit.getInnerHits("inner_hit_name");
		softly.assertThat(innerHits).hasSize(1);

		SearchHit<?> innerHit = innerHits.getSearchHit(0);
		Object content = innerHit.getContent();
		assertThat(content).isInstanceOf(Inhabitant.class);
		Inhabitant inhabitant = (Inhabitant) content;
		softly.assertThat(inhabitant.getFirstName()).isEqualTo("Carla");
		softly.assertThat(inhabitant.getLastName()).isEqualTo("Miller");

		NestedMetaData nestedMetaData = innerHit.getNestedMetaData();
		softly.assertThat(nestedMetaData.getField()).isEqualTo("houses");
		softly.assertThat(nestedMetaData.getOffset()).isEqualTo(0); // <-- this seems right
		softly.assertThat(nestedMetaData.getChild().getField()).isEqualTo("inhabitants");
		softly.assertThat(nestedMetaData.getChild().getOffset()).isEqualTo(1);  // this makes sense too

		softly.assertAll();

However ☝, imagine they would look like this, now finding 2 inner hits that (as far as I understand) should have different offsets:

		Inhabitant john = new Inhabitant("John", "Smith");
		Inhabitant carla1 = new Inhabitant("Carla", "Miller");
		Inhabitant carla2 = new Inhabitant("Carla", "Nguyen");  // <-- second match/inner hit!!
		House cornerHouse = new House("Round the corner", "7", Arrays.asList(john, carla1, carla2));
		City metropole = new City("Metropole", Arrays.asList(cornerHouse));

		Inhabitant jack = new Inhabitant("Jack", "Wayne");
		Inhabitant emmy = new Inhabitant("Emmy", "Stone");
		House mainStreet = new House("Main Street", "42", Arrays.asList(jack, emmy));
		City village = new City("Village", Arrays.asList(mainStreet));

		operations.save(Arrays.asList(metropole, village));
	}

	@Test
	@Order(java.lang.Integer.MAX_VALUE)
	void cleanup() {
		operations.indexOps(IndexCoordinates.of(indexNameProvider.getPrefix() + "*")).delete();
	}

	private static void testInnerHit(
		SoftAssertions softly, SearchHit<?> innerHit,
		String firstName, String lastName,
		int nestedOffsetLvl1, int nestedOffsetLvl2
	) {
		Object content = innerHit.getContent();
		assertThat(content).isInstanceOf(Inhabitant.class);
		Inhabitant inhabitant = (Inhabitant) content;
		softly.assertThat(inhabitant.getFirstName()).isEqualTo(firstName);
		softly.assertThat(inhabitant.getLastName()).isEqualTo(lastName);

		NestedMetaData nestedMetaData = innerHit.getNestedMetaData();
		softly.assertThat(nestedMetaData.getField()).isEqualTo("houses");
		softly.assertThat(nestedMetaData.getOffset()).isEqualTo(nestedOffsetLvl1);
		softly.assertThat(nestedMetaData.getChild().getField()).isEqualTo("inhabitants");
		softly.assertThat(nestedMetaData.getChild().getOffset()).isEqualTo(nestedOffsetLvl2);
	}

	@Test
	void shouldReturnInnerHits() {

		Query query = buildQueryForInnerHits("inner_hit_name", "hou-ses.in-habi-tants", "hou-ses.in-habi-tants.first-name",
				"Carla");

		SoftAssertions softly = new SoftAssertions();
		SearchHits<City> searchHits = operations.search(query, City.class);

		softly.assertThat(searchHits.getTotalHits()).isEqualTo(1);

		SearchHit<City> searchHit = searchHits.getSearchHit(0);
		softly.assertThat(searchHit.getInnerHits()).hasSize(1);

		SearchHits<?> innerHits = searchHit.getInnerHits("inner_hit_name");
		softly.assertThat(innerHits).hasSize(2);

		testInnerHit(softly, innerHits.getSearchHit(0), "Carla", "Miller", 0, 1);
		testInnerHit(softly, innerHits.getSearchHit(1), "Carla", "Nguyen", 0, 2); // <-- the important bit!

		softly.assertAll();
	}

❌ Now the tests (./mvnw clean verify) fail because all of the inner hits have the same offset (1). This is despite of Elasticsearch itself returning the offsets I expect (1 and 2 in this case). I might be doing something wrong, but on the other hand the implementation responsible for mapping the inner hit offsets does contain something that confuses me (first and last line):

			NestedMetaData nestedMetaData = searchHits.getSearchHit(0).getContent().getNestedMetaData();
			ElasticsearchPersistentEntityWithNestedMetaData persistentEntityWithNestedMetaData = getPersistentEntity(
					mappingContext.getPersistentEntity(type), nestedMetaData);

			if (persistentEntityWithNestedMetaData.entity != null) {
				List<SearchHit<Object>> convertedSearchHits = new ArrayList<>();
				Class<?> targetType = persistentEntityWithNestedMetaData.entity.getType();

				// convert the list of SearchHit<SearchDocument> to list of SearchHit<Object>
				searchHits.getSearchHits().forEach(searchHit -> {
					SearchDocument searchDocument = searchHit.getContent();

					Object targetObject = converter.read(targetType, searchDocument);
					convertedSearchHits.add(new SearchHit<>(searchDocument.getIndex(), //
							searchDocument.getId(), //
							searchDocument.getRouting(), //
							searchDocument.getScore(), //
							searchDocument.getSortValues(), //
							searchDocument.getHighlightFields(), //
							searchHit.getInnerHits(), //
							persistentEntityWithNestedMetaData.nestedMetaData, // <-- this

So for each inner hit, the same nested matadata are being used (persistendEntityWithNestedMetaData.nestedMetaData), even though each inner inner hit should have its own metadata (_nested object with field and offset inside) in the Elasticsearch response.
There is probably a good reason for it to be implemented like this, but this seems to cause the unexpected behaviour I described above. I actually managed to fix this very easily by mapping the actual metadata of the actual inner hits respectively:

					convertedSearchHits.add(new SearchHit<>(searchDocument.getIndex(), //
							searchDocument.getId(), //
							searchDocument.getRouting(), //
							searchDocument.getScore(), //
							searchDocument.getSortValues(), //
							searchDocument.getHighlightFields(), //
							searchHit.getInnerHits(), //
							getPersistentEntity(
								mappingContext.getPersistentEntity(type),
								searchHit.getContent().getNestedMetaData()  // <-- 👀
							).nestedMetaData,
							searchHit.getExplanation(), //
							searchHit.getMatchedQueries(), //
							targetObject));
				});

✅ After this change, the tests pass again.

Am I right to assume that the way in which I amended your integration test(s) is how you are supposed to retrieve the inner hits offsets, or am I doing something wrong?

Thanks again and happy Easter!

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Apr 6, 2023
@sothawo
Copy link
Collaborator

sothawo commented Apr 7, 2023

Thanks for finding, analyzing and providing a fix for this. Do you want to submit a Pull Request, otherwise I could do it as well, I copied and tested that code locally.

As for why this is so? Don't know. I probably did not seem to consider the offset when implementing it and so just missed it.

@sothawo sothawo added type: bug A general bug and removed status: waiting-for-triage An issue we've not yet triaged labels Apr 7, 2023
@JKatzwinkel
Copy link
Contributor Author

thanks for the feedback and yes, I'm happy to prepare a pr I hope to get to it tomorrow!

JKatzwinkel added a commit to JKatzwinkel/spring-data-elasticsearch that referenced this issue Apr 8, 2023
sothawo pushed a commit that referenced this issue Apr 10, 2023
Original Pull Request #2522
Closes #2521
@sothawo sothawo added this to the 5.1 RC1 (2023.0.0) milestone Apr 10, 2023
sothawo pushed a commit that referenced this issue Apr 10, 2023
Original Pull Request #2522
Closes #2521

(cherry picked from commit 1f7fa77)
sothawo pushed a commit that referenced this issue Apr 10, 2023
Original Pull Request #2522
Closes #2521

(cherry picked from commit 1f7fa77)
(cherry picked from commit e1da43c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A general bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants