Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SourceFilters Annotation. #2151

Conversation

nathan-t-lr
Copy link

@nathan-t-lr nathan-t-lr commented May 1, 2022

Created a new annotation to be used with @Query in order to specify source filters on an elasticsearch search.

  • You have read the Spring Data contribution guidelines.
  • There is a ticket in the bug tracker for the project in our [issue tracker](https://github.
    com/spring-projects/spring-data-elasticsearch/issues)
    . Add the issue number to the Closes #issue-number line below
  • You use the code formatters provided here and have them applied to your changes. Don’t submit any formatting related changes.
  • You submit test cases (unit or integration tests) that back your changes.
  • You added yourself as author in the headers of the classes you touched. Amend the date range in the Apache license header if needed. For new types, add the license header (copy from another file and set the current year only).

Solves issue found on this thread.
Relates #2062
Relates to #1280
Closes #2146
Closes #2147

@sothawo
Copy link
Collaborator

sothawo commented May 3, 2022

looks good at a first glance, I will have a deeper look, once the build and integration tests work again. It seems spring-data-commons introduced some breaking changes that causes tests to fail

@sothawo
Copy link
Collaborator

sothawo commented May 4, 2022

The issue in spring-data-commons is fixed, I'll do a detailed review of the PR after work - there's an aspect we didn't consider up to now: custom field names or field naming strategies. I'll explain that later.

Copy link
Collaborator

@sothawo sothawo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some comments in the code.

One additional thing: It is possible to define a fieldname in Elasticsearch being different from the Java property name. This can be done by either adding this fieldname to the @Field annotation or by defining a FieldNamingStrategy. When using a SnakeCaseFieldNamingStrategy` for example a property named someIntValue would be mapped to a name of some_int_value.

The user should be able to use the Java property name in her code and Spring Data Elasticsearch should automatically map this to the correct fieldnames in Elasticsearch.
The code in the two methods in ElasticsearchStringQuery would then be

private SourceFilter processSourceFilterParams(SourceFilters sourceFilters, ParameterAccessor parameterAccessor) {

	Assert.isTrue(sourceFilters.includes().length > 0 || sourceFilters.excludes().length > 0,
			"At least one includes or excludes must be provided.");

	ElasticsearchConverter elasticsearchConverter = elasticsearchOperations.getElasticsearchConverter();
	ElasticsearchPersistentEntity<?> persistentEntity = elasticsearchConverter.getMappingContext()
			.getPersistentEntity(queryMethod.getEntityInformation().getJavaType());

	StringQueryUtil stringQueryUtil = new StringQueryUtil(elasticsearchConverter.getConversionService());
	FetchSourceFilterBuilder fetchSourceFilterBuilder = new FetchSourceFilterBuilder();

	if (sourceFilters.includes().length > 0) {
		fetchSourceFilterBuilder
				.withIncludes(mapParameters(sourceFilters.includes(), parameterAccessor, stringQueryUtil, persistentEntity));
	}

	if (sourceFilters.excludes().length > 0) {
		fetchSourceFilterBuilder
				.withExcludes(mapParameters(sourceFilters.excludes(), parameterAccessor, stringQueryUtil, persistentEntity));
	}

	return fetchSourceFilterBuilder.build();
}

private String[] mapParameters(String[] source, ParameterAccessor parameterAccessor, StringQueryUtil stringQueryUtil,
		@Nullable ElasticsearchPersistentEntity<?> persistentEntity) {

	List<String> mapped = new ArrayList<>();

	for (String s : source) {

		if (!s.isBlank()) {
			String fieldName = stringQueryUtil.replacePlaceholders(s, parameterAccessor);
			ElasticsearchPersistentProperty property = persistentEntity != null
					? persistentEntity.getPersistentPropertyWithFieldName(fieldName)
					: null;
			mapped.add(property != null ? property.getFieldName() : fieldName);
		}
	}

	return mapped.toArray(new String[0]);
}	

As for the test: We then should test the case of custom fieldnames as well

@Retention(RetentionPolicy.RUNTIME)
@Target({ ElementType.METHOD, ElementType.ANNOTATION_TYPE })
@Documented
@QueryAnnotation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should this be a @QueryAnnotation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the usage of this annotation, I was just copying the template used in @Query. We can remove if it provides no utility.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need it on the method, but it's already part of @Query.

* Fields to include in the query search hits. (e.g. ["field1", "field2"])
* @return
*/
String includes() default "";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
String includes() default "";
String[] includes() default "";

making this an array seems much clearer and later makes the processing much simpler

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I make it into a string array then i need to interpolate the parameters for each string (or convert the array to a string and run the interpolation). It also leads to simpler checks downstream if I keep it as a string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also if you make this a string array then you can't interpolate the entire variable. (i.e. includes = ?0). You would then only be able to interpolate at the element level which can be inconvenient (i.e. includes=[?0])

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I make it into a string array then i need to interpolate the parameters for each string (or convert the array to a string and run the interpolation). It also leads to simpler checks downstream if I keep it as a string.

sure, interpolating for each string, that's was my code example does. We should keep it as simple as possible for the user, not for our implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also if you make this a string array then you can't interpolate the entire variable. (i.e. includes = ?0). You would then only be able to interpolate at the element level which can be inconvenient (i.e. includes=[?0])

This is not covered in my code, this interpolation would in the mapParameters method lead to a string like "[\"prop1\",\"prop2\"]" which needs to be split and processed before applying field name matching.

Comment on lines +124 to +148
private SourceFilter processSourceFilterParams(SourceFilters sourceFilters, ParameterAccessor parameterAccessor) {
StringQueryUtil stringQueryUtil = new StringQueryUtil(elasticsearchOperations.getElasticsearchConverter().getConversionService());
ObjectMapper objectMapper = new ObjectMapper();
FetchSourceFilterBuilder fetchSourceFilterBuilder = new FetchSourceFilterBuilder();
String errorMessage = null;
String includesInput = stringQueryUtil.replacePlaceholders(sourceFilters.includes(), parameterAccessor);
String excludesInput = stringQueryUtil.replacePlaceholders(sourceFilters.excludes(), parameterAccessor);
try {
if (!includesInput.equals("")) {
String[] includes = objectMapper.readValue(includesInput, String[].class);
fetchSourceFilterBuilder.withIncludes(includes);
}
if (!excludesInput.equals("")) {
String[] excludes = objectMapper.readValue(excludesInput, String[].class);
fetchSourceFilterBuilder.withExcludes(excludes);
}
} catch (JsonProcessingException e) {
errorMessage = e.getMessage();
}

SourceFilter sourceFilter = fetchSourceFilterBuilder.build();
Assert.isTrue(sourceFilter.getIncludes().length > 0 || sourceFilter.getExcludes().length > 0,
"At least one includes or excludes should be provided.\n Found error: " + errorMessage);
return sourceFilter;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when using String arrays in the annotation this can be refactored to this (first draft only, a more complete version is in the review comment):

 	private SourceFilter processSourceFilterParams(SourceFilters sourceFilters, ParameterAccessor parameterAccessor) {

		Assert.isTrue(sourceFilters.includes().length > 0 || sourceFilters.excludes().length > 0,
				"At least one includes or excludes must be provided.");

		StringQueryUtil stringQueryUtil = new StringQueryUtil(
				elasticsearchOperations.getElasticsearchConverter().getConversionService());
		FetchSourceFilterBuilder fetchSourceFilterBuilder = new FetchSourceFilterBuilder();

		if (sourceFilters.includes().length > 0) {
			fetchSourceFilterBuilder
					.withIncludes(mapParameters(sourceFilters.includes(), parameterAccessor, stringQueryUtil));
		}

		if (sourceFilters.excludes().length > 0) {
			fetchSourceFilterBuilder
					.withExcludes(mapParameters(sourceFilters.excludes(), parameterAccessor, stringQueryUtil));
		}

		return fetchSourceFilterBuilder.build();
	}

	private String[] mapParameters(String[] source, ParameterAccessor parameterAccessor,
			StringQueryUtil stringQueryUtil) {
		String[] mapped = new String[source.length];
		for (int i = 0; i < source.length; i++) {
			mapped[i] = stringQueryUtil.replacePlaceholders(source[i], parameterAccessor);
		}
		return mapped;
	}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the simplest solution, which in this case does not involve parameter interpolation at each element of the array. It is also in keeping with the @Query annotation of using string values instead of the underlying data structure (i.e. @Query.value is a string instead of a map)

nathan-t-lr and others added 3 commits May 4, 2022 19:33
…ns/SourceFilters.java

Co-authored-by: Peter-Josef Meisch <pj.meisch@sothawo.com>
…ns/SourceFilters.java

Co-authored-by: Peter-Josef Meisch <pj.meisch@sothawo.com>
…y/query/ElasticsearchQueryMethod.java

Co-authored-by: Peter-Josef Meisch <pj.meisch@sothawo.com>
@nathan-t-lr
Copy link
Author

nathan-t-lr commented May 5, 2022

I made some comments in the code.

One additional thing: It is possible to define a fieldname in Elasticsearch being different from the Java property name. This can be done by either adding this fieldname to the @Field annotation or by defining a FieldNamingStrategy. When using a SnakeCaseFieldNamingStrategy` for example a property named someIntValue would be mapped to a name of some_int_value.

The user should be able to use the Java property name in her code and Spring Data Elasticsearch should automatically map this to the correct fieldnames in Elasticsearch. The code in the two methods in ElasticsearchStringQuery would then be

I disagree.. If they are using the @Field to make an alternative name for the elasticsearch to store it in, then they have that name already in the code base. The proposed change is adding a lot of complexity for no real added value imo.

If this use case comes up in the future then we can implement and not over-engineer right now. However, source filter usage is usually very situation based. Odds are the user will know the exact path of what they're filtering in/out.

@sothawo
Copy link
Collaborator

sothawo commented May 5, 2022

I have no time to look at the code today, will be tomorrow.

As for String array vs. String: For a user who wants to pass in the values as a method parameter and interpolate with "?x" - as it seems to be your use case - just one String would be enough. When I read this for the first time, I didn't even think about parameter interpolation, I see this as

@SourceFilters(includes = {"prop1", "prop2"})

and not as

@SourceFilters(includes = "[\"prop1\",\"prop2\"]")

The second is error-prone and not easy readable. We should make it as easy to use for the user as possible and not take a simple approach. This would be overengineering if this would be code for a simple application that you use. Spring Data Elasticsearch is a library used by many users, and when adding a feature like source filters the chances are high, that other users will have the need to pass in multiple strings without using parameter replacement and we should be able to handle both.

Nothing prevents in the case of arrays to use @SourceFilters(includes = "?1") and pass the names as properties as a collection parameter. This builds in the mapParameters a String which must be split, but so we would need to do with the non-array solution.

As for parsing a string like "[\"prop1\",\"prop2\"]": Don't use an ObjectMapper for this. This is not JSON. This would be a simple function stripping the brackets, splitting at the comma and trimming the quotes.

I disagree.. If they are using the @field to make an alternative name for the elasticsearch to store it in, then they have that name already in the code base. The proposed change is adding a lot of complexity for no real added value imo.

They have the name in the code base, right. But exactly in one place. And if that field name mapping needs to be changed it should be done in exactly that one place and not in multiple places.

In Spring Data, the user works with entities and their properties, these are Java objects. That the underlying store might use a different name mapping should be transparent. We have this mapping in place wherever possible, not only when mapping the entities to documents and back, but as well when building queries with criteria, when returning highlighting results and so on.

Getting the PersistentEntity from the converter and then accessing the PersistentProperty to retrieve the fieldname does not seem overly complex to me. And if you already pass in the Elasticsearch fieldname, there would be no PersistentProperty found and the name that was passed in will be used.

@nathan-t-lr
Copy link
Author

The second is error-prone and not easy readable

I suppose value in @Query will then be a Map instead of a string in the next version then? I understand your concern but I'm trying to be consistent with the practice already in place.

This would be overengineering if this would be code for a simple application that you use

the over-engineering was referring to the snakecase use-case you referred to. Has anyone ever complained or raised an issue about snake case field mappings when using source filter on elasticsearchtemplate or its other variants? If not then why would they complain about it here? You keep talking about entities but you cannot pass an Entity into a source filter, only strings. It is always required to have the field names stored in local variables unless reflection is used, and even in that case it is very specific and unlikely.

Nothing prevents in the case of arrays to use @SourceFilters(includes = "?1") and pass the names as properties as a collection parameter.

When String[] includes = ?1 then sourceFilters.getIncludes() = [""]. How is interpolation supposed to be accomplished in this scenario? If it is not possible then it would be very inconvenient to have a long list of strings to include and/or exclude, and even more so if either of those lists were dynamic.

In summary, the code I have written answers what people have been asking for in many many threads. I already adjusted it as per your request to put it as it's own annotation. I would ask that we continue with this sufficient implementation and revisit it if the need arises.

@sothawo
Copy link
Collaborator

sothawo commented May 5, 2022

I suppose value in @query will then be a Map instead of a string in the next version then?

The value of @Query has been in string and will be a String, this is consistent behaviour across the different Spring Data modules.

You keep talking about entities but you cannot pass an Entity into a source filter,

Of course not. What sense would it make to pass en entity in a source filter? A source filter defines which properties of an entity are included or excluded. Including a whole entity would mean no filtering and excluding an entity would mean to return nothing.

And Spring Data is an abstraction of different data stores based on entities and their properties. The different modules like Spring Data Elasticsearch encapsulate the store specific details from the user. Therefore the user should be able to use name name of the Java object's properties where possible and not be forced to use store specific information.

It is always required to have the field names stored in local variables

Why is this required? It might be a requirement in your use case but it's not a general requirement. And I did not say that you must use the property names. I wrote that the code must be able to handle the use of property names (as for example defined by the @Field annotation or the field naming strategy) as well as using the Elasticsearch names.

When String[] includes = ?1 then sourceFilters.getIncludes() = [""]. How is interpolation supposed to be accomplished in this scenario?

When having a String-only value, a user that wants to include or exclude multiple fields must either use a value of includes ="?x" and use corresponding parameters in the method or write a String like includes="[\"prop1\",\"prop2\']". When using the parameterized variant internally the created string which looks like the second one if putting it directly in the annotation must be parsed - that's what you did with the ObjectMapper.

Using a String array you can either do the very same includes="?x" or includes={"prop1", "prop2"} which is much more readable. In the case of parameter substitution you'll get in mapping part of the code again a String like "[\"field1\",\"field2\']" containing the names from the parameter which needs to be split like in the previous case.

So using a String array can do what using a single String can do while being more flexible and user friendly.

In summary, the code I have written answers what people have been asking for in many many threads.

Threads where? Not in this repo.

When we are adding new features to Spring Data Elasticsearch we are doing this so as many users as possible can use use them. In this case this means that a user should be able to pass in values as either method parameters or as a list of Strings in the annotation.

@nathan-t-lr
Copy link
Author

Of course not. What sense would it make to pass en entity in a source filter? A source filter defines which properties of an entity are included or excluded. Including a whole entity would mean no filtering and excluding an entity would mean to return nothing.

my point exactly... you keep referring to Entities as if the user is going to pass one in which doesn't make sense

Using a String array you can either do the very same includes="?x" or includes={"prop1", "prop2"} which is much more readable

I just demonstrated that this is not the case... The ?x is lost after getting the includes property as it is then replaced by [""]

Threads where?

1 2 3, not to mention the issue this ticket is built on. The oldest ticket in this list is 6 years old, which means this has been an issue for quite sometime and the community still wants it.

I wrote that the code must be able to handle the use of property names (as for example defined by the @Field annotation or the field naming strategy) as well as using the Elasticsearch names.

this is the case already, as you may just hard code the name (which is probably what people do anyway when they run into this issue using Query.addSourceFilter)

I wrote that the code must be able to handle the use of property names (as for example defined by the @Field annotation or the field naming strategy) as well as using the Elasticsearch names

a) you should be able to justify this change with an issue or thread where people bring it up. If you can provide one then there is an actual use case. b) if this is the case, it wouldn't even be implemented here, it would be implemented further downstream like in the Query.addSourceFilters method or in one of the source filter builder classes.


I feel like we are going in circles. Please let's have a call to hash these things out off business hours to resolve all of these concerns.

@sothawo
Copy link
Collaborator

sothawo commented May 5, 2022

Interesting you are citing #1280. When this came up, I talked to Mark, who is the project lead of all Spring Data, and he then said that we should not implement this at all (with the reference to the question regarding Spring Data MongoDB - spring-projects/spring-data-mongodb#3465.). So I have not talked to him about this issue, but I think his answer would still be the same: to not add store-specific arguments and annotations to @Query methods but to use Elasticsearch(Rest)Template in combination with a repository fragment.

So the proper thing to do would probably be to add support for projections (there is a ticket for that #1378)

@nathan-t-lr
Copy link
Author

nathan-t-lr commented May 5, 2022

Interesting you are citing #1280

I never referenced this issue but still yet another example of the community asking for this feature.

to not add store-specific arguments and annotations to @Query methods but to use Elasticsearch(Rest)Template in combination with a repository fragment.

we are not modifying @Query, I made it into its own annotation as requested. And the alternate solution (repo fragment )is way more complex than it needs to be (probably the reason for all the threads asking for the feature in this pr). Given you keep saying the library should be easy, this pull request is still a valid solution.

but I think his answer would still be the same

I assume you are referring to @markfisher ? Why don't we get his opinion too then.

With the addition of #1280 and my unfiled issues, that is 5 issues of the community asking for this feature which I have implemented here.

@sothawo
Copy link
Collaborator

sothawo commented May 6, 2022

#1280 is referenced from the second StackOverflow issue you referred to. And this was closed with a reference to a ticket from Spring Data MongoDB

Mark Paluch commented

Spring Data Repositories aim to be technology-agnostic (CrudRepository) so that the caller doesn't have to deal with store-specific API. Introducing a method that accepts Query violates such a rule.

One could also argue that MongoRepository breaks this rule in the first place as it already defines methods that aren't available on CrudRepository and their claim is true. However, we don't want to follow that path of introducing additional store-specifics as at some time the interface would end as collection query methods.

Instead, we suggest using MongoTemplate directly. Alternatively, implement custom repository fragments that you add to one/multiple repositories in your code if you don't need to adhere to a strict technology separation.

Btw, I has another look into the code, field name mapping is indeed done on a lower level on the values that are set on the sourcefilter in the query, so that's not needed here.

But looking at the issues dealing with this topic that are already closed, it becomes questionable if we should add this at all.

@nathan-t-lr
Copy link
Author

But looking at the issues dealing with this topic that are already closed, it becomes questionable if we should add this at all.

#1280 was closed because of the reasoning in #3645 but it should not have been. The issue with #3645 is they were proposing adding a new method that accepts Query and thus deviating away from CrudRepository interface even more than MongoRepository already does.

In this instance, we are not making such breaking changes, the community just wants a new annotation to enable source filtering. The fact that Highlight is there should be justification enough since it is the same concept of implementing features this way.

This is in addition to the justification of time saved using this filter:
Let's say I want to write a source filter to only include one field, then with the annotation it is trivial. Without the annotation, a developer must use the elasticsearch template (possibly within repo fragment), replace @Query with the equivalent template (which might not be a simple task), and finally debug/write test code for coverage. The former takes mere moments, and the latter a couple hours. We need this change.

@sothawo
Copy link
Collaborator

sothawo commented May 7, 2022

I'll have a talk about this with Mark Paluch (Spring Data project lead) when he's back from his vacations after the next week.

@sothawo
Copy link
Collaborator

sothawo commented May 19, 2022

I had a call with Mark yesterday.

We can add the sourceIncludes and sourceExcludes to the @Query annotation like you had it first. Spring Data Mongo for example has a fields attribute in it's @Query, so there as well is store-specific information available.

The data type for these attributes should be String[], the same as the corresponding values in Elasticsearch are (https://www.elastic.co/guide/en/elasticsearch/reference/8.2/search-fields.html#source-filtering)

These would then be valid uses cases:

@Query(value="{ something with ?0 }", sourceIncludes = "?1")
SearchHits<E> method1(String arg1, String[] includes);

@Query(value="{ something with ?0 }", sourceIncludes = {"field1", "field2"})
SearchHits<E> method1(String arg1);

And although the second variant does not cover your use case, we should integrate it already.

@sothawo
Copy link
Collaborator

sothawo commented May 19, 2022

and I forgot: As 4.4 was released last week, this will target the next release (5.0)

@nathan-t-lr
Copy link
Author

awesome, should I resume work on the original pr and close this one out, or make a new pr so that it's cleaner commit history?

@sothawo
Copy link
Collaborator

sothawo commented May 20, 2022

do it in this one to keep this discussion. Rebase on main and after your changes squash your commits into one (although we'll do the squash anyway when merging)

sothawo added a commit to sothawo/spring-data-elasticsearch that referenced this pull request Aug 6, 2022
@sothawo sothawo closed this in #2254 Aug 6, 2022
@sothawo sothawo closed this in cf135f4 Aug 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-triage An issue we've not yet triaged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add source filter configuration to @Query repository methods
3 participants