New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement EZP-23465: Elasticsearch: refactor FieldMap implementation for caching and multiple fields support #1046
Conversation
3693bfb
to
e35aebf
Compare
This needs refactoring for FullText visitor implementation: #1051. |
e35aebf
to
a3413ad
Compare
Updated Now ready for review again. |
+1 |
a3413ad
to
7db30ff
Compare
Rebased for ES 1.4.0 final, still one +1 missing... review ping @bdunogier, @lolautruche |
7db30ff
to
3c62cf4
Compare
3c62cf4
to
b613c69
Compare
Rebased on master, calling for one missing review. |
As I was afraid, I think there are too many changes in this PR. But I've looked around, and I don't think it should be blocked. So +1, but I personally think that the code looks a bit complex sometimes, and believe that with better granularity of changes, we'd have been able to make it more simple. But good job nonetheless :-) |
Hmmm, sorry for the comments that show without the context. I've reviewed your code commit by commit... :-) |
…ia and sort clauses
…eld map extraction
b613c69
to
2c016f2
Compare
Ok no worries, I'll keep the comments there then. |
On second thought, better to keep the discussion here as the links are now lost with rebase. See my comment directly above. Initially it implemented caching, but it worked only because LegacySolr test factory always recreated the service anew. Otherwise it can't work as the cache will not be invalidated. But we'll get there in https://jira.ez.no/browse/EZP-23941. |
Fixed in a352e62. |
Changed to injecting field type identifier/name params through constructor in 611dbcb. |
It is checked in the |
okay. I'd rather have each class independent enough to prevent warnings and notices, without relying on the caller, but it is not a huge deal either. +1 |
…oring Implement EZP-23465: Elasticsearch: refactor FieldMap implementation for caching and multiple fields support
This PR resolves issue https://jira.ez.no/browse/EZP-23465
Current implementation of field mapping does not cover several use-cases:
In case the FieldType indexes multiple fields, no way is provided to select particular field for search and sort
Indexable FieldType interface provides a way to index single field data in multiple fields. Currently we use this in the case of MapLocation FieldType, where both address and geo point is indexed. At the moment MapLocationDistance criterion and sort clause discriminate on the field to use through its type (see
eZ\Publish\SPI\Persistence\Content\Search\FieldType\*
). This mechanism can be ambiguous if multiple fields of the same type are defined. MapLocation address is not searchable with Field criterion nor sortable with Field sort clause, as there is no way to choose the default field from the FieldType's index definition.Additionally custom implementation of Indexable interface for the FieldType should be the main way to index custom fields by the users, so we need a standard mechanism to choose default field to search and sort on with provided
Field
criterion and sort clause, and also to choose other fields to be used by user-implemented criteria and sort clauses.Same as is possible to search on a custom field, it should be possible to sort on it
CustomFieldInterface documents that it is to be used on criteria only. However same as it is possible to define custom field for search, it should be possible to define it for sort, so it the interface must also be applicable on a (Field) sort clause. FWIW MapLocationDistance sort clause already implements it.
FieldMap is not cacheable
FieldMap provides mapping of a Field criterion or sort clause targets to a index storage field names. To achieve this it needs a complete mapping of ContentTypes, their FieldDefinitions and corresponding FieldTypes. This data can be a simple hash structure. In most cases there won't be that many ContentTypes and FieldDefinitions, but given they can easily come in hundreds, it would be opportune to cache this data even if cached storage is used. ATM this is not possible since FieldMap receives criterion and sort clause object which can vary on user input.
All of the above points pertain to the FieldMap implementation and are therefore handled in the same pull request.
FieldMap is here refactored so that it is possible to request a default field, to be used with Field criterion and sort clause, and also a "non default" field as for example used by MapLocationDistance criterion or sort clause, or custom field. The hash structure described in point 3. is extracted to a single method, which makes moving it to the ContentType storage handler and caching it there (when Search SPI is moved out of Persistence SPI) possible. Caching does not have to be granular, simple "timber" style would be sufficient.
Indexable interface gets a new method
getDefaultField
, which returns a field name from index definition that is to be used by default, i.e. by Field criterion and sort clause.Custom criteria and sort clauses like our MapLocationDistance can also, through the FieldMap implementation, request a field by name from the FieldType's index definition. This makes them limited to a specific FieldType if the provided FieldMap is used to obtain field names for search and sort. This is intentional to avoid ambiguity, as described above. If that is not desired user can use custom field feature or implement its own field mapping mechanism.
CustomFieldInterface doc is updated to describe that application on a sort clause is possible. Field sort clause is updated to implement CustomFieldInterface.
Tests: integration tests.
Post-review followups