Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement EZP-23465: Elasticsearch: refactor FieldMap implementation for caching and multiple fields support #1046

Merged
merged 13 commits into from Feb 3, 2015

Conversation

pspanja
Copy link
Contributor

@pspanja pspanja commented Oct 22, 2014

This PR resolves issue https://jira.ez.no/browse/EZP-23465

Current implementation of field mapping does not cover several use-cases:

  1. In case the FieldType indexes multiple fields, no way is provided to select particular field for search and sort

    Indexable FieldType interface provides a way to index single field data in multiple fields. Currently we use this in the case of MapLocation FieldType, where both address and geo point is indexed. At the moment MapLocationDistance criterion and sort clause discriminate on the field to use through its type (see eZ\Publish\SPI\Persistence\Content\Search\FieldType\*). This mechanism can be ambiguous if multiple fields of the same type are defined. MapLocation address is not searchable with Field criterion nor sortable with Field sort clause, as there is no way to choose the default field from the FieldType's index definition.

    Additionally custom implementation of Indexable interface for the FieldType should be the main way to index custom fields by the users, so we need a standard mechanism to choose default field to search and sort on with provided Field criterion and sort clause, and also to choose other fields to be used by user-implemented criteria and sort clauses.

  2. Same as is possible to search on a custom field, it should be possible to sort on it

    CustomFieldInterface documents that it is to be used on criteria only. However same as it is possible to define custom field for search, it should be possible to define it for sort, so it the interface must also be applicable on a (Field) sort clause. FWIW MapLocationDistance sort clause already implements it.

  3. FieldMap is not cacheable

    FieldMap provides mapping of a Field criterion or sort clause targets to a index storage field names. To achieve this it needs a complete mapping of ContentTypes, their FieldDefinitions and corresponding FieldTypes. This data can be a simple hash structure. In most cases there won't be that many ContentTypes and FieldDefinitions, but given they can easily come in hundreds, it would be opportune to cache this data even if cached storage is used. ATM this is not possible since FieldMap receives criterion and sort clause object which can vary on user input.

All of the above points pertain to the FieldMap implementation and are therefore handled in the same pull request.

FieldMap is here refactored so that it is possible to request a default field, to be used with Field criterion and sort clause, and also a "non default" field as for example used by MapLocationDistance criterion or sort clause, or custom field. The hash structure described in point 3. is extracted to a single method, which makes moving it to the ContentType storage handler and caching it there (when Search SPI is moved out of Persistence SPI) possible. Caching does not have to be granular, simple "timber" style would be sufficient.

Indexable interface gets a new method getDefaultField, which returns a field name from index definition that is to be used by default, i.e. by Field criterion and sort clause.

Custom criteria and sort clauses like our MapLocationDistance can also, through the FieldMap implementation, request a field by name from the FieldType's index definition. This makes them limited to a specific FieldType if the provided FieldMap is used to obtain field names for search and sort. This is intentional to avoid ambiguity, as described above. If that is not desired user can use custom field feature or implement its own field mapping mechanism.

CustomFieldInterface doc is updated to describe that application on a sort clause is possible. Field sort clause is updated to implement CustomFieldInterface.

Tests: integration tests.

Post-review followups

  • Implement similar FieldMap refactoring for Solr Storage Engine: https://jira.ez.no/browse/EZP-23694
  • Create an issue to move field map hash to storage and cache it there (depends on extracting Search SPI out of Persistence SPI)
  • Make field criteria, sort clauses and targets extend base Field implementation (MapLocationDistance ATM): https://jira.ez.no/browse/EZP-23695

@pspanja
Copy link
Contributor Author

pspanja commented Oct 23, 2014

This needs refactoring for FullText visitor implementation: #1051.

@pspanja pspanja force-pushed the fix-EZP-23465-fieldmap-refactoring branch from e35aebf to a3413ad Compare October 28, 2014 14:50
@pspanja
Copy link
Contributor Author

pspanja commented Oct 28, 2014

Updated FullText visitor implementation and added unit tests.

Now ready for review again.

@andrerom
Copy link
Contributor

+1

@pspanja pspanja force-pushed the fix-EZP-23465-fieldmap-refactoring branch from a3413ad to 7db30ff Compare November 9, 2014 21:32
@pspanja
Copy link
Contributor Author

pspanja commented Nov 9, 2014

Rebased for ES 1.4.0 final, still one +1 missing... review ping @bdunogier, @lolautruche

@pspanja
Copy link
Contributor Author

pspanja commented Jan 21, 2015

Rebased on master, calling for one missing review.

@bdunogier
Copy link
Member

As I was afraid, I think there are too many changes in this PR. But I've looked around, and I don't think it should be blocked.

So +1, but I personally think that the code looks a bit complex sometimes, and believe that with better granularity of changes, we'd have been able to make it more simple.

But good job nonetheless :-)

@bdunogier
Copy link
Member

Hmmm, sorry for the comments that show without the context. I've reviewed your code commit by commit... :-)

@pspanja pspanja force-pushed the fix-EZP-23465-fieldmap-refactoring branch from b613c69 to 2c016f2 Compare January 30, 2015 13:21
@pspanja
Copy link
Contributor Author

pspanja commented Jan 30, 2015

Ok no worries, I'll keep the comments there then.

@pspanja
Copy link
Contributor Author

pspanja commented Jan 30, 2015

On second thought, better to keep the discussion here as the links are now lost with rebase.

93de436#commitcomment-9515780

See my comment directly above. Initially it implemented caching, but it worked only because LegacySolr test factory always recreated the service anew. Otherwise it can't work as the cache will not be invalidated. But we'll get there in https://jira.ez.no/browse/EZP-23941.

@pspanja
Copy link
Contributor Author

pspanja commented Jan 30, 2015

1456579#commitcomment-9515837

Fixed in a352e62.

@pspanja
Copy link
Contributor Author

pspanja commented Jan 30, 2015

1456579#commitcomment-9515824

Changed to injecting field type identifier/name params through constructor in 611dbcb.

@pspanja
Copy link
Contributor Author

pspanja commented Jan 30, 2015

@bdunogier

42a358c#commitcomment-9515749

It is checked in the FieldMap implementation, or do you mean validating the type here?
All remarks are addressed now, let me know it you are fine with merging this.

@bdunogier
Copy link
Member

It is checked in the FieldMap implementation, or do you mean validating the type here?

okay. I'd rather have each class independent enough to prevent warnings and notices, without relying on the caller, but it is not a huge deal either.

+1

pspanja added a commit that referenced this pull request Feb 3, 2015
…oring

Implement EZP-23465: Elasticsearch: refactor FieldMap implementation for caching and multiple fields support
@pspanja pspanja merged commit e6f690c into master Feb 3, 2015
@andrerom andrerom deleted the fix-EZP-23465-fieldmap-refactoring branch February 27, 2015 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants