Implement EZP-23465: Elasticsearch: refactor FieldMap implementation for caching and multiple fields support #1046

pspanja · 2014-10-22T12:29:50Z

This PR resolves issue https://jira.ez.no/browse/EZP-23465

Current implementation of field mapping does not cover several use-cases:

In case the FieldType indexes multiple fields, no way is provided to select particular field for search and sort

Indexable FieldType interface provides a way to index single field data in multiple fields. Currently we use this in the case of MapLocation FieldType, where both address and geo point is indexed. At the moment MapLocationDistance criterion and sort clause discriminate on the field to use through its type (see eZ\Publish\SPI\Persistence\Content\Search\FieldType\*). This mechanism can be ambiguous if multiple fields of the same type are defined. MapLocation address is not searchable with Field criterion nor sortable with Field sort clause, as there is no way to choose the default field from the FieldType's index definition.

Additionally custom implementation of Indexable interface for the FieldType should be the main way to index custom fields by the users, so we need a standard mechanism to choose default field to search and sort on with provided Field criterion and sort clause, and also to choose other fields to be used by user-implemented criteria and sort clauses.
Same as is possible to search on a custom field, it should be possible to sort on it

CustomFieldInterface documents that it is to be used on criteria only. However same as it is possible to define custom field for search, it should be possible to define it for sort, so it the interface must also be applicable on a (Field) sort clause. FWIW MapLocationDistance sort clause already implements it.
FieldMap is not cacheable

FieldMap provides mapping of a Field criterion or sort clause targets to a index storage field names. To achieve this it needs a complete mapping of ContentTypes, their FieldDefinitions and corresponding FieldTypes. This data can be a simple hash structure. In most cases there won't be that many ContentTypes and FieldDefinitions, but given they can easily come in hundreds, it would be opportune to cache this data even if cached storage is used. ATM this is not possible since FieldMap receives criterion and sort clause object which can vary on user input.

All of the above points pertain to the FieldMap implementation and are therefore handled in the same pull request.

FieldMap is here refactored so that it is possible to request a default field, to be used with Field criterion and sort clause, and also a "non default" field as for example used by MapLocationDistance criterion or sort clause, or custom field. The hash structure described in point 3. is extracted to a single method, which makes moving it to the ContentType storage handler and caching it there (when Search SPI is moved out of Persistence SPI) possible. Caching does not have to be granular, simple "timber" style would be sufficient.

Indexable interface gets a new method getDefaultField, which returns a field name from index definition that is to be used by default, i.e. by Field criterion and sort clause.

Custom criteria and sort clauses like our MapLocationDistance can also, through the FieldMap implementation, request a field by name from the FieldType's index definition. This makes them limited to a specific FieldType if the provided FieldMap is used to obtain field names for search and sort. This is intentional to avoid ambiguity, as described above. If that is not desired user can use custom field feature or implement its own field mapping mechanism.

CustomFieldInterface doc is updated to describe that application on a sort clause is possible. Field sort clause is updated to implement CustomFieldInterface.

Tests: integration tests.

Post-review followups

Implement similar FieldMap refactoring for Solr Storage Engine: https://jira.ez.no/browse/EZP-23694
Create an issue to move field map hash to storage and cache it there (depends on extracting Search SPI out of Persistence SPI)
Make field criteria, sort clauses and targets extend base Field implementation (MapLocationDistance ATM): https://jira.ez.no/browse/EZP-23695

pspanja · 2014-10-23T20:04:52Z

This needs refactoring for FullText visitor implementation: #1051.

pspanja · 2014-10-28T14:52:14Z

Updated FullText visitor implementation and added unit tests.

Now ready for review again.

andrerom · 2014-10-30T13:36:23Z

+1

pspanja · 2014-11-09T21:33:47Z

Rebased for ES 1.4.0 final, still one +1 missing... review ping @bdunogier, @lolautruche

pspanja · 2015-01-21T12:17:51Z

Rebased on master, calling for one missing review.

bdunogier · 2015-01-30T11:49:12Z

As I was afraid, I think there are too many changes in this PR. But I've looked around, and I don't think it should be blocked.

So +1, but I personally think that the code looks a bit complex sometimes, and believe that with better granularity of changes, we'd have been able to make it more simple.

But good job nonetheless :-)

bdunogier · 2015-01-30T12:05:20Z

Hmmm, sorry for the comments that show without the context. I've reviewed your code commit by commit... :-)

…ia and sort clauses

…eld map extraction

pspanja · 2015-01-30T13:21:44Z

Ok no worries, I'll keep the comments there then.

pspanja · 2015-01-30T13:33:56Z

On second thought, better to keep the discussion here as the links are now lost with rebase.

93de436#commitcomment-9515780

See my comment directly above. Initially it implemented caching, but it worked only because LegacySolr test factory always recreated the service anew. Otherwise it can't work as the cache will not be invalidated. But we'll get there in https://jira.ez.no/browse/EZP-23941.

pspanja · 2015-01-30T14:32:17Z

1456579#commitcomment-9515837

Fixed in a352e62.

pspanja · 2015-01-30T15:30:51Z

1456579#commitcomment-9515824

Changed to injecting field type identifier/name params through constructor in 611dbcb.

…visitors

pspanja · 2015-01-30T16:01:12Z

@bdunogier

42a358c#commitcomment-9515749

It is checked in the FieldMap implementation, or do you mean validating the type here?
All remarks are addressed now, let me know it you are fine with merging this.

bdunogier · 2015-02-03T09:22:43Z

It is checked in the FieldMap implementation, or do you mean validating the type here?

okay. I'd rather have each class independent enough to prevent warnings and notices, without relying on the caller, but it is not a huge deal either.

+1

…oring Implement EZP-23465: Elasticsearch: refactor FieldMap implementation for caching and multiple fields support

pspanja mentioned this pull request Oct 22, 2014

Implement EZP-23465: Elasticsearch: refactor FieldMap implementation for caching and multiple fields support #1044

Closed

pspanja added Ready for review Needs decision labels Oct 22, 2014

pspanja force-pushed the fix-EZP-23465-fieldmap-refactoring branch from 3693bfb to e35aebf Compare October 22, 2014 15:25

pspanja force-pushed the fix-EZP-23465-fieldmap-refactoring branch from e35aebf to a3413ad Compare October 28, 2014 14:50

pspanja force-pushed the fix-EZP-23465-fieldmap-refactoring branch from a3413ad to 7db30ff Compare November 9, 2014 21:32

pspanja mentioned this pull request Nov 26, 2014

Fix EZP-23129: sorting by field filters the result set #1101

Merged

pspanja force-pushed the fix-EZP-23465-fieldmap-refactoring branch from 7db30ff to 3c62cf4 Compare November 27, 2014 10:45

pspanja force-pushed the fix-EZP-23465-fieldmap-refactoring branch from 3c62cf4 to b613c69 Compare January 21, 2015 12:17

pspanja added 9 commits January 30, 2015 14:20

EZP-23465: document CustomFieldInterface as applicable on both criter…

821962d

…ia and sort clauses

EZP-23465: implement CustomFieldInterface on Field sort clause

0a1cb7f

EZP-23465: add getDefaultField() on Indexable and implement it

11507da

EZP-23465: refactor FieldMap to support custom/multiple fields and fi…

a1e1b45

…eld map extraction

EZP-23465: adapt criteria and sort clause visitors

a70cf04

EZP-23465: test custom sort field

6f72ceb

EZP-23465: update FullText visitor

d5cafa4

EZP-23465: expose method to make it testable

8130d06

EZP-23465: added unit tests

2c016f2

pspanja force-pushed the fix-EZP-23465-fieldmap-refactoring branch from b613c69 to 2c016f2 Compare January 30, 2015 13:21

EZP-23465: fixed: throw InvalidArgumentException on invalid target

a352e62

EZP-23465: updated: inject field type/name into field visitors

611dbcb

pspanja added 2 commits January 30, 2015 16:31

EZP-23465: fixed: remove unused import

0248739

EZP-23465: use abstract parent service definition for field criteria …

7fe478b

…visitors

andrerom removed the Needs decision label Jan 30, 2015

pspanja mentioned this pull request Jan 31, 2015

Fix EZP-23694: Solr: refactor FieldMap implementation for caching and multiple fields support #1154

Closed

pspanja added a commit that referenced this pull request Feb 3, 2015

Merge pull request #1046 from ezsystems/fix-EZP-23465-fieldmap-refact…

e6f690c

…oring Implement EZP-23465: Elasticsearch: refactor FieldMap implementation for caching and multiple fields support

pspanja merged commit e6f690c into master Feb 3, 2015

pspanja mentioned this pull request Feb 3, 2015

Fix EZP-23694: Solr: refactor FieldMap implementation for caching and multiple fields support #1163

Merged

andrerom deleted the fix-EZP-23465-fieldmap-refactoring branch February 27, 2015 11:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement EZP-23465: Elasticsearch: refactor FieldMap implementation for caching and multiple fields support #1046

Implement EZP-23465: Elasticsearch: refactor FieldMap implementation for caching and multiple fields support #1046

pspanja commented Oct 22, 2014

pspanja commented Oct 23, 2014

pspanja commented Oct 28, 2014

andrerom commented Oct 30, 2014

pspanja commented Nov 9, 2014

pspanja commented Jan 21, 2015

bdunogier commented Jan 30, 2015

bdunogier commented Jan 30, 2015

pspanja commented Jan 30, 2015

pspanja commented Jan 30, 2015

pspanja commented Jan 30, 2015

pspanja commented Jan 30, 2015

pspanja commented Jan 30, 2015

bdunogier commented Feb 3, 2015

Implement EZP-23465: Elasticsearch: refactor FieldMap implementation for caching and multiple fields support #1046

Implement EZP-23465: Elasticsearch: refactor FieldMap implementation for caching and multiple fields support #1046

Conversation

pspanja commented Oct 22, 2014

Post-review followups

pspanja commented Oct 23, 2014

pspanja commented Oct 28, 2014

andrerom commented Oct 30, 2014

pspanja commented Nov 9, 2014

pspanja commented Jan 21, 2015

bdunogier commented Jan 30, 2015

bdunogier commented Jan 30, 2015

pspanja commented Jan 30, 2015

pspanja commented Jan 30, 2015

pspanja commented Jan 30, 2015

pspanja commented Jan 30, 2015

pspanja commented Jan 30, 2015

bdunogier commented Feb 3, 2015