Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
220 additions
and
0 deletions.
There are no files selected for viewing
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
!Flow | ||
with: | ||
board: | ||
canvas: | ||
gateway: | ||
x: 528 | ||
y: 71 | ||
FieldsMapper: | ||
x: 527 | ||
y: 199 | ||
TitleFilteredEncoder: | ||
x: 377 | ||
y: 344 | ||
SumFilteredEncoder: | ||
x: 665 | ||
y: 334 | ||
TitleIndexer: | ||
x: 366 | ||
y: 539 | ||
SumIndexer: | ||
x: 660 | ||
y: 539 | ||
join: | ||
x: 516 | ||
y: 659 | ||
pods: | ||
gateway: {} | ||
FieldsMapper: | ||
needs: gateway | ||
TitleFilteredEncoder: | ||
needs: FieldsMapper | ||
SumFilteredEncoder: | ||
needs: FieldsMapper | ||
TitleIndexer: | ||
needs: TitleFilteredEncoder | ||
SumIndexer: | ||
needs: SumFilteredEncoder | ||
join: | ||
needs: | ||
- TitleIndexer | ||
- SumIndexer |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
JEP 1 --- Adding support for multi-fields search | ||
================================================================= | ||
|
||
.. contents:: Table of Contents | ||
:depth: 3 | ||
|
||
|
||
:Author: Nan Wang (nan.wang@jina.ai) | ||
:Created: May. 28, 2020 | ||
:Status: Proposal | ||
:Related JEPs: | ||
:Created on Jina VCS version: ``TBA`` | ||
:Merged to Jina VCS version: ``TBA`` | ||
:Released in Jina version: TBA | ||
:Discussions: https://github.com/jina-ai/jina/issues/441 | ||
|
||
.. contents:: Table of Contents | ||
:depth: 2 | ||
|
||
Abstract | ||
-------- | ||
|
||
[A short (~200 word) description of the technical issue being addressed.] | ||
|
||
Motivation | ||
--------- | ||
Multi-field search is commonly used in the production. | ||
Concretely, the use case is to limit the query within some fields that the user has selected. | ||
In the following case, there are three two fields in each document, i.e. ``title`` and ``summary``. | ||
The use case is to query only from the ``title`` field. Given the query, ``q='painter'``, | ||
the expected result is only ``hacker and painters``. | ||
|
||
.. highlight:: json | ||
.. code-block:: json | ||
{ | ||
"id": 10, | ||
"title": "the story of the art", | ||
"summary": "This is a book about the history of the art, and the stories of the great painters" | ||
}, { | ||
"id": 11, | ||
"title": "hackers and painters", | ||
"summary": "This book discusses hacking, start-up companies, and many other technological issues" | ||
} | ||
Rationale | ||
--------- | ||
The core issue of this use case is the need of marking the ``Chunks`` from different fields. | ||
During the query time, we would like to enable the users to change the selected fields in different queries without rebuilding the query ``Flow``. | ||
|
||
.. highlight:: json | ||
.. code-block:: json | ||
{ | ||
"data": "painter", | ||
"top_k": 10, | ||
"mime_type": "application/text" | ||
"fields_name": ["title"], | ||
} | ||
Flow | ||
^^^^ | ||
|
||
.. image:: JEP3-index-design.png | ||
:align: center | ||
:width: 60% | ||
|
||
To achieve this, we propose the following changes, | ||
|
||
1. Add a new field in the protobuf defination of the ``Chunk``. | ||
|
||
.. highlight:: proto | ||
.. code-block:: proto | ||
message Chunk { | ||
... | ||
string field_name = 13; | ||
} | ||
2. Add a new ``Crafter`` for adding ``field_name`` information to the ``Chunk``. | ||
|
||
.. highlight:: python | ||
.. code-block:: python | ||
class FieldMapper(BaseSegmenter): | ||
def craft(self, *args, **kwargs) -> List[Dict]: | ||
pass | ||
.. highlight:: python | ||
.. code-block:: python | ||
class MapperDriver(SegmentDriver): | ||
pass | ||
3. Add a new ``Driver`` for merging the messages defined by ``field_names`` in the request instead of merging all the messages defined by ``needs``. | ||
|
||
|
||
4. Add a CompoundExecutor, namely ``FieldEncoder``, which wraps up ``FieldMapper`` and ``Encoder`` as a common pattern for multi-field search. | ||
|
||
.. highlight:: yaml | ||
.. code-block:: yaml | ||
!FieldEncoder | ||
on: | ||
SearchRequest, IndexRequest: | ||
- !MapperDriver: | ||
with: | ||
executor: FieldMapper | ||
- !EncoderDriver | ||
with: | ||
executor: TransformerTFEncoder | ||
Specification | ||
------------- | ||
|
||
[Describe the syntax and semantics of any new feature.] | ||
|
||
Backwards Compatibility | ||
----------------------- | ||
|
||
[Describe potential impact and severity on pre-existing code.] | ||
|
||
|
||
Reference Implementation | ||
------------------------ | ||
|
||
[Link to any existing implementation and details about its state, e.g. proof-of-concept.] | ||
|
||
Open Issues | ||
----------- | ||
|
||
[Any points that are still being decided/discussed.] | ||
|
||
References | ||
---------- | ||
|
||
[A collection of URLs used as references through the JEP.] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
!Flow | ||
with: | ||
board: | ||
canvas: | ||
gateway: | ||
x: 528 | ||
y: 71 | ||
FieldsMapper: | ||
x: 526 | ||
y: 162 | ||
TitleFilteredEncoder: | ||
x: 333 | ||
y: 280 | ||
SumFilteredEncoder: | ||
x: 676 | ||
y: 262 | ||
TitleIndexer: | ||
x: 332 | ||
y: 404 | ||
SumIndexer: | ||
x: 669 | ||
y: 394 | ||
join: | ||
x: 507 | ||
y: 535 | ||
ranker: | ||
x: 505 | ||
y: 635 | ||
pods: | ||
gateway: {} | ||
FieldsMapper: | ||
needs: gateway | ||
TitleFilteredEncoder: | ||
needs: FieldsMapper | ||
SumFilteredEncoder: | ||
needs: FieldsMapper | ||
TitleIndexer: | ||
needs: TitleFilteredEncoder | ||
SumIndexer: | ||
needs: SumFilteredEncoder | ||
join: | ||
needs: | ||
- TitleIndexer | ||
- SumIndexer | ||
ranker: | ||
needs: join |