This repository has been archived by the owner on Nov 1, 2021. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adapt cross modal search to 2.0 (#631)
* fix: comment out merge_all and discard compound indexer pattern * feat: add merge root executor * feat: adapt most executors and yml to 2.0 * feat: clean index flow with comments * fix: tokenize text and input to CLIP text encoder * fix: fix image reader and normalizer * feat: query flow adapted to 2.0 with comment * fix: change 0:0:0:0 to localhost in Readme * feat: fix the requirements * feat: fix the requirements * feat: persistence * fix: fix the index flow * feat: fix index flow * feat: query flow that supports text input * help to adapt the cross modal search to 2.0 (#657) * fix: fix the query part * feat: fix the bug in the indexing text part * fix: fix the query flow * fix: fix the score for text2image matching * fix: fix the query mode * feat: clean up * fix: fix the workspace * fix: remove hello fashion dependency * chore: clean up * fix: switch to use mime_type for routing * fix: adapt evaluation * feat: remove kv indexer * fix: remove the keyvalue indexer * fix: remove the keyvalue indexer * feat: add tests * fix: revert kvindexer deletion * chore: clean up * chore: clean up * chore: clean up Co-authored-by: Nan Wang <nan.wang@jina.ai>
- Loading branch information
Showing
26 changed files
with
444 additions
and
300 deletions.
There are no files selected for viewing
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,42 +1,43 @@ | ||
!Flow | ||
jtype: Flow | ||
version: '1' | ||
with: | ||
prefetch: 10 | ||
port_expose: 45678 | ||
workspace: $JINA_WORKSPACE | ||
pods: | ||
- name: loader | ||
- name: loader # load images from the dataset of image-caption pairs | ||
uses: pods/image-load.yml | ||
shards: $JINA_PARALLEL | ||
read_only: true | ||
- name: normalizer | ||
needs: [gateway] | ||
- name: image_normalizer # normalize the dimension of the images | ||
uses: pods/image-normalize.yml | ||
shards: $JINA_PARALLEL | ||
read_only: true | ||
- name: image_encoder | ||
uses: $JINA_IMAGE_ENCODER | ||
- name: image_encoder # encode images into embeddings with CLIP model | ||
uses: pods/clip/image-encoder.yml | ||
shards: $JINA_PARALLEL | ||
timeout_ready: 600000 | ||
read_only: true | ||
- name: image_vector_indexer | ||
- name: image_vector_indexer # store image embeddings | ||
polling: any | ||
uses: pods/index-image-vector.yml | ||
shards: $JINA_SHARDS | ||
- name: image_kv_indexer | ||
- name: image_kv_indexer # store image documents | ||
polling: any | ||
uses: pods/index-image-kv.yml | ||
shards: $JINA_SHARDS | ||
needs: [gateway] | ||
- name: text_encoder | ||
uses: $JINA_TEXT_ENCODER | ||
uses_internal: $JINA_TEXT_ENCODER_INTERNAL | ||
- name: text_encoder # encode text into embeddings with CLIP model | ||
uses: pods/clip/text-encoder.yml | ||
shards: $JINA_PARALLEL | ||
timeout_ready: 600000 | ||
read_only: true | ||
needs: [gateway] | ||
- name: text_indexer | ||
- name: text_indexer # index the text into documents | ||
polling: any | ||
uses: pods/index-text.yml | ||
uses: pods/index-text.yml #(numpy + binary pb indexer) | ||
shards: $JINA_SHARDS | ||
- name: join_all | ||
uses: _merge_root | ||
needs: [image_vector_indexer, image_kv_indexer, text_indexer] | ||
read_only: true | ||
needs: text_encoder | ||
- name: join_all # wait on the 3 executors to finish data processing with "needs" | ||
needs: [image_vector_indexer, image_kv_indexer, text_indexer] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,49 +1,47 @@ | ||
!Flow | ||
jtype: Flow | ||
version: '1' | ||
with: | ||
prefetch: 10 | ||
port_expose: 45678 | ||
workspace: $JINA_WORKSPACE | ||
pods: | ||
- name: loader | ||
- name: loader # load query image | ||
uses: pods/image-load.yml | ||
shards: $JINA_PARALLEL | ||
read_only: true | ||
- name: normalizer | ||
needs: gateway | ||
- name: normalizer # normalize query image | ||
uses: pods/image-normalize.yml | ||
shards: $JINA_PARALLEL | ||
read_only: true | ||
- name: image_encoder | ||
needs: loader | ||
- name: image_encoder # encode query image into embeddings with CLIP model | ||
polling: any | ||
uses: $JINA_IMAGE_ENCODER | ||
uses: pods/clip/image-encoder.yml | ||
shards: $JINA_PARALLEL | ||
timeout_ready: 600000 | ||
read_only: true | ||
- name: text_indexer | ||
needs: normalizer | ||
- name: text_indexer # index query text | ||
polling: all | ||
uses: pods/index-text.yml | ||
shards: $JINA_SHARDS | ||
uses_after: pods/merge_matches_sort_topk.yml | ||
remove_uses_ba: true | ||
- name: text_encoder | ||
uses: $JINA_TEXT_ENCODER | ||
uses_internal: $JINA_TEXT_ENCODER_INTERNAL | ||
- name: text_encoder # encode query text into embeddings with CLIP model | ||
uses: pods/clip/text-encoder.yml | ||
shards: $JINA_PARALLEL | ||
timeout_ready: 600000 | ||
read_only: true | ||
needs: [gateway] | ||
- name: image_vector_indexer | ||
- name: image_vector_indexer # index query image embeddings | ||
polling: all | ||
uses: pods/index-image-vector.yml | ||
shards: $JINA_SHARDS | ||
uses_after: _merge_matches | ||
remove_uses_ba: true | ||
- name: image_kv_indexer | ||
needs: text_encoder | ||
- name: image_kv_indexer # index query image as kv | ||
polling: all | ||
uses: pods/index-image-kv.yml | ||
shards: $JINA_SHARDS | ||
uses_after: pods/merge_matches_sort_topk.yml | ||
remove_uses_ba: true | ||
- name: join_all | ||
uses: _merge_root | ||
needs: image_vector_indexer | ||
- name: join_all # combine text and image queries | ||
needs: [text_indexer, image_kv_indexer] | ||
read_only: true |
Empty file.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
jtype: CLIPImageEncoder | ||
metas: | ||
py_modules: | ||
- '../executors.py' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,5 @@ | ||
!CLIPTextEncoder | ||
# encodes text into embeddings with CLIP model | ||
jtype: CLIPTextEncoder | ||
metas: | ||
py_modules: | ||
- workspace/__init__.py | ||
requests: | ||
on: | ||
IndexRequest: | ||
- !FilterQL | ||
with: | ||
lookups: {'modality': 'text'} | ||
- !EncodeDriver {} | ||
SearchRequest: | ||
- !FilterQL | ||
with: | ||
lookups: {'mime_type__contains': 'text'} | ||
- !EncodeDriver {} | ||
- '../executors.py' |
Oops, something went wrong.