NumpyPostgresSearcher is a compound Searcher Executor for Jina, made up of NumpySearcher for performing similarity search on the embeddings, and of PostgresSearcher for retrieving the metadata of the Documents.
Table of Contents
- This Executor works on Python 3.7 and 3.8.
- Make sure to install the requirements
Additionally, you will need a running PostgreSQL database. This can be a local instance, a Docker image, or a virtual machine in the cloud. Make sure you have the credentials and connection parameters.
You can start one in a Docker container, like so:
docker run -e POSTGRES_PASSWORD=123456 -p 127.0.0.1:5432:5432/tcp postgres:13.2
Check integration tests for an example on how to use it.
Since this is a "Searcher"-type Executor, it does not index new data. Rather they are write-once classes, which take as data source a dump_path
.
This can be provided in different ways:
- in the YAML definition
jtype: NumpyPostgresSearcher
with:
dump_path: /tmp/your_dump_location
...
- from the
Flow.rolling_update
method. See README.
The folder needs to contain the data exported from your Indexer. Again, see README.
Use the prebuilt images from JinaHub in your python codes,
from jina import Flow
f = Flow().add(uses='jinahub+docker://NumpyPostgresSearcher')
or in the .yml
config.
jtype: Flow
pods:
- name: indexer
uses: 'jinahub+docker://NumpyPostgresSearcher'
Use the source codes from JinaHub in your code
from jina import Flow
f = Flow().add(uses='jinahub://NumpyPostgresSearcher')
or in the .yml
config.
jtype: Flow
pods:
- name: indexer
uses: 'jinahub://NumpyPostgresSearcher'
-
Install the
executor-indexers
package.pip install git+https://github.com/jina-ai/executor-indexers/
-
Use
executor-indexers
in your codefrom jina import Flow from jinahub.indexers.searcher import NumpyPostgresSearcher f = Flow().add(uses=NumpyPostgresSearcher)
-
Clone the repo and build the docker image
git clone https://github.com/jina-ai/executor-indexers/ cd jinahub/indexers/searcher/compound/NumpyPostgresSearcher docker build -t numpy-psql-image .
-
Use
numpy-psql-image
in your codesfrom jina import Flow f = Flow().add(uses='docker://numpy-psql-image:latest')
from jina import Flow, Document
f = Flow().add(uses='jinahub+docker://NumpyPostgresSearcher')
with f:
resp = f.post(on='/search', inputs=Document(), return_results=True)
print(f'{resp}')
Document
with .embedding
the same shape as the Documents
stored in the NumpySearcher
. The ids of the Documents
stored in NumpySearcher
need to exist in the PostgresSearcher
. Otherwise you will not get back the original metadata.
The NumpySearcher attaches matches to the Documents sent as inputs, with the id of the match, and its embedding. Then, the PostgresSearcher retrieves the full metadata (original text or image blob) and attaches those to the Document. You receive back the full Document.