Skip to content
This repository has been archived by the owner on Jul 27, 2021. It is now read-only.

Latest commit

 

History

History

NumpyPostgresSearcher

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

✨ NumpyPostgresSearcher

NumpyPostgresSearcher is a compound Searcher Executor for Jina, made up of NumpySearcher for performing similarity search on the embeddings, and of PostgresSearcher for retrieving the metadata of the Documents.

Table of Contents

🌱 Prerequisites

  • This Executor works on Python 3.7 and 3.8.
  • Make sure to install the requirements

Additionally, you will need a running PostgreSQL database. This can be a local instance, a Docker image, or a virtual machine in the cloud. Make sure you have the credentials and connection parameters.

You can start one in a Docker container, like so:

docker run -e POSTGRES_PASSWORD=123456  -p 127.0.0.1:5432:5432/tcp postgres:13.2 

🚀 Usages

Check integration tests for an example on how to use it.

Loading data

Since this is a "Searcher"-type Executor, it does not index new data. Rather they are write-once classes, which take as data source a dump_path.

This can be provided in different ways:

  • in the YAML definition
jtype: NumpyPostgresSearcher
with:
    dump_path: /tmp/your_dump_location
...
  • from the Flow.rolling_update method. See README.

The folder needs to contain the data exported from your Indexer. Again, see README.

🚚 Via JinaHub

using docker images

Use the prebuilt images from JinaHub in your python codes,

from jina import Flow
	
f = Flow().add(uses='jinahub+docker://NumpyPostgresSearcher')

or in the .yml config.

jtype: Flow
pods:
  - name: indexer
    uses: 'jinahub+docker://NumpyPostgresSearcher'

using source codes

Use the source codes from JinaHub in your code

from jina import Flow
	
f = Flow().add(uses='jinahub://NumpyPostgresSearcher')

or in the .yml config.

jtype: Flow
pods:
  - name: indexer
    uses: 'jinahub://NumpyPostgresSearcher'

📦️ Via Pypi

  1. Install the executor-indexers package.

    pip install git+https://github.com/jina-ai/executor-indexers/
  2. Use executor-indexers in your code

    from jina import Flow
    from jinahub.indexers.searcher import NumpyPostgresSearcher
    
    f = Flow().add(uses=NumpyPostgresSearcher)

🐳 Via Docker

  1. Clone the repo and build the docker image

    git clone https://github.com/jina-ai/executor-indexers/
    cd jinahub/indexers/searcher/compound/NumpyPostgresSearcher
    docker build -t numpy-psql-image .
  2. Use numpy-psql-image in your codes

    from jina import Flow
    
    f = Flow().add(uses='docker://numpy-psql-image:latest')

🎉️ Example

from jina import Flow, Document

f = Flow().add(uses='jinahub+docker://NumpyPostgresSearcher')

with f:
    resp = f.post(on='/search', inputs=Document(), return_results=True)
    print(f'{resp}')

Inputs

Document with .embedding the same shape as the Documents stored in the NumpySearcher. The ids of the Documents stored in NumpySearcher need to exist in the PostgresSearcher. Otherwise you will not get back the original metadata.

Returns

The NumpySearcher attaches matches to the Documents sent as inputs, with the id of the match, and its embedding. Then, the PostgresSearcher retrieves the full metadata (original text or image blob) and attaches those to the Document. You receive back the full Document.