Skip to content

jooni22/python_embeddings

Repository files navigation

Docker Compose Configuration:

  • Services:
    • embedding-service:

      • Build from dockerfile-embedding.
      • Exposes port 6000.
      • Mounts embedding-service.py.
      • Restarts on failure up to 3 attempts.
    • splade-doc-service:

      • Build from dockerfile-splade.
      • Exposes port 4000.
      • Mounts splade-doc-service.py.
    • splade-query-service:

      • Shares build context and Dockerfile with splade-doc service.
      • Exposes port 5000.
    • reranking-service:

      • Build from dockerfile-reranking. -- Exposes port 8000.

Python Code Overview:

  • API Services (using FastAPI): Each service corresponds to a different aspect of text processing or machine learning model inference. The services include embedding generation, reranking based on embeddings, and sparse vector extraction for SPLADE models.

Key Features Across Services:

  • Utilization of NVIDIA CUDA for computations (CUDA_VISIBLE_DEVICES=0).
  • Running on host IP with specific ports exposed for each service.
  • Use of PyTorch-based models (pytorch/pytorch:2.3.1-cuda11.8-cudnn8-runtime) for neural network operations.
  • Installation of necessary Python packages via pip including transformers which are heavily used in NLP tasks.

Specific API Endpoints Implemented in FastAPI:

  1. Embedding Service: Generates embeddings using pre-trained sentence transformers models like 'baai/bge-m3', 'jinaai/jina-embeddings-v2-base-en' and 'mixedbread-ai/mxbai-embed-large-v1'.
  2. Rerank Service: Uses cosine similarity to rerank given texts based on their relevance to a query string using embeddings generated by the model 'mixedbread-ai/mxbai-rerank-xsmall-v1'.
  3. Sparse Embedding Extraction: For both document (SPLADE doc) and query (SPLADE query) versions, extracting sparse vectors indicating important tokens weighted by their contribution to the document/query representation.

Deployment Considerations:

The deployment setup ensures that each component can be scaled independently while being robust against failures thanks to restart policies set in the Docker Compose file.

This structured overview provides insights into how various components interact within this microservices architecture using modern tools such as Docker, FastAPI, PyTorch, Transformers library for handling complex NLP tasks efficiently within an API framework accessible over standard HTTP methods.

How to run:

You can run via:

docker compose up -d

or without docker:

tail -f logs.txt & python3 embedding-service.py >> logs.txt & python3 reranking-service.py >> logs.txt & python3 splade-doc-service.py >> logs.txt & python3 splade-query-service.py >> logs.txt

How to call API:

EMBEDDING-SERVICE:

Multiple model is available only in embedding-service, you can refer to the API in several different ways for embedding-service:

curl -s http://127.0.0.1:6000/embeddings -X POST -H "Content-Type: application/json" -d '{"input": "What is Deep Learning?", "model": "jina-embeddings-v2-base-en"}'

If your application additionally adds "api-version" to the endpoint, embedding-service will also return the result correctly (this value is ignored)

curl -s http://127.0.0.1:6000/embeddings?api-version=2023-05-15 -X POST -H "Content-Type: application/json" -d '{"input": "What is Deep Learning?", "model": "jina-embeddings-v2-base-en"}'

Also, embeddings will be returned correctly if you add the organisation name before the model name.

curl -s http://127.0.0.1:6000/embeddings -X POST -H "Content-Type: application/json" -d '{"input": "What is Deep Learning?", "model": "jinaai/jina-embeddings-v2-base-en"}'

RERANKING-SERVICE:

curl -s http://127.0.0.1:8000/rerank -X POST -H 'Content-Type: application/json' -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."], "truncate": true}'

SPLADE-DOC-SERVICE:

curl -s http://127.0.0.1:4000/embed_sparse -X POST -H "Content-Type: application/json" -d '{"inputs": "What is Deep Learning?"}'

SPLADE-QUERY-SERVICE:

curl -s http://127.0.0.1:5000/embed_sparse -X POST -H "Content-Type: application/json" -d '{"inputs": "What is Deep Learning?"}'

About

The services include embedding generation, reranking based on embeddings, and sparse vector extraction for SPLADE models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages