GitHub - jooni22/python_embeddings: The services include embedding generation, reranking based on embeddings, and sparse vector extraction for SPLADE models.

Docker Compose Configuration:

Services:
- embedding-service:
  - Build from dockerfile-embedding.
  - Exposes port 6000.
  - Mounts embedding-service.py.
  - Restarts on failure up to 3 attempts.
- splade-doc-service:
  - Build from dockerfile-splade.
  - Exposes port 4000.
  - Mounts splade-doc-service.py.
- splade-query-service:
  - Shares build context and Dockerfile with splade-doc service.
  - Exposes port 5000.
- reranking-service:
  - Build from dockerfile-reranking. -- Exposes port 8000.

Python Code Overview:

API Services (using FastAPI): Each service corresponds to a different aspect of text processing or machine learning model inference. The services include embedding generation, reranking based on embeddings, and sparse vector extraction for SPLADE models.

Key Features Across Services:

Utilization of NVIDIA CUDA for computations (CUDA_VISIBLE_DEVICES=0).
Running on host IP with specific ports exposed for each service.
Use of PyTorch-based models (pytorch/pytorch:2.3.1-cuda11.8-cudnn8-runtime) for neural network operations.
Installation of necessary Python packages via pip including transformers which are heavily used in NLP tasks.

Specific API Endpoints Implemented in FastAPI:

Embedding Service: Generates embeddings using pre-trained sentence transformers models like 'baai/bge-m3', 'jinaai/jina-embeddings-v2-base-en' and 'mixedbread-ai/mxbai-embed-large-v1'.
Rerank Service: Uses cosine similarity to rerank given texts based on their relevance to a query string using embeddings generated by the model 'mixedbread-ai/mxbai-rerank-xsmall-v1'.
Sparse Embedding Extraction: For both document (SPLADE doc) and query (SPLADE query) versions, extracting sparse vectors indicating important tokens weighted by their contribution to the document/query representation.

Deployment Considerations:

The deployment setup ensures that each component can be scaled independently while being robust against failures thanks to restart policies set in the Docker Compose file.

This structured overview provides insights into how various components interact within this microservices architecture using modern tools such as Docker, FastAPI, PyTorch, Transformers library for handling complex NLP tasks efficiently within an API framework accessible over standard HTTP methods.

How to run:

You can run via:

docker compose up -d

or without docker:

tail -f logs.txt & python3 embedding-service.py >> logs.txt & python3 reranking-service.py >> logs.txt & python3 splade-doc-service.py >> logs.txt & python3 splade-query-service.py >> logs.txt

How to call API:

EMBEDDING-SERVICE:

Multiple model is available only in embedding-service, you can refer to the API in several different ways for embedding-service:

curl -s http://127.0.0.1:6000/embeddings -X POST -H "Content-Type: application/json" -d '{"input": "What is Deep Learning?", "model": "jina-embeddings-v2-base-en"}'

If your application additionally adds "api-version" to the endpoint, embedding-service will also return the result correctly (this value is ignored)

curl -s http://127.0.0.1:6000/embeddings?api-version=2023-05-15 -X POST -H "Content-Type: application/json" -d '{"input": "What is Deep Learning?", "model": "jina-embeddings-v2-base-en"}'

Also, embeddings will be returned correctly if you add the organisation name before the model name.

curl -s http://127.0.0.1:6000/embeddings -X POST -H "Content-Type: application/json" -d '{"input": "What is Deep Learning?", "model": "jinaai/jina-embeddings-v2-base-en"}'

RERANKING-SERVICE:

curl -s http://127.0.0.1:8000/rerank -X POST -H 'Content-Type: application/json' -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."], "truncate": true}'

SPLADE-DOC-SERVICE:

curl -s http://127.0.0.1:4000/embed_sparse -X POST -H "Content-Type: application/json" -d '{"inputs": "What is Deep Learning?"}'

SPLADE-QUERY-SERVICE:

curl -s http://127.0.0.1:5000/embed_sparse -X POST -H "Content-Type: application/json" -d '{"inputs": "What is Deep Learning?"}'

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
dockerfile-embedding		dockerfile-embedding
dockerfile-reranking		dockerfile-reranking
dockerfile-splade		dockerfile-splade
embedding-service.py		embedding-service.py
requirements.txt		requirements.txt
reranking-service.py		reranking-service.py
splade-doc-service.py		splade-doc-service.py
splade-query-service.py		splade-query-service.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Docker Compose Configuration:

Python Code Overview:

Key Features Across Services:

Specific API Endpoints Implemented in FastAPI:

Deployment Considerations:

How to run:

How to call API:

EMBEDDING-SERVICE:

RERANKING-SERVICE:

SPLADE-DOC-SERVICE:

SPLADE-QUERY-SERVICE:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Docker Compose Configuration:

Python Code Overview:

Key Features Across Services:

Specific API Endpoints Implemented in FastAPI:

Deployment Considerations:

How to run:

How to call API:

EMBEDDING-SERVICE:

RERANKING-SERVICE:

SPLADE-DOC-SERVICE:

SPLADE-QUERY-SERVICE:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages