feat: add embed_raw route to get all embeddings without pooling #154

OlivierDehaene · 2024-02-09T18:08:03Z

@LLukas22, is this enough for your usecase?
It adds a embed_all route that returns all token embeddings for a given request.

LLukas22 · 2024-02-10T15:53:00Z

@OlivierDehaene

This is exactly what i needed to get BAAI/bge-m3 running.
While I'm away from my workstation, I managed to test it on my personal machine, and it seems to be working flawlessly. Thanks for adding this. Here are a few observations I made while experimenting with this PR:

Slight Discrepancy in Results

Upon closer inspection, I noticed a slight deviation in the results compared to those of the FlagEmbedding. Although it's not significant, there seems to be around a 0.9% difference in similarity scores, which appears notable.

Further investigation revealed disparities between TEI's raw output and the output from transformers, larger than what I expected from running a model with float 32 precision.

Running the following script:

from transformers import AutoModel, AutoTokenizer
import requests
import torch
from torch.testing import assert_close

sentence = "This is a test sentence"

raw_model = AutoModel.from_pretrained('BAAI/bge-m3')
tokenizer  = AutoTokenizer.from_pretrained('BAAI/bge-m3')  

tokenized = tokenizer(sentence, return_tensors="pt")
transformers_result = raw_model(**tokenized).last_hidden_state

tei_result = torch.tensor(requests.post('http://localhost:8080/embed_raw',json={"inputs": sentence}).json())

assert_close(tei_result, transformers_result)

yielded the following output:

Mismatched elements: 7150 / 7168 (99.7%)
Greatest absolute difference: 0.026642441749572754 at index (0, 6, 297) (up to 1e-05 allowed)
Greatest relative difference: 11.229803085327148 at index (0, 1, 112) (up to 1.3e-06 allowed)

While I understand that the outputs might not perfectly align within Torch's assert_close defined thresholds, an absolute difference of 0.0266 appears to be somewhat significant.

The TEI container I used was built and started with the following commands:

docker build . -f Dockerfile-cuda --build-arg CUDA_COMPUTE_CAP=86 -t tei 
docker run --gpus all -p 8080:80 -v ./data:/data tei --model-id BAAI/bge-m3 --dtype float32

Issue with Loading Token-Classification and QA Models

I attempted to load various BERT-based models for token classification and question-answering tasks but faced initial setbacks. It would be greatly beneficial if TEI could facilitate the loading of these models, especially since their architecture is fundamentally integrated, with the bulk of computation occurring in the encoder. This would require just a single additional layer on the client's side. While I don't anticipate TEI to natively support these specific tasks, enabling the ability to load them as "generic" BERT models would suffice.

Token-Classification

I tried to run wikineural-multilingual-ner which resulted in the following error message:

Error: Could not create backend

Caused by:
    Could not start backend: `classifier` model type is not supported for Bert

QA Model

I also attempted to run bert-large-uncased-whole-word-masking-finetuned-squad, which failed to start due to the absence of a pooler defined in the configuration. After manually defining a pooler, the model loaded and performed as expected, but the requirement to define a pooler for a QA model seems somewhat counterintuitive.

OlivierDehaene · 2024-02-11T14:02:32Z

#155 adds support for classification for bert.

OlivierDehaene · 2024-02-12T10:56:28Z

The route was renamed to embed_all.

OlivierDehaene added 5 commits February 9, 2024 10:36

wip

05adc2f

working cpu bert

f1811b2

flash bert / flash jina support

dca0f7c

almost there

214d72e

added grpc

02506d0

OlivierDehaene added 3 commits February 11, 2024 13:22

better interface

f41d550

fix tests

9a1faf9

clippy

f2e6951

OlivierDehaene merged commit a059696 into main Feb 12, 2024
6 checks passed

OlivierDehaene deleted the feat/raw_embeddings branch February 12, 2024 10:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add embed_raw route to get all embeddings without pooling #154

feat: add embed_raw route to get all embeddings without pooling #154

OlivierDehaene commented Feb 9, 2024 •

edited

LLukas22 commented Feb 10, 2024

OlivierDehaene commented Feb 11, 2024

OlivierDehaene commented Feb 12, 2024

feat: add embed_raw route to get all embeddings without pooling #154

feat: add embed_raw route to get all embeddings without pooling #154

Conversation

OlivierDehaene commented Feb 9, 2024 • edited

LLukas22 commented Feb 10, 2024

Slight Discrepancy in Results

Issue with Loading Token-Classification and QA Models

Token-Classification

QA Model

OlivierDehaene commented Feb 11, 2024

OlivierDehaene commented Feb 12, 2024

OlivierDehaene commented Feb 9, 2024 •

edited