Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add embed_raw route to get all embeddings without pooling #154

Merged
merged 8 commits into from Feb 12, 2024

Conversation

OlivierDehaene
Copy link
Member

@OlivierDehaene OlivierDehaene commented Feb 9, 2024

@LLukas22, is this enough for your usecase?
It adds a embed_all route that returns all token embeddings for a given request.

@LLukas22
Copy link

@OlivierDehaene

This is exactly what i needed to get BAAI/bge-m3 running.
While I'm away from my workstation, I managed to test it on my personal machine, and it seems to be working flawlessly. Thanks for adding this. Here are a few observations I made while experimenting with this PR:

Slight Discrepancy in Results

Upon closer inspection, I noticed a slight deviation in the results compared to those of the FlagEmbedding. Although it's not significant, there seems to be around a 0.9% difference in similarity scores, which appears notable.

Further investigation revealed disparities between TEI's raw output and the output from transformers, larger than what I expected from running a model with float 32 precision.

Running the following script:

from transformers import AutoModel, AutoTokenizer
import requests
import torch
from torch.testing import assert_close

sentence = "This is a test sentence"

raw_model = AutoModel.from_pretrained('BAAI/bge-m3')
tokenizer  = AutoTokenizer.from_pretrained('BAAI/bge-m3')  

tokenized = tokenizer(sentence, return_tensors="pt")
transformers_result = raw_model(**tokenized).last_hidden_state

tei_result = torch.tensor(requests.post('http://localhost:8080/embed_raw',json={"inputs": sentence}).json())

assert_close(tei_result, transformers_result) 

yielded the following output:

Mismatched elements: 7150 / 7168 (99.7%)
Greatest absolute difference: 0.026642441749572754 at index (0, 6, 297) (up to 1e-05 allowed)
Greatest relative difference: 11.229803085327148 at index (0, 1, 112) (up to 1.3e-06 allowed)

While I understand that the outputs might not perfectly align within Torch's assert_close defined thresholds, an absolute difference of 0.0266 appears to be somewhat significant.

The TEI container I used was built and started with the following commands:

docker build . -f Dockerfile-cuda --build-arg CUDA_COMPUTE_CAP=86 -t tei 
docker run --gpus all -p 8080:80 -v ./data:/data tei --model-id BAAI/bge-m3 --dtype float32

Issue with Loading Token-Classification and QA Models

I attempted to load various BERT-based models for token classification and question-answering tasks but faced initial setbacks. It would be greatly beneficial if TEI could facilitate the loading of these models, especially since their architecture is fundamentally integrated, with the bulk of computation occurring in the encoder. This would require just a single additional layer on the client's side. While I don't anticipate TEI to natively support these specific tasks, enabling the ability to load them as "generic" BERT models would suffice.

Token-Classification

I tried to run wikineural-multilingual-ner which resulted in the following error message:

Error: Could not create backend

Caused by:
    Could not start backend: `classifier` model type is not supported for Bert

QA Model

I also attempted to run bert-large-uncased-whole-word-masking-finetuned-squad, which failed to start due to the absence of a pooler defined in the configuration. After manually defining a pooler, the model loaded and performed as expected, but the requirement to define a pooler for a QA model seems somewhat counterintuitive.

@OlivierDehaene
Copy link
Member Author

#155 adds support for classification for bert.

@OlivierDehaene
Copy link
Member Author

The route was renamed to embed_all.

@OlivierDehaene OlivierDehaene merged commit a059696 into main Feb 12, 2024
6 checks passed
@OlivierDehaene OlivierDehaene deleted the feat/raw_embeddings branch February 12, 2024 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants