New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add embed_raw route to get all embeddings without pooling #154
Conversation
This is exactly what i needed to get Slight Discrepancy in ResultsUpon closer inspection, I noticed a slight deviation in the results compared to those of the Further investigation revealed disparities between TEI's raw output and the output from Running the following script: from transformers import AutoModel, AutoTokenizer
import requests
import torch
from torch.testing import assert_close
sentence = "This is a test sentence"
raw_model = AutoModel.from_pretrained('BAAI/bge-m3')
tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-m3')
tokenized = tokenizer(sentence, return_tensors="pt")
transformers_result = raw_model(**tokenized).last_hidden_state
tei_result = torch.tensor(requests.post('http://localhost:8080/embed_raw',json={"inputs": sentence}).json())
assert_close(tei_result, transformers_result) yielded the following output:
While I understand that the outputs might not perfectly align within Torch's The TEI container I used was built and started with the following commands:
Issue with Loading Token-Classification and QA ModelsI attempted to load various BERT-based models for token classification and question-answering tasks but faced initial setbacks. It would be greatly beneficial if TEI could facilitate the loading of these models, especially since their architecture is fundamentally integrated, with the bulk of computation occurring in the encoder. This would require just a single additional layer on the client's side. While I don't anticipate TEI to natively support these specific tasks, enabling the ability to load them as "generic" BERT models would suffice. Token-ClassificationI tried to run wikineural-multilingual-ner which resulted in the following error message:
QA ModelI also attempted to run bert-large-uncased-whole-word-masking-finetuned-squad, which failed to start due to the absence of a pooler defined in the configuration. After manually defining a pooler, the model loaded and performed as expected, but the requirement to define a pooler for a QA model seems somewhat counterintuitive. |
#155 adds support for classification for bert. |
The route was renamed to |
@LLukas22, is this enough for your usecase?
It adds a embed_all route that returns all token embeddings for a given request.