-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HuggingFace Hub Embeddings #125
Conversation
Main design q is whether this should be consolidated with HuggingFaceEmbeddings, could be confusing to have 2 separate huggingface embedings classes. Will add unit tests and docs once that's decided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im pretty fine with them being separate
client = InferenceApi( | ||
repo_id=repo_id, | ||
token=huggingfacehub_api_token, | ||
task="feature-extraction", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets make this adjustable as well
def test_huggingfacehub_embedding_documents() -> None: | ||
"""Test huggingfacehub embeddings.""" | ||
documents = ["foo bar"] | ||
embedding = HuggingFaceHubEmbeddings(task="feature-extraction") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hwchase17 dunno how we feel about default model being one where the default task isn't valid (it's "sentence-similarity"). Can find a different one if that's preferred, was just matching the existing HuggingFace embeddings class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also seems like decent amount of feature-extraction models have 2D outputs — should we automatically flatten? or make the embed_
signatures more permissive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good questions. i thik the reason for the weirdness is the HF embeddings use the sentence transformers package, but this can in theory use any
I think we'll also potentially want some different handling for whether the model is sentence transformer or not? im not super sure but how about this:
-
default the task to feature-extraction (since its the only supported one, but we still let people change it if they really want to)
-
check that repo_id starts with sentence_transformer - a bit restrictive but we can worry about extending it later? start simple and tight
def test_huggingfacehub_embedding_documents() -> None: | ||
"""Test huggingfacehub embeddings.""" | ||
documents = ["foo bar"] | ||
embedding = HuggingFaceHubEmbeddings(task="feature-extraction") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good questions. i thik the reason for the weirdness is the HF embeddings use the sentence transformers package, but this can in theory use any
I think we'll also potentially want some different handling for whether the model is sentence transformer or not? im not super sure but how about this:
-
default the task to feature-extraction (since its the only supported one, but we still let people change it if they really want to)
-
check that repo_id starts with sentence_transformer - a bit restrictive but we can worry about extending it later? start simple and tight
Main design q is whether this should be consolidated with HuggingFaceEmbeddings, could be confusing to have 2 separate huggingface embedings classes. Will add unit tests and docs once that's decided.
Closes #86