Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HuggingFace Hub Embeddings #125

Merged
merged 10 commits into from
Nov 27, 2022
Merged

Conversation

baskaryan
Copy link
Collaborator

Main design q is whether this should be consolidated with HuggingFaceEmbeddings, could be confusing to have 2 separate huggingface embedings classes. Will add unit tests and docs once that's decided.

Closes #86

Main design q is whether this should be consolidated with
HuggingFaceEmbeddings, could be confusing to have 2 separate huggingface
embedings classes. Will add unit tests and docs once that's decided.
Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im pretty fine with them being separate

langchain/embeddings/huggingface.py Outdated Show resolved Hide resolved
@baskaryan baskaryan changed the title WIP: Add HuggingFace Hub Embeddings Add HuggingFace Hub Embeddings Nov 20, 2022
langchain/embeddings/huggingface_hub.py Outdated Show resolved Hide resolved
langchain/embeddings/huggingface_hub.py Outdated Show resolved Hide resolved
client = InferenceApi(
repo_id=repo_id,
token=huggingfacehub_api_token,
task="feature-extraction",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make this adjustable as well

def test_huggingfacehub_embedding_documents() -> None:
"""Test huggingfacehub embeddings."""
documents = ["foo bar"]
embedding = HuggingFaceHubEmbeddings(task="feature-extraction")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hwchase17 dunno how we feel about default model being one where the default task isn't valid (it's "sentence-similarity"). Can find a different one if that's preferred, was just matching the existing HuggingFace embeddings class

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also seems like decent amount of feature-extraction models have 2D outputs — should we automatically flatten? or make the embed_ signatures more permissive?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good questions. i thik the reason for the weirdness is the HF embeddings use the sentence transformers package, but this can in theory use any

I think we'll also potentially want some different handling for whether the model is sentence transformer or not? im not super sure but how about this:

  1. default the task to feature-extraction (since its the only supported one, but we still let people change it if they really want to)

  2. check that repo_id starts with sentence_transformer - a bit restrictive but we can worry about extending it later? start simple and tight

langchain/embeddings/huggingface_hub.py Outdated Show resolved Hide resolved
def test_huggingfacehub_embedding_documents() -> None:
"""Test huggingfacehub embeddings."""
documents = ["foo bar"]
embedding = HuggingFaceHubEmbeddings(task="feature-extraction")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good questions. i thik the reason for the weirdness is the HF embeddings use the sentence transformers package, but this can in theory use any

I think we'll also potentially want some different handling for whether the model is sentence transformer or not? im not super sure but how about this:

  1. default the task to feature-extraction (since its the only supported one, but we still let people change it if they really want to)

  2. check that repo_id starts with sentence_transformer - a bit restrictive but we can worry about extending it later? start simple and tight

@baskaryan baskaryan merged commit b90e25f into master Nov 27, 2022
@baskaryan baskaryan deleted the bagatur/add_hfhub_embeddings branch November 27, 2022 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add HuggingFace Hub embeddings support
2 participants