Add HuggingFace Hub Embeddings #125

baskaryan · 2022-11-12T22:30:26Z

Main design q is whether this should be consolidated with HuggingFaceEmbeddings, could be confusing to have 2 separate huggingface embedings classes. Will add unit tests and docs once that's decided.

Closes #86

Main design q is whether this should be consolidated with HuggingFaceEmbeddings, could be confusing to have 2 separate huggingface embedings classes. Will add unit tests and docs once that's decided.

hwchase17

im pretty fine with them being separate

langchain/embeddings/huggingface.py

langchain/embeddings/huggingface_hub.py

hwchase17 · 2022-11-23T19:11:59Z

langchain/embeddings/huggingface_hub.py

+            client = InferenceApi(
+                repo_id=repo_id,
+                token=huggingfacehub_api_token,
+                task="feature-extraction",


lets make this adjustable as well

baskaryan · 2022-11-25T20:38:48Z

tests/integration_tests/embeddings/test_huggingface_hub.py

+def test_huggingfacehub_embedding_documents() -> None:
+    """Test huggingfacehub embeddings."""
+    documents = ["foo bar"]
+    embedding = HuggingFaceHubEmbeddings(task="feature-extraction")


@hwchase17 dunno how we feel about default model being one where the default task isn't valid (it's "sentence-similarity"). Can find a different one if that's preferred, was just matching the existing HuggingFace embeddings class

also seems like decent amount of feature-extraction models have 2D outputs — should we automatically flatten? or make the embed_ signatures more permissive?

good questions. i thik the reason for the weirdness is the HF embeddings use the sentence transformers package, but this can in theory use any

I think we'll also potentially want some different handling for whether the model is sentence transformer or not? im not super sure but how about this:

default the task to feature-extraction (since its the only supported one, but we still let people change it if they really want to)

check that repo_id starts with sentence_transformer - a bit restrictive but we can worry about extending it later? start simple and tight

langchain/embeddings/huggingface_hub.py

hwchase17 · 2022-11-26T05:04:17Z

tests/integration_tests/embeddings/test_huggingface_hub.py

+def test_huggingfacehub_embedding_documents() -> None:
+    """Test huggingfacehub embeddings."""
+    documents = ["foo bar"]
+    embedding = HuggingFaceHubEmbeddings(task="feature-extraction")


good questions. i thik the reason for the weirdness is the HF embeddings use the sentence transformers package, but this can in theory use any

I think we'll also potentially want some different handling for whether the model is sentence transformer or not? im not super sure but how about this:

default the task to feature-extraction (since its the only supported one, but we still let people change it if they really want to)

check that repo_id starts with sentence_transformer - a bit restrictive but we can worry about extending it later? start simple and tight

WIP: Add HuggingFace Hub Embeddings

1542ff4

Main design q is whether this should be consolidated with HuggingFaceEmbeddings, could be confusing to have 2 separate huggingface embedings classes. Will add unit tests and docs once that's decided.

baskaryan requested a review from hwchase17 November 12, 2022 22:30

hwchase17 reviewed Nov 12, 2022

View reviewed changes

langchain/embeddings/huggingface.py Outdated Show resolved Hide resolved

hwchase17 mentioned this pull request Nov 14, 2022

Support HuggingFaceHub embeddings endpoint #136

Closed

baskaryan added 2 commits November 20, 2022 09:58

Merge branch 'master' into bagatur/add_hfhub_embeddings

31bf6ed

cr

c8842fc

baskaryan changed the title ~~WIP: Add HuggingFace Hub Embeddings~~ Add HuggingFace Hub Embeddings Nov 20, 2022

baskaryan requested a review from hwchase17 November 20, 2022 19:01

hwchase17 requested changes Nov 20, 2022

View reviewed changes

langchain/embeddings/huggingface_hub.py Outdated Show resolved Hide resolved

langchain/embeddings/huggingface_hub.py Outdated Show resolved Hide resolved

hwchase17 reviewed Nov 23, 2022

View reviewed changes

baskaryan added 2 commits November 25, 2022 12:23

Merge branch 'master' into bagatur/add_hfhub_embeddings

dcc5a38

cr

48491cb

baskaryan requested a review from hwchase17 November 25, 2022 20:26

baskaryan added 2 commits November 25, 2022 12:27

nit

c0d23bf

explicit task

65e5fac

baskaryan commented Nov 25, 2022

View reviewed changes

nit

195d925

hwchase17 reviewed Nov 26, 2022

View reviewed changes

baskaryan added 2 commits November 26, 2022 18:48

Merge branch 'master' into bagatur/add_hfhub_embeddings

cce2aef

cr

0ca0d58

baskaryan requested a review from hwchase17 November 27, 2022 03:19

hwchase17 approved these changes Nov 27, 2022

View reviewed changes

baskaryan merged commit b90e25f into master Nov 27, 2022

baskaryan deleted the bagatur/add_hfhub_embeddings branch November 27, 2022 08:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add HuggingFace Hub Embeddings #125

Add HuggingFace Hub Embeddings #125

baskaryan commented Nov 12, 2022

hwchase17 left a comment

hwchase17 Nov 23, 2022

baskaryan Nov 25, 2022

baskaryan Nov 25, 2022

hwchase17 Nov 26, 2022

hwchase17 Nov 26, 2022

Add HuggingFace Hub Embeddings #125

Add HuggingFace Hub Embeddings #125

Conversation

baskaryan commented Nov 12, 2022

hwchase17 left a comment

Choose a reason for hiding this comment

hwchase17 Nov 23, 2022

Choose a reason for hiding this comment

baskaryan Nov 25, 2022

Choose a reason for hiding this comment

baskaryan Nov 25, 2022

Choose a reason for hiding this comment

hwchase17 Nov 26, 2022

Choose a reason for hiding this comment

hwchase17 Nov 26, 2022

Choose a reason for hiding this comment