Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HuggingFaceBgeEmbeddings Error, Please help! #10263

Closed
2 of 14 tasks
bathroomhu opened this issue Sep 6, 2023 · 6 comments
Closed
2 of 14 tasks

HuggingFaceBgeEmbeddings Error, Please help! #10263

bathroomhu opened this issue Sep 6, 2023 · 6 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: embeddings Related to text embedding models module

Comments

@bathroomhu
Copy link

System Info

Yesterday is works, someone accidentally update langchain now the whole platform is down.

We built the whole platform using his code all over the place. Now nothing works.
We have around 50 models. All our models are build like this and we just went live as a startup.

We are scrabbling here guys. Please help us.

`
from langchain.embeddings import HuggingFaceBgeEmbeddings

embeddings = HuggingFaceBgeEmbeddings(model_name="BAAI/bge-large-en")

news_instance = FAISS.load_local("federal_legislativeinstruments_inforce_index", embeddings)

data_state_nsw_legisation_index_instance = FAISS.load_local("data_indexes/federal/federal_legislativeinstruments_inforce_index", embeddings)

data_state_nsw_legisation_runner = RetrievalQA.from_chain_type(llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0,
openai_api_key=openai_api_key_value),
chain_type="stuff",
retriever=data_state_nsw_legisation_index_instance.as_retriever())
`

Please, please help. How to we refactor so it works. How team is going crazy to get it live again, Our very first customers are ringing us complaining. Please help.

Who can help?

No response

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • LLMs/Chat Models
  • Embedding Models
  • Prompts / Prompt Templates / Prompt Selectors
  • Output Parsers
  • Document Loaders
  • Vector Stores / Retrievers
  • Memory
  • Agents / Agent Executors
  • Tools / Toolkits
  • Chains
  • Callbacks/Tracing
  • Async

Reproduction

from langchain.embeddings import HuggingFaceBgeEmbeddings

embeddings = HuggingFaceBgeEmbeddings(model_name="BAAI/bge-large-en")

news_instance = FAISS.load_local("federal_legislativeinstruments_inforce_index", embeddings)

data_state_nsw_legisation_index_instance = FAISS.load_local("data_indexes/federal/federal_legislativeinstruments_inforce_index", embeddings)

data_state_nsw_legisation_runner = RetrievalQA.from_chain_type(llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0,
openai_api_key=openai_api_key_value),
chain_type="stuff",
retriever=data_state_nsw_legisation_index_instance.as_retriever())

Expected behavior

How load the embedding like yesterday and all the time before.

@dosubot dosubot bot added Ɑ: embeddings Related to text embedding models module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Sep 6, 2023
@dosubot
Copy link

dosubot bot commented Sep 6, 2023

🤖

I'm sorry to hear about the issues you're experiencing after the recent update. Based on a similar issue I found in the LangChain repository, it seems like the HuggingFaceBgeEmbeddings class does not exist in the latest version of LangChain. Instead, you can use the SelfHostedHuggingFaceEmbeddings or SelfHostedHuggingFaceInstructEmbeddings classes.

Here's how you can refactor your code:

from langchain.embeddings import SelfHostedHuggingFaceEmbeddings
import runhouse as rh

model_name = "BAAI/bge-large-en"
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1")
embeddings = SelfHostedHuggingFaceEmbeddings(model_name=model_name, hardware=gpu)

news_instance = FAISS.load_local("federal_legislativeinstruments_inforce_index", embeddings)

data_state_nsw_legisation_index_instance = FAISS.load_local("data_indexes/federal/federal_legislativeinstruments_inforce_index", embeddings)

data_state_nsw_legisation_runner = RetrievalQA.from_chain_type(llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0,
                                                openai_api_key=openai_api_key_value),
                                                chain_type="stuff",
                                                retriever=data_state_nsw_legisation_index_instance.as_retriever())

Please ensure that you have the runhouse python package installed. If not, you can install it using pip:

pip install runhouse

I hope this helps! If you're still experiencing issues, could you please provide the specific error message you're getting? This will help me provide more targeted assistance.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@bathroomhu
Copy link
Author

Thank you so much for helping us!!! We have refactored and getting this exception. Please help, and thank you so much 🙏

Traceback (most recent call last): File "/opt/homebrew/bin/flask", line 8, in <module> sys.exit(main()) File "/opt/homebrew/lib/python3.10/site-packages/flask/cli.py", line 1050, in main cli.main() File "/opt/homebrew/lib/python3.10/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/opt/homebrew/lib/python3.10/site-packages/click/core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/opt/homebrew/lib/python3.10/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/opt/homebrew/lib/python3.10/site-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "/opt/homebrew/lib/python3.10/site-packages/click/decorators.py", line 84, in new_func return ctx.invoke(f, obj, *args, **kwargs) File "/opt/homebrew/lib/python3.10/site-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "/opt/homebrew/lib/python3.10/site-packages/flask/cli.py", line 911, in run_command raise e from None File "/opt/homebrew/lib/python3.10/site-packages/flask/cli.py", line 897, in run_command app = info.load_app() File "/opt/homebrew/lib/python3.10/site-packages/flask/cli.py", line 308, in load_app app = locate_app(import_name, name) File "/opt/homebrew/lib/python3.10/site-packages/flask/cli.py", line 218, in locate_app __import__(module_name) File "/Users/malcolm.fitzgerald/Desktop/ailearning/legalmind/app.py", line 49, in <module> embeddings = SelfHostedHuggingFaceEmbeddings(model_name=model_name, hardware=gpu) File "/opt/homebrew/lib/python3.10/site-packages/langchain/embeddings/self_hosted_hugging_face.py", line 100, in __init__ super().__init__(load_fn_kwargs=load_fn_kwargs, **kwargs) File "/opt/homebrew/lib/python3.10/site-packages/langchain/llms/self_hosted.py", line 153, in __init__ super().__init__(**kwargs) File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__ pydantic.error_wrappers.ValidationError: 1 validation error for SelfHostedHuggingFaceEmbeddings model_name

@bathroomhu
Copy link
Author

bathroomhu commented Sep 6, 2023

we had to install version langchain==0.0.279 to get it working. is it still supported?

@dosubot
Copy link

dosubot bot commented Sep 7, 2023

🤖

Hello,

Thank you for reaching out and providing detailed information about the issue you're facing.

From the traceback you provided, it seems like the SelfHostedHuggingFaceEmbeddings class is expecting a model_name parameter, but it's not receiving it. This is causing the pydantic.error_wrappers.ValidationError.

In the latest version of LangChain, the model_name parameter for the SelfHostedHuggingFaceEmbeddings class is specified by the model_id attribute. By default, it is set to "sentence-transformers/all-mpnet-base-v2" as indicated by the DEFAULT_MODEL_NAME constant. However, it can be overridden by providing a different model name when initializing the class.

Here is the relevant code:

class SelfHostedHuggingFaceEmbeddings(SelfHostedEmbeddings):
    model_id: str = DEFAULT_MODEL_NAME
    """Model name to use."""

And in the __init__ method, it can be overridden:

def __init__(self, **kwargs: Any):
    """Initialize the remote inference function."""
    load_fn_kwargs = kwargs.pop("load_fn_kwargs", {})
    load_fn_kwargs["model_id"] = load_fn_kwargs.get("model_id", DEFAULT_MODEL_NAME)
    ...
    super().__init__(load_fn_kwargs=load_fn_kwargs, **kwargs)

So, the 'model_name' parameter should be a string that represents the name of a valid model that can be loaded by the sentence_transformers.SentenceTransformer or InstructorEmbedding.INSTRUCTOR classes, depending on the 'instruct' flag.

As for your question about the support for version langchain==0.0.279, while it might still work for your current needs, it's generally recommended to use the latest version of the library to benefit from the latest features and bug fixes.

If you're still encountering issues after checking the model_name parameter, please provide more details about how you're initializing the SelfHostedHuggingFaceEmbeddings class and I'll be glad to assist you further.

Best regards.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@deanchanter
Copy link
Contributor

Working in 0.0.285!

Copy link

dosubot bot commented Dec 9, 2023

Hi, @bathroomhu,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, the issue was raised by you regarding a platform that went down after a language model update. The team attempted to load embeddings and use them for retrieval, encountering exceptions. After some troubleshooting and guidance from me, it was discovered that installing version langchain==0.0.279 resolved the issue, and confirmed that the issue was resolved in version 0.0.285.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Dec 9, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 18, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: embeddings Related to text embedding models module
Projects
None yet
Development

No branches or pull requests

2 participants