-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: When using HuggingFaceEmbedding in VectorStoreIndex it loses attributes through copy.deepcopy() #14837
Comments
Hey there, @learnbott! 👋 I'm here to help you with any bugs, questions, or contributions you might have. Let's squash those bugs together! The issue arises because the To resolve this, you can override the from copy import deepcopy
from llama_index.embeddings.huggingface.base import HuggingFaceEmbedding
class CustomHuggingFaceEmbedding(HuggingFaceEmbedding):
def __deepcopy__(self, memo):
# Create a new instance of CustomHuggingFaceEmbedding
new_instance = type(self)(
model_name=self.model_name,
max_length=self.max_length,
query_instruction=self.query_instruction,
text_instruction=self.text_instruction,
normalize=self.normalize,
cache_folder=self.cache_folder,
device=self._device,
callback_manager=self.callback_manager,
)
# Manually copy the _model attribute
new_instance._model = deepcopy(self._model, memo)
return new_instance By adding the You can then use this modified class in your code: import copy
filepath = "/some/path/to/.xlsx"
documents = PandasExcelReader(sheet_name="sheet 1").load_data(filepath)
embed_model_name = "BAAI/bge-small-en-v1.5"
embed_model = CustomHuggingFaceEmbedding(model_name=embed_model_name)
index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)
query_engine = index.as_retriever(similarity_top_k=2)
qe_clone = copy.deepcopy(query_engine)
print('Original query engine', hasattr(query_engine._embed_model, "_model"))
print('Copy query engine', hasattr(qe_clone._embed_model, "_model")) This should resolve the |
Here is the line, that causes the deep copy issue:
Intent of this line is unclear. |
Commenting out this line allowed happiness to be published throughout all the land. Thank you. |
Bug Description
I was integrating LlamaIndex with HuggingFaceEmbedding as the vector store in a DSPy model. It worked fine outside of DSPy Teleprompters. But when running with
teleprompter.compile()
it would result in this error:AttributeError: 'HuggingFaceEmbedding' object has no attribute '_model'
It turns out that the teleprompters make a
copy.deepcopy()
of the DSPy model including the LlamaIndex vector store. Before the copy is made theHuggingFaceEmbedding
has the attribute_model
, but the copy does not.This may be related to issue #14464
Version
0.10.55
Steps to Reproduce
Relevant Logs/Tracbacks
The text was updated successfully, but these errors were encountered: