Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse vectors name unknown #24658

Closed
5 tasks done
EtienneFerrandi opened this issue Jul 25, 2024 · 2 comments
Closed
5 tasks done

sparse vectors name unknown #24658

EtienneFerrandi opened this issue Jul 25, 2024 · 2 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@EtienneFerrandi
Copy link

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

#Step 1

import os
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_qdrant import Qdrant, FastEmbedSparse, RetrievalMode

embeddings = HuggingFaceEmbeddings(model_name='OrdalieTech/Solon-embeddings-large-0.1', model_kwargs={"device": "cuda"})

sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

vectordb = Qdrant.from_texts(
    texts=texts,
    embedding=embeddings,
    sparse_embedding=sparse_embeddings,
    sparse_vector_name="sparse-vector"
    path=os.path.join(os.getcwd(), 'manuscrits_biblissima_vectordb'),
    collection_name="manuscrits_biblissima",
    retrieval_mode=RetrievalMode.HYBRID,
)

#Step 2

model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(
    model_name='OrdalieTech/Solon-embeddings-large-0.1',
    model_kwargs=model_kwargs
    )
sparse_embeddings = FastEmbedSparse(
    model_name="Qdrant/bm25",
    model_kwargs=model_kwargs,
    )
        
qdrant = QdrantVectorStore.from_existing_collection(
    collection_name="manuscrits_biblissima",
    path=os.path.join(os.getcwd(), 'manuscrits_biblissima_vectordb'),
    retrieval_mode=RetrievalMode.HYBRID,
    embedding=embeddings, 
    sparse_embedding=sparse_embeddings,
    sparse_vector_name="sparse-vector"
    )

Error Message and Stack Trace (if applicable)

Traceback (most recent call last):
  File "/local/eferra01/data/get_ref_llama3_70B_gguf.py", line 101, in <module>
    qdrant = QdrantVectorStore.from_existing_collection(
  File "/local/eferra01/miniconda3/envs/llama-cpp-env/lib/python3.9/site-packages/langchain_qdrant/qdrant.py", line 286, in from_existing_collection
    return cls(
  File "/local/eferra01/miniconda3/envs/llama-cpp-env/lib/python3.9/site-packages/langchain_qdrant/qdrant.py", line 87, in __init__
    self._validate_collection_config(
  File "/local/eferra01/miniconda3/envs/llama-cpp-env/lib/python3.9/site-packages/langchain_qdrant/qdrant.py", line 937, in _validate_collection_config
    cls._validate_collection_for_sparse(
  File "/local/eferra01/miniconda3/envs/llama-cpp-env/lib/python3.9/site-packages/langchain_qdrant/qdrant.py", line 1022, in _validate_collection_for_sparse
    raise QdrantVectorStoreError(
langchain_qdrant.qdrant.QdrantVectorStoreError: Existing Qdrant collection manuscrits_biblissima does not contain sparse vectors named None. If you want to recreate the collection, set force_recreate parameter to True.

Description

I first create a qdrant database (#Step 1).

Then, in another script, to do RAG, I try to load the database (#Step 2).

However, I have the error above.

I named the sparse vectors when creating the database (Step 1) and took care to mention this name when loading the database for the RAG, (Step 2) but it doesn't seem to have been taken into account...

System Info

langchain-qdrant==0.1.3
OS : Linux
OS Version : Linux dgx 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
Python Version : 3.9.19 | packaged by conda-forge | (main, Mar 20 2024, 12:50:21) \n[GCC 12.3.0]

@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Jul 25, 2024
@wulifu2hao
Copy link
Contributor

wulifu2hao commented Jul 25, 2024

I am able to reproduce your issue. However changing "Qdrant.from_texts(" in your step 1 To "QdrantVectorStore.from_texts" fix the issue for me


import os
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_qdrant import FastEmbedSparse, RetrievalMode, QdrantVectorStore

embeddings = HuggingFaceEmbeddings(model_name='OrdalieTech/Solon-embeddings-large-0.1')

sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

texts = ["the capital of france is paris", "the capital of germany is berlin", "the capital of italy is rome"]
vectordb = QdrantVectorStore.from_texts(
    # vectordb = Qdrant.from_texts(
    texts=texts,
    embedding=embeddings,
    sparse_embedding=sparse_embeddings,
    sparse_vector_name="sparse-vector",
    path=os.path.join(os.getcwd(), 'manuscrits_biblissima_vectordb'),
    collection_name="manuscrits_biblissima",
    retrieval_mode=RetrievalMode.HYBRID,
)
print(vectordb)
        
qdrant = QdrantVectorStore.from_existing_collection(
    collection_name="manuscrits_biblissima",
    path=os.path.join(os.getcwd(), 'manuscrits_biblissima_vectordb'),
    retrieval_mode=RetrievalMode.HYBRID,
    embedding=embeddings, 
    sparse_embedding=sparse_embeddings,
    sparse_vector_name="sparse-vector"
    )
res = qdrant.search("where is the capital of france",search_type="similarity", k=1)
print(res)

@EtienneFerrandi
Copy link
Author

yes, it works for me also when I change Qdrant.from_texts by QdrantVectorStore.from_texts.
Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants