sparse vectors name unknown #24658

EtienneFerrandi · 2024-07-25T08:20:55Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

#Step 1

import os
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_qdrant import Qdrant, FastEmbedSparse, RetrievalMode

embeddings = HuggingFaceEmbeddings(model_name='OrdalieTech/Solon-embeddings-large-0.1', model_kwargs={"device": "cuda"})

sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

vectordb = Qdrant.from_texts(
    texts=texts,
    embedding=embeddings,
    sparse_embedding=sparse_embeddings,
    sparse_vector_name="sparse-vector"
    path=os.path.join(os.getcwd(), 'manuscrits_biblissima_vectordb'),
    collection_name="manuscrits_biblissima",
    retrieval_mode=RetrievalMode.HYBRID,
)

#Step 2

model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(
    model_name='OrdalieTech/Solon-embeddings-large-0.1',
    model_kwargs=model_kwargs
    )
sparse_embeddings = FastEmbedSparse(
    model_name="Qdrant/bm25",
    model_kwargs=model_kwargs,
    )
        
qdrant = QdrantVectorStore.from_existing_collection(
    collection_name="manuscrits_biblissima",
    path=os.path.join(os.getcwd(), 'manuscrits_biblissima_vectordb'),
    retrieval_mode=RetrievalMode.HYBRID,
    embedding=embeddings, 
    sparse_embedding=sparse_embeddings,
    sparse_vector_name="sparse-vector"
    )

Error Message and Stack Trace (if applicable)

Traceback (most recent call last):
  File "/local/eferra01/data/get_ref_llama3_70B_gguf.py", line 101, in <module>
    qdrant = QdrantVectorStore.from_existing_collection(
  File "/local/eferra01/miniconda3/envs/llama-cpp-env/lib/python3.9/site-packages/langchain_qdrant/qdrant.py", line 286, in from_existing_collection
    return cls(
  File "/local/eferra01/miniconda3/envs/llama-cpp-env/lib/python3.9/site-packages/langchain_qdrant/qdrant.py", line 87, in __init__
    self._validate_collection_config(
  File "/local/eferra01/miniconda3/envs/llama-cpp-env/lib/python3.9/site-packages/langchain_qdrant/qdrant.py", line 937, in _validate_collection_config
    cls._validate_collection_for_sparse(
  File "/local/eferra01/miniconda3/envs/llama-cpp-env/lib/python3.9/site-packages/langchain_qdrant/qdrant.py", line 1022, in _validate_collection_for_sparse
    raise QdrantVectorStoreError(
langchain_qdrant.qdrant.QdrantVectorStoreError: Existing Qdrant collection manuscrits_biblissima does not contain sparse vectors named None. If you want to recreate the collection, set force_recreate parameter to True.

Description

I first create a qdrant database (#Step 1).

Then, in another script, to do RAG, I try to load the database (#Step 2).

However, I have the error above.

I named the sparse vectors when creating the database (Step 1) and took care to mention this name when loading the database for the RAG, (Step 2) but it doesn't seem to have been taken into account...

System Info

langchain-qdrant==0.1.3
OS : Linux
OS Version : Linux dgx 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
Python Version : 3.9.19 | packaged by conda-forge | (main, Mar 20 2024, 12:50:21) \n[GCC 12.3.0]

The text was updated successfully, but these errors were encountered:

wulifu2hao · 2024-07-25T09:44:37Z

I am able to reproduce your issue. However changing "Qdrant.from_texts(" in your step 1 To "QdrantVectorStore.from_texts" fix the issue for me


import os
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_qdrant import FastEmbedSparse, RetrievalMode, QdrantVectorStore

embeddings = HuggingFaceEmbeddings(model_name='OrdalieTech/Solon-embeddings-large-0.1')

sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

texts = ["the capital of france is paris", "the capital of germany is berlin", "the capital of italy is rome"]
vectordb = QdrantVectorStore.from_texts(
    # vectordb = Qdrant.from_texts(
    texts=texts,
    embedding=embeddings,
    sparse_embedding=sparse_embeddings,
    sparse_vector_name="sparse-vector",
    path=os.path.join(os.getcwd(), 'manuscrits_biblissima_vectordb'),
    collection_name="manuscrits_biblissima",
    retrieval_mode=RetrievalMode.HYBRID,
)
print(vectordb)
        
qdrant = QdrantVectorStore.from_existing_collection(
    collection_name="manuscrits_biblissima",
    path=os.path.join(os.getcwd(), 'manuscrits_biblissima_vectordb'),
    retrieval_mode=RetrievalMode.HYBRID,
    embedding=embeddings, 
    sparse_embedding=sparse_embeddings,
    sparse_vector_name="sparse-vector"
    )
res = qdrant.search("where is the capital of france",search_type="similarity", k=1)
print(res)

EtienneFerrandi · 2024-07-25T10:53:55Z

yes, it works for me also when I change Qdrant.from_texts by QdrantVectorStore.from_texts.
Thanks !

dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Jul 25, 2024

EtienneFerrandi closed this as completed Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sparse vectors name unknown #24658

sparse vectors name unknown #24658

EtienneFerrandi commented Jul 25, 2024

wulifu2hao commented Jul 25, 2024 •

edited

Loading

EtienneFerrandi commented Jul 25, 2024

sparse vectors name unknown #24658

sparse vectors name unknown #24658

Comments

EtienneFerrandi commented Jul 25, 2024

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

wulifu2hao commented Jul 25, 2024 • edited Loading

EtienneFerrandi commented Jul 25, 2024

wulifu2hao commented Jul 25, 2024 •

edited

Loading