Replies: 1 comment 1 reply
-
🤖 Hello @hboen1990! I'm Dosu, a bot designed to help you with your questions and issues related to the LangChain repository. I'm here to assist you while waiting for a human maintainer. Don't worry, I'm here to help you uncover the answers to your questions and navigate through any bugs you might encounter. Let's get started, shall we? To use the persisted data in Weaviate to create a vectorstore for RAG without re-ingesting the documents, you can use the Here is an example of how you can do this: import weaviate
from langchain.vectorstores import Weaviate
client = weaviate.Client(url=os.environ["WEAVIATE_URL"], ...)
weaviate_vectorstore = Weaviate(client, index_name, text_key) In this example, Once you have created the Regarding your question about RAG with Weaviate vectorstore, LangChain does support this. The provided context shows that Weaviate is used as a vectorstore to retrieve context for the RAG model. Here is the relevant code: # Add to vectorDB
vectorstore = Weaviate.from_existing_index(WEAVIATE_INDEX_NAME, OpenAIEmbeddings())
retriever = vectorstore.as_retriever()
# RAG prompt
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
# RAG
model = ChatOpenAI()
chain = (
RunnableParallel({"context": retriever, "question": RunnablePassthrough()})
| prompt
| model
| StrOutputParser()
) You can find more details in the chain.py file. I hope this helps! If you have any other questions, feel free to ask. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
Hi All,
I am exploring weaviate with langchain. I have loaded a bunch of pdf’s and do the standard splitting and create a weaviate class as
class_obj = {
"class": "WMOInfo",
"properties": [{"dataType": ["text"], "name": "content"}],
"vectorizer": "text2vec-huggingface",
}
And use the BGE embedding
model_name = "BAAI/bge-base-en-v1.5"
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity
model_norm = HuggingFaceBgeEmbeddings(
model_name=model_name,
model_kwargs={'device': 'cpu'},
encode_kwargs=encode_kwargs
embedding = model_norm
and create a vector store as
weaviate_vectorstore = Weaviate.from_documents(documents=texts,
embedding=embedding,
client=client,
by_text=False)
with the restriever:
weaviate_retriever = weaviate_vectorstore.as_retriever(search_kwargs={"k":3})
response = weaviate_retriever.get_relevant_documents(query="some query”)
print(response) -> I get the relevant documents.
The above code for digesting the pdf’s works.
Now I want to use the vectorstore to do RAG. I cannot find any code to access the persisted data to create a vectorstore without doing the ingestion of the documents again. In chroma it's easy you just do vectorstore = Chroma(persist_directory=persist_directory, embedding_function=embedding).
How can I do this in Weaviate similar as Chroma?
I am using a local version of weaviate using Docker.
Thanks in advance.
Schema output:
{
"classes": [
{
"class": "WMOInfo",
"invertedIndexConfig": {
"bm25": {
"b": 0.75,
"k1": 1.2
},
"cleanupIntervalSeconds": 60,
"stopwords": {
"additions": null,
"preset": "en",
"removals": null
}
},
"moduleConfig": {
"text2vec-huggingface": {
"vectorizeClassName": true
}
},
"multiTenancyConfig": {
"enabled": false
},
"properties": [
{
"dataType": [
"text"
],
"indexFilterable": true,
"indexSearchable": true,
"moduleConfig": {
"text2vec-huggingface": {
"skip": false,
"vectorizePropertyName": false
}
},
"name": "content",
"tokenization": "word"
}
],
"replicationConfig": {
"factor": 1
},
"shardingConfig": {
"virtualPerPhysical": 128,
"desiredCount": 1,
"actualCount": 1,
"desiredVirtualCount": 128,
"actualVirtualCount": 128,
"key": "_id",
"strategy": "hash",
"function": "murmur3"
},
"vectorIndexConfig": {
"skip": false,
"cleanupIntervalSeconds": 300,
"maxConnections": 64,
"efConstruction": 128,
"ef": -1,
"dynamicEfMin": 100,
"dynamicEfMax": 500,
"dynamicEfFactor": 8,
"vectorCacheMaxObjects": 1000000000000,
"flatSearchCutoff": 40000,
"distance": "cosine",
"pq": {
"enabled": false,
"bitCompression": false,
"segments": 0,
"centroids": 256,
"trainingLimit": 100000,
"encoder": {
"type": "kmeans",
"distribution": "log-normal"
}
}
},
"vectorIndexType": "hnsw",
"vectorizer": "text2vec-huggingface"
},
{
"class": "LangChain_71db963cf5114df8b31ef1edde728ef3",
"invertedIndexConfig": {
"bm25": {
"b": 0.75,
"k1": 1.2
},
"cleanupIntervalSeconds": 60,
"stopwords": {
"additions": null,
"preset": "en",
"removals": null
}
},
"multiTenancyConfig": {
"enabled": false
},
"properties": [
{
"dataType": [
"text"
],
"indexFilterable": true,
"indexSearchable": true,
"name": "text",
"tokenization": "word"
},
{
"dataType": [
"text"
],
"description": "This property was generated by Weaviate's auto-schema feature on Fri Dec 29 01:00:07 2023",
"indexFilterable": true,
"indexSearchable": true,
"name": "source",
"tokenization": "word"
},
{
"dataType": [
"number"
],
"description": "This property was generated by Weaviate's auto-schema feature on Fri Dec 29 01:00:07 2023",
"indexFilterable": true,
"indexSearchable": false,
"name": "page"
}
],
"replicationConfig": {
"factor": 1
},
"shardingConfig": {
"virtualPerPhysical": 128,
"desiredCount": 1,
"actualCount": 1,
"desiredVirtualCount": 128,
"actualVirtualCount": 128,
"key": "_id",
"strategy": "hash",
"function": "murmur3"
},
"vectorIndexConfig": {
"skip": false,
"cleanupIntervalSeconds": 300,
"maxConnections": 64,
"efConstruction": 128,
"ef": -1,
"dynamicEfMin": 100,
"dynamicEfMax": 500,
"dynamicEfFactor": 8,
"vectorCacheMaxObjects": 1000000000000,
"flatSearchCutoff": 40000,
"distance": "cosine",
"pq": {
"enabled": false,
"bitCompression": false,
"segments": 0,
"centroids": 256,
"trainingLimit": 100000,
"encoder": {
"type": "kmeans",
"distribution": "log-normal"
}
}
},
"vectorIndexType": "hnsw",
"vectorizer": "none"
}
]
}
Beta Was this translation helpful? Give feedback.
All reactions