Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not load it into Chroma, and then query it !index.as_query_engine(chroma_collection=chroma_collection) #6858

Closed
1 task done
ZhuJD-China opened this issue Jul 12, 2023 · 2 comments · Fixed by #6872
Closed
1 task done
Labels
question Further information is requested

Comments

@ZhuJD-China
Copy link

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

save to disk

from dotenv import load_dotenv

load_dotenv()
from chromadb import Settings
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores import ChromaVectorStore
from llama_index.storage.storage_context import StorageContext
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.embeddings import LangchainEmbedding
from IPython.display import Markdown, display
import chromadb

set up OpenAI

import os
import getpass

create client and a new collection

chroma_client = chromadb.Client()
chroma_collection = chroma_client.create_collection("quickstart")

print(chroma_collection.count())

define embedding function

embed_model = LangchainEmbedding(
HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
)

load documents

documents = SimpleDirectoryReader(
"news"
).load_data()

print(documents)

set up ChromaVectorStore and load in data

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context, embed_model=embed_model
)

print(index)
print(chroma_collection.count())

print(chroma_collection.get()['documents'])

print(chroma_collection.get()['metadatas'])

Query Data

query_engine = index.as_query_engine(chroma_collection=chroma_collection)
response = query_engine.query("中国最近发生了什么,说出发生的时间?")
print(response)
display(Markdown(f"{response}"))

OUTPUTS:
F:\Anaconda\python.exe D:\EmbeddingsSearch\Chroma_Search\test.py
0
<llama_index.indices.vector_store.base.VectorStoreIndex object at 0x00000210F8D88E50>
77

中国最近发生了2022年11月末,中国北京、上海、南京、广州、成都、重庆等多个城市爆发了大批年轻人抗议所谓“动态清零”极端防疫政策的白纸运动,以及武汉、大连、鞍山等地出现了大批退休人员上街抗议医保福利削减的“白发运动”,以及中共领导人反复强调安全,以及达赖喇嘛称北京方面有意与他接触,他对恢复对话没有任何问题,发生的时间是2022年11月末至2023年7月10日。
<IPython.core.display.Markdown object>

Process finished with exit code 0

@ZhuJD-China ZhuJD-China added the question Further information is requested label Jul 12, 2023
@logan-markewich
Copy link
Collaborator

This line

query_engine = index.as_query_engine(chroma_collection=chroma_collection)

Is not needed. The index is already loaded and connected to chroma.

Also, you are not creating the embed_model properly. Should be like this

from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(embed_model=embed_model)
index = VectorStoreIndex.from_documents(
    documents, 
    storage_context=storage_context, 
    service_context= service_context
)

To reload the index from chroma after you created it with llama-index, you can create the vector store and use it like this

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)

@logan-markewich
Copy link
Collaborator

Looks like the docs for chroma are wrong. Updating now..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants