Can not load it into Chroma, and then query it ！index.as_query_engine(chroma_collection=chroma_collection) #6858

ZhuJD-China · 2023-07-12T02:08:52Z

Question Validation

I have searched both the documentation and discord for an answer.

Question

save to disk

from dotenv import load_dotenv

load_dotenv()
from chromadb import Settings
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores import ChromaVectorStore
from llama_index.storage.storage_context import StorageContext
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.embeddings import LangchainEmbedding
from IPython.display import Markdown, display
import chromadb

set up OpenAI

import os
import getpass

create client and a new collection

chroma_client = chromadb.Client()
chroma_collection = chroma_client.create_collection("quickstart")

print(chroma_collection.count())

define embedding function

embed_model = LangchainEmbedding(
HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
)

load documents

documents = SimpleDirectoryReader(
"news"
).load_data()

print(documents)

set up ChromaVectorStore and load in data

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context, embed_model=embed_model
)

print(index)
print(chroma_collection.count())

print(chroma_collection.get()['documents'])

print(chroma_collection.get()['metadatas'])

Query Data

query_engine = index.as_query_engine(chroma_collection=chroma_collection)
response = query_engine.query("中国最近发生了什么,说出发生的时间?")
print(response)
display(Markdown(f"{response}"))

OUTPUTS:
F:\Anaconda\python.exe D:\EmbeddingsSearch\Chroma_Search\test.py
0
<llama_index.indices.vector_store.base.VectorStoreIndex object at 0x00000210F8D88E50>
77

中国最近发生了2022年11月末，中国北京、上海、南京、广州、成都、重庆等多个城市爆发了大批年轻人抗议所谓“动态清零”极端防疫政策的白纸运动，以及武汉、大连、鞍山等地出现了大批退休人员上街抗议医保福利削减的“白发运动”，以及中共领导人反复强调安全，以及达赖喇嘛称北京方面有意与他接触，他对恢复对话没有任何问题，发生的时间是2022年11月末至2023年7月10日。
<IPython.core.display.Markdown object>

Process finished with exit code 0

logan-markewich · 2023-07-12T16:29:55Z

This line

query_engine = index.as_query_engine(chroma_collection=chroma_collection)

Is not needed. The index is already loaded and connected to chroma.

Also, you are not creating the embed_model properly. Should be like this

from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(embed_model=embed_model)
index = VectorStoreIndex.from_documents(
    documents, 
    storage_context=storage_context, 
    service_context= service_context
)

To reload the index from chroma after you created it with llama-index, you can create the vector store and use it like this

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)

logan-markewich · 2023-07-12T16:58:50Z

Looks like the docs for chroma are wrong. Updating now..

ZhuJD-China added the question Further information is requested label Jul 12, 2023

logan-markewich mentioned this issue Jul 12, 2023

fix chroma notebook in docs #6872

Merged

1 task

logan-markewich closed this as completed in #6872 Jul 12, 2023

dosubot bot mentioned this issue Nov 2, 2023

[Question]: How the retrieve the number of records in an index inside chromadb using llamaindex #8638

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not load it into Chroma, and then query it ！index.as_query_engine(chroma_collection=chroma_collection) #6858

Can not load it into Chroma, and then query it ！index.as_query_engine(chroma_collection=chroma_collection) #6858

ZhuJD-China commented Jul 12, 2023

logan-markewich commented Jul 12, 2023

logan-markewich commented Jul 12, 2023

Can not load it into Chroma, and then query it ！index.as_query_engine(chroma_collection=chroma_collection) #6858

Can not load it into Chroma, and then query it ！index.as_query_engine(chroma_collection=chroma_collection) #6858

Comments

ZhuJD-China commented Jul 12, 2023

Question Validation

Question

save to disk

set up OpenAI

create client and a new collection

define embedding function

load documents

print(documents)

set up ChromaVectorStore and load in data

print(chroma_collection.get()['documents'])

print(chroma_collection.get()['metadatas'])

Query Data

logan-markewich commented Jul 12, 2023

logan-markewich commented Jul 12, 2023