[Question]: Empty Response after query_engine.query #7803

pranavbhat12 · 2023-09-24T19:08:51Z

Question Validation

I have searched both the documentation and discord for an answer.

Question

I am trying to read the content of website and index it using llama_index but after I perform query_engine.query(question) I get a empty response.I have latest version of llama_index installed.(0.8.31)

When I saw the index_store.json file its content was:
{"index_store/data": {"some_string": {"type": "vector_store", "data": "{"index_id": "some_string", "summary": null, "nodes_dict": {}, "doc_id_dict": {}, "embeddings_dict": {}}"}}}

Below is the code:

def document_loader():
web_links = ["https://www.databricks.com/","https://help.databricks.com","https://databricks.com/try-databricks","https://help.databricks.com/s/","https://docs.databricks.com"]
loader = WebBaseLoader(web_links)
documents = loader.load()
docs=[Document(page_content=doc.page_content,metadata={"source":doc.metadata["source"]}) for doc in documents]
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}

embeddings =LangchainEmbedding(
HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)
)

service_context = ServiceContext.from_defaults(
chunk_size=2048,
llm=llm,
embed_model=embeddings
)
set_global_service_context(service_context)
index=GPTVectorStoreIndex.from_documents(docs)
index.storage_context.persist("db/naval_index")
storage_context = StorageContext.from_defaults(persist_dir="db/naval_index")
new_index = load_index_from_storage(storage_context)
new_query_engine = new_index.as_query_engine()
new_query_engine.query(question)

Is it the problem with llama_index or my approach is incorrect?

dosubot · 2023-09-24T19:16:58Z

🤖

Hello,

Based on the information you've provided, it seems like the issue might be related to the content being loaded into the index or the query you're executing. Here are a few things you could check:

Content Loading: Ensure that you have loaded relevant content into the index before executing the query. You can do this by calling the load method with the appropriate arguments.
Keyword Matching: LlamaIndex uses a keyword-table based index. It first tries to extract keywords from the source text and stores the keywords as keys per item. It similarly extracts keywords from the query text. Then, it tries to match those keywords to existing keywords in the table. If the keywords extracted from your query do not match any of the keywords in the index, this could result in an empty response. Make sure that your query contains keywords that match the content loaded into the index.
Knowledge Graph Matching: LlamaIndex also builds a Knowledge Graph by extracting triplets and leveraging the KG during query-time. If the triplets extracted from your query do not match any of the triplets in the Knowledge Graph, this could also result in an empty response. If you are using the Knowledge Graph feature, ensure that your query contains triplets that match the triplets in the Knowledge Graph.

You can refer to the following sources for more information:

Also, it's worth noting that there have been similar issues reported in the past. You might find the discussions and solutions helpful:

Please let me know if this helps or if you need further assistance.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

logan-markewich · 2023-09-24T22:13:53Z

When you load the index, you should pass the service context back in

new_index = load_index_from_storage(storage_context, service_context=service_context)

logan-markewich · 2023-09-24T22:15:13Z

Also, it looks like you are using the langchain document class? Or you set the wrong kwargs. Should be

from llama_index import Document
docs=[Document(text=doc.page_content, metadata={"source":doc.metadata["source"]}) for doc in documents]

I.e. llama-index docs use the text= kwarg, not page_context=

pranavbhat12 · 2023-09-25T12:46:14Z

Thankyou for the response @logan-markewich. Again now I removed the manual Document creation code and replaced the web loader from Langchain to BeautifulSoupWebReader in llama_index.

Below is the index.json file content:

{"index_store/data": {"some_string": {"type": "vector_store", "data": "{"index_id": "some_string", "summary": null, "nodes_dict": {"2e48e410-a236-4f61-a282-e94429cb9bb9": "2e48e410-a236-4f61-a282-e94429cb9bb9"}, "doc_id_dict": {}, "embeddings_dict": {}}"}}}

But still doc_id_dict and embeddings_dict are empty.

Below is the code:

from llama_index.embeddings import LangchainEmbedding
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index import set_global_service_context
from llama_index import ServiceContext
from llama_index import VectorStoreIndex, download_loader, GPTVectorStoreIndex
from llama_index import StorageContext, load_index_from_storage

def document_loader():
web_links = ["https://www.databricks.com/"]
BeautifulSoupWebReader = download_loader("BeautifulSoupWebReader")

loader = BeautifulSoupWebReader()
documents = loader.load_data(urls=web_links)
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}

embeddings =LangchainEmbedding(
HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)
)

service_context = ServiceContext.from_defaults(
chunk_size=2048,
llm=llm,
embed_model=embeddings
)

set_global_service_context(service_context)
index=GPTVectorStoreIndex.from_documents(documents)
index.storage_context.persist("db/naval_index")
storage_context = StorageContext.from_defaults(persist_dir="db/naval_index")
new_index = load_index_from_storage(storage_context,service_context=service_context)
new_query_engine = new_index.as_query_engine()
response=new_query_engine.query("What is Databricks")
print(response)

Output:

Output now is blank atleast earlier it was printing Empty Response!!

logan-markewich · 2023-09-25T21:28:51Z

Hmm I suspect if you print(response.source_nodes) it will show the retrieved nodes properly

Seems like maybe an issue with the LLM? Which LLM are you using, or how is it setup? I would try decreasing the chunk size to 1024, and maybe setting context_window=3800 if using llama2

pranavbhat12 added the question Further information is requested label Sep 24, 2023

run-llama locked and limited conversation to collaborators Oct 24, 2023

Disiok converted this issue into discussion #8421 Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

[Question]: Empty Response after query_engine.query #7803

[Question]: Empty Response after query_engine.query #7803

pranavbhat12 commented Sep 24, 2023 •

edited

Loading

dosubot bot commented Sep 24, 2023

logan-markewich commented Sep 24, 2023

logan-markewich commented Sep 24, 2023 •

edited

Loading

pranavbhat12 commented Sep 25, 2023 •

edited

Loading

logan-markewich commented Sep 25, 2023 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

[Question]: Empty Response after query_engine.query #7803

[Question]: Empty Response after query_engine.query #7803

Comments

pranavbhat12 commented Sep 24, 2023 • edited Loading

Question Validation

Question

dosubot bot commented Sep 24, 2023

Sources

logan-markewich commented Sep 24, 2023

logan-markewich commented Sep 24, 2023 • edited Loading

pranavbhat12 commented Sep 25, 2023 • edited Loading

logan-markewich commented Sep 25, 2023 • edited Loading

This issue was moved to a discussion.

pranavbhat12 commented Sep 24, 2023 •

edited

Loading

logan-markewich commented Sep 24, 2023 •

edited

Loading

pranavbhat12 commented Sep 25, 2023 •

edited

Loading

logan-markewich commented Sep 25, 2023 •

edited

Loading