Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Empty Response after query_engine.query #7803

Closed
1 task done
pranavbhat12 opened this issue Sep 24, 2023 · 5 comments
Closed
1 task done

[Question]: Empty Response after query_engine.query #7803

pranavbhat12 opened this issue Sep 24, 2023 · 5 comments
Labels
question Further information is requested

Comments

@pranavbhat12
Copy link

pranavbhat12 commented Sep 24, 2023

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

I am trying to read the content of website and index it using llama_index but after I perform query_engine.query(question) I get a empty response.I have latest version of llama_index installed.(0.8.31)

When I saw the index_store.json file its content was:
{"index_store/data": {"some_string": {"type": "vector_store", "data": "{"index_id": "some_string", "summary": null, "nodes_dict": {}, "doc_id_dict": {}, "embeddings_dict": {}}"}}}

Below is the code:

def document_loader():
web_links = ["https://www.databricks.com/","https://help.databricks.com","https://databricks.com/try-databricks","https://help.databricks.com/s/","https://docs.databricks.com"]
loader = WebBaseLoader(web_links)
documents = loader.load()
docs=[Document(page_content=doc.page_content,metadata={"source":doc.metadata["source"]}) for doc in documents]
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}

embeddings =LangchainEmbedding(
HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)
)

service_context = ServiceContext.from_defaults(
chunk_size=2048,
llm=llm,
embed_model=embeddings
)
set_global_service_context(service_context)
index=GPTVectorStoreIndex.from_documents(docs)
index.storage_context.persist("db/naval_index")
storage_context = StorageContext.from_defaults(persist_dir="db/naval_index")
new_index = load_index_from_storage(storage_context)
new_query_engine = new_index.as_query_engine()
new_query_engine.query(question)

Is it the problem with llama_index or my approach is incorrect?

@pranavbhat12 pranavbhat12 added the question Further information is requested label Sep 24, 2023
@dosubot
Copy link

dosubot bot commented Sep 24, 2023

🤖

Hello,

Based on the information you've provided, it seems like the issue might be related to the content being loaded into the index or the query you're executing. Here are a few things you could check:

  1. Content Loading: Ensure that you have loaded relevant content into the index before executing the query. You can do this by calling the load method with the appropriate arguments.

  2. Keyword Matching: LlamaIndex uses a keyword-table based index. It first tries to extract keywords from the source text and stores the keywords as keys per item. It similarly extracts keywords from the query text. Then, it tries to match those keywords to existing keywords in the table. If the keywords extracted from your query do not match any of the keywords in the index, this could result in an empty response. Make sure that your query contains keywords that match the content loaded into the index.

  3. Knowledge Graph Matching: LlamaIndex also builds a Knowledge Graph by extracting triplets and leveraging the KG during query-time. If the triplets extracted from your query do not match any of the triplets in the Knowledge Graph, this could also result in an empty response. If you are using the Knowledge Graph feature, ensure that your query contains triplets that match the triplets in the Knowledge Graph.

You can refer to the following sources for more information:

Also, it's worth noting that there have been similar issues reported in the past. You might find the discussions and solutions helpful:

Please let me know if this helps or if you need further assistance.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@logan-markewich
Copy link
Collaborator

When you load the index, you should pass the service context back in

new_index = load_index_from_storage(storage_context, service_context=service_context)

@logan-markewich
Copy link
Collaborator

logan-markewich commented Sep 24, 2023

Also, it looks like you are using the langchain document class? Or you set the wrong kwargs. Should be

from llama_index import Document
docs=[Document(text=doc.page_content, metadata={"source":doc.metadata["source"]}) for doc in documents]

I.e. llama-index docs use the text= kwarg, not page_context=

@pranavbhat12
Copy link
Author

pranavbhat12 commented Sep 25, 2023

Thankyou for the response @logan-markewich. Again now I removed the manual Document creation code and replaced the web loader from Langchain to BeautifulSoupWebReader in llama_index.

Below is the index.json file content:

{"index_store/data": {"some_string": {"type": "vector_store", "data": "{"index_id": "some_string", "summary": null, "nodes_dict": {"2e48e410-a236-4f61-a282-e94429cb9bb9": "2e48e410-a236-4f61-a282-e94429cb9bb9"}, "doc_id_dict": {}, "embeddings_dict": {}}"}}}

But still doc_id_dict and embeddings_dict are empty.

Below is the code:

from llama_index.embeddings import LangchainEmbedding
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index import set_global_service_context
from llama_index import ServiceContext
from llama_index import VectorStoreIndex, download_loader, GPTVectorStoreIndex
from llama_index import StorageContext, load_index_from_storage

def document_loader():
web_links = ["https://www.databricks.com/"]
BeautifulSoupWebReader = download_loader("BeautifulSoupWebReader")

loader = BeautifulSoupWebReader()
documents = loader.load_data(urls=web_links)
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}

embeddings =LangchainEmbedding(
HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)
)

service_context = ServiceContext.from_defaults(
chunk_size=2048,
llm=llm,
embed_model=embeddings
)

set_global_service_context(service_context)
index=GPTVectorStoreIndex.from_documents(documents)
index.storage_context.persist("db/naval_index")
storage_context = StorageContext.from_defaults(persist_dir="db/naval_index")
new_index = load_index_from_storage(storage_context,service_context=service_context)
new_query_engine = new_index.as_query_engine()
response=new_query_engine.query("What is Databricks")
print(response)

Output:
image

Output now is blank atleast earlier it was printing Empty Response!!

@logan-markewich
Copy link
Collaborator

logan-markewich commented Sep 25, 2023

Hmm I suspect if you print(response.source_nodes) it will show the retrieved nodes properly

Seems like maybe an issue with the LLM? Which LLM are you using, or how is it setup? I would try decreasing the chunk size to 1024, and maybe setting context_window=3800 if using llama2

@run-llama run-llama locked and limited conversation to collaborators Oct 24, 2023
@Disiok Disiok converted this issue into discussion #8421 Oct 24, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants