Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: How to add previous Chat context when querying an index? #7296

Closed
1 task done
niels-bosman opened this issue Aug 17, 2023 · 6 comments
Closed
1 task done
Labels
question Further information is requested

Comments

@niels-bosman
Copy link

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

I am looking for a way to send the context of my chat application to Llama index via the .query() function but it seems like it only accepts a string:

streaming_response = index.as_query_engine(
        streaming=True, similarity_top_k=1
    ).query(query_text)

I would like to know how I can make it so Llama understands the context of my chats like I can do with OpenAI:

openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

Any tips on how I can achieve this using the Llama API? Thanks.

@niels-bosman niels-bosman added the question Further information is requested label Aug 17, 2023
@dosubot
Copy link

dosubot bot commented Aug 17, 2023

🤖

Hello,

To send the context of your chat application to LlamaIndex via the .query() function, you can use the chat() method of the ContextChatEngine class. This method takes a message and an optional chat history as arguments, generates a context from the message using a retriever, sets the context in the system prompt, and then uses an LLM to generate a response.

Here is a simplified example of how you can use it:

from llama_index.chat_engine.context import ContextChatEngine
from llama_index.indices.base_retriever import BaseRetriever
from llama_index.llms.base import LLM, ChatMessage
from llama_index.memory import BaseMemory
from llama_index.indices.service_context import ServiceContext

# Initialize your components
retriever = BaseRetriever()  # Replace with your actual retriever
llm = LLM()  # Replace with your actual LLM
memory = BaseMemory()  # Replace with your actual memory
service_context = ServiceContext()  # Replace with your actual service context
prefix_messages = [ChatMessage(role="system", content="Your system prompt here")]  # Replace with your actual prefix messages

# Initialize the chat engine
chat_engine = ContextChatEngine(
    retriever=retriever,
    llm=llm,
    memory=memory,
    prefix_messages=prefix_messages,
)

# Use the chat method
message = "Your message here"
chat_history = [ChatMessage(role="user", content="Previous message")]  # Replace with your actual chat history
response = chat_engine.chat(message, chat_history)

print(response)

Please replace the placeholders with your actual components. The chat() method will return an AgentChatResponse object, which contains the response from the LLM and the sources of the response.

You can find more details about the ContextChatEngine class in the LlamaIndex codebase.

I hope this helps! If you have any other questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@mcmoochi
Copy link

🤖

Hello,

To send the context of your chat application to LlamaIndex via the .query() function, you can use the chat() method of the ContextChatEngine class. This method takes a message and an optional chat history as arguments, generates a context from the message using a retriever, sets the context in the system prompt, and then uses an LLM to generate a response.

Here is a simplified example of how you can use it:

from llama_index.chat_engine.context import ContextChatEngine
from llama_index.indices.base_retriever import BaseRetriever
from llama_index.llms.base import LLM, ChatMessage
from llama_index.memory import BaseMemory
from llama_index.indices.service_context import ServiceContext

# Initialize your components
retriever = BaseRetriever()  # Replace with your actual retriever
llm = LLM()  # Replace with your actual LLM
memory = BaseMemory()  # Replace with your actual memory
service_context = ServiceContext()  # Replace with your actual service context
prefix_messages = [ChatMessage(role="system", content="Your system prompt here")]  # Replace with your actual prefix messages

# Initialize the chat engine
chat_engine = ContextChatEngine(
    retriever=retriever,
    llm=llm,
    memory=memory,
    prefix_messages=prefix_messages,
)

# Use the chat method
message = "Your message here"
chat_history = [ChatMessage(role="user", content="Previous message")]  # Replace with your actual chat history
response = chat_engine.chat(message, chat_history)

print(response)

Please replace the placeholders with your actual components. The chat() method will return an AgentChatResponse object, which contains the response from the LLM and the sources of the response.

You can find more details about the ContextChatEngine class in the LlamaIndex codebase.

I hope this helps! If you have any other questions, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

I have the same question, while this method works to inject previous chat history, it seems to forgo a key function of the query engine which is to constrain the response to the knowledge base provided. ContextChatEngine does not seems to be limited to the index (at least I don't know how to).

@logan-markewich
Copy link
Collaborator

@mcmoochi the context chat engine CAN be restrained, you just have to write a system prompt that encourages that behaviour. Tbh the system prompt is powerful if written well.

Other options include the

  • condense question chat engine (will query the index every time, but can lead to some "forced" logic/conversation)
  • an agent with an index as a tool. Since the agent decides when to invoke the query engine, this is maybe less controlled, but...
  • an agent with forced tool calling -- you can setup an agent and force it to use a tool every time. Again, this can feel forced.

@mcmoochi
Copy link

@mcmoochi the context chat engine CAN be restrained, you just have to write a system prompt that encourages that behaviour. Tbh the system prompt is powerful if written well.

Other options include the

  • condense question chat engine (will query the index every time, but can lead to some "forced" logic/conversation)
  • an agent with an index as a tool. Since the agent decides when to invoke the query engine, this is maybe less controlled, but...
  • an agent with forced tool calling -- you can setup an agent and force it to use a tool every time. Again, this can feel forced.

Thanks very much for the response. I have tried the system prompt, it doesn't seem to be working for me for some reason.

For example, I'm passing this system prompt: "Don't answer if question is not directly related to [My company]." and using. "how to cook a potato" as a control question. My index only consist of terms of conditions of products, currently the bot will give a detailed response on how to cook a potato...

Condense question seems to do the same thing, not sure if I'm doing something wrong. I'll try the agent forced too calling. Any help would be greatly appreciated!

Here are my codes:

custom_prompt = """
    Act friendly and charming \
    say 'I don't know' if question is not directly related to [my company]. \
    Make your responses short and concise. \
    If the question is not specific, ask for more information. \
    Use Australian english spelling and grammar. \
"""
def construct_index(directory_path):
    num_outputs = 512
    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.7, model_name="text-davinci-003", max_tokens=num_outputs))
    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
    docs = SimpleDirectoryReader(directory_path).load_data()
    index = GPTVectorStoreIndex.from_documents(docs, service_context=service_context, summary_template=custom_ft_prompt)
    index.storage_context.persist(persist_dir='index.json')
    return index
    def respond(message, chat_history):
        storage_context = StorageContext.from_defaults(persist_dir="index.json")
        index = load_index_from_storage(storage_context)
        chat_engine = index.as_chat_engine(
            chat_mode="condense_question", memory=memory)
        response=chat_engine.chat(message)
        ##appending message history
        bot_message = str(response)
        chat_history.append((message, bot_message))
        return "", chat_history

@niels-bosman
Copy link
Author

@logan-markewich When using the chat engine instead of the query engine, don’t you lose the ability to also use the data as context? It also does not look like the chat engine supports streaming like the query engine does with streaming=True.

@logan-markewich
Copy link
Collaborator

@niels-bosman nope, you still get the data as context. Each chat engine uses the index under the hood in slightly different ways

Also, with chat engines, streaming is enabled by using specific endpoints -- chat_engine.stream_chat() and chat_engine.astream_chat

Then, you can iterate over the response

Going to link some docs pages that relevant, but I think it's safe to close this for now. Feel free to reach out on discord!

https://gpt-index.readthedocs.io/en/latest/core_modules/query_modules/chat_engines/usage_pattern.html

https://gpt-index.readthedocs.io/en/latest/examples/agent/openai_agent.html#streaming-chat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants