how to improve the perfromance of agents to get better responses from the local model like gpt4all #9506

Kuramdasu-ujwala-devi · 2023-08-20T06:25:58Z

how to improve the perfromance of agents to get better responses from the local model like gpt4all

dosubot · 2023-08-20T06:37:00Z

🤖

To enhance the performance of agents for improved responses from a local model like gpt4all in the context of LangChain, you can adjust several parameters in the GPT4All class. Here are some of them:

model: This parameter specifies the local path to the model you want to use. If you're not satisfied with the performance of the current model, you can try a different one.
backend: This parameter specifies the backend to use. If you want to use a custom model, you can add the backend parameter. Check https://docs.gpt4all.io/gpt4all_python.html for supported backends.
callbacks: This parameter allows you to add callbacks that support token-wise streaming. This can be useful for monitoring the progress of the model's output generation.
n_threads: This parameter specifies the number of threads to use. Increasing this number can improve the performance of the model if your system has more cores available.
n_predict: This parameter specifies the maximum number of tokens to generate. Adjusting this parameter can help control the length of the generated text.
temp, top_p, top_k: These parameters control the sampling strategy of the model. Adjusting these parameters can help control the diversity and quality of the generated text.
repeat_penalty: This parameter applies a penalty to repeated tokens, which can help improve the diversity of the generated text.

Here is an example of how you can adjust these parameters:

from langchain import PromptTemplate, LLMChain
from langchain.llms import GPT4All
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Callbacks support token-wise streaming
callbacks = [StreamingStdOutCallbackHandler()]

# Verbose is required to pass to the callback manager
llm = GPT4All(model="./models/ggml-gpt4all-l13b-snoozy.bin", backend="gptj", callbacks=callbacks, verbose=True, n_threads=8, n_predict=256, temp=0.7, top_p=0.1, top_k=40, repeat_penalty=1.18)

prompt = PromptTemplate(template="Question: {question}\n\nAnswer: Let's think step by step.", input_variables=["question"])

llm_chain = LLMChain(prompt=prompt, llm=llm)

Please note that the actual performance improvement depends on the specific requirements of your application and may require further fine-tuning of these parameters.

In addition to this, you can also use the Infino integration in the LangChain framework for logging and monitoring various metrics and events, which can be used for performance optimization, debugging, understanding the behavior of the agents, and visualizing key metrics. The Infino integration logs prompts, tracks latency, tracks errors, tracks token usage, and logs responses generated by the LLM. It also allows for the creation of metric charts using matplotlib to visualize latency, errors, and tokens consumed. Additionally, it supports full text query on prompt or prompt outputs, which can be useful for searching specific text in the logged data.

For more details, you can refer to the following sources:

I hope this helps! If you have any other questions, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Kuramdasu-ujwala-devi · 2023-08-21T03:02:09Z

from load_data import load_document, split_documents
from create_vectorstore import generate_embeddings
from langchain.agents.agent_toolkits import (
create_vectorstore_agent,
VectorStoreToolkit,
VectorStoreInfo,
)
import logging
import os
from langchain import OpenAI, VectorDBQA
from langchain.llms import OpenAI, GPT4All, LlamaCpp
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
from langchain.tools import BaseTool
import pandas as pd
import numpy as np
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
import csv
from datetime import datetime
import time
import yaml
import os

from langchain.chat_models import ChatOpenAI

###############Load parameters from YAML file
with open('qna/config.yml', 'r') as file:
params = yaml.safe_load(file)

llm = ChatOpenAI(openai_api_key= params['openai_api_key'])
llm.predict("Hello, world!")

csv_file_path = params['csv_file_path']
model_path = params['model_path']
embedding_model = params['embedding_model']
os.environ["OPENAI_API_KEY"] = params['openai_api_key']
question = params['question']
context = params['context']
model = params['model_type']

data = load_document(csv_file_path)

documents = split_documents(data)

vector_store = generate_embeddings(documents, embedding_model)

##########prompt template
prompt_template1 = """
You are a Bot assistant for answering any questions related to the given documents.
For a given question find the answer from the documents given to you.Use the question column from the document given to find the answer.
Please follow below instruction:

To find the answer first read the context and question properly.
Do not rely on any prior knowledge.
Avoid using web search.
show only the top result.
If you don't find the answer, simply state that you sorry! could not find. Avoid make up an answer.

context : {context}

now based on above context answer the question accurately:

Question: {question}

Answer:

"""

def generate_prompt(prompt_template1):
logger = logging.getLogger(name)
prompt_template = PromptTemplate(template=prompt_template1, input_variables=["question", "context"])
logger.info('Prompt template created Successfully!')
return prompt_template

def load_qa_chain_model(model_type, prompt_template):
logger = logging.getLogger(name)
logger.info(f'model is,{model_type}')
if model_type.lower() == "gpt4all":
llm=GPT4All(
model= model_path,
verbose=True,
streaming= True,
n_threads = 2,
temp = 0.2,
# top_p = 0.1,
# top_k = 40,
repeat_penalty = 200,
repeat_last_n = 0,
backend='gptj',
)
elif model_type.lower() == "openai":
llm=OpenAI(model_name="text-davinci-003",
temperature=0,
max_tokens=300,
)
else:
raise ValueError("Invalid embedding model specified.")
chain_type = {"prompt": prompt_template}
logging.info('Model loaded successfully!')
return llm, chain_type

####memory
def memory():
logger = logging.getLogger(name)
conversational_memory = ConversationBufferWindowMemory(
memory_key='chat_history',
k=5,
return_messages=True
)
logger.info('Conversational_memory created successfully')
return conversational_memory

####QA_chain
def qa_chain(retriever, llm, chain_type):
logger = logging.getLogger(name)
tacit_db = RetrievalQA.from_chain_type(
llm=llm, chain_type="stuff",chain_type_kwargs = chain_type, retriever=retriever.as_retriever()
)
logger.info('Created RetrievalQA chain')
return tacit_db

####creating custom tacit db_tool
def custom_db_tool(retriever, llm, chain_type):
logger = logging.getLogger(name)
tacit_db = qa_chain(retriever, llm, chain_type)
tools = [
Tool(
name="Tacit knowledge QA System",
func=tacit_db.run,
description="useful for when you need to answer questions about the Tacit knowledge. Decompose the question if the question is complex then search answers for related parts of the questions finally combine all the answers as a final output. Follow the instructions given in the prompt_template before searching answer. If you could not find the answer say you didn't found",
),
]
logger.info('RetrievalQA Tool created')
return tools

def track_lang_model(question, model_gen_answer, prompt_template, llm, duration):
csv_file_path = "qna/llm_tracking.csv"
# Check if the CSV file already exists or not
if not os.path.exists(csv_file_path):
# If the file doesn't exist, create a new one with headers
with open(csv_file_path, mode='w', newline='') as file:
writer = csv.writer(file)
writer.writerow(['question','model_gen_answer','prompt template', 'llm','duration'])
# Append the new data to the CSV file
with open(csv_file_path, mode='a', newline='') as file:
writer = csv.writer(file)
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
writer.writerow([question,model_gen_answer, prompt_template,llm, duration])

####Intialize agent
def call_agent_func(retriever, llm, chain_type, query, prompt_template):
logger = logging.getLogger(name)
start_time = time.time()
conversational_memory = memory()
tools = custom_db_tool(retriever,llm, chain_type)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True, memory = conversational_memory,
max_iterations=2,
early_stopping_method='generate' #return_intermediate_steps = True
)
logger.info('Initialization of Agent is done')
answer = agent.run(query)
end_time = time.time()
duration = end_time - start_time
logger.info('Agent executed successfully!')
track_lang_model(query , answer, prompt_template, llm, duration)
return answer

template = generate_prompt(prompt_template1)

llm, chain_type = load_qa_chain_model(model, template)

answer = call_agent_func(vector_store,llm, chain_type, question, prompt_template1)

print(answer)

##########################3
Here is my code -
I want to run complex queries using gpt4all model, but the model is not able to give response atleast.

output -

Entering new AgentExecutor chain...
Pricing strategy, effective closing techniques, and handling objections during a sales pitch are important aspects of sales.
Action: Research the topic and gather information on pricing strategy, effective closing techniques, and handling objections during a sales pitch.
Action Input: The information gathered from research.
Observation: Research the topic and gather information on pricing strategy, effective closing techniques, and handling objections during a sales pitch. is not a valid tool, try another one.

It is failing badly.
How to improve the performance here?

Kuramdasu-ujwala-devi · 2023-08-22T05:53:42Z

@ dosu-bot
please answer

dosubot · 2023-08-22T06:08:58Z

🤖

Hello,

To make the bot answer a question using your current setup, you can use the ChatOpenAI instance (llm) you've created. You've already loaded the documents and generated embeddings for them. Now, you need to use these embeddings to find the most relevant document for a given question and then use the ChatOpenAI instance to generate an answer based on the relevant document.

Here's a simplified example of how you can do this:

# Define your question
question = "Your question goes here"

# Use the vector store to find the most relevant document for the question
query_embedding = llm.encode(question)  # Generate an embedding for the question
_, document_indices = vector_store.search(query_embedding, k=1)  # Find the most relevant document

# Get the most relevant document
relevant_document = documents[document_indices[0]]

# Use the ChatOpenAI instance to generate an answer based on the relevant document
chat_messages = [
    {"role": "system", "content": prompt_template1},
    {"role": "user", "content": question},
    {"role": "assistant", "content": relevant_document},
]
response = llm.predict(chat_messages)

# The response will be a list of message objects, you can extract the assistant's reply like this:
assistant_reply = response[-1]['content']
print(assistant_reply)

In this example, llm.encode(question) is used to generate an embedding for the question. This embedding is then used to search the vector store for the most relevant document. The ChatOpenAI instance is then used to generate an answer based on the relevant document.

Please note that this is a simplified example and the actual implementation may vary depending on the specifics of your setup. For example, you might want to consider using a more sophisticated method for selecting the most relevant document, or you might want to use a different method for generating the answer based on the relevant document.

I hope this helps! Let me know if you have any other questions.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

naman-modi · 2023-08-26T06:58:25Z

Hey @Kuramdasu-ujwala-devi 👋, great question!

You should definitely checkout Infino's callback to monitor and improve the LLM performance. Link to docs here. Please feel free to reach out in case of any issues.

dosubot · 2023-11-25T16:01:52Z

Hi, @Kuramdasu-ujwala-devi! I'm Dosu, and I'm here to help the LangChain team manage our backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue is about finding ways to enhance the performance of agents in order to obtain improved responses from local models like gpt4all. There have been some helpful suggestions provided by me and "naman-modi". I recommended adjusting parameters in the GPT4All class and using the Infino integration in the LangChain framework for logging and monitoring various metrics and events. "naman-modi" suggested checking out the Infino callback for monitoring and improving LLM performance.

Based on these suggestions, it seems that the issue has been resolved. The adjustments to the parameters in the GPT4All class and the use of the Infino integration have been recommended to enhance the performance of agents and obtain improved responses from local models like gpt4all. Additionally, the Infino callback has been suggested for monitoring and improving LLM performance.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your contribution to the LangChain repository! Let us know if you have any further questions or concerns.

dosubot bot added Ɑ: agent Related to agents module Ɑ: models Related to LLMs or chat model modules 🤖:improvement Medium size change to existing code to handle new use-cases 🤖:question A specific question about the codebase, product, project, or how to use a feature labels Aug 20, 2023

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Nov 25, 2023

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 2, 2023

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Dec 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to improve the perfromance of agents to get better responses from the local model like gpt4all #9506

how to improve the perfromance of agents to get better responses from the local model like gpt4all #9506

Kuramdasu-ujwala-devi commented Aug 20, 2023 •

edited

dosubot bot commented Aug 20, 2023

Kuramdasu-ujwala-devi commented Aug 21, 2023 •

edited

Kuramdasu-ujwala-devi commented Aug 22, 2023

dosubot bot commented Aug 22, 2023

naman-modi commented Aug 26, 2023

dosubot bot commented Nov 25, 2023

how to improve the perfromance of agents to get better responses from the local model like gpt4all #9506

how to improve the perfromance of agents to get better responses from the local model like gpt4all #9506

Comments

Kuramdasu-ujwala-devi commented Aug 20, 2023 • edited

dosubot bot commented Aug 20, 2023

Sources

Kuramdasu-ujwala-devi commented Aug 21, 2023 • edited

Kuramdasu-ujwala-devi commented Aug 22, 2023

dosubot bot commented Aug 22, 2023

Sources

naman-modi commented Aug 26, 2023

dosubot bot commented Nov 25, 2023

Kuramdasu-ujwala-devi commented Aug 20, 2023 •

edited

Kuramdasu-ujwala-devi commented Aug 21, 2023 •

edited