# WebResearchRetriever

Given a query, this retriever will: 

* Formulate a set of relate Google searches
* Search for each 
* Load all the resulting URLs
* Then embed and perform similarity search with the query on the consolidate page content

In [2]:
from langchain.callbacks.manager import CallbackManager
from langchain.retrievers.web_research import WebResearchRetriever
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

## Run

Pass the desired model and vectorstore.

In [3]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chat_models.openai import ChatOpenAI
# Set input
llm = ChatOpenAI(temperature=0)
vectorstore = Chroma(embedding_function=OpenAIEmbeddings(),persist_directory="./chroma_db_oai")
GOOGLE_CSE_ID = "xxx"
GOOGLE_API_KEY = "xxx"

In [4]:
# Initialize
web_research_retriever = WebResearchRetriever(
    vectorstore=vectorstore, 
    llm=llm, 
    GOOGLE_CSE_ID=GOOGLE_CSE_ID, 
    GOOGLE_API_KEY=GOOGLE_API_KEY
)

In [5]:
# Run
import logging
logging.basicConfig()
logging.getLogger("langchain.retrievers.web_research").setLevel(logging.INFO)
user_input = "How do LLM Powered Autonomous Agents work?"
docs = web_research_retriever.get_relevant_documents(user_input)

INFO:langchain.retrievers.web_research:Generating questions for Google Search ...
INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'How do LLM Powered Autonomous Agents work?', 'text': LineList(lines=['1. What is the definition of LLM Powered Autonomous Agents?\n', '2. What are the key features of LLM Powered Autonomous Agents?\n', '3. How do LLM Powered Autonomous Agents use machine learning algorithms?\n', '4. What are some real-world applications of LLM Powered Autonomous Agents?\n'])}
INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. What is the definition of LLM Powered Autonomous Agents?\n', '2. What are the key features of LLM Powered Autonomous Agents?\n', '3. How do LLM Powered Autonomous Agents use machine learning algorithms?\n', '4. What are some real-world applications of LLM Powered Autonomous Agents?\n']
INFO:langchain.retrievers.web_research:Searching for relevat urls ...
INFO:langchain.retrievers.web_research

In [6]:
from langchain.chains.question_answering import load_qa_chain
chain = load_qa_chain(llm, chain_type="stuff")
output = chain({"input_documents": docs, "question": user_input}, return_only_outputs=True)
output['output_text']

"LLM-powered autonomous agents work by using a large language model (LLM) as their core controller. These agents have several key components that complement the LLM:\n\n1. Planning: Complex tasks are broken down into simpler steps through task decomposition. This can be done by prompting the LLM, providing task-specific instructions, or using human input. Self-reflection techniques help the agents learn from experience and improve their reasoning.\n\n2. Memory: Autonomous agents have different types of memory, including working memory and long-term memory. They can use contextual embeddings to understand the user's intent and context by incorporating entire conversation histories. This allows them to respond based on collective knowledge gained throughout the interaction with a user.\n\n3. Tool Use: Autonomous agents have the capacity to use tools such as browsing the internet, accessing live data, or running code. They can define goals and tasks, identify the right actions to take, an

`Local -`

In [7]:
from langchain.llms import LlamaCpp
from langchain.vectorstores import Chroma
from langchain.embeddings import GPT4AllEmbeddings

n_gpu_layers = 1  # Metal set to 1 is enough.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of RAM of your Apple Silicon Chip.
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
llama = LlamaCpp(
    model_path="/Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    n_ctx=4096,  # Context window
    max_tokens=1000,  # Max tokens to generate
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
    callback_manager=callback_manager,
    verbose=True,
)
vectorstore_llama = Chroma(embedding_function=GPT4AllEmbeddings(),persist_directory="./chroma_db_llama")
GOOGLE_CSE_ID = "b5e84267513eb4dcf"
GOOGLE_API_KEY = "AIzaSyDUKwJCpdU6nNwANyA7NC2cXnMfvXD6YcM"

llama.cpp: loading model from /Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 4096
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 9132.71 MB (+ 1608.00 MB per state)
llama_new_context_with_model: kv self size  = 3200.00 MB
ggml_metal_init: allocating


Found model file at  /Users/rlm/.cache/gpt4all/ggml-all-MiniLM-L6-v2-f16.bin
llama_new_context_with_model: max tensor size =    87.89 MB


ggml_metal_init: using MPS
ggml_metal_init: loading '/Users/rlm/miniforge3/envs/llama/lib/python3.9/site-packages/llama_cpp/ggml-metal.metal'
ggml_metal_init: loaded kernel_add                            0x2996f3910
ggml_metal_init: loaded kernel_mul                            0x2996f4e40
ggml_metal_init: loaded kernel_mul_row                        0x2996f5660
ggml_metal_init: loaded kernel_scale                          0x2996f5cf0
ggml_metal_init: loaded kernel_silu                           0x2996f6460
ggml_metal_init: loaded kernel_relu                           0x2996f3c60
ggml_metal_init: loaded kernel_gelu                           0x2996f3ec0
ggml_metal_init: loaded kernel_soft_max                       0x2996f77e0
ggml_metal_init: loaded kernel_diag_mask_inf                  0x2996f7c90
ggml_metal_init: loaded kernel_get_rows_f16                   0x2996f85b0
ggml_metal_init: loaded kernel_get_rows_q4_0                  0x2996f8de0
ggml_metal_init: loaded kernel_get_rows_q4_1

In [8]:
# Initialize WebResearchRetriever
web_research_retriever = WebResearchRetriever(
    vectorstore=vectorstore_llama, 
    llm=llama, 
    GOOGLE_CSE_ID=GOOGLE_CSE_ID, 
    GOOGLE_API_KEY=GOOGLE_API_KEY
)

In [9]:
import logging
logging.basicConfig()
logging.getLogger("langchain.retrievers.web_research").setLevel(logging.INFO)
user_input = "How do LLM Powered Autonomous Agents work?"
docs = web_research_retriever.get_relevant_documents(user_input)

INFO:langchain.retrievers.web_research:Generating questions for Google Search ...


  Sure! Based on the user input search query "How do LLM Powered Autonomous Agents work?", here are five search queries that could help answer their question:

1. What are LLM Powered Autonomous Agents and how do they differ from traditional AI agents?
2. How do LLM Powered Autonomous Agents learn and improve their decision-making abilities over time?
3. Can you provide examples of real-world applications of LLM Powered Autonomous Agents, such as self-driving cars or personal assistants?
4. How does the choice of LLM (Long Short-Term Memory) algorithm affect the performance and capabilities of an LLM Powered Autonomous Agent?
5. What are some common challenges and limitations of LLM Powered Autonomous Agents, such as dealing with unexpected events or handling conflicting goals?


llama_print_timings:        load time =  7308.57 ms
llama_print_timings:      sample time =   136.15 ms /   194 runs   (    0.70 ms per token,  1424.94 tokens per second)
llama_print_timings: prompt eval time =  7308.44 ms /    99 tokens (   73.82 ms per token,    13.55 tokens per second)
llama_print_timings:        eval time =  6384.79 ms /   193 runs   (   33.08 ms per token,    30.23 tokens per second)
llama_print_timings:       total time = 14072.87 ms
INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'How do LLM Powered Autonomous Agents work?', 'text': LineList(lines=['1. What are LLM Powered Autonomous Agents and how do they differ from traditional AI agents?\n', '2. How do LLM Powered Autonomous Agents learn and improve their decision-making abilities over time?\n', '3. Can you provide examples of real-world applications of LLM Powered Autonomous Agents, such as self-driving cars or personal assistants?\n', '4. How does the choice of LLM (L

In [10]:
len(docs)

6

In [11]:
from langchain import PromptTemplate
from langchain.chains.question_answering import load_qa_chain

# Prompt
template = """<<SYS>> \n You are a QA assistant. Use the following pieces of context to answer the 
question at the end. Keep the answer as concise as possible. \n <</SYS>> \n\n  [INST] Context: 
{context} \n
Question: {question} [/INST]"""
QA_CHAIN_PROMPT = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
)

chain = load_qa_chain(llama, chain_type="stuff", prompt=QA_CHAIN_PROMPT)
output = chain({"input_documents": docs, "question": user_input}, return_only_outputs=True)
output['output_text']

Llama.generate: prefix-match hit


  LLM Powered Autonomous Agents use a combination of planning, memory, and tool use to perform tasks. The agent breaks down large tasks into smaller subgoals, reflects on past actions, and learns from mistakes. It also uses external tools to access proprietary information sources and more. The agent utilizes short-term and long-term memory to retain and recall information over extended periods.


llama_print_timings:        load time =  7308.57 ms
llama_print_timings:      sample time =    63.39 ms /    85 runs   (    0.75 ms per token,  1340.88 tokens per second)
llama_print_timings: prompt eval time = 103653.11 ms /  2117 tokens (   48.96 ms per token,    20.42 tokens per second)
llama_print_timings:        eval time =  3880.31 ms /    84 runs   (   46.19 ms per token,    21.65 tokens per second)
llama_print_timings:       total time = 107710.45 ms


'  LLM Powered Autonomous Agents use a combination of planning, memory, and tool use to perform tasks. The agent breaks down large tasks into smaller subgoals, reflects on past actions, and learns from mistakes. It also uses external tools to access proprietary information sources and more. The agent utilizes short-term and long-term memory to retain and recall information over extended periods.'