# WebResearchRetriever

Given a query, this retriever will: 

* Formulate a set of relate Google searches
* Search for each 
* Load all the resulting URLs
* Then embed and perform similarity search with the query on the consolidate page content

In [10]:
from langchain.callbacks.manager import CallbackManager
from langchain.retrievers.web_research import WebResearchRetriever
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

`Simple usage`

Specify the LLM to use for search query generation, and the retriver will do the rest.

In [25]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chat_models.openai import ChatOpenAI
from langchain.utilities import GoogleSearchAPIWrapper

# Vectorstore
vectorstore = Chroma(embedding_function=OpenAIEmbeddings(),persist_directory="./chroma_db_oai")

# LLM
llm = ChatOpenAI(temperature=0)

# Search 
os.environ["GOOGLE_CSE_ID"] = "xxx"
os.environ["GOOGLE_API_KEY"] = "xxx"
search = GoogleSearchAPIWrapper()

In [23]:
# Initialize
web_research_retriever = WebResearchRetriever.from_llm(
    vectorstore=vectorstore,
    llm=llm, 
    search=search, 
)

# Run
user_input = "How do LLM Powered Autonomous Agents work?"
docs = web_research_retriever.get_relevant_documents(user_input)

INFO:langchain.retrievers.web_research:Generating questions for Google Search ...
INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'How do LLM Powered Autonomous Agents work?', 'text': LineList(lines=['1. What is the definition of LLM Powered Autonomous Agents?\n', '2. What are the key features of LLM Powered Autonomous Agents?\n', '3. How do LLM Powered Autonomous Agents differ from traditional autonomous agents?\n', '4. What are the applications of LLM Powered Autonomous Agents?\n'])}
INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. What is the definition of LLM Powered Autonomous Agents?\n', '2. What are the key features of LLM Powered Autonomous Agents?\n', '3. How do LLM Powered Autonomous Agents differ from traditional autonomous agents?\n', '4. What are the applications of LLM Powered Autonomous Agents?\n']
INFO:langchain.retrievers.web_research:Searching for relevat urls ...
INFO:langchain.retrievers.web_research:Sea

In [24]:
len(docs)

5

`Added flexibility`

Pass an LLM chain with custom prompt and output parsing

In [16]:
import os
import re
from typing import List
from langchain.chains import LLMChain
from pydantic import BaseModel, Field
from langchain.prompts import PromptTemplate
from langchain.output_parsers.pydantic import PydanticOutputParser

# LLMChain
search_prompt = PromptTemplate(
    input_variables=["question"],
    template="""<<SYS>> \n You are a web research assistant to help users
    answer questions. Answer using a numeric list. Do not include any extra
    test. \n <</SYS>> \n\n [INST] Given a user input search query, 
    generate a numbered list of five search queries to run to help answer their 
    question: \n\n {question} [/INST]""",
)

class LineList(BaseModel):
    """List of questions."""

    lines: List[str] = Field(description="Questions")

class QuestionListOutputParser(PydanticOutputParser):
    """Output parser for a list of numbered questions."""

    def __init__(self) -> None:
        super().__init__(pydantic_object=LineList)

    def parse(self, text: str) -> LineList:
        lines = re.findall(r"\d+\..*?\n", text)
        return LineList(lines=lines)
    
llm_chain = LLMChain(
            llm=llm,
            prompt=search_prompt,
            output_parser=QuestionListOutputParser(),
        )

In [27]:
# Initialize
web_research_retriever_llm_chain = WebResearchRetriever(
    vectorstore=vectorstore,
    llm_chain=llm_chain, 
    search=search, 
)

# Run
docs = web_research_retriever_llm_chain.get_relevant_documents(user_input)

INFO:langchain.retrievers.web_research:Generating questions for Google Search ...
ERROR:langchain.callbacks.tracers.langchain:Failed to post https://api.langchain.plus/runs in LangSmith API. {"detail":"Internal server error"}
INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'How do LLM Powered Autonomous Agents work?', 'text': LineList(lines=['1. What is the definition of LLM Powered Autonomous Agents?\n', '2. What are the key features of LLM Powered Autonomous Agents?\n', '3. How do LLM Powered Autonomous Agents use machine learning algorithms?\n', '4. What are the applications of LLM Powered Autonomous Agents?\n'])}
INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. What is the definition of LLM Powered Autonomous Agents?\n', '2. What are the key features of LLM Powered Autonomous Agents?\n', '3. How do LLM Powered Autonomous Agents use machine learning algorithms?\n', '4. What are the applications of LLM Powered Autonomous 

In [28]:
len(docs)

5

`Run locally`

Specify LLM and embeddings that will run locally.

In [29]:
from langchain.llms import LlamaCpp
from langchain.embeddings import GPT4AllEmbeddings

n_gpu_layers = 1  # Metal set to 1 is enough.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of RAM of your Apple Silicon Chip.
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
llama = LlamaCpp(
    model_path="/Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    n_ctx=4096,  # Context window
    max_tokens=1000,  # Max tokens to generate
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
    callback_manager=callback_manager,
    verbose=True,
)

vectorstore_llama = Chroma(embedding_function=GPT4AllEmbeddings(),persist_directory="./chroma_db_llama")

llama.cpp: loading model from /Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 4096
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 9132.71 MB (+ 1608.00 MB per state)
llama_new_context_with_model: kv self size  = 3200.00 MB


Found model file at  /Users/rlm/.cache/gpt4all/ggml-all-MiniLM-L6-v2-f16.bin
llama_new_context_with_model: max tensor size =    87.89 MB


ggml_metal_init: allocating
ggml_metal_init: using MPS
ggml_metal_init: loading '/Users/rlm/miniforge3/envs/llama/lib/python3.9/site-packages/llama_cpp/ggml-metal.metal'
ggml_metal_init: loaded kernel_add                            0x2cdaed120
ggml_metal_init: loaded kernel_mul                            0x2cdaee650
ggml_metal_init: loaded kernel_mul_row                        0x2cdaeede0
ggml_metal_init: loaded kernel_scale                          0x2cdaef460
ggml_metal_init: loaded kernel_silu                           0x2cdaefbe0
ggml_metal_init: loaded kernel_relu                           0x2cdaed470
ggml_metal_init: loaded kernel_gelu                           0x2cdaed6d0
ggml_metal_init: loaded kernel_soft_max                       0x2cdaf0f20
ggml_metal_init: loaded kernel_diag_mask_inf                  0x2cdaf13b0
ggml_metal_init: loaded kernel_get_rows_f16                   0x2cdaf1e70
ggml_metal_init: loaded kernel_get_rows_q4_0                  0x2cdaf2540
ggml_metal_init:

In [30]:
# Initialize
web_research_retriever = WebResearchRetriever.from_llm(
    vectorstore=vectorstore_llama,
    llm=llama, 
    search=search, 
)

# Run
user_input = "How do LLM Powered Autonomous Agents work?"
docs = web_research_retriever.get_relevant_documents(user_input)

INFO:langchain.retrievers.web_research:Generating questions for Google Search ...


  Sure! Based on the user input search query "How do LLM Powered Autonomous Agents work?", here are five search queries that could be used to help answer this question:

1. What is an LLM (Large Language Model) and how does it differ from other machine learning models?
2. How do autonomous agents use LLMs to make decisions and take actions?
3. Can you provide examples of real-world applications of LLM Powered Autonomous Agents, such as self-driving cars or virtual assistants?
4. What are some potential risks or limitations associated with using LLM Powered Autonomous Agents in various industries or contexts?
5. How do experts predict the future of LLM Powered Autonomous Agents will evolve as technology advances and becomes more integrated into our daily lives?
These search queries could help provide a comprehensive overview of how LLM Powered Autonomous Agents work, their potential applications, risks and limitations, as well as the future outlook of this technology.


llama_print_timings:        load time = 12546.80 ms
llama_print_timings:      sample time =   166.15 ms /   236 runs   (    0.70 ms per token,  1420.44 tokens per second)
llama_print_timings: prompt eval time = 12546.65 ms /    99 tokens (  126.73 ms per token,     7.89 tokens per second)
llama_print_timings:        eval time =  9499.58 ms /   235 runs   (   40.42 ms per token,    24.74 tokens per second)
llama_print_timings:       total time = 22535.38 ms
INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'How do LLM Powered Autonomous Agents work?', 'text': LineList(lines=['1. What is an LLM (Large Language Model) and how does it differ from other machine learning models?\n', '2. How do autonomous agents use LLMs to make decisions and take actions?\n', '3. Can you provide examples of real-world applications of LLM Powered Autonomous Agents, such as self-driving cars or virtual assistants?\n', '4. What are some potential risks or limitations associ

In [31]:
len(docs)

10

In [32]:
# Gengerate answer
from langchain.chains.question_answering import load_qa_chain

# Prompt
template = """<<SYS>> \n You are a QA assistant. Use the following pieces of context to answer the 
question at the end. Keep the answer as concise as possible. \n <</SYS>> \n\n  [INST] Context: 
{context} \n
Question: {question} [/INST]"""
QA_CHAIN_PROMPT = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
)

chain = load_qa_chain(llama, chain_type="stuff", prompt=QA_CHAIN_PROMPT)
output = chain({"input_documents": docs, "question": user_input}, return_only_outputs=True)
output['output_text']

Llama.generate: prefix-match hit


  LLM powered autonomous agents use large language models as their core controller to perform various tasks. These agents consist of three components: planning, memory, and tools. The LLM is responsible for generating actions based on the current state of the agent and its goals. The planning component breaks down complex tasks into smaller ones, while the memory component stores knowledge gained throughout the interaction with a user. Finally, the tool use component allows the agent to fetch current information, access live data, or perform dynamic computations using external tools such as search engines or APIs.


llama_print_timings:        load time = 12546.80 ms
llama_print_timings:      sample time =    83.38 ms /   114 runs   (    0.73 ms per token,  1367.27 tokens per second)
llama_print_timings: prompt eval time = 176621.27 ms /  3027 tokens (   58.35 ms per token,    17.14 tokens per second)
llama_print_timings:        eval time =  6285.97 ms /   113 runs   (   55.63 ms per token,    17.98 tokens per second)
llama_print_timings:       total time = 183147.12 ms


'  LLM powered autonomous agents use large language models as their core controller to perform various tasks. These agents consist of three components: planning, memory, and tools. The LLM is responsible for generating actions based on the current state of the agent and its goals. The planning component breaks down complex tasks into smaller ones, while the memory component stores knowledge gained throughout the interaction with a user. Finally, the tool use component allows the agent to fetch current information, access live data, or perform dynamic computations using external tools such as search engines or APIs.'