# WebResearchRetriever

Given a query, this retriever will: 

* Formulate a set of relate Google searches
* Search for each 
* Load all the resulting URLs
* Then embed and perform similarity search with the query on the consolidate page content

In [8]:
from langchain.retrievers.web_research import WebResearchRetriever

### Simple usage

Specify the LLM to use for Google search query generation.

In [10]:
import os
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chat_models.openai import ChatOpenAI
from langchain.utilities import GoogleSearchAPIWrapper

# Vectorstore
vectorstore = Chroma(embedding_function=OpenAIEmbeddings(),persist_directory="./chroma_db_oai")

# LLM
llm = ChatOpenAI(temperature=0)

# Search 
os.environ["GOOGLE_CSE_ID"] = "xxx"
os.environ["GOOGLE_API_KEY"] = "xxx"
search = GoogleSearchAPIWrapper()

In [11]:
# Initialize
web_research_retriever = WebResearchRetriever.from_llm(
    vectorstore=vectorstore,
    llm=llm, 
    search=search, 
)

`Run with citations`

We can use `RetrievalQAWithSourcesChain` to retrieve docs and provide citations

In [5]:
from langchain.chains import RetrievalQAWithSourcesChain
user_input = "How do LLM Powered Autonomous Agents work?"
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm,retriever=web_research_retriever)
result = qa_chain({"question": user_input})
result

Fetching pages: 100%|######################################################################################################################################################################################################| 1/1 [00:00<00:00,  3.60it/s]


{'question': 'How do LLM Powered Autonomous Agents work?',
 'answer': 'LLM Powered Autonomous Agents work by utilizing a large language model (LLM) as the core controller of the agent. The agent is complemented by several key components, including planning, memory, and tool use. In terms of planning, the agent breaks down tasks into smaller subgoals and can reflect on past actions to improve future results. The memory component includes both short-term and long-term memory, allowing the agent to learn in-context and retain and recall information over extended periods. Tool use involves the agent calling external APIs for additional information. There are also challenges associated with LLM-powered autonomous agents, such as finite context length, long-term planning and task decomposition, and the reliability of natural language interfaces. \n\n',
 'sources': '\n- https://lilianweng.github.io/posts/2023-06-23-agent/'}

`Run with logging`

Here, we use `get_relevant_documents` method to return docs.

In [16]:
# Run
import logging
logging.basicConfig()
logging.getLogger("langchain.retrievers.web_research").setLevel(logging.INFO)
user_input = "What is Task Decomposition in LLM Powered Autonomous Agents?"
docs = web_research_retriever.get_relevant_documents(user_input)

INFO:langchain.retrievers.web_research:Generating questions for Google Search ...
INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'What is Task Decomposition in LLM Powered Autonomous Agents?', 'text': LineList(lines=['1. How do LLM powered autonomous agents utilize task decomposition?\n', '2. Can you explain the concept of task decomposition in LLM powered autonomous agents?\n', '3. What role does task decomposition play in the functioning of LLM powered autonomous agents?\n', '4. Why is task decomposition important for LLM powered autonomous agents?\n'])}
INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. How do LLM powered autonomous agents utilize task decomposition?\n', '2. Can you explain the concept of task decomposition in LLM powered autonomous agents?\n', '3. What role does task decomposition play in the functioning of LLM powered autonomous agents?\n', '4. Why is task decomposition important for LLM powered autonom

`Look at the URLs loaded`

In [17]:
web_research_retriever.get_urls()

['https://lilianweng.github.io/posts/2023-06-23-agent/']

`Generate answer using retrieved docs`

We can use `load_qa_chain` for QA using the retrieved docs

In [19]:
from langchain.chains.question_answering import load_qa_chain
chain = load_qa_chain(llm, chain_type="stuff")
output = chain({"input_documents": docs, "question": user_input},return_only_outputs=True)
output['output_text']

'Task decomposition in LLM-powered autonomous agents refers to the process of breaking down a complex task into smaller, more manageable subgoals. This allows the agent to efficiently handle and solve complex tasks by tackling them step by step. By decomposing a task, the agent can plan ahead and determine the sequence of actions required to achieve the overall goal. Task decomposition is an important component of planning in LLM-powered agents.'

### More flexibility

Pass an LLM chain with custom prompt and output parsing

In [16]:
import os
import re
from typing import List
from langchain.chains import LLMChain
from pydantic import BaseModel, Field
from langchain.prompts import PromptTemplate
from langchain.output_parsers.pydantic import PydanticOutputParser

# LLMChain
search_prompt = PromptTemplate(
    input_variables=["question"],
    template="""You are an assistant tasked with improving Google search 
    results. Generate FIVE Google search queries that are similar to
    this question. The output should be a numbered list of questions and each
    should have a question mark at the end: {question}""",
)

class LineList(BaseModel):
    """List of questions."""

    lines: List[str] = Field(description="Questions")

class QuestionListOutputParser(PydanticOutputParser):
    """Output parser for a list of numbered questions."""

    def __init__(self) -> None:
        super().__init__(pydantic_object=LineList)

    def parse(self, text: str) -> LineList:
        lines = re.findall(r"\d+\..*?\n", text)
        return LineList(lines=lines)
    
llm_chain = LLMChain(
            llm=llm,
            prompt=search_prompt,
            output_parser=QuestionListOutputParser(),
        )

In [27]:
# Initialize
web_research_retriever_llm_chain = WebResearchRetriever(
    vectorstore=vectorstore,
    llm_chain=llm_chain, 
    search=search, 
)

# Run
docs = web_research_retriever_llm_chain.get_relevant_documents(user_input)

INFO:langchain.retrievers.web_research:Generating questions for Google Search ...
ERROR:langchain.callbacks.tracers.langchain:Failed to post https://api.langchain.plus/runs in LangSmith API. {"detail":"Internal server error"}
INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'How do LLM Powered Autonomous Agents work?', 'text': LineList(lines=['1. What is the definition of LLM Powered Autonomous Agents?\n', '2. What are the key features of LLM Powered Autonomous Agents?\n', '3. How do LLM Powered Autonomous Agents use machine learning algorithms?\n', '4. What are the applications of LLM Powered Autonomous Agents?\n'])}
INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. What is the definition of LLM Powered Autonomous Agents?\n', '2. What are the key features of LLM Powered Autonomous Agents?\n', '3. How do LLM Powered Autonomous Agents use machine learning algorithms?\n', '4. What are the applications of LLM Powered Autonomous 

In [28]:
len(docs)

5

### Run locally

Specify LLM and embeddings that will run locally (e.g., on your laptop)

In [6]:
from langchain.llms import LlamaCpp
from langchain.embeddings import GPT4AllEmbeddings
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

n_gpu_layers = 1  # Metal set to 1 is enough.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of RAM of your Apple Silicon Chip.
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
llama = LlamaCpp(
    model_path="/Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    n_ctx=4096,  # Context window
    max_tokens=1000,  # Max tokens to generate
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
    callback_manager=callback_manager,
    verbose=True,
)

vectorstore_llama = Chroma(embedding_function=GPT4AllEmbeddings(),persist_directory="./chroma_db_llama")

llama.cpp: loading model from /Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 4096
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 9132.71 MB (+ 1608.00 MB per state)
llama_new_context_with_model: kv self size  = 3200.00 MB


Found model file at  /Users/rlm/.cache/gpt4all/ggml-all-MiniLM-L6-v2-f16.bin
llama_new_context_with_model: max tensor size =    87.89 MB


ggml_metal_init: allocating
ggml_metal_init: using MPS
ggml_metal_init: loading '/Users/rlm/miniforge3/envs/llama/lib/python3.9/site-packages/llama_cpp/ggml-metal.metal'
ggml_metal_init: loaded kernel_add                            0x2c147aa30
ggml_metal_init: loaded kernel_mul                            0x2c147bfd0
ggml_metal_init: loaded kernel_mul_row                        0x2c147c820
ggml_metal_init: loaded kernel_scale                          0x16c293d10
ggml_metal_init: loaded kernel_silu                           0x16c294770
ggml_metal_init: loaded kernel_relu                           0x16c2954f0
ggml_metal_init: loaded kernel_gelu                           0x16c295b90
ggml_metal_init: loaded kernel_soft_max                       0x16c296210
ggml_metal_init: loaded kernel_diag_mask_inf                  0x16c296960
ggml_metal_init: loaded kernel_get_rows_f16                   0x16c2970e0
ggml_metal_init: loaded kernel_get_rows_q4_0                  0x16c297810
ggml_metal_init:

We supplied `StreamingStdOutCallbackHandler()`, so model outputs are streamed 

In [7]:
# Initialize
web_research_retriever = WebResearchRetriever.from_llm(
    vectorstore=vectorstore_llama,
    llm=llama, 
    search=search, 
)

# Run
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm,retriever=web_research_retriever)
result = qa_chain({"question": user_input})
result

Using LlamaCpp
  Sure, here are five Google search queries that are similar to "How do LLM Powered Autonomous Agents work?":

1. What are the key components of an LLM-powered autonomous agent?
2. How do LLMs enable autonomous agents to make decisions?
3. Can you explain the training process for an LLM-powered autonomous agent?
4. What are some real-world applications of LLM-powered autonomous agents?
5. How do LLM-powered autonomous agents handle unexpected events or situations?


llama_print_timings:        load time =  8929.09 ms
llama_print_timings:      sample time =    88.29 ms /   125 runs   (    0.71 ms per token,  1415.76 tokens per second)
llama_print_timings: prompt eval time =  8928.89 ms /    97 tokens (   92.05 ms per token,    10.86 tokens per second)
llama_print_timings:        eval time =  8130.48 ms /   124 runs   (   65.57 ms per token,    15.25 tokens per second)
llama_print_timings:       total time = 17310.78 ms
Fetching pages: 100%|######################################################################################################################################################################################################| 1/1 [00:00<00:00,  3.90it/s]


{'question': 'How do LLM Powered Autonomous Agents work?',
 'answer': 'LLM Powered Autonomous Agents work by utilizing a large language model (LLM) as the core controller of the agent. The agent system consists of several key components, including planning, memory, and tool use. In terms of planning, the agent breaks down complex tasks into smaller subgoals and can reflect on past actions to improve future results. The memory component includes both short-term and long-term memory, allowing the agent to learn in-context and retain and recall information over extended periods. The tool use component involves the agent calling external APIs for additional information. There are also case studies, such as scientific discovery agents and generative agents simulations, that demonstrate the capabilities of LLM-powered autonomous agents. However, there are challenges, such as the limited context length, difficulties in long-term planning and task decomposition, and the reliability of natural 