# WebResearchRetriever

Given a query, this retriever will: 

* Formulate a set of relate Google searches
* Search for each 
* Load all the resulting URLs
* Then embed and perform similarity search with the query on the consolidate page content

In [2]:
from langchain.retrievers.web_research import WebResearchRetriever

### Simple usage

Specify the LLM to use for Google search query generation.

In [6]:
import os
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chat_models.openai import ChatOpenAI
from langchain.utilities import GoogleSearchAPIWrapper

# Vectorstore
vectorstore = Chroma(embedding_function=OpenAIEmbeddings(),persist_directory="./chroma_db_oai")

# LLM
llm = ChatOpenAI(temperature=0)

# Search 
os.environ["GOOGLE_CSE_ID"] = "xxx"
os.environ["GOOGLE_API_KEY"] = "xxx"
search = GoogleSearchAPIWrapper()

In [7]:
# Initialize
web_research_retriever = WebResearchRetriever.from_llm(
    vectorstore=vectorstore,
    llm=llm, 
    search=search, 
)

`Run with citations`

We can use `RetrievalQAWithSourcesChain` to retrieve docs and provide citations

In [8]:
from langchain.chains import RetrievalQAWithSourcesChain
user_input = "How do LLM Powered Autonomous Agents work?"
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm,retriever=web_research_retriever)
result = qa_chain({"question": user_input})
result

Fetching pages: 100%|#######################################################################################################################################| 3/3 [00:01<00:00,  1.79it/s]


{'question': 'How do LLM Powered Autonomous Agents work?',
 'answer': 'LLM Powered Autonomous Agents work by utilizing a large language model (LLM) as the core controller of the agent. The agent is complemented by several key components, including planning, memory, and tool use. The planning component involves task decomposition and self-reflection. The memory component includes short-term and long-term memory, which allows the agent to learn and retain information. The tool use component enables the agent to call external APIs for additional information. There are also case studies and challenges associated with building LLM-powered autonomous agents. \n\n',
 'sources': '\n- https://lilianweng.github.io/posts/2023-06-23-agent/'}

`Run with logging`

Here, we use `get_relevant_documents` method to return docs.

In [9]:
# Run
import logging
logging.basicConfig()
logging.getLogger("langchain.retrievers.web_research").setLevel(logging.INFO)
user_input = "What is Task Decomposition in LLM Powered Autonomous Agents?"
docs = web_research_retriever.get_relevant_documents(user_input)

INFO:langchain.retrievers.web_research:Generating questions for Google Search ...
INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'What is Task Decomposition in LLM Powered Autonomous Agents?', 'text': LineList(lines=['1. How do LLM powered autonomous agents use task decomposition?\n', '2. Why is task decomposition important for LLM powered autonomous agents?\n', '3. Can you explain the concept of task decomposition in LLM powered autonomous agents?\n', '4. What are the benefits of task decomposition in LLM powered autonomous agents?\n'])}
INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. How do LLM powered autonomous agents use task decomposition?\n', '2. Why is task decomposition important for LLM powered autonomous agents?\n', '3. Can you explain the concept of task decomposition in LLM powered autonomous agents?\n', '4. What are the benefits of task decomposition in LLM powered autonomous agents?\n']
INFO:langchain.retri

`Look at the URLs loaded`

In [10]:
web_research_retriever.get_urls()

['https://lilianweng.github.io/posts/2023-06-23-agent/']

`Generate answer using retrieved docs`

We can use `load_qa_chain` for QA using the retrieved docs

In [11]:
from langchain.chains.question_answering import load_qa_chain
chain = load_qa_chain(llm, chain_type="stuff")
output = chain({"input_documents": docs, "question": user_input},return_only_outputs=True)
output['output_text']

'Task decomposition in LLM-powered autonomous agents refers to the process of breaking down complex tasks into smaller, more manageable subtasks. This allows the agent to efficiently handle and solve complex problems by dividing them into smaller steps. Task decomposition can be done using various techniques, such as prompting the LLM with specific instructions or using human inputs. The goal is to transform a large task into multiple smaller tasks that can be easily understood and executed by the agent.'

### More flexibility

Pass an LLM chain with custom prompt and output parsing

In [12]:
import os
import re
from typing import List
from langchain.chains import LLMChain
from pydantic import BaseModel, Field
from langchain.prompts import PromptTemplate
from langchain.output_parsers.pydantic import PydanticOutputParser

# LLMChain
search_prompt = PromptTemplate(
    input_variables=["question"],
    template="""You are an assistant tasked with improving Google search 
    results. Generate FIVE Google search queries that are similar to
    this question. The output should be a numbered list of questions and each
    should have a question mark at the end: {question}""",
)

class LineList(BaseModel):
    """List of questions."""

    lines: List[str] = Field(description="Questions")

class QuestionListOutputParser(PydanticOutputParser):
    """Output parser for a list of numbered questions."""

    def __init__(self) -> None:
        super().__init__(pydantic_object=LineList)

    def parse(self, text: str) -> LineList:
        lines = re.findall(r"\d+\..*?\n", text)
        return LineList(lines=lines)
    
llm_chain = LLMChain(
            llm=llm,
            prompt=search_prompt,
            output_parser=QuestionListOutputParser(),
        )

In [13]:
# Initialize
web_research_retriever_llm_chain = WebResearchRetriever(
    vectorstore=vectorstore,
    llm_chain=llm_chain, 
    search=search, 
)

# Run
docs = web_research_retriever_llm_chain.get_relevant_documents(user_input)

INFO:langchain.retrievers.web_research:Generating questions for Google Search ...
INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'What is Task Decomposition in LLM Powered Autonomous Agents?', 'text': LineList(lines=['1. How do LLM powered autonomous agents utilize task decomposition?\n', '2. Can you explain the concept of task decomposition in LLM powered autonomous agents?\n', '3. What role does task decomposition play in the functioning of LLM powered autonomous agents?\n', '4. Why is task decomposition important for LLM powered autonomous agents?\n'])}
INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. How do LLM powered autonomous agents utilize task decomposition?\n', '2. Can you explain the concept of task decomposition in LLM powered autonomous agents?\n', '3. What role does task decomposition play in the functioning of LLM powered autonomous agents?\n', '4. Why is task decomposition important for LLM powered autonom

In [14]:
len(docs)

3

### Run locally

Specify LLM and embeddings that will run locally (e.g., on your laptop)

In [15]:
from langchain.llms import LlamaCpp
from langchain.embeddings import GPT4AllEmbeddings
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

n_gpu_layers = 1  # Metal set to 1 is enough.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of RAM of your Apple Silicon Chip.
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
llama = LlamaCpp(
    model_path="/Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    n_ctx=4096,  # Context window
    max_tokens=1000,  # Max tokens to generate
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
    callback_manager=callback_manager,
    verbose=True,
)

vectorstore_llama = Chroma(embedding_function=GPT4AllEmbeddings(),persist_directory="./chroma_db_llama")

llama.cpp: loading model from /Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 4096
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 9132.71 MB (+ 1608.00 MB per state)
llama_new_context_with_model: kv self size  = 3200.00 MB


Found model file at  /Users/rlm/.cache/gpt4all/ggml-all-MiniLM-L6-v2-f16.bin
llama_new_context_with_model: max tensor size =    87.89 MB


ggml_metal_init: allocating
ggml_metal_init: using MPS
ggml_metal_init: loading '/Users/rlm/miniforge3/envs/llama/lib/python3.9/site-packages/llama_cpp/ggml-metal.metal'
ggml_metal_init: loaded kernel_add                            0x105bdb800
ggml_metal_init: loaded kernel_mul                            0x105bdcec0
ggml_metal_init: loaded kernel_mul_row                        0x105bde260
ggml_metal_init: loaded kernel_scale                          0x105bdd120
ggml_metal_init: loaded kernel_silu                           0x105bdd380
ggml_metal_init: loaded kernel_relu                           0x105bdf760
ggml_metal_init: loaded kernel_gelu                           0x2cc9deb10
ggml_metal_init: loaded kernel_soft_max                       0x105bdf9c0
ggml_metal_init: loaded kernel_diag_mask_inf                  0x105bdff80
ggml_metal_init: loaded kernel_get_rows_f16                   0x105be0620
ggml_metal_init: loaded kernel_get_rows_q4_0                  0x105be0d20
ggml_metal_init:

We supplied `StreamingStdOutCallbackHandler()`, so model outputs (e.g., generated questions) are streamed. 

We also have logging on, so we seem them there too.

In [16]:
# Initialize
web_research_retriever = WebResearchRetriever.from_llm(
    vectorstore=vectorstore_llama,
    llm=llama, 
    search=search, 
)

# Run
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm,retriever=web_research_retriever)
result = qa_chain({"question": user_input})
result

INFO:langchain.retrievers.web_research:Generating questions for Google Search ...


  Sure, here are five Google search queries that are similar to "What is Task Decomposition in LLM Powered Autonomous Agents?"

1. How does Task Decomposition work in LLM Powered Autonomous Agents?

2. What are the benefits of using Task Decomposition in LLM Powered Autonomous Agents?

3. Can you explain the process of Task Decomposition in LLM Powered Autonomous Agents with examples?

4. How does Task Decomposition improve the performance of LLM Powered Autonomous Agents?

5. What are some common tasks that can be decomposed using Task Decomposition in LLM Powered Autonomous Agents?


llama_print_timings:        load time = 16109.97 ms
llama_print_timings:      sample time =   113.70 ms /   160 runs   (    0.71 ms per token,  1407.25 tokens per second)
llama_print_timings: prompt eval time = 16109.88 ms /   101 tokens (  159.50 ms per token,     6.27 tokens per second)
llama_print_timings:        eval time = 10335.34 ms /   159 runs   (   65.00 ms per token,    15.38 tokens per second)
llama_print_timings:       total time = 26759.93 ms
INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'What is Task Decomposition in LLM Powered Autonomous Agents?', 'text': LineList(lines=['1. How does Task Decomposition work in LLM Powered Autonomous Agents?\n', '2. What are the benefits of using Task Decomposition in LLM Powered Autonomous Agents?\n', '3. Can you explain the process of Task Decomposition in LLM Powered Autonomous Agents with examples?\n', '4. How does Task Decomposition improve the performance of LLM Powered Autonomous Agents?\

{'question': 'What is Task Decomposition in LLM Powered Autonomous Agents?',
 'answer': 'Task Decomposition in LLM Powered Autonomous Agents refers to the process of breaking down large tasks into smaller, manageable subgoals. This allows the agent to efficiently handle complex tasks. Task decomposition is one of the components of the planning phase in LLM-powered autonomous agents. \n',
 'sources': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}