## 06 Web Research Retriever
### Example

In [1]:
!pip install -q -U langchain openai chromadb tiktoken


In [5]:
pip install google-api-python-client>=2.100.0

Note: you may need to restart the kernel to use updated packages.


In [2]:
import os
os.environ['OPENAI_API_KEY'] = "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"

In [3]:
from langchain.retrievers.web_research import WebResearchRetriever

import os
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chat_models.openai import ChatOpenAI
from langchain.utilities import GoogleSearchAPIWrapper

In [6]:

# Vectorstore
vectorstore = Chroma(embedding_function=OpenAIEmbeddings(),persist_directory="./chroma_db_oai")

# LLM
llm = ChatOpenAI(temperature=0)

# Request from https://programmablesearchengine.google.com/controlpanel/all
os.environ["GOOGLE_CSE_ID"] = "a06735d3c6722431c"
# Request from https://developers.google.com/custom-search/v1/introduction
os.environ["GOOGLE_API_KEY"] = "AIzaSyCXlCvLG1wfOIwZocZ-PB0GfTFXoBrz1FE"
search = GoogleSearchAPIWrapper()
     

In [7]:
search.run("What is vitamin?")


"Sep 18, 2023 ... Total vitamin D intakes were three times higher with supplement use than with diet alone; the mean intake from foods and beverages alone for\xa0... Dec 16, 2020 ... A vitamin is an organic compound, which means that it contains carbon. It is also an essential nutrient that the body may need to get from food. Nov 8, 2022 ... Vitamin D helps maintain strong bones. Learn how much you need, good sources, deficiency symptoms, and health effects here. Jan 19, 2023 ... Vitamins are a group of substances that are needed for normal cell function, growth, and development. There are 13 essential vitamins. This\xa0... The two main forms of vitamin A in the human diet are preformed vitamin A (retinol, retinyl esters), and provitamin A carotenoids such as alpha-carotene and\xa0... Aug 2, 2022 ... Vitamin D deficiency is a common vitamin deficiency that causes issues with your bones and muscles. It most commonly affects people over the\xa0... Vitamin A, along with other vitamins, mi

In [8]:
web_research_retriever = WebResearchRetriever.from_llm(
    vectorstore=vectorstore,
    llm=llm,
    search=search,
)
     

In [9]:
!pip install -q -U html2text


In [10]:
from langchain.chains import RetrievalQAWithSourcesChain
user_input = "Who is the winner of FIFA world cup 2002?"
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm,retriever=web_research_retriever)
result = qa_chain({"question": user_input})
result

Fetching pages: 100%|#########################################################################################################################################################| 2/2 [00:01<00:00,  1.40it/s]
Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for text-embedding-ada-002 in organization org-JXHBc0XTJ18W533b9hr517mH on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for text-embedding-ada-002 in organization org-JXHBc0XTJ18W533b9hr517mH on requests per min. Limit: 3 / min. Please try again in 20s. Cont

{'question': 'Who is the winner of FIFA world cup 2002?',
 'answer': 'Brazil is the winner of the FIFA World Cup 2002.\n',
 'sources': 'https://en.wikipedia.org/wiki/2002_FIFA_World_Cup'}

In [11]:

import logging
logging.basicConfig()
logging.getLogger("langchain.retrievers.web_research").setLevel(logging.INFO)
user_input = "What is Task Decomposition in LLM Powered Autonomous Agents?"
docs = web_research_retriever.get_relevant_documents(user_input)

INFO:langchain.retrievers.web_research:Generating questions for Google Search ...
INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'What is Task Decomposition in LLM Powered Autonomous Agents?', 'text': LineList(lines=['1. How does task decomposition work in LLM powered autonomous agents?\n', '2. What is the role of task decomposition in LLM powered autonomous agents?\n', '3. Can you explain the concept of task decomposition in LLM powered autonomous agents?'])}
INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. How does task decomposition work in LLM powered autonomous agents?\n', '2. What is the role of task decomposition in LLM powered autonomous agents?\n', '3. Can you explain the concept of task decomposition in LLM powered autonomous agents?']
INFO:langchain.retrievers.web_research:Searching for relevant urls...
INFO:langchain.retrievers.web_research:Searching for relevant urls...
INFO:langchain.retrievers.web_research:S

In [12]:
import os
import re
from typing import List
from langchain.chains import LLMChain
from pydantic import BaseModel, Field
from langchain.prompts import PromptTemplate
from langchain.output_parsers.pydantic import PydanticOutputParser

# LLMChain
search_prompt = PromptTemplate(
    input_variables=["question"],
    template="""You are an assistant tasked with improving Google search
    results. Generate 5 Google search queries that are similar to
    this question. The output should be a numbered list of questions and each
    should have a question mark at the end: {question}""",
)

class LineList(BaseModel):
    """List of questions."""

    lines: List[str] = Field(description="Questions")

class QuestionListOutputParser(PydanticOutputParser):
    """Output parser for a list of numbered questions."""

    def __init__(self) -> None:
        super().__init__(pydantic_object=LineList)

    def parse(self, text: str) -> LineList:
        lines = re.findall(r"\d+\..*?\n", text)
        return LineList(lines=lines)

llm_chain = LLMChain(llm=llm, prompt=search_prompt, output_parser=QuestionListOutputParser())
     

In [13]:
# Initialize
web_research_retriever_llm_chain = WebResearchRetriever(vectorstore=vectorstore, llm_chain=llm_chain, search=search)

# Run
docs = web_research_retriever_llm_chain.get_relevant_documents("What is the recommended way to recycle plastics?")

INFO:langchain.retrievers.web_research:Generating questions for Google Search ...
INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'What is the recommended way to recycle plastics?', 'text': LineList(lines=['1. How can I recycle plastics effectively?\n', '2. What are the best practices for recycling plastics?\n', '3. Which methods are recommended for recycling plastics?\n', '4. What is the most efficient way to recycle plastics?\n'])}
INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. How can I recycle plastics effectively?\n', '2. What are the best practices for recycling plastics?\n', '3. Which methods are recommended for recycling plastics?\n', '4. What is the most efficient way to recycle plastics?\n']
INFO:langchain.retrievers.web_research:Searching for relevant urls...
INFO:langchain.retrievers.web_research:Searching for relevant urls...
INFO:langchain.retrievers.web_research:Search results: [{'title': '7 Tips to Recycle

In [14]:

docs

[Document(page_content='Effective recycling of mixed plastics waste is the next major challenge for\nthe plastics recycling sector. The advantage is the ability to recycle a\nlarger proportion of the plastic waste stream by expanding post-consumer\ncollection of plastic packaging to cover a wider variety of materials and pack\ntypes. Product design for recycling has strong potential to assist in such\nrecycling efforts. A study carried out in the UK found that the amount of\npackaging in a regular shopping basket that, even if collected, cannot be\neffectively recycled, ranged from 21 to 40% (Local Government Association (UK)\n2007). Hence, wider implementation of policies to promote the use of\nenvironmental design principles by industry could have a large impact on\nrecycling performance, increasing the proportion of packaging that can\neconomically be collected and diverted from landfill (see Shaxson _et al._\n2009). The same logic applies to durable consumer goods designing for\ndi